<div class="docbook"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">10.4. Behaviour of the Group</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="Java-Broker-High-Availability-CreatingGroup.html">Prev</a> </td><th align="center" width="60%">Chapter 10. High Availability</th><td align="right" width="20%"> <a accesskey="n" href="Java-Broker-High-Availability-NodeOperations.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="Java-Broker-High-Availability-Behaviour"></a>10.4. Behaviour of the Group</h2></div></div></div><p>This section first describes the behaviour of the group in its default configuration. It | |
then goes on to talk about the various controls that are available to override it. It | |
describes the controls available that affect the <a class="link" href="http://en.wikipedia.org/wiki/ACID#Durability" target="_top">durability</a> of transactions and | |
the data consistency between the master and replicas and thus make trade offs between | |
performance and reliability.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="Java-Broker-High-Availability-Behaviour-Default-Behaviour"></a>10.4.1. Default Behaviour</h3></div></div></div><p>Let's first look at the behaviour of a group in default configuration.</p><p>In the default configuration, for any messaging work to be done, there must be at least | |
<span class="emphasis"><em>quorum</em></span> nodes present. This means for example, in a three node group, | |
this means there must be at least two nodes available.</p><p>When a messaging client sends a transaction, it can be assured that, before the control | |
returns back to his application after the commit call that the following is true:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>At the master, the transaction is <span class="emphasis"><em>written to disk and OS level caches | |
are flushed</em></span> meaning the data is on the storage device.</p></li><li class="listitem"><p>At least quorum minus 1 replicas, <span class="emphasis"><em>acknowledge the receipt of | |
transaction</em></span>. The replicas will write the data to the storage device | |
sometime later.</p></li></ul></div><p>If there were to be a master failure immediately after the transaction was committed, | |
the transaction would be held by at least quorum minus one replicas. For example, if we had | |
a group of three, then we would be assured that at least one replica held the | |
transaction.</p><p>In the event of a master failure, if quorum nodes remain, those nodes hold an election. | |
The nodes will elect master the node with the most recent transaction. If two or more nodes | |
have the most recent transaction the group makes an arbitrary choice. If quorum number of | |
nodes does not remain, the nodes cannot elect a new master and will wait until nodes rejoin. | |
You will see later that manual controls are available allow service to be restored from | |
fewer than quorum nodes and to influence which node gets elected in the event of a | |
tie.</p><p>Whenever a group has fewer than quorum nodes present, the virtualhost will be | |
unavailable and messaging connections will be refused. If quorum disappears at the very | |
moment a messaging client sends a transaction that transaction will fail.</p><p>You will have noticed the difference in the synchronization policies applied the master | |
and the replicas. The replicas send the acknowledgement back before the data is written to | |
disk. The master synchronously writes the transaction to storage. This is an example of a | |
trade off between durability and performance. We will see more about how to control this | |
trade off later.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="Java-Broker-High-Availability-Behaviour-SynchronizationPolicy"></a>10.4.2. Synchronization Policy</h3></div></div></div><p>The <span class="emphasis"><em>synchronization policy</em></span> dictates what a node must do when it | |
receives a transaction before it acknowledges that transaction to the rest of the | |
group.</p><p>The following options are available: </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>SYNC</em></span>. The node must write the transaction to disk and flush | |
any OS level buffers before sending the acknowledgement. SYNC is offers the highest | |
durability but offers the least performance.</p></li><li class="listitem"><p><span class="emphasis"><em>WRITE_NO_SYNC</em></span>. The node must write the transaction to disk | |
before sending the acknowledgement. OS level buffers will be flush as some point | |
later. This typically provides an assurance against failure of the application but not | |
the operating system or hardware.</p></li><li class="listitem"><p><span class="emphasis"><em>NO_SYNC</em></span>. The node immediately sends the acknowledgement. The | |
transaction will be written and OS level buffers flushed as some point later. NO_SYNC | |
offers the highest performance but the lowest durability level. This synchronization | |
policy is sometimes known as <span class="emphasis"><em>commit to the network</em></span>.</p></li></ul></div><p>It is possible to assign a one policy to the master and a different policy to the | |
replicas. These are configured as <a class="link" href="Java-Broker-Management-Managing-Virtualhosts.html#Java-Broker-Management-Managing-Virtualhost-Attributes" title="7.4.3. Attributes">attributes on the | |
virtualhost</a>. By default the master uses <span class="emphasis"><em>SYNC</em></span> and replicas use | |
<span class="emphasis"><em>NO_SYNC</em></span>.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="Java-Broker-High-Availability-Behaviour-NodePriority"></a>10.4.3. Node Priority</h3></div></div></div><p>Node priority can be used to influence the behaviour of the election algorithm. It is | |
useful in the case were you want to favour some nodes over others. For instance, if you wish | |
to favour nodes located in a particular data centre over those in a remote site. </p><p>The following options are available: </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Highest</em></span>. Nodes with this priority will be more favoured. In | |
the event of two or more nodes having the most recent transaction, the node with this | |
priority will be elected master. If two or more nodes have this priority the algorithm | |
will make an arbitrary choice.</p></li><li class="listitem"><p><span class="emphasis"><em>High</em></span>. Nodes with this priority will be favoured but not as | |
much so as those with Highest.</p></li><li class="listitem"><p><span class="emphasis"><em>Normal</em></span>. This is default election priority.</p></li><li class="listitem"><p><span class="emphasis"><em>Never</em></span>. The node will never be elected <span class="emphasis"><em>even if the | |
node has the most recent transaction</em></span>. The node will still keep up to date | |
with the replication stream and will still vote itself, but can just never be | |
elected.</p></li></ul></div><p> | |
</p><p>Node priority is configured as an <a class="link" href="Java-Broker-Management-Managing-Virtualhost-Nodes.html#Java-Broker-Management-Managing-Virtualhost-Nodes-Attributes" title="7.3.2. Attributes">attribute on the | |
virtualhost node</a> and can be changed at runtime and is effective immediately.</p><div class="important" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Important</h3><p>Use of the Never priority can lead to transaction loss. For example, consider a group | |
of three where replica-2 is marked as Never. If a transaction were to arrive and it be | |
acknowledged only by Master and Replica-2, the transaction would succeed. Replica 1 is | |
running behind for some reason (perhaps a full-GC). If a Master failure were to occur at | |
that moment, the replicas would elect Replica-1 even though Replica-2 had the most recent | |
transaction.</p><p>Transaction loss is reported by message <a class="link" href="Java-Broker-Appendix-Operation-Logging.html#Java-Broker-Appendix-Operation-Logging-Message-HA-1014">HA-1014</a>.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="Java-Broker-High-Availability-Behaviour-MinimumNumberOfNodes"></a>10.4.4. Required Minimum Number Of Nodes</h3></div></div></div><p>This controls the required minimum number of nodes to complete a transaction and to | |
elect a new master. By default, the required number of nodes is set to | |
<span class="emphasis"><em>Default</em></span> (which signifies quorum).</p><p>It is possible to reduce the required minimum number of nodes. The rationale for doing | |
this is normally to temporarily restore service from fewer than quorum nodes following an | |
extraordinary failure.</p><p>For example, consider a group of three. If one node were to fail, as quorum still | |
remained, the system would continue work without any intervention. If the failing node were | |
the master, a new master would be elected.</p><p>What if a further node were to fail? Quorum no longer remains, and the remaining node | |
would just wait. It cannot elect itself master. What if we wanted to restore service from | |
just this one node?</p><p>In this case, Required Number of Nodes can be reduced to 1 on the remain node, allowing | |
the node to elect itself and service to be restored from the singleton. Required minimum | |
number of nodes is configured as an <a class="link" href="Java-Broker-Management-Managing-Virtualhost-Nodes.html#Java-Broker-Management-Managing-Virtualhost-Nodes-Attributes" title="7.3.2. Attributes">attribute on the | |
virtualhost node</a> and can be changed at runtime and is effective immediately.</p><div class="important" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Important</h3><p>The attribute must be used cautiously. Careless use will lead to lost transactions and | |
can lead to a <a class="link" href="http://en.wikipedia.org/wiki/Split-brain_(computing)" target="_top">split-brain</a> in the event of a network partition. If used to temporarily restore | |
service from fewer than quorum nodes, it is <span class="emphasis"><em>imperative</em></span> to revert it | |
to the Default value as the failed nodes are restored.</p><p>Transaction loss is reported by message <a class="link" href="Java-Broker-Appendix-Operation-Logging.html#Java-Broker-Appendix-Operation-Logging-Message-HA-1014">HA-1014</a>.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="Java-Broker-High-Availability-Behaviour-DesignatedPrimary"></a>10.4.5. Allow to Operate Solo</h3></div></div></div><p>This attribute only applies to groups containing exactly two nodes.</p><p> In a group of two, if a node were to fail then in default configuration work will cease | |
as quorum no longer exists. A single node cannot elect itself master. </p><p>The <code class="literal">allow to operate solo</code> flag allows a node in a two node group to elect itself master and | |
to operate sole. It is configured as an <a class="link" href="Java-Broker-Management-Managing-Virtualhost-Nodes.html#Java-Broker-Management-Managing-Virtualhost-Nodes-Attributes" title="7.3.2. Attributes">attribute on the | |
virtualhost node</a> and can be changed at runtime and is effective immediately.</p><p>For example, consider a group of two where the master fails. Service will be interrupted | |
as the remaining node cannot elect itself master. To allow it to become master, apply the | |
<code class="literal">allow to operate solo</code> flag to it. It will elect itself master and work can continue, albeit | |
from one node.</p><div class="important" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Important</h3><p>It is imperative not to allow the <code class="literal">allow to operate solo</code> flag to be set on both nodes at once. To | |
do so will mean, in the event of a network partition, a <a class="link" href="http://en.wikipedia.org/wiki/Split-brain_(computing)" target="_top">split-brain</a> will | |
occur.</p><p>Transaction loss is reported by message <a class="link" href="Java-Broker-Appendix-Operation-Logging.html#Java-Broker-Appendix-Operation-Logging-Message-HA-1014">HA-1014</a>.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="Java-Broker-High-Availability-Behaviour-MaximumMessageSize"></a>10.4.6. Maximum message size</h3></div></div></div><p> | |
Internally, BDB JE HA restricts the maximum size of replication stream records passed from the master | |
to the replica(s). This helps prevent DOS attacks. | |
If expected application maximum message size is greater than 5MB, the BDB JE setting | |
<code class="literal">je.rep.maxMessageSize</code> and Qpid context variable <code class="literal">qpid.max_message_size</code> | |
needs to be adjusted to reflect this in order to avoid running into the BDB HA JE limit. | |
</p></div></div><div class="navfooter"><hr /><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="Java-Broker-High-Availability-CreatingGroup.html">Prev</a> </td><td align="center" width="20%"><a accesskey="u" href="Java-Broker-High-Availability.html">Up</a></td><td align="right" width="40%"> <a accesskey="n" href="Java-Broker-High-Availability-NodeOperations.html">Next</a></td></tr><tr><td align="left" valign="top" width="40%">10.3. Creating a group </td><td align="center" width="20%"><a accesskey="h" href="Apache-Qpid-Broker-J-Book.html">Home</a></td><td align="right" valign="top" width="40%"> 10.5. Node Operations</td></tr></table></div></div> |