doc/book/src/cpp-broker/Active-Passive-Cluster.xml - qpid - Git at Google

 <?xml version="1.0" encoding="utf-8"?>
 <!--

 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 h"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.

 -->

 <section id="chap-Messaging_User_Guide-Active_Passive_Cluster">

   <title>Active-passive Messaging Clusters</title>

   <section>
     <title>Overview</title>
     <para>

       The High Availability (HA) module provides
       <firstterm>active-passive</firstterm>, <firstterm>hot-standby</firstterm>
       messaging clusters to provide fault tolerant message delivery.
     </para>
     <para>
       In an active-passive cluster only one broker, known as the
       <firstterm>primary</firstterm>, is active and serving clients at a time. The other
       brokers are standing by as <firstterm>backups</firstterm>. Changes on the primary
       are replicated to all the backups so they are always up-to-date or "hot". Backup
       brokers reject client connection attempts, to enforce the requirement that clients
       only connect to the primary.
     </para>
     <para>
       If the primary fails, one of the backups is promoted to take over as the new
       primary. Clients fail-over to the new primary automatically. If there are multiple
       backups, the other backups also fail-over to become backups of the new primary.
     </para>
     <para>
       This approach relies on an external <firstterm>cluster resource manager</firstterm>
       to detect failures, choose the new primary and handle network partitions. <ulink
       url="https://fedorahosted.org/cluster/wiki/RGManager">Rgmanager</ulink> is supported
       initially, but others may be supported in the future.
     </para>
     <section>
       <title>Avoiding message loss</title>
       <para>
 	In order to avoid message loss, the primary broker <emphasis>delays
 	acknowledgment</emphasis> of messages received from clients until the message has
 	been replicated to and acknowledged by all of the back-up brokers. This means that
 	all <emphasis>acknowledged</emphasis> messages are safely stored on all the backup
 	brokers.
       </para>
       <para>
 	Clients keep <emphasis>unacknowledged</emphasis> messages in a buffer
 	<footnote>
 	  <para>
 	    You can control the maximum number of messages in the buffer by setting the
 	    client's <literal>capacity</literal>. For details of how to set the capacity
 	    in client code see &#34;Using the Qpid Messaging API&#34; in
 	    <citetitle>Programming in Apache Qpid</citetitle>.
 	  </para>
 	</footnote>
 	until they are acknowledged by the primary. If the primary fails, clients will
 	fail-over to the new primary and <emphasis>re-send</emphasis> all their
 	unacknowledged messages.
 	<footnote>
 	  <para>
 	  Clients must use "at-least-once" reliability to enable re-send of unacknowledged
 	  messages. This is the default behavior, no options need be set to enable it. For
 	  details of client addressing options see &#34;Using the Qpid Messaging API&#34;
 	  in <citetitle>Programming in Apache Qpid</citetitle>.
 	  </para>
 	</footnote>
       </para>
       <para>
 	  So if the primary crashes, all the <emphasis>acknowledged</emphasis>
 	  messages will be available on the backup that takes over as the new
 	  primary. The <emphasis>unacknowledged</emphasis> messages will be
 	  re-sent by the clients.  Thus no messages are lost.
       </para>
       <para>
 	Note that this means it is possible for messages to be
 	<emphasis>duplicated</emphasis>. In the event of a failure it is possible for a
 	message to received by the backup that becomes the new primary
 	<emphasis>and</emphasis> re-sent by the client.  The application must take steps
 	to identify and eliminate duplicates.
       </para>
       <para>
 	When a new primary is promoted after a fail-over it is initially in
 	"recovering" mode. In this mode, it delays acknowledgment of messages
 	on behalf of all the backups that were connected to the previous
 	primary. This protects those messages against a failure of the new
 	primary until the backups have a chance to connect and catch up.
       </para>
       <para>
 	Not all messages need to be replicated to the back-up brokers. If a
 	message is consumed and acknowledged by a regular client before it has
 	been replicated to a backup, then it doesn't need to be replicated.
       </para>
       <variablelist>
 	<title>Status of a HA broker</title>
 	<varlistentry>
 	  <term>Joining</term>
 	  <listitem>
 	    <para>
 	      Initial status of a new broker that has not yet connected to the primary.
 	    </para>
 	  </listitem>
 	</varlistentry>
 	<varlistentry>
 	  <term>Catch-up</term>
 	  <listitem>
 	    <para>
 	      A backup broker that is connected to the primary and catching up
 	      on queues and messages.
 	    </para>
 	  </listitem>
 	</varlistentry>
 	<varlistentry>
 	  <term>Ready</term>
 	  <listitem>
 	    <para>
 	      A backup broker that is fully caught-up and ready to take over as
 	      primary.
 	    </para>
 	  </listitem>
 	</varlistentry>
 	<varlistentry>
 	  <term>Recovering</term>
 	  <listitem>
 	    <para>
 	      The newly-promoted primary, waiting for backups to connect and catch up.
 	    </para>
 	  </listitem>
 	</varlistentry>
 	<varlistentry>
 	  <term>Active</term>
 	  <listitem>
 	    <para>
 	      The active primary broker with all backups connected and caught-up.
 	    </para>
 	  </listitem>
 	</varlistentry>
       </variablelist>
     </section>
     <section>
       <title>Limitations</title>
       <para>
 	There are a some known limitations in the current implementation. These
 	will be fixed in furture versions.
       </para>
       <itemizedlist>
 	<listitem>
 	  <para>
 	    Transactional changes to queue state are not replicated atomically. If
 	    the primary crashes during a transaction, it is possible that the
 	    backup could contain only part of the changes introduced by a
 	    transaction.
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    Configuration changes (creating or deleting queues, exchanges and
 	    bindings) are replicated asynchronously. Management tools used to
 	    make changes will consider the change complete when it is complete
 	    on the primary, it may not yet be replicated to all the backups.
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    Federated links <emphasis>from</emphasis> the primary will be lost
 	    in fail over, they will not be re-connected to the new
 	    primary. Federation links <emphasis>to</emphasis> the primary will
 	    fail over.
 	  </para>
 	</listitem>
       </itemizedlist>
     </section>
   </section>

   <section>
     <title>Virtual IP Addresses</title>
     <para>
       Some resource managers (including <command>rgmanager</command>) support
       <firstterm>virtual IP addresses</firstterm>. A virtual IP address is an IP
       address that can be relocated to any of the nodes in a cluster.  The
       resource manager associates this address with the primary node in the
       cluster, and relocates it to the new primary when there is a failure. This
       simplifies configuration as you can publish a single IP address rather
       than a list.
     </para>
     <para>
       A virtual IP address can be used by clients and backup brokers to connect
       to the primary. The following sections will explain how to configure
       virtual IP addresses for clients or brokers.
     </para>
   </section>

   <section>
     <title>Configuring the Brokers</title>
     <para>
       The broker must load the <filename>ha</filename> module, it is loaded by
       default. The following broker options are available for the HA module.
     </para>
     <table frame="all" id="ha-broker-options">
       <title>Broker Options for High Availability Messaging Cluster</title>
       <tgroup align="left" cols="2" colsep="1" rowsep="1">
 	<colspec colname="c1"/>
 	<colspec colname="c2"/>
 	<thead>
 	  <row>
 	    <entry align="center" nameend="c2" namest="c1">
 	      Options for High Availability Messaging Cluster
 	    </entry>
 	  </row>
 	</thead>
 	<tbody>
 	  <row>
 	    <entry>
 	      <literal>ha-cluster <replaceable>yes|no</replaceable></literal>
 	    </entry>
 	    <entry>
 	      Set to "yes" to have the broker join a cluster.
 	    </entry>
 	  </row>
 	  <row>
 	    <entry>
 	      <literal>ha-queue-replication <replaceable>yes|no</replaceable></literal>
 	    </entry>
 	    <entry>
 	      Enable replication of specific queues without joining a cluster, see <xref linkend="ha-queue-replication"/>.
 	    </entry>
 	  </row>
 	  <row>
 	    <entry>
 	      <literal>ha-brokers-url <replaceable>URL</replaceable></literal>
 	    </entry>
 	    <entry>
 	      <para>
 		The URL
 		<footnote id="ha-url-grammar">
 		  <para>
 		  The full format of the URL is given by this grammar:
 		  <programlisting>
 url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
 addr = tcp_addr / rmda_addr / ssl_addr / ...
 tcp_addr = ["tcp:"] host [":" port]
 rdma_addr = "rdma:" host [":" port]
 ssl_addr = "ssl:" host [":" port]'
 		  </programlisting>
 		  </para>
 		</footnote>
 		used by cluster brokers to connect to each other. The URL should
 		contain a comma separated list of the broker addresses, rather than a
 		virtual IP address.
 	      </para>
 	    </entry>
 	  </row>
 	  <row>
 	    <entry><literal>ha-public-url <replaceable>URL</replaceable></literal> </entry>
 	    <entry>
 	      <para>
 		The URL <footnoteref linkend="ha-url-grammar"/> is advertised to
 		clients as the "known-hosts" for fail-over.  It can be a list or
 		a single virtual IP address. A virtual IP address is recommended.
 	      </para>
 	      <para>
 		Using this option you can put client and broker traffic on
 		separate networks, which is recommended.
 	      </para>
 	      <para>
 		Note: When HA clustering is enabled the broker option
 		<literal>known-hosts-url</literal> is ignored and over-ridden by
 		the <literal>ha-public-url</literal> setting.
 	      </para>
 	    </entry>
 	  </row>
 	  <row>
 	    <entry><literal>ha-replicate </literal><replaceable>VALUE</replaceable></entry>
 	    <entry>
 	      <para>
 		Specifies whether queues and exchanges are replicated by default.
 		<replaceable>VALUE</replaceable> is one of: <literal>none</literal>,
 		<literal>configuration</literal>, <literal>all</literal>.
 		For details see <xref linkend="ha-creating-replicated"/>.
 	      </para>
 	    </entry>
 	  </row>
 	  <row>
 	    <entry>
 	      <para><literal>ha-username <replaceable>USER</replaceable></literal></para>
 	      <para><literal>ha-password <replaceable>PASS</replaceable></literal></para>
 	      <para><literal>ha-mechanism <replaceable>MECH</replaceable></literal></para>
 	    </entry>
 	    <entry>
 	      Authentication settings used by HA brokers to connect to each other.
 	      If you are using authorization
 	      (<xref linkend="sect-Messaging_User_Guide-Security-Authorization"/>)
 	      then this user must have all permissions.
 	    </entry>
 	  </row>
 	  <row>
 	    <entry><literal>ha-backup-timeout <replaceable>SECONDS</replaceable></literal> </entry>
 	    <entry>
 	      <para>
 		Maximum time that a recovering primary will wait for an expected
 		backup to connect and become ready.
 	      </para>
 	    </entry>
 	  </row>
 	  <row>
 	    <entry><literal>link-maintenance-interval <replaceable>SECONDS</replaceable></literal></entry>
 	    <entry>
 	      <para>
 		Interval for the broker to check link health and re-connect links if need
 		be. If you want brokers to fail over quickly you can set this to a
 		fraction of a second, for example: 0.1.
 	      </para>
 	    </entry>
 	  </row>
 	  <row>
 	    <entry><literal>link-heartbeat-interval <replaceable>SECONDS</replaceable></literal></entry>
 	    <entry>
 	      <para>
 		Heartbeat interval for replication links. The link will be assumed broken
 		if there is no heartbeat for twice the interval.
 	      </para>
 	    </entry>
 	  </row>
 	</tbody>
       </tgroup>
     </table>
     <para>
       To configure a HA cluster you must set at least <literal>ha-cluster</literal> and
       <literal>ha-brokers-url</literal>.
     </para>
   </section>

   <section>
     <title>The Cluster Resource Manager</title>
     <para>
       Broker fail-over is managed by a <firstterm>cluster resource
       manager</firstterm>.  An integration with <ulink
       url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is
       provided, but it is possible to integrate with other resource managers.
     </para>
     <para>
       The resource manager is responsible for starting the <command>qpidd</command> broker
       on each node in the cluster. The resource manager then <firstterm>promotes</firstterm>
       one of the brokers to be the primary. The other brokers connect to the primary as
       backups, using the URL provided in the <literal>ha-brokers-url</literal> configuration
       option.
     </para>
     <para>
       Once connected, the backup brokers synchronize their state with the
       primary.  When a backup is synchronized, or "hot", it is ready to take
       over if the primary fails.  Backup brokers continually receive updates
       from the primary in order to stay synchronized.
     </para>
     <para>
       If the primary fails, backup brokers go into fail-over mode. The resource
       manager must detect the failure and promote one of the backups to be the
       new primary.  The other backups connect to the new primary and synchronize
       their state with it.
     </para>
     <para>
       The resource manager is also responsible for protecting the cluster from
       <firstterm>split-brain</firstterm> conditions resulting from a network partition.  A
       network partition divide a cluster into two sub-groups which cannot see each other.
       Usually a <firstterm>quorum</firstterm> voting algorithm is used that disables nodes
       in the inquorate sub-group.
     </para>
   </section>

   <section>
     <title>Configuring <command>rgmanager</command> as resource manager</title>
     <para>
       This section assumes that you are already familiar with setting up and configuring
       clustered services using <command>cman</command> and
       <command>rgmanager</command>. It will show you how to configure an active-passive,
       hot-standby <command>qpidd</command> HA cluster with <command>rgmanager</command>.
     </para>
     <para>
       You must provide a <literal>cluster.conf</literal> file to configure
       <command>cman</command> and <command>rgmanager</command>.  Here is
       an example <literal>cluster.conf</literal> file for a cluster of 3 nodes named
       node1, node2 and node3. We will go through the configuration step-by-step.
     </para>
     <programlisting>
       <![CDATA[
 <?xml version="1.0"?>
 <!--
 This is an example of a cluster.conf file to run qpidd HA under rgmanager.
 This example assumes a 3 node cluster, with nodes named node1, node2 and node3.

 NOTE: fencing is not shown, you must configure fencing appropriately for your cluster.
 -->

 <cluster name="qpid-test" config_version="18">
   <!-- The cluster has 3 nodes. Each has a unique nodid and one vote
        for quorum. -->
   <clusternodes>
     <clusternode name="node1.example.com" nodeid="1"/>
     <clusternode name="node2.example.com" nodeid="2"/>
     <clusternode name="node3.example.com" nodeid="3"/>
   </clusternodes>
   <!-- Resouce Manager configuration. -->
   <rm>
     <!--
 	There is a failoverdomain for each node containing just that node.
 	This lets us stipulate that the qpidd service should always run on each node.
     -->
     <failoverdomains>
       <failoverdomain name="node1-domain" restricted="1">
 	<failoverdomainnode name="node1.example.com"/>
       </failoverdomain>
       <failoverdomain name="node2-domain" restricted="1">
 	<failoverdomainnode name="node2.example.com"/>
       </failoverdomain>
       <failoverdomain name="node3-domain" restricted="1">
 	<failoverdomainnode name="node3.example.com"/>
       </failoverdomain>
     </failoverdomains>

     <resources>
       <!-- This script starts a qpidd broker acting as a backup. -->
       <script file="/etc/init.d/qpidd" name="qpidd"/>

       <!-- This script promotes the qpidd broker on this node to primary. -->
       <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/>

       <!-- This is a virtual IP address for broker replication traffic. -->
       <ip address="20.0.10.200" monitor_link="1"/>

       <!-- This is a virtual IP address on a seprate network for client traffic. -->
       <ip address="20.0.20.200" monitor_link="1"/>
     </resources>

     <!-- There is a qpidd service on each node, it should be restarted if it fails. -->
     <service name="node1-qpidd-service" domain="node1-domain" recovery="restart">
       <script ref="qpidd"/>
     </service>
     <service name="node2-qpidd-service" domain="node2-domain" recovery="restart">
       <script ref="qpidd"/>
     </service>
     <service name="node3-qpidd-service" domain="node3-domain"  recovery="restart">
       <script ref="qpidd"/>
     </service>

     <!-- There should always be a single qpidd-primary service, it can run on any node. -->
     <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">
       <script ref="qpidd-primary"/>
       <!-- The primary has the IP addresses for brokers and clients to connect. -->
       <ip ref="20.0.10.200"/>
       <ip ref="20.0.20.200"/>
     </service>
   </rm>
 </cluster>
       ]]>
     </programlisting>

     <para>
       There is a <literal>failoverdomain</literal> for each node containing just that
       one node.  This lets us stipulate that the qpidd service should always run on all
       nodes.
     </para>
     <para>
       The <literal>resources</literal> section defines the <command>qpidd</command>
       script used to start the <command>qpidd</command> service. It also defines the
       <command>qpid-primary</command> script which does not
       actually start a new service, rather it promotes the existing
       <command>qpidd</command> broker to primary status.
     </para>
     <para>
       The <literal>resources</literal> section also defines a pair of virtual IP
       addresses on different sub-nets. One will be used for broker-to-broker
       communication, the other for client-to-broker.
     </para>
     <para>
       To take advantage of the virtual IP addresses, <filename>qpidd.conf</filename>
       should contain these  lines:
     </para>
     <programlisting>
       ha-cluster=yes
       ha-brokers-url=20.0.20.200
       ha-public-url=20.0.10.200
     </programlisting>
     <para>
       This configuration specifies that backup brokers will use 20.0.20.200
       to connect to the primary and will advertise 20.0.10.200 to clients.
       Clients should connect to 20.0.10.200.
     </para>
     <para>
       The <literal>service</literal> section defines 3 <literal>qpidd</literal>
       services, one for each node. Each service is in a restricted fail-over
       domain containing just that node, and has the <literal>restart</literal>
       recovery policy. The effect of this is that rgmanager will run
       <command>qpidd</command> on each node, restarting if it fails.
     </para>
     <para>
       There is a single <literal>qpidd-primary-service</literal> using the
       <command>qpidd-primary</command> script which is not restricted to a
       domain and has the <literal>relocate</literal> recovery policy. This means
       rgmanager will start <command>qpidd-primary</command> on one of the nodes
       when the cluster starts and will relocate it to another node if the
       original node fails. Running the <literal>qpidd-primary</literal> script
       does not start a new broker process, it promotes the existing broker to
       become the primary.
     </para>
   </section>

   <section>
     <title>Broker Administration Tools</title>
     <para>
       Normally, clients are not allowed to connect to a backup broker. However
       management tools are allowed to connect to a backup brokers. If you use
       these tools you <emphasis>must not</emphasis> add or remove messages from
       replicated queues, nor create or delete replicated queues or exchanges as
       this will disrupt the replication process and may cause message loss.
     </para>
     <para>
       <command>qpid-ha</command> allows you to view and change HA configuration settings.
     </para>
     <para>
       The tools <command>qpid-config</command>, <command>qpid-route</command> and
       <command>qpid-stat</command> will connect to a backup if you pass the flag <command>ha-admin</command> on the
       command line.
     </para>
   </section>

   <section id="ha-creating-replicated">
     <title>Controlling replication of queues and exchanges</title>
     <para>
       By default, queues and exchanges are not replicated automatically. You can change
       the default behavior by setting the <literal>ha-replicate</literal> configuration
       option. It has one of the following values:
       <itemizedlist>
 	<listitem>
 	  <para>
 	    <firstterm>all</firstterm>: Replicate everything automatically: queues,
 	    exchanges, bindings and messages.
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    <firstterm>configuration</firstterm>: Replicate the existence of queues,
 	    exchange and bindings but don't replicate messages.
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    <firstterm>none</firstterm>: Don't replicate anything, this is the default.
 	  </para>
 	</listitem>
       </itemizedlist>
     </para>
     <para>
       You can over-ride the default for a particular queue or exchange by passing the
       argument <literal>qpid.replicate</literal> when creating the queue or exchange. It
       takes the same values as <literal>ha-replicate</literal>
     </para>
     <para>
       Bindings are automatically replicated if the queue and exchange being bound both
       have replication <literal>all</literal> or <literal>configuration</literal>, they
       are not replicated otherwise.
     </para>
     <para>
       You can create replicated queues and exchanges with the
       <command>qpid-config</command> management tool like this:
     </para>
     <programlisting>
       qpid-config add queue myqueue --replicate all
     </programlisting>
     <para>
       To create replicated queues and exchanges via the client API, add a
       <literal>node</literal> entry to the address like this:
     </para>
     <programlisting>
       "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
     </programlisting>
     <para>
       There are some built-in exchanges created automatically by the broker, these
       exchangs are never replicated. The built-in exchanges are the default (nameless)
       exchange, the AMQP standard exchanges (<literal>amq.direct, amq.topic, amq.fanout</literal> and
       <literal>amq.match</literal>) and the management exchanges (<literal>qpid.management, qmf.default.direct</literal> and
       <literal>qmf.default.topic</literal>)
     </para>
     <para>
       Note that if you bind a replicated queue to one of these exchanges, the
       binding wil <emphasis>not</emphasis> be replicated, so the queue will not
       have the binding after a fail-over.
     </para>
   </section>

   <section>
     <title>Client Connection and Fail-over</title>
     <para>
       Clients can only connect to the primary broker. Backup brokers
       automatically reject any connection attempt by a client.
     </para>
     <para>
       Clients are configured with the URL for the cluster (details below for
       each type of client). There are two possibilities
       <itemizedlist>
 	<listitem>
 	  <para>
 	    The URL contains multiple addresses, one for each broker in the cluster.
 	  </para>
 	</listitem>
 	<listitem>
 	  <para>
 	    The URL contains a single <firstterm>virtual IP address</firstterm>
 	    that is assigned to the primary broker by the resource manager.
 	    <footnote><para>Only if the resource manager supports virtual IP
 	    addresses</para></footnote>
 	  </para>
 	</listitem>
       </itemizedlist>
       In the first case, clients will repeatedly re-try each address in the URL
       until they successfully connect to the primary. In the second case the
       resource manager will assign the virtual IP address to the primary broker,
       so clients only need to re-try on a single address.
     </para>
     <para>
       When the primary broker fails, clients re-try all known cluster addresses
       until they connect to the new primary.  The client re-sends any messages
       that were previously sent but not acknowledged by the broker at the time
       of the failure.  Similarly messages that have been sent by the broker, but
       not acknowledged by the client, are re-queued.
     </para>
     <para>
       TCP can be slow to detect connection failures. A client can configure a
       connection to use a <firstterm>heartbeat</firstterm> to detect connection
       failure, and can specify a time interval for the heartbeat. If heartbeats
       are in use, failures will be detected no later than twice the heartbeat
       interval. The following sections explain how to enable heartbeat in each
       client.
     </para>
     <para>
       See &#34;Cluster Failover&#34; in <citetitle>Programming in Apache
       Qpid</citetitle> for details on how to keep the client aware of cluster
       membership.
     </para>
     <para>
       Suppose your cluster has 3 nodes: <literal>node1</literal>,
       <literal>node2</literal> and <literal>node3</literal> all using the
       default AMQP port, and you are not using a virtual IP address. To connect
       a client you need to specify the address(es) and set the
       <literal>reconnect</literal> property to <literal>true</literal>. The
       following sub-sections show how to connect each type of client.
     </para>
     <section>
       <title>C++ clients</title>
       <para>
 	With the C++ client, you specify multiple cluster addresses in a single URL
 	<footnote>
 	  <para>
 	    The full grammar for the URL is:
 	  </para>
 	  <programlisting>
 	    url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
 	    addr = tcp_addr / rmda_addr / ssl_addr / ...
 	    tcp_addr = ["tcp:"] host [":" port]
 	    rdma_addr = "rdma:" host [":" port]
 	    ssl_addr = "ssl:" host [":" port]'
 	  </programlisting>
 	</footnote>
 	You also need to specify the connection option
 	<literal>reconnect</literal> to be true.  For example:
       </para>
       <programlisting>
 	qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
       </programlisting>
       <para>
 	Heartbeats are disabled by default. You can enable them by specifying a
 	heartbeat interval (in seconds) for the connection via the
 	<literal>heartbeat</literal> option. For example:
 	<programlisting>
 	  qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");
 	</programlisting>
       </para>
     </section>
     <section>
       <title>Python clients</title>
       <para>
 	With the python client, you specify <literal>reconnect=True</literal>
 	and a list of <replaceable>host:port</replaceable> addresses as
 	<literal>reconnect_urls</literal> when calling
 	<literal>Connection.establish</literal> or
 	<literal>Connection.open</literal>
       </para>
       <programlisting>
 	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
       </programlisting>
       <para>
 	Heartbeats are disabled by default. You can
 	enable them by specifying a heartbeat interval (in seconds) for the
 	connection via the &#39;heartbeat&#39; option. For example:
       </para>
       <programlisting>
 	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
       </programlisting>
     </section>
     <section>
       <title>Java JMS Clients</title>
       <para>
 	In Java JMS clients, client fail-over is handled automatically if it is
 	enabled in the connection.  You can configure a connection to use
 	fail-over using the <command>failover</command> property:
       </para>

       <screen>
 	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;&amp;failover=&#39;failover_exchange&#39;
       </screen>
       <para>
 	This property can take three values:
       </para>
       <variablelist>
 	<title>Fail-over Modes</title>
 	<varlistentry>
 	  <term>failover_exchange</term>
 	  <listitem>
 	    <para>
 	      If the connection fails, fail over to any other broker in the cluster.
 	    </para>

 	  </listitem>

 	</varlistentry>
 	<varlistentry>
 	  <term>roundrobin</term>
 	  <listitem>
 	    <para>
 	      If the connection fails, fail over to one of the brokers specified in the <command>brokerlist</command>.
 	    </para>

 	  </listitem>

 	</varlistentry>
 	<varlistentry>
 	  <term>singlebroker</term>
 	  <listitem>
 	    <para>
 	      Fail-over is not supported; the connection is to a single broker only.
 	    </para>

 	  </listitem>

 	</varlistentry>

       </variablelist>
       <para>
 	In a Connection URL, heartbeat is set using the <command>idle_timeout</command> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:
       </para>

       <screen>
 	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;,idle_timeout=3
       </screen>
     </section>
   </section>

   <section>
     <title>Security.</title>
     <para>
       You can secure your cluster using the authentication and authorization features
       described in <xref linkend="chap-Messaging_User_Guide-Security"/>.
     </para>
     <para>
       Backup brokers connect to the primary broker and subscribe for management
       events and queue contents. You can specify the identity used to connect
       to the primary with the following options:
     </para>
     <table frame="all" id="ha-broker-security-options">
       <title>Security options for High Availability Messaging Cluster</title>
       <tgroup align="left" cols="2" colsep="1" rowsep="1">
 	<colspec colname="c1" colwidth="1*"/>
 	<colspec colname="c2" colwidth="3*"/>
 	<thead>
 	  <row>
 	    <entry align="center" nameend="c2" namest="c1">
 	      Security options for High Availability Messaging Cluster
 	    </entry>
 	  </row>
 	</thead>
 	<tbody>
 	  <row>
 	    <entry>
 	      <para><literal>ha-username <replaceable>USER</replaceable></literal></para>
 	      <para><literal>ha-password <replaceable>PASS</replaceable></literal></para>
 	      <para><literal>ha-mechanism <replaceable>MECH</replaceable></literal></para>
 	    </entry>
 	    <entry>
 	      Authentication settings used by HA brokers to connect to each other.
 	      If you are using authorization
 	      (<xref linkend="sect-Messaging_User_Guide-Security-Authorization"/>)
 	      then this user must have all permissions.
 	    </entry>
 	  </row>
 	</tbody>
       </tgroup>
     </table>
     <para>
       This identity is also used to authorize actions taken on the backup broker to replicate
       from the primary, for example to create queues or exchanges.
     </para>
   </section>

   <section>
     <title>Integrating with other Cluster Resource Managers</title>
     <para>
       To integrate with a different resource manager you must configure it to:
       <itemizedlist>
 	<listitem><para>Start a qpidd process on each node of the cluster.</para></listitem>
 	<listitem><para>Restart qpidd if it crashes.</para></listitem>
 	<listitem><para>Promote exactly one of the brokers to primary.</para></listitem>
 	<listitem><para>Detect a failure and promote a new primary.</para></listitem>
       </itemizedlist>
     </para>
     <para>
       The <command>qpid-ha</command> command allows you to check if a broker is primary,
       and to promote a backup to primary.
     </para>
     <para>
       To test if a broker is the primary:
       <programlisting>
 	qpid-ha -b <replaceable>broker-address</replaceable> status --expect=primary
       </programlisting>
       This command will return 0 if the broker at <replaceable>broker-address</replaceable>
       is the primary, non-0 otherwise.
     </para>
     <para>
       To promote a broker to primary:
       <programlisting>
 	qpid-ha -b <replaceable>broker-address</replaceable> promote
       </programlisting>
     </para>
     <para>
       <command>qpid-ha --help</command> gives information on other commands and options available.
       You can also use <command>qpid-ha</command> to manually examine and promote brokers. This
       can be useful for testing failover scenarios without having to set up a full resource manager,
       or to simulate a cluster on a single node. For deployment, a resource manager is required.
     </para>
   </section>
   <section id="ha-queue-replication">
     <title>Replicating specific queues</title>
     <para>
       In addition to the automatic replication performed in a cluster, you can
       set up replication for specific queues between arbitrary brokers, even if
       the brokers are not members of a cluster. The command:
     </para>
     <programlisting>
       qpid-ha replicate <replaceable>QUEUE</replaceable> <replaceable>REMOTE-BROKER</replaceable>
     </programlisting>
     <para>
     sets up replication of <replaceable>QUEUE</replaceable> on <replaceable>REMOTE-BROKER</replaceable> to <replaceable>QUEUE</replaceable> on the current broker.
     </para>
     <para>
       Set the configuration option
       <literal>ha-queue-replication=yes</literal> on both brokers to enable this
       feature on non-cluster brokers. It is automatically enabled for brokers
       that are part of a cluster.
     </para>
     <para>
       Note that this feature does not provide automatic fail-over, for that you
       need to run a cluster.
     </para>
   </section>
 </section>

 <!-- LocalWords:  scalability rgmanager multicast RGManager mailto LVQ qpidd IP dequeued Transactional username
 -->