shared-file-system-master-slave.xml - activemq-web - Git at Google

 <div class="wiki-content maincontent"><h2>Shared File System Master Slave</h2><p>If you have a SAN or shared file system it can be used to provide <em>high availability</em> such that if a broker is killed, another broker can take over immediately.</p><structured-macro ac:macro-id="85475cc1-b94a-45ca-901e-7c43a6763787" ac:name="warning" ac:schema-version="1"><parameter ac:name="title">Ensure your shared file locks work</parameter><rich-text-body><p>Note that the requirements of this failover system are a distributed file system like a SAN for which exclusive file locks work reliably. If you do not have such a thing available then consider using <link><page ri:content-title="MasterSlave"></page></link> instead which implements something similar but working on commodity hardware using local file systems which ActiveMQ does the replication.</p><structured-macro ac:macro-id="ac672a77-53f7-40b0-8cf1-4074d0cbf732" ac:name="note" ac:schema-version="1"><parameter ac:name="title">OCFS2 Warning</parameter><rich-text-body><p>Was testing using OCFS2 and both brokers thought they had the master lock - this is because "OCFS2 only supports locking with 'fcntl' and not 'lockf and flock', therefore mutex file locking from Java isn't supported."</p><p>From <a shape="rect" class="external-link" href="http://sources.redhat.com/cluster/faq.html#gfs_vs_ocfs2" rel="nofollow">http://sources.redhat.com/cluster/faq.html#gfs_vs_ocfs2</a> :<br clear="none"> OCFS2: No cluster-aware flock or POSIX locks<br clear="none"> GFS: fully supports Cluster-wide flocks and POSIX locks and is supported.<br clear="none"> See this JIRA for more discussion: <a shape="rect" class="external-link" href="https://issues.apache.org/jira/browse/AMQ-4378">https://issues.apache.org/jira/browse/AMQ-4378</a></p></rich-text-body></structured-macro><structured-macro ac:macro-id="26037037-9d93-4407-807d-d8bb080027b9" ac:name="note" ac:schema-version="1"><parameter ac:name="title">NFSv3 Warning</parameter><rich-text-body><p>In the event of an abnormal NFSv3 client termination (i.e., the ActiveMQ master broker), the NFSv3 server will not timeout the lock that is held by that client. This effectively renders the ActiveMQ data directory inaccessible because the ActiveMQ slave broker can't acquire the lock and therefore cannot start up. The only solution to this predicament with NFSv3 is to reboot all ActiveMQ instances to reset everything.</p><p>Use of NFSv4 is another solution because its design includes timeouts for locks. When using NFSv4 and the client holding the lock experiences an abnormal termination, by design, the lock is released after 30 seconds, allowing another client to grab the lock. For more information about this, see <a shape="rect" href="http://blogs.netapp.com/eislers_nfs_blog/2008/07/part-i-since-nf.html">this blog entry</a>.</p></rich-text-body></structured-macro></rich-text-body></structured-macro><p>Basically you can run as many brokers as you wish from the same shared file system directory. The first broker to grab the exclusive lock on the file is the master broker. If that broker dies and releases the lock then another broker takes over. The slave brokers sit in a loop trying to grab the lock from the master broker.</p><p>The following example shows how to configure a broker for Shared File System Master Slave where <strong>/sharedFileSystem</strong> is some directory on a shared file system. It is just a case of configuring a file based store to use a shared directory.</p><structured-macro ac:macro-id="a9501ea9-3a9e-4a57-af82-e214b13403ef" ac:name="code" ac:schema-version="1"><plain-text-body>    &lt;persistenceAdapter&gt;
       &lt;kahaDB directory="/sharedFileSystem/sharedBrokerData"/&gt;
     &lt;/persistenceAdapter&gt;
 </plain-text-body></structured-macro><p>or:</p><structured-macro ac:macro-id="4ebd665a-ddb9-482b-824e-b47910adf715" ac:name="code" ac:schema-version="1"><plain-text-body>    &lt;persistenceAdapter&gt;
       &lt;levelDB directory="/sharedFileSystem/sharedBrokerData"/&gt;
     &lt;/persistenceAdapter&gt;
 </plain-text-body></structured-macro><p>or:</p><structured-macro ac:macro-id="a610da29-a160-4831-902b-96258209f92a" ac:name="code" ac:schema-version="1"><plain-text-body>    &lt;persistenceAdapter&gt;
       &lt;amqPersistenceAdapter directory="/sharedFileSystem/sharedBrokerData"/&gt;
     &lt;/persistenceAdapter&gt;
 </plain-text-body></structured-macro><h3>Startup</h3><p>On startup one master grabs an exclusive lock on the broker file directory - all other brokers are slaves and pause waiting for the exclusive lock.</p><p><image><attachment ri:filename="Startup.png"></attachment></image></p><p>Clients should be using the <link><page ri:content-title="Failover Transport Reference"></page><plain-text-link-body>Failover Transport</plain-text-link-body></link> to connect to the available brokers. e.g. using a URL something like the following</p><structured-macro ac:macro-id="3c51402a-0246-453c-9e5a-38fab6f25e75" ac:name="code" ac:schema-version="1"><plain-text-body>failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)
 </plain-text-body></structured-macro><p>Only the master broker starts up its transport connectors and so the clients can only connect to the master.</p><h3>Master failure</h3><p>If the master looses the exclusive lock then it immediately shuts down. If a master shuts down or fails, one of the other slaves will grab the lock and so the topology switches to the following diagram</p><p><image><attachment ri:filename="MasterFailed.png"></attachment></image></p><p>One of the other other slaves immediately grabs the exclusive lock on the file system to them commences becoming the master, starting all of its transport connectors.</p><p>Clients loose connection to the stopped master and then the failover transport tries to connect to the available brokers - of which the only one available is the new master.</p><h3>Master restart</h3><p>At any time you can restart other brokers which join the cluster and start as slaves waiting to become a master if the master is shutdown or a failure occurs. So the following topology is created after a restart of an old master...</p><p><image><attachment ri:filename="MasterRestarted.png"></attachment></image></p><h3>Scheduler Support</h3><p>ActiveMQ maintains information about schedules independent to the settings in the persistence adapter. With a shared file-system it is therefore important to tell ActiveMQ expressly where to store scheduler information. To do this, set the&#160;<code>dataDirectory</code> attribute on the&#160;<code>broker</code>, for example:</p><structured-macro ac:macro-id="4f436336-254b-4003-8118-ee80b6f4c689" ac:name="code" ac:schema-version="1"><parameter ac:name="language">xml</parameter><plain-text-body>&lt;broker xmlns="http://activemq.apache.org/schema/core"
 dataDirectory="/some/location"
 brokerName="mmuserb2" useJmx="true" advisorySupport="false"
 persistent="true" deleteAllMessagesOnStartup="false"
 useShutdownHook="false" schedulerSupport="true"&gt;</plain-text-body></structured-macro></div>
	<div class="wiki-content maincontent"><h2>Shared File System Master Slave</h2><p>If you have a SAN or shared file system it can be used to provide <em>high availability</em> such that if a broker is killed, another broker can take over immediately.</p><structured-macro ac:macro-id="85475cc1-b94a-45ca-901e-7c43a6763787" ac:name="warning" ac:schema-version="1"><parameter ac:name="title">Ensure your shared file locks work</parameter><rich-text-body><p>Note that the requirements of this failover system are a distributed file system like a SAN for which exclusive file locks work reliably. If you do not have such a thing available then consider using <link><page ri:content-title="MasterSlave"></page></link> instead which implements something similar but working on commodity hardware using local file systems which ActiveMQ does the replication.</p><structured-macro ac:macro-id="ac672a77-53f7-40b0-8cf1-4074d0cbf732" ac:name="note" ac:schema-version="1"><parameter ac:name="title">OCFS2 Warning</parameter><rich-text-body><p>Was testing using OCFS2 and both brokers thought they had the master lock - this is because "OCFS2 only supports locking with 'fcntl' and not 'lockf and flock', therefore mutex file locking from Java isn't supported."</p><p>From <a shape="rect" class="external-link" href="http://sources.redhat.com/cluster/faq.html#gfs_vs_ocfs2" rel="nofollow">http://sources.redhat.com/cluster/faq.html#gfs_vs_ocfs2</a> :<br clear="none"> OCFS2: No cluster-aware flock or POSIX locks<br clear="none"> GFS: fully supports Cluster-wide flocks and POSIX locks and is supported.<br clear="none"> See this JIRA for more discussion: <a shape="rect" class="external-link" href="https://issues.apache.org/jira/browse/AMQ-4378">https://issues.apache.org/jira/browse/AMQ-4378</a></p></rich-text-body></structured-macro><structured-macro ac:macro-id="26037037-9d93-4407-807d-d8bb080027b9" ac:name="note" ac:schema-version="1"><parameter ac:name="title">NFSv3 Warning</parameter><rich-text-body><p>In the event of an abnormal NFSv3 client termination (i.e., the ActiveMQ master broker), the NFSv3 server will not timeout the lock that is held by that client. This effectively renders the ActiveMQ data directory inaccessible because the ActiveMQ slave broker can't acquire the lock and therefore cannot start up. The only solution to this predicament with NFSv3 is to reboot all ActiveMQ instances to reset everything.</p><p>Use of NFSv4 is another solution because its design includes timeouts for locks. When using NFSv4 and the client holding the lock experiences an abnormal termination, by design, the lock is released after 30 seconds, allowing another client to grab the lock. For more information about this, see <a shape="rect" href="http://blogs.netapp.com/eislers_nfs_blog/2008/07/part-i-since-nf.html">this blog entry</a>.</p></rich-text-body></structured-macro></rich-text-body></structured-macro><p>Basically you can run as many brokers as you wish from the same shared file system directory. The first broker to grab the exclusive lock on the file is the master broker. If that broker dies and releases the lock then another broker takes over. The slave brokers sit in a loop trying to grab the lock from the master broker.</p><p>The following example shows how to configure a broker for Shared File System Master Slave where <strong>/sharedFileSystem</strong> is some directory on a shared file system. It is just a case of configuring a file based store to use a shared directory.</p><structured-macro ac:macro-id="a9501ea9-3a9e-4a57-af82-e214b13403ef" ac:name="code" ac:schema-version="1"><plain-text-body> <persistenceAdapter>
	<kahaDB directory="/sharedFileSystem/sharedBrokerData"/>
	</persistenceAdapter>
	</plain-text-body></structured-macro><p>or:</p><structured-macro ac:macro-id="4ebd665a-ddb9-482b-824e-b47910adf715" ac:name="code" ac:schema-version="1"><plain-text-body> <persistenceAdapter>
	<levelDB directory="/sharedFileSystem/sharedBrokerData"/>
	</persistenceAdapter>
	</plain-text-body></structured-macro><p>or:</p><structured-macro ac:macro-id="a610da29-a160-4831-902b-96258209f92a" ac:name="code" ac:schema-version="1"><plain-text-body> <persistenceAdapter>
	<amqPersistenceAdapter directory="/sharedFileSystem/sharedBrokerData"/>
	</persistenceAdapter>
	</plain-text-body></structured-macro><h3>Startup</h3><p>On startup one master grabs an exclusive lock on the broker file directory - all other brokers are slaves and pause waiting for the exclusive lock.</p><p><image><attachment ri:filename="Startup.png"></attachment></image></p><p>Clients should be using the <link><page ri:content-title="Failover Transport Reference"></page><plain-text-link-body>Failover Transport</plain-text-link-body></link> to connect to the available brokers. e.g. using a URL something like the following</p><structured-macro ac:macro-id="3c51402a-0246-453c-9e5a-38fab6f25e75" ac:name="code" ac:schema-version="1"><plain-text-body>failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)
	</plain-text-body></structured-macro><p>Only the master broker starts up its transport connectors and so the clients can only connect to the master.</p><h3>Master failure</h3><p>If the master looses the exclusive lock then it immediately shuts down. If a master shuts down or fails, one of the other slaves will grab the lock and so the topology switches to the following diagram</p><p><image><attachment ri:filename="MasterFailed.png"></attachment></image></p><p>One of the other other slaves immediately grabs the exclusive lock on the file system to them commences becoming the master, starting all of its transport connectors.</p><p>Clients loose connection to the stopped master and then the failover transport tries to connect to the available brokers - of which the only one available is the new master.</p><h3>Master restart</h3><p>At any time you can restart other brokers which join the cluster and start as slaves waiting to become a master if the master is shutdown or a failure occurs. So the following topology is created after a restart of an old master...</p><p><image><attachment ri:filename="MasterRestarted.png"></attachment></image></p><h3>Scheduler Support</h3><p>ActiveMQ maintains information about schedules independent to the settings in the persistence adapter. With a shared file-system it is therefore important to tell ActiveMQ expressly where to store scheduler information. To do this, set the <code>dataDirectory</code> attribute on the <code>broker</code>, for example:</p><structured-macro ac:macro-id="4f436336-254b-4003-8118-ee80b6f4c689" ac:name="code" ac:schema-version="1"><parameter ac:name="language">xml</parameter><plain-text-body><broker xmlns="http://activemq.apache.org/schema/core"
	dataDirectory="/some/location"
	brokerName="mmuserb2" useJmx="true" advisorySupport="false"
	persistent="true" deleteAllMessagesOnStartup="false"
	useShutdownHook="false" schedulerSupport="true"></plain-text-body></structured-macro></div>