guide/ops/high-availability/index.md - brooklyn-docs - Git at Google

 ---
 title: High Availability
 layout: website-normal
 children:
 - high-availability-supplemental.md
 ---

 Brooklyn will automatically run in HA mode if multiple Brooklyn instances are started
 pointing at the same persistence store.  One Brooklyn node (e.g. the first one started)
 is elected as HA master:  all *write operations* against Brooklyn entities, such as creating
 an application or invoking an effector, should be directed to the master.

 Once one node is running as `MASTER`, other nodes start in either `STANDBY` or `HOT_STANDBY` mode:

 * In `STANDBY` mode, a Brooklyn instance will monitor the master and will be a candidate
   to become `MASTER` should the master fail. Standby nodes do *not* attempt to rebind
   until they are elected master, so the state of existing entities is not available at
   the standby node.  However a standby server consumes very little resource until it is
   promoted.

 * In `HOT_STANDBY` mode, a Brooklyn instance will read and make available the live state of
   entities.  Thus a hot-standby node is available as a read-only copy.
   As with the standby node, if a hot-standby node detects that the master fails,
   it will be a candidate for promotion to master.

 * In `HOT_BACKUP` mode, a Brooklyn instance will read and make available the live state of
   entities, as a read-only copy. However this node is not able to become master,
   so it can safely be used to test compatibility across different versions.

 To explicitly specify what HA mode a node should be in, the following options are available
 for the config option `highAvailabilityMode` in [`org.apache.brooklyn.osgilauncher.cfg`](/guide/ops/paths.md):

 * `DISABLED`: management node works in isolation; it will not cooperate with any other standby/master nodes in management plane
 * `AUTO`: will look for other management nodes, and will allocate itself as standby or master based on other nodes' states
 * `MASTER`: will startup as master; if there is already a master then fails immediately
 * `STANDBY`: will start up as lukewarm standby; if there is not already a master then fails immediately
 * `HOT_STANDBY`: will start up as hot standby; if there is not already a master then fails immediately
 * `HOT_BACKUP`: will start up as hot backup; this can be done even if there is not already a master; this node will not be a master

 The REST API offers live detection and control of the HA mode,
 including setting priority to control which nodes will be promoted on master failure:

 * `/server/ha/state`: Returns the HA state of a management node (GET),
   or changes the state (POST)
 * `/server/ha/states`: Returns the HA states and detail for all nodes in a management plane
 * `/server/ha/priority`: Returns the HA node priority for MASTER failover (GET),
   or sets that priority (POST)

 Note that when POSTing to a non-master server it is necessary to pass a `Brooklyn-Allow-Non-Master-Access: true` header.
 For example, the following cURL command could be used to change the state of a `STANDBY` node on `localhost:8082` to `HOT_STANDBY`:

     curl -v -X POST -d mode=HOT_STANDBY -H "Brooklyn-Allow-Non-Master-Access: true" http://localhost:8082/v1/server/ha/state

 When running a single server, you can disable HA mode. You can recover from a failure
 by restarting the process or launching a replacement machine, pointing at the same
 persisted state. A single server running in HA mode will have the following differences
 in behaviour:

 * If you run Brooklyn and then kill it (e.g. `kill -9` or turn off the
   VM), when you start Brooklyn again it will wait to confirm the previous
   server is really dead. It waits for 30 seconds after the old server's last
   heartbeat, by default.
 * The HA status shows all previous runs of the Brooklyn server (it gets
   a new node-id each time it restarts). This list will get longer and
   longer if you keep restarting Brooklyn, while pointing at the same persisted
   state, until you clear out terminated instances from the list (via the
   UI or the REST API).
 * The logging at startup can be quite different (e.g. in HA mode, "Brooklyn
   initialisation (part two) complete" can mean that the server has finished
   becoming the 'standby'. Care should be taken if searching or parsing the logs.
	---
	title: High Availability
	layout: website-normal
	children:
	- high-availability-supplemental.md
	---

	Brooklyn will automatically run in HA mode if multiple Brooklyn instances are started
	pointing at the same persistence store. One Brooklyn node (e.g. the first one started)
	is elected as HA master: all write operations against Brooklyn entities, such as creating
	an application or invoking an effector, should be directed to the master.

	Once one node is running as `MASTER`, other nodes start in either `STANDBY` or `HOT_STANDBY` mode:

	* In `STANDBY` mode, a Brooklyn instance will monitor the master and will be a candidate
	to become `MASTER` should the master fail. Standby nodes do not attempt to rebind
	until they are elected master, so the state of existing entities is not available at
	the standby node. However a standby server consumes very little resource until it is
	promoted.

	* In `HOT_STANDBY` mode, a Brooklyn instance will read and make available the live state of
	entities. Thus a hot-standby node is available as a read-only copy.
	As with the standby node, if a hot-standby node detects that the master fails,
	it will be a candidate for promotion to master.

	* In `HOT_BACKUP` mode, a Brooklyn instance will read and make available the live state of
	entities, as a read-only copy. However this node is not able to become master,
	so it can safely be used to test compatibility across different versions.

	To explicitly specify what HA mode a node should be in, the following options are available
	for the config option `highAvailabilityMode` in [`org.apache.brooklyn.osgilauncher.cfg`](/guide/ops/paths.md):

	* `DISABLED`: management node works in isolation; it will not cooperate with any other standby/master nodes in management plane
	* `AUTO`: will look for other management nodes, and will allocate itself as standby or master based on other nodes' states
	* `MASTER`: will startup as master; if there is already a master then fails immediately
	* `STANDBY`: will start up as lukewarm standby; if there is not already a master then fails immediately
	* `HOT_STANDBY`: will start up as hot standby; if there is not already a master then fails immediately
	* `HOT_BACKUP`: will start up as hot backup; this can be done even if there is not already a master; this node will not be a master

	The REST API offers live detection and control of the HA mode,
	including setting priority to control which nodes will be promoted on master failure:

	* `/server/ha/state`: Returns the HA state of a management node (GET),
	or changes the state (POST)
	* `/server/ha/states`: Returns the HA states and detail for all nodes in a management plane
	* `/server/ha/priority`: Returns the HA node priority for MASTER failover (GET),
	or sets that priority (POST)

	Note that when POSTing to a non-master server it is necessary to pass a `Brooklyn-Allow-Non-Master-Access: true` header.
	For example, the following cURL command could be used to change the state of a `STANDBY` node on `localhost:8082` to `HOT_STANDBY`:

	curl -v -X POST -d mode=HOT_STANDBY -H "Brooklyn-Allow-Non-Master-Access: true" http://localhost:8082/v1/server/ha/state

	When running a single server, you can disable HA mode. You can recover from a failure
	by restarting the process or launching a replacement machine, pointing at the same
	persisted state. A single server running in HA mode will have the following differences
	in behaviour:

	* If you run Brooklyn and then kill it (e.g. `kill -9` or turn off the
	VM), when you start Brooklyn again it will wait to confirm the previous
	server is really dead. It waits for 30 seconds after the old server's last
	heartbeat, by default.
	* The HA status shows all previous runs of the Brooklyn server (it gets
	a new node-id each time it restarts). This list will get longer and
	longer if you keep restarting Brooklyn, while pointing at the same persisted
	state, until you clear out terminated instances from the list (via the
	UI or the REST API).
	* The logging at startup can be quite different (e.g. in HA mode, "Brooklyn
	initialisation (part two) complete" can mean that the server has finished
	becoming the 'standby'. Care should be taken if searching or parsing the logs.