doc/modules/cassandra/pages/operating/topo_changes.adoc - cassandra - Git at Google

 = Adding, replacing, moving and removing nodes

 == Bootstrap

 Adding new nodes is called "bootstrapping". The `num_tokens` parameter
 will define the amount of virtual nodes (tokens) the joining node will
 be assigned during bootstrap. The tokens define the sections of the ring
 (token ranges) the node will become responsible for.

 === Token allocation

 With the default token allocation algorithm the new node will pick
 `num_tokens` random tokens to become responsible for. Since tokens are
 distributed randomly, load distribution improves with a higher amount of
 virtual nodes, but it also increases token management overhead. The
 default of 256 virtual nodes should provide a reasonable load balance
 with acceptable overhead.

 On 3.0+ a new token allocation algorithm was introduced to allocate
 tokens based on the load of existing virtual nodes for a given keyspace,
 and thus yield an improved load distribution with a lower number of
 tokens. To use this approach, the new node must be started with the JVM
 option `-Dcassandra.allocate_tokens_for_keyspace=<keyspace>`, where
 `<keyspace>` is the keyspace from which the algorithm can find the load
 information to optimize token assignment for.

 ==== Manual token assignment

 You may specify a comma-separated list of tokens manually with the
 `initial_token` `cassandra.yaml` parameter, and if that is specified
 Cassandra will skip the token allocation process. This may be useful
 when doing token assignment with an external tool or when restoring a
 node with its previous tokens.

 === Range streaming

 After the tokens are allocated, the joining node will pick current
 replicas of the token ranges it will become responsible for to stream
 data from. By default it will stream from the primary replica of each
 token range in order to guarantee data in the new node will be
 consistent with the current state.

 In the case of any unavailable replica, the consistent bootstrap process
 will fail. To override this behavior and potentially miss data from an
 unavailable replica, set the JVM flag
 `-Dcassandra.consistent.rangemovement=false`.

 === Resuming failed/hanged bootstrap

 On 2.2+, if the bootstrap process fails, it's possible to resume
 bootstrap from the previous saved state by calling
 `nodetool bootstrap resume`. If for some reason the bootstrap hangs or
 stalls, it may also be resumed by simply restarting the node. In order
 to cleanup bootstrap state and start fresh, you may set the JVM startup
 flag `-Dcassandra.reset_bootstrap_progress=true`.

 On lower versions, when the bootstrap proces fails it is recommended to
 wipe the node (remove all the data), and restart the bootstrap process
 again.

 === Manual bootstrapping

 It's possible to skip the bootstrapping process entirely and join the
 ring straight away by setting the hidden parameter
 `auto_bootstrap: false`. This may be useful when restoring a node from a
 backup or creating a new data-center.

 == Removing nodes

 You can take a node out of the cluster with `nodetool decommission` to a
 live node, or `nodetool removenode` (to any other machine) to remove a
 dead one. This will assign the ranges the old node was responsible for
 to other nodes, and replicate the appropriate data there. If
 decommission is used, the data will stream from the decommissioned node.
 If removenode is used, the data will stream from the remaining replicas.

 No data is removed automatically from the node being decommissioned, so
 if you want to put the node back into service at a different token on
 the ring, it should be removed manually.

 == Moving nodes

 When `num_tokens: 1` it's possible to move the node position in the ring
 with `nodetool move`. Moving is both a convenience over and more
 efficient than decommission + bootstrap. After moving a node,
 `nodetool cleanup` should be run to remove any unnecessary data.

 == Replacing a dead node

 In order to replace a dead node, start cassandra with the JVM startup
 flag `-Dcassandra.replace_address_first_boot=<dead_node_ip>`. Once this
 property is enabled the node starts in a hibernate state, during which
 all the other nodes will see this node to be DOWN (DN), however this
 node will see itself as UP (UN). Accurate replacement state can be found
 in `nodetool netstats`.

 The replacing node will now start to bootstrap the data from the rest of
 the nodes in the cluster. A replacing node will only receive writes
 during the bootstrapping phase if it has a different ip address to the
 node that is being replaced. (See CASSANDRA-8523 and CASSANDRA-12344)

 Once the bootstrapping is complete the node will be marked "UP".

 [NOTE]
 .Note
 ====
 If any of the following cases apply, you *MUST* run repair to make the
 replaced node consistent again, since it missed ongoing writes
 during/prior to bootstrapping. The _replacement_ timeframe refers to the
 period from when the node initially dies to when a new node completes
 the replacement process.

 [arabic]
 . The node is down for longer than `max_hint_window_in_ms` before being
 replaced.
 . You are replacing using the same IP address as the dead node *and*
 replacement takes longer than `max_hint_window_in_ms`.
 ====

 == Monitoring progress

 Bootstrap, replace, move and remove progress can be monitored using
 `nodetool netstats` which will show the progress of the streaming
 operations.

 == Cleanup data after range movements

 As a safety measure, Cassandra does not automatically remove data from
 nodes that "lose" part of their token range due to a range movement
 operation (bootstrap, move, replace). Run `nodetool cleanup` on the
 nodes that lost ranges to the joining node when you are satisfied the
 new node is up and working. If you do not do this the old data will
 still be counted against the load on that node.
	= Adding, replacing, moving and removing nodes

	== Bootstrap

	Adding new nodes is called "bootstrapping". The `num_tokens` parameter
	will define the amount of virtual nodes (tokens) the joining node will
	be assigned during bootstrap. The tokens define the sections of the ring
	(token ranges) the node will become responsible for.

	=== Token allocation

	With the default token allocation algorithm the new node will pick
	`num_tokens` random tokens to become responsible for. Since tokens are
	distributed randomly, load distribution improves with a higher amount of
	virtual nodes, but it also increases token management overhead. The
	default of 256 virtual nodes should provide a reasonable load balance
	with acceptable overhead.

	On 3.0+ a new token allocation algorithm was introduced to allocate
	tokens based on the load of existing virtual nodes for a given keyspace,
	and thus yield an improved load distribution with a lower number of
	tokens. To use this approach, the new node must be started with the JVM
	option `-Dcassandra.allocate_tokens_for_keyspace=<keyspace>`, where
	`<keyspace>` is the keyspace from which the algorithm can find the load
	information to optimize token assignment for.

	==== Manual token assignment

	You may specify a comma-separated list of tokens manually with the
	`initial_token` `cassandra.yaml` parameter, and if that is specified
	Cassandra will skip the token allocation process. This may be useful
	when doing token assignment with an external tool or when restoring a
	node with its previous tokens.

	=== Range streaming

	After the tokens are allocated, the joining node will pick current
	replicas of the token ranges it will become responsible for to stream
	data from. By default it will stream from the primary replica of each
	token range in order to guarantee data in the new node will be
	consistent with the current state.

	In the case of any unavailable replica, the consistent bootstrap process
	will fail. To override this behavior and potentially miss data from an
	unavailable replica, set the JVM flag
	`-Dcassandra.consistent.rangemovement=false`.

	=== Resuming failed/hanged bootstrap

	On 2.2+, if the bootstrap process fails, it's possible to resume
	bootstrap from the previous saved state by calling
	`nodetool bootstrap resume`. If for some reason the bootstrap hangs or
	stalls, it may also be resumed by simply restarting the node. In order
	to cleanup bootstrap state and start fresh, you may set the JVM startup
	flag `-Dcassandra.reset_bootstrap_progress=true`.

	On lower versions, when the bootstrap proces fails it is recommended to
	wipe the node (remove all the data), and restart the bootstrap process
	again.

	=== Manual bootstrapping

	It's possible to skip the bootstrapping process entirely and join the
	ring straight away by setting the hidden parameter
	`auto_bootstrap: false`. This may be useful when restoring a node from a
	backup or creating a new data-center.

	== Removing nodes

	You can take a node out of the cluster with `nodetool decommission` to a
	live node, or `nodetool removenode` (to any other machine) to remove a
	dead one. This will assign the ranges the old node was responsible for
	to other nodes, and replicate the appropriate data there. If
	decommission is used, the data will stream from the decommissioned node.
	If removenode is used, the data will stream from the remaining replicas.

	No data is removed automatically from the node being decommissioned, so
	if you want to put the node back into service at a different token on
	the ring, it should be removed manually.

	== Moving nodes

	When `num_tokens: 1` it's possible to move the node position in the ring
	with `nodetool move`. Moving is both a convenience over and more
	efficient than decommission + bootstrap. After moving a node,
	`nodetool cleanup` should be run to remove any unnecessary data.

	== Replacing a dead node

	In order to replace a dead node, start cassandra with the JVM startup
	flag `-Dcassandra.replace_address_first_boot=<dead_node_ip>`. Once this
	property is enabled the node starts in a hibernate state, during which
	all the other nodes will see this node to be DOWN (DN), however this
	node will see itself as UP (UN). Accurate replacement state can be found
	in `nodetool netstats`.

	The replacing node will now start to bootstrap the data from the rest of
	the nodes in the cluster. A replacing node will only receive writes
	during the bootstrapping phase if it has a different ip address to the
	node that is being replaced. (See CASSANDRA-8523 and CASSANDRA-12344)

	Once the bootstrapping is complete the node will be marked "UP".

	[NOTE]
	.Note
	====
	If any of the following cases apply, you MUST run repair to make the
	replaced node consistent again, since it missed ongoing writes
	during/prior to bootstrapping. The _replacement_ timeframe refers to the
	period from when the node initially dies to when a new node completes
	the replacement process.

	[arabic]
	. The node is down for longer than `max_hint_window_in_ms` before being
	replaced.
	. You are replacing using the same IP address as the dead node and
	replacement takes longer than `max_hint_window_in_ms`.
	====

	== Monitoring progress

	Bootstrap, replace, move and remove progress can be monitored using
	`nodetool netstats` which will show the progress of the streaming
	operations.

	== Cleanup data after range movements

	As a safety measure, Cassandra does not automatically remove data from
	nodes that "lose" part of their token range due to a range movement
	operation (bootstrap, move, replace). Run `nodetool cleanup` on the
	nodes that lost ranges to the joining node when you are satisfied the
	new node is up and working. If you do not do this the old data will
	still be counted against the load on that node.