src/doc/4.0-rc1/_sources/operating/topo_changes.rst.txt - cassandra-website - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one
 .. or more contributor license agreements.  See the NOTICE file
 .. distributed with this work for additional information
 .. regarding copyright ownership.  The ASF licenses this file
 .. to you under the Apache License, Version 2.0 (the
 .. "License"); you may not use this file except in compliance
 .. with the License.  You may obtain a copy of the License at
 ..
 ..     http://www.apache.org/licenses/LICENSE-2.0
 ..
 .. Unless required by applicable law or agreed to in writing, software
 .. distributed under the License is distributed on an "AS IS" BASIS,
 .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 .. See the License for the specific language governing permissions and
 .. limitations under the License.

 .. highlight:: none

 .. _topology-changes:

 Adding, replacing, moving and removing nodes
 --------------------------------------------

 Bootstrap
 ^^^^^^^^^

 Adding new nodes is called "bootstrapping". The ``num_tokens`` parameter will define the amount of virtual nodes
 (tokens) the joining node will be assigned during bootstrap. The tokens define the sections of the ring (token ranges)
 the node will become responsible for.

 Token allocation
 ~~~~~~~~~~~~~~~~

 With the default token allocation algorithm the new node will pick ``num_tokens`` random tokens to become responsible
 for. Since tokens are distributed randomly, load distribution improves with a higher amount of virtual nodes, but it
 also increases token management overhead. The default of 256 virtual nodes should provide a reasonable load balance with
 acceptable overhead.

 On 3.0+ a new token allocation algorithm was introduced to allocate tokens based on the load of existing virtual nodes
 for a given keyspace, and thus yield an improved load distribution with a lower number of tokens. To use this approach,
 the new node must be started with the JVM option ``-Dcassandra.allocate_tokens_for_keyspace=<keyspace>``, where
 ``<keyspace>`` is the keyspace from which the algorithm can find the load information to optimize token assignment for.

 Manual token assignment
 """""""""""""""""""""""

 You may specify a comma-separated list of tokens manually with the ``initial_token`` ``cassandra.yaml`` parameter, and
 if that is specified Cassandra will skip the token allocation process. This may be useful when doing token assignment
 with an external tool or when restoring a node with its previous tokens.

 Range streaming
 ~~~~~~~~~~~~~~~~

 After the tokens are allocated, the joining node will pick current replicas of the token ranges it will become
 responsible for to stream data from. By default it will stream from the primary replica of each token range in order to
 guarantee data in the new node will be consistent with the current state.

 In the case of any unavailable replica, the consistent bootstrap process will fail. To override this behavior and
 potentially miss data from an unavailable replica, set the JVM flag ``-Dcassandra.consistent.rangemovement=false``.

 Resuming failed/hanged bootstrap
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 On 2.2+, if the bootstrap process fails, it's possible to resume bootstrap from the previous saved state by calling
 ``nodetool bootstrap resume``. If for some reason the bootstrap hangs or stalls, it may also be resumed by simply
 restarting the node. In order to cleanup bootstrap state and start fresh, you may set the JVM startup flag
 ``-Dcassandra.reset_bootstrap_progress=true``.

 On lower versions, when the bootstrap proces fails it is recommended to wipe the node (remove all the data), and restart
 the bootstrap process again.

 Manual bootstrapping
 ~~~~~~~~~~~~~~~~~~~~

 It's possible to skip the bootstrapping process entirely and join the ring straight away by setting the hidden parameter
 ``auto_bootstrap: false``. This may be useful when restoring a node from a backup or creating a new data-center.

 Removing nodes
 ^^^^^^^^^^^^^^

 You can take a node out of the cluster with ``nodetool decommission`` to a live node, or ``nodetool removenode`` (to any
 other machine) to remove a dead one. This will assign the ranges the old node was responsible for to other nodes, and
 replicate the appropriate data there. If decommission is used, the data will stream from the decommissioned node. If
 removenode is used, the data will stream from the remaining replicas.

 No data is removed automatically from the node being decommissioned, so if you want to put the node back into service at
 a different token on the ring, it should be removed manually.

 Moving nodes
 ^^^^^^^^^^^^

 When ``num_tokens: 1`` it's possible to move the node position in the ring with ``nodetool move``. Moving is both a
 convenience over and more efficient than decommission + bootstrap. After moving a node, ``nodetool cleanup`` should be
 run to remove any unnecessary data.

 Replacing a dead node
 ^^^^^^^^^^^^^^^^^^^^^

 In order to replace a dead node, start cassandra with the JVM startup flag
 ``-Dcassandra.replace_address_first_boot=<dead_node_ip>``. Once this property is enabled the node starts in a hibernate
 state, during which all the other nodes will see this node to be DOWN (DN), however this node will see itself as UP
 (UN). Accurate replacement state can be found in ``nodetool netstats``.

 The replacing node will now start to bootstrap the data from the rest of the nodes in the cluster. A replacing node will
 only receive writes during the bootstrapping phase if it has a different ip address to the node that is being replaced.
 (See CASSANDRA-8523 and CASSANDRA-12344)

 Once the bootstrapping is complete the node will be marked "UP".

 .. Note:: If any of the following cases apply, you **MUST** run repair to make the replaced node consistent again, since
     it missed ongoing writes during/prior to bootstrapping. The *replacement* timeframe refers to the period from when the
     node initially dies to when a new node completes the replacement process.

     1. The node is down for longer than ``max_hint_window_in_ms`` before being replaced.
     2. You are replacing using the same IP address as the dead node **and** replacement takes longer than ``max_hint_window_in_ms``.

 Monitoring progress
 ^^^^^^^^^^^^^^^^^^^

 Bootstrap, replace, move and remove progress can be monitored using ``nodetool netstats`` which will show the progress
 of the streaming operations.

 Cleanup data after range movements
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 As a safety measure, Cassandra does not automatically remove data from nodes that "lose" part of their token range due
 to a range movement operation (bootstrap, move, replace). Run ``nodetool cleanup`` on the nodes that lost ranges to the
 joining node when you are satisfied the new node is up and working. If you do not do this the old data will still be
 counted against the load on that node.
	.. Licensed to the Apache Software Foundation (ASF) under one
	.. or more contributor license agreements. See the NOTICE file
	.. distributed with this work for additional information
	.. regarding copyright ownership. The ASF licenses this file
	.. to you under the Apache License, Version 2.0 (the
	.. "License"); you may not use this file except in compliance
	.. with the License. You may obtain a copy of the License at
	..
	.. http://www.apache.org/licenses/LICENSE-2.0
	..
	.. Unless required by applicable law or agreed to in writing, software
	.. distributed under the License is distributed on an "AS IS" BASIS,
	.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	.. See the License for the specific language governing permissions and
	.. limitations under the License.

	.. highlight:: none

	.. _topology-changes:

	Adding, replacing, moving and removing nodes
	--------------------------------------------

	Bootstrap
	^^^^^^^^^

	Adding new nodes is called "bootstrapping". The ``num_tokens`` parameter will define the amount of virtual nodes
	(tokens) the joining node will be assigned during bootstrap. The tokens define the sections of the ring (token ranges)
	the node will become responsible for.

	Token allocation
	~~~~~~~~~~~~~~~~

	With the default token allocation algorithm the new node will pick ``num_tokens`` random tokens to become responsible
	for. Since tokens are distributed randomly, load distribution improves with a higher amount of virtual nodes, but it
	also increases token management overhead. The default of 256 virtual nodes should provide a reasonable load balance with
	acceptable overhead.

	On 3.0+ a new token allocation algorithm was introduced to allocate tokens based on the load of existing virtual nodes
	for a given keyspace, and thus yield an improved load distribution with a lower number of tokens. To use this approach,
	the new node must be started with the JVM option ``-Dcassandra.allocate_tokens_for_keyspace=<keyspace>``, where
	``<keyspace>`` is the keyspace from which the algorithm can find the load information to optimize token assignment for.

	Manual token assignment
	"""""""""""""""""""""""

	You may specify a comma-separated list of tokens manually with the ``initial_token`` ``cassandra.yaml`` parameter, and
	if that is specified Cassandra will skip the token allocation process. This may be useful when doing token assignment
	with an external tool or when restoring a node with its previous tokens.

	Range streaming
	~~~~~~~~~~~~~~~~

	After the tokens are allocated, the joining node will pick current replicas of the token ranges it will become
	responsible for to stream data from. By default it will stream from the primary replica of each token range in order to
	guarantee data in the new node will be consistent with the current state.

	In the case of any unavailable replica, the consistent bootstrap process will fail. To override this behavior and
	potentially miss data from an unavailable replica, set the JVM flag ``-Dcassandra.consistent.rangemovement=false``.

	Resuming failed/hanged bootstrap
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	On 2.2+, if the bootstrap process fails, it's possible to resume bootstrap from the previous saved state by calling
	``nodetool bootstrap resume``. If for some reason the bootstrap hangs or stalls, it may also be resumed by simply
	restarting the node. In order to cleanup bootstrap state and start fresh, you may set the JVM startup flag
	``-Dcassandra.reset_bootstrap_progress=true``.

	On lower versions, when the bootstrap proces fails it is recommended to wipe the node (remove all the data), and restart
	the bootstrap process again.

	Manual bootstrapping
	~~~~~~~~~~~~~~~~~~~~

	It's possible to skip the bootstrapping process entirely and join the ring straight away by setting the hidden parameter
	``auto_bootstrap: false``. This may be useful when restoring a node from a backup or creating a new data-center.

	Removing nodes
	^^^^^^^^^^^^^^

	You can take a node out of the cluster with ``nodetool decommission`` to a live node, or ``nodetool removenode`` (to any
	other machine) to remove a dead one. This will assign the ranges the old node was responsible for to other nodes, and
	replicate the appropriate data there. If decommission is used, the data will stream from the decommissioned node. If
	removenode is used, the data will stream from the remaining replicas.

	No data is removed automatically from the node being decommissioned, so if you want to put the node back into service at
	a different token on the ring, it should be removed manually.

	Moving nodes
	^^^^^^^^^^^^

	When ``num_tokens: 1`` it's possible to move the node position in the ring with ``nodetool move``. Moving is both a
	convenience over and more efficient than decommission + bootstrap. After moving a node, ``nodetool cleanup`` should be
	run to remove any unnecessary data.

	Replacing a dead node
	^^^^^^^^^^^^^^^^^^^^^

	In order to replace a dead node, start cassandra with the JVM startup flag
	``-Dcassandra.replace_address_first_boot=<dead_node_ip>``. Once this property is enabled the node starts in a hibernate
	state, during which all the other nodes will see this node to be DOWN (DN), however this node will see itself as UP
	(UN). Accurate replacement state can be found in ``nodetool netstats``.

	The replacing node will now start to bootstrap the data from the rest of the nodes in the cluster. A replacing node will
	only receive writes during the bootstrapping phase if it has a different ip address to the node that is being replaced.
	(See CASSANDRA-8523 and CASSANDRA-12344)

	Once the bootstrapping is complete the node will be marked "UP".

	.. Note:: If any of the following cases apply, you MUST run repair to make the replaced node consistent again, since
	it missed ongoing writes during/prior to bootstrapping. The replacement timeframe refers to the period from when the
	node initially dies to when a new node completes the replacement process.

	1. The node is down for longer than ``max_hint_window_in_ms`` before being replaced.
	2. You are replacing using the same IP address as the dead node and replacement takes longer than ``max_hint_window_in_ms``.

	Monitoring progress
	^^^^^^^^^^^^^^^^^^^

	Bootstrap, replace, move and remove progress can be monitored using ``nodetool netstats`` which will show the progress
	of the streaming operations.

	Cleanup data after range movements
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	As a safety measure, Cassandra does not automatically remove data from nodes that "lose" part of their token range due
	to a range movement operation (bootstrap, move, replace). Run ``nodetool cleanup`` on the nodes that lost ranges to the
	joining node when you are satisfied the new node is up and working. If you do not do this the old data will still be
	counted against the load on that node.