Adjusting the configuration, such as replacing failed servers or changing replication levels, is sometimes necessary in practice. Ratis facilitates these functionalities through joint consensus, allowing for simultaneous addition or replacement of multiple servers in a cluster.
For details on how joint consensus algorithm works, please refer to section 4.3 in Raft Paper
During membership changes, the availability guarantee of Raft is more vulnerable than usual. Agreement (for elections and entry commitment) requires separate majorities from both the old and new configurations. For example, when changing from a cluster of 3 servers to a different cluster of 9 servers, agreement requires both 2 of the 3 servers in the old configuration and 5 of the 9 servers in the new configuration. Be careful to keep both separate majorities online!
To add a new node (e.g., N3
) to an existing group (e.g., N0
, N1
, N2
), follow these steps:
N3
with EMPTY group.RaftServer N3 = RaftServer.newBuilder() .setGroup(RaftGroup.emptygroup()) .setProperties(properties) .setServerId(n3id) .setStateMachine(userStateMachine) .build(); N3.start()
setConfiguration
method in the AdminApi with the new group as the parameter. It will wait for the new peer to catch up before returning the reply.reply = client.admin().setConfiguration(List.of(N0, N1, N2, N3))
⚠️ Note
Avoid starting
N3
with the group (N0
,N1
,N2
,N3
) as it may lead to shutdown scenarios, particularly ifN3
initiates a leader election before being recognized by original peers.
To remove an existing node (e.g., N3
) from a group (e.g., N0
, N1
, N2
, N3
), follow these steps:
setConfiguration
request to inform about the configuration change.reply = client.admin().setConfiguration(List.of(N0, N1, N2))
reply.isSuccess()
Note that N3
will automatically shut down. The data in N3
can be safely deleted.
To perform multiple member changes at one time, like replacing two old nodes with new ones in a five-node cluster (changing from an existing group {N0
, N1
, N2
, N3
, N4
} to a new group {N0
, N1
, N2
, N5
, N6
}), follow these steps:
setConfiguration(List.of(N0, N1, N2, N5, N6))
request to the original group.The removed peers N3
and N4
will automatically shut down.
For full code examples, see examples/membership/server.
ADD
Mode and COMPARE_AND_SET
ModeThere are different setConfiguration
modes:SET
(default), ADD
and COMPARE_AND_SET
.
SET
: The leader will consider the given peers in request to be the new group.ADD
: The leader will only add the new peers in request to the existing group and will not remove any existing peers.COMPARE_AND_SET
: The leader will first compare between the current configuration and the provided configuration in request, and it will accept the new configuration only if the current configuration equals to the provided configuration in request.For the ADD
mode and the COMPARE_AND_SET
mode, build a SetConfigurationRequest.Arguments
object and then pass it to setConfiguration(SetConfigurationRequest.Arguments).
Majority-add is defined as below:
setConfiguration
request. In other words, adding n
members to a group of m
members in a setConfiguration
request, where n >= m
.Note that, when a setConfiguration
request removes and adds members at the same time, the majority is counted after the removal. For examples, setConfiguration
to a 3-member group by adding 2 new members is NOT a majority-add. However, setConfiguration
to a 3-member group by removing 2 of members and adding 2 new members at the same time is a majority-add.
Majority-add is unsafe since it can end up in a no-leader state which cannot be recovered automatically. The following property is to enable/disable majority-add.
raft.server.leaderelection.member.majority-add
(boolean
, default=false
)In a monitored environment, majority-add can be enabled in order to reduce the number of setConfiguration
calls. As an example, change from non-HA single server to a HA 3-server group can be done in a single setConfiguration
call, instead of 2 setConfiguration
calls with majority-add disabled. In case the no-leader state problem happens, the monitor (a human or a program) can fix it by restarting the servers.