| # Catching up process |
| |
| ## Entry point: |
| |
| We have a closure named `NodeImpl.OnCaughtUp`, which is responsible for the catching up process for every stale node/replicator on a leader. |
| This closure is created every time we call `NodeImpl.ConfigurationCtx#addNewPeers` which happens on a raft configuration change, for example |
| when we call `NodeImpl#changePeerAndLearners`. In `NodeImpl.ConfigurationCtx#addNewPeers` method we assign `OnCaughtUp` closure with a corresponding |
| replicator for a stale node. This is done inside `ReplicatorGroupImpl#waitCaughtUp` |
| by calling `Replicator#waitForCaughtUp`. To be more precise, we save the closure in a field `Replicator#catchUpClosure` and also we schedule |
| timer on a replicator to call `Replicator#onCatchUpTimedOut` (by default it is called after election timeout). |
| |
| ## Closure invocation |
| |
| So, we saved closure, it is time to discuss what happens when the closure is run. It happens in `Replicator#notifyOnCaughtUp`, we call this |
| method with a status of success or failure of a catching up process and propagate it to the closure by setting `Status.setError(int, |
| java.lang.String, java.lang.Object...)`. When the closure is run, `NodeImpl#onCaughtUp` is called, this method checks the status of a process |
| and here we have several outcomes: |
| |
| - `Status` is `OK` and `NodeImpl.ConfigurationCtx#onCaughtUp` with a success flag equals true is called, so we can move |
| to `NodeImpl.ConfigurationCtx.nextStage` in a configuration changing process. |
| - `Status` is `Error` and more specific, it is timeout, so we retry catching up process by creating a new `NodeImpl.OnCaughtUp` closure |
| calling the same `Replicator#waitForCaughtUp` that we described before. |
| - If retrying went wrong or `Status` is `Error` and is not a timeout, we call `NodeImpl.ConfigurationCtx#onCaughtUp` with a success flag |
| equals false, so the whole process of a configuration changing is reset with `RaftError.ECATCHUP` and corresponding |
| `RaftGroupEventsListener#onReconfigurationError` is called. Note that we do not preserve the original reason for failed catch up. Also, it |
| is important, that `RaftGroupEventsListener#onReconfigurationError` can be called, when a current leader stepped down. In that case |
| `ConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status)` with status equals `null` will be called, and this `null` value will be |
| passed to `RaftGroupEventsListener#onReconfigurationError`. |
| |
| ## Where the closure is invoked |
| |
| Now let's discuss when this closure is run. As we said before, it happens in `Replicator#notifyOnCaughtUp`, |
| so lets track who call `Replicator#Replicator#notifyOnCaughtUp` |
| |
| Calls with successful statuses: |
| - `Replicator#onInstallSnapshotReturned` -- called after a successful installation of a snapshot |
| - `Replicator#onAppendEntriesReturned` -- called after a successful appending of new entries from a leader |
| |
| Calls with error statuses: |
| |
| * `Replicator#onCatchUpTimedOut` with `RaftError.ETIMEDOUT`, called when a timer event happens. As was described before, this timer is started |
| when we call `Replicator#waitForCaughtUp`. |
| * `Replicator#onHeartbeatReturned`, `Replicator#onAppendEntriesReturned`, or `Replicator#onTimeoutNowReturned` with `RaftError.EPERM`, |
| called when a follower returns a term higher than a leader's current term. This is a general check for RPC calls where we check terms and |
| decide, should we step down or not. |
| * `Replicator#onTimeoutNowReturned` with `RaftError.ESTOP`, called when we passed to the method flag `stopAfterFinish` equals true. It happens |
| when a leader is stepped down and we try to wake up a potential candidate for the optimisation purposes |
| (see [waking up optimisation](#Waking-up-optimisation)), so we call |
| `Replicator#sendTimeoutNowAndStop(this.wakingCandidate, this.options.getElectionTimeoutMs())` on a leader. For more details see |
| `NodeImpl#stepDown` |
| * `Replicator#onError` with `RaftError.ESTOP`. This is a general case when some replicator was stopped. For example, it might happen when |
| a leader stepped down, or when a node was shutdown, etc. Let's consider all places where `Replicator#onError` with `RaftError.ESTOP` can happen, to |
| do that we need to trace `Replicator#stop` |
| - `NodeImpl#shutdown(org.apache.ignite.raft.jraft.Closure)` -- node shutdown case |
| - `ReplicatorGroupImpl#stopAll` -- happens when a leader steps down, including stopping all replicators. See `NodeImpl#stepDown`. |
| - `ReplicatorGroupImpl#stopReplicator` -- this happens when we call `ConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status)`, |
| when we successfully or not successfully changed configuration, so we have to start or stop replicators for new peers. |
| - `ReplicatorGroupImpl#stopAllAndFindTheNextCandidate` -- called when a leader step down, in case we make |
| [waking up optimisation](#Waking-up-optimisation) |
| |
| |
| ## Waking up optimisation |
| |
| When a leader faces some problem, it makes some optimisation when it steps down, to start a new voting with a new candidate immediately. In |
| that case, instead of stopping all replicators as usual, it preserves one replicator for stopping and sends `TimeoutNowRequest` to it. When |
| the node receives that request, it elects itself and starts voting. Failed leader chose such a node by searching for the node with the largest |
| log id among peers in the current configuration. For more details see `ReplicatorGroupImpl#stopAllAndFindTheNextCandidate` |