The diagram above shows the component dependency diagram and provides an order in which compomnents may be initialized.
For a cluster to become operational, the metastorage instance must be initialized first. The initialization command chooses a set of nodes (normally, 3 - 5 nodes) to host the distributed metastorage Raft group. When a node receives the initialization command, it either creates a bootstrapped Raft instance with the given members (if this is a metastorage group node), or writes the metastorage group member IDs to the vault as a private system-level property.
After the metastorage is initialized, components start to receive and process watch events, updating the local state according to the changes received from the watch.
An entry point to user-initiated cluster state changes is cluster configuration. Configuration module provides convenient ways for managing configuration both as Java API, as well as from ignite
command line utility.
Any configuration change is translated to a metastorage multi-update and has a single configuration change ID. This ID is used to enforce CAS-style configuration changes and to ensure no duplicate configuration changes are executed during the cluster runtime. To reliably process cluster configuration changes, we introduce an additional metastorage key
internal.configuration.applied=<change ID>
that indicates the configuration change ID that was already processed and corresponding changes are written to the metastorage. Whenever a node processes a configuration change, it must also conditionally update the internal.configuration.applied
value checking that the previous value is smaller than the change ID being applied. This prevents configuration changes being processed more than once. Any metastorage update that processes configuration change must update this key to indicate that this configuraion change has been already processed. It is safe to process the same configuration change more than once since only one update will be applied.
All cluster state is written and maintained in the metastorage. Nodes may update some state in the metastorage, which may require a recomputation of some other metastorage properties (for example, when cluster baseline changes, Ignite needs to recalculate table partition assignments). In other words, some properties in the metastore are dependent on each other and we may need to reliably update one property in response to an update to another.
To facilitate this pattern, Ignite uses the metastorage ability to replay metastorage changes from a certain revision called [watch](TODO link to metastorage watch). To process watch updates reliably, we associate a special persistent value called applied revision
(stored in the vault) with each watch. We rely on the following assumptions about the reliable watch processing:
applied revision
is propagated only after the metastorage confirmed that the proposed change is committed (note that in this case it does not matter whether this particular multi-update succeeds or not: we know that the first multi-update will succeed, and all further updates are idempotent, so we need to make sure that at least one multi-update is committed).applied revision
value.In a case of a crash, each watch is restared from the revision stored in the corresponding applied revision
variable of the watch, and not processed events are replayed.
It's possible to stop node both when node was already started or during the startup process, in that case node stop will prevent any new components startup and stop already started ones.
Following method was added to Ignition interface:
/** * Stops the node with given {@code name}. * It's possible to stop both already started node or node that is currently starting. * Has no effect if node with specified name doesn't exist. * * @param name Node name to stop. * @throws IllegalArgumentException if null is specified instead of node name. */ public void stop(String name);
It's also possible to stop a node by calling close()
on an already started Ignite instance.
As a starting point stop process checks node status:
In all cases the node stop process consists of two phases:
onNodeStop()
will be called on all started components in reverse order, meaning that the last started component will run onNodeStop()
first. For most components onNodeStop()
will be No-op. Core idea here is to stop network communication on onNodeStop()
in order to terminate distributed operations gracefully: no network communication is allowed but node local logic still remains consistent.stop()
will be called on all started components also in reverse order, here thread stopping logic, inner structures cleanup and other related logic takes it time. Please pay attention that at this point network communication isn't possible.Besides local node stopping logic two more actions took place on a cluster as a result of node left event: