tree: 354b9eeb4ad7309c8ead3b194d7e1485ee52f0da [path history] [tgz]
  1. modules/
  2. src/
  3. pom.xml
  4. README.md
modules/topology-validator-ext/README.md

#What problem this module is intended to solve?

Some network issues can cause the Ignite cluster to split into several isolated parts - segments. Nodes from different segments cannot communicate with each other, while nodes from the same segment do not experience communication problems. In this case, each segment marks the nodes with which the connection was lost as failed and considers itself as an independent Ignite cluster. Let's call this scenario cluster segmentation.

Cluster segmentation can lead to cache data inconsistency across different segments because each segment can continue to handle cache update requests independently.

Apache Ignite allows the user to provide custom validation logic for Ignite caches that will be applied to each topology change, and if the validation fails, writes to the corresponding cache will be blocked. The mentioned validation logic is passed to Ignite as an TopologyValidation interface implementation. It can be done through cache configuration or through Ignite plugin extensions mechanism (see CacheTopologyValidatorProvider interface).

This module represents an implementation of the Ignite plugin that provides the guarantee that after cluster segmentation, no more than one segment can process write requests to all caches. This is achieved by providing implementation of the TopologyValidation interface as mentioned above.

The current implementation of TopologyValidation uses remaining Ignite baseline nodes in the topology to determine segmentation.

#In what cases cache writes will be blocked for the segment?

The following rules are used to determine which segment can process cache write requests after segmentation and which cannot:

  1. The segment is allowed to process cache writes requests after segmentation if and only if more than configured fraction of the baseline nodes remain in the segment, otherwise all writes to the cache will be blocked.
  2. If the cluster is split into two equal segments, writing to both of them will be blocked.
  3. Since Ignite treats segmentation as sequential node failures, even a single node failure in an active cluster in which alive baseline nodes count is less or equals to segmentation threshold is considered as segmentation and results in write block for all caches.

#Configuration

  1. Configure CacheTopologyValidatorPluginProvider on each server node:

    new IgniteConfiguration() 
        ... 
        .setPluginProviders(new CacheTopologyValidatorPluginProvider());
    
  2. Configure baseline nodes explicitly, or configure baseline nodes auto adjustment with a timeout that significantly exceeds the node failure detection timeout. It can be done through Java Api or through control script. See [1] and [2] for more info.

Note that it is illegal to use baseline nodes auto adjustment with a zero timeout along with current TopologyValidator implementation.

  1. Configure deactivation threshold. The deactivation threshold is a fraction of nodes that determines how many nodes must remain in the baseline topology in order to this segment was considered valid and continued to accept write requests. This value must be in range from 0.5 (inclusively) to 1. Default value is 0.5. If the default value suits you, nothing to do is required.

To set up custom deactivation threshold value set the org.apache.ignite.topology.validator.deactivation.threshold distributed configuration property via control script (see https://ignite.apache.org/docs/latest/tools/control-script#working-with-cluster-properties)

#Manual segmentation resolving

The state of each segment for which cache writes were blocked will be eventually switched to the READ-ONLY mode. Manually switching the cluster state back to ACTIVE mode will restore cache write availability. It can be done through Java Api or through control script. See [1] and [2] for more info.

[1] - https://ignite.apache.org/docs/latest/clustering/baseline-topology
[2] - https://ignite.apache.org/docs/latest/tools/control-script#activation-deactivation-and-topology-management