docs/UserGuide/Cluster/Cluster-Concept.md - iotdb - Git at Google

 <!--

 ```
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 ```

 -->

 ## Basic Concepts of IoTDB Cluster

 Apache IoTDB Cluster contains two types of nodes: ConfigNode and DataNode, each is a process that could be deployed independently.

 An illustration of the cluster architecture：

 <img style="width:100%; max-width:500px; max-height:400px; margin-left:auto; margin-right:auto; display:block;" src="https://alioss.timecho.com/docs/img/UserGuide/Cluster/Architecture.png?raw=true">

 ConfigNode is the control node of the cluster, which manages the cluster's node status, partition information, etc. All ConfigNodes in the cluster form a highly available group, which is fully replicated.

 Notice：The replication factor of ConfigNode is all ConfigNodes that has joined the Cluster. Over half of the ConfigNodes is Running could the cluster work.

 DataNode stores the data and schema of the cluster, which manages multiple data regions and schema regions. Data is a time-value pair, and schema is the path and data type of each time series.

 Client could only connect to the DataNode for operation.

 ### Concepts

 | Concept           | Type                             | Description                                                                                                                                 |
 |:------------------|:---------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------|
 | ConfigNode        | node role                        | Configuration node, which manages cluster node information and partition information, monitors cluster status and controls load balancing   |
 | DataNode          | node role                        | Data node, which manages data and meta data                                                                                                 |
 | Database          | meta data                        | Database, data are isolated physically from different databases                                                                             |
 | DeviceId          | device id                        | The full path from root to the penultimate level in the metadata tree represents a device id                                                |
 | SeriesSlot        | schema partition                 | Each database contains many SeriesSlots, the partition key being DeviceId                                                                   |
 | SchemaRegion      | schema region                    | A collection of multiple SeriesSlots                                                                                                        |
 | SchemaRegionGroup | logical concept                  | The number of SchemaRegions contained in group is the number of schema replicas, it manages the same schema data, and back up each other    |
 | SeriesTimeSlot    | data partition                   | The data of a time interval of SeriesSlot, a SeriesSlot contains multiple SeriesTimeSlots, the partition key being timestamp                |
 | DataRegion        | data region                      | A collection of multiple SeriesTimeSlots                                                                                                    |
 | DataRegionGroup   | logical concept                  | The number of DataRegions contained in group is the number of data replicas, it manages the same data, and back up each other               |

 ## Characteristics of Cluster

 * Native Cluster Architecture
     * All modules are designed for cluster.
     * Standalone is a special form of Cluster.
 * High Scalability
     * Support adding nodes in a few seconds without data migration.
 * Massive Parallel Processing Architecture
     * Adopt the MPP architecture and volcano module for data processing, which have high extensibility.
 * Configurable Consensus Protocol
     * We could adopt different consensus protocol for data replicas and schema replicas.
 * Extensible Partition Strategy
     * The cluster adopts the lookup table for data and schema partitions, which is flexible to extend.
 * Built-in Metric Framework
     * Monitor the status of each node in cluster.

 ## Partitioning Strategy

 The partitioning strategy partitions data and schema into different Regions, and allocates Regions to different DataNodes.

 It is recommended to set 1 database, and the cluster will dynamically allocate resources according to the number of nodes and cores.

 The database contains multiple SchemaRegions and DataRegions, which are managed by DataNodes.

 * Schema partition strategy
     * For a time series schema, the ConfigNode maps the device ID (full path from root to the penultimate tier node) into a SeriesSlot and allocate this SeriesSlot to a SchemaRegionGroup.
 * Data partition strategy
     * For a time series data point, the ConfigNode will map to a SeriesSlot according to the DeviceId, and then map it to a SeriesTimeSlot according to the timestamp, and allocate this SeriesTimeSlot to a DataRegionGroup.

 IoTDB uses a slot-based partitioning strategy, so the size of the partition information is controllable and does not grow infinitely with the number of time series or devices.

 Regions will be allocated to different DataNodes to avoid single point of failure, and the load balance of different DataNodes will be ensured when Regions are allocated.

 ## Replication Strategy

 The replication strategy replicates data in multiple replicas, which are copies of each other. Multiple copies can provide high-availability services together and tolerate the failure of some copies.

 A region is the basic unit of replication. Multiple replicas of a region construct a high-availability RegionGroup.

 * Replication and consensus
   * ConfigNode Group: Consisting of all ConfigNodes.
   * SchemaRegionGroup: The cluster has multiple SchemaRegionGroups, and each SchemaRegionGroup has multiple SchemaRegions with the same id.
   * DataRegionGroup: The cluster has multiple DataRegionGroups, and each DataRegionGroup has multiple DataRegions with the same id.

 An illustration of the partition allocation in cluster:

 <img style="width:100%; max-width:500px; max-height:500px; margin-left:auto; margin-right:auto; display:block;" src="https://alioss.timecho.com/docs/img/UserGuide/Cluster/Data-Partition.png?raw=true">

 The figure contains 1 SchemaRegionGroup, and the schema_replication_factor is 3, so the 3 white SchemaRegion-0s form a replication group.

 The figure contains 3 DataRegionGroups, and the data_replication_factor is 3, so there are 9 DataRegions in total.

 ## Consensus Protocol (Consistency Protocol)

 Among multiple Regions of each RegionGroup, consistency is guaranteed through a consensus protocol, which routes read and write requests to multiple replicas.

 * Current supported consensus protocol
   * SimpleConsensus：Provide strong consistency, could only be used when replica is 1, which is the empty implementation of the consensus protocol.
   * IoTConsensus：Provide eventual consistency, could be used in any number of replicas, 2 replicas could avoid single point failure, only for DataRegion, writings can be applied on each replica and replicated asynchronously to other replicas.
   * RatisConsensus：Provide Strong consistency, using raft consensus protocol, Could be used in any number of replicas, and could be used for any region groups.
   Currently, DataRegion uses RatisConsensus does not support multiple data directories. This feature is planned to be supported in future releases.
	<!--

	```
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	```

	-->

	## Basic Concepts of IoTDB Cluster

	Apache IoTDB Cluster contains two types of nodes: ConfigNode and DataNode, each is a process that could be deployed independently.

	An illustration of the cluster architecture：

	<img style="width:100%; max-width:500px; max-height:400px; margin-left:auto; margin-right:auto; display:block;" src="https://alioss.timecho.com/docs/img/UserGuide/Cluster/Architecture.png?raw=true">

	ConfigNode is the control node of the cluster, which manages the cluster's node status, partition information, etc. All ConfigNodes in the cluster form a highly available group, which is fully replicated.

	Notice：The replication factor of ConfigNode is all ConfigNodes that has joined the Cluster. Over half of the ConfigNodes is Running could the cluster work.

	DataNode stores the data and schema of the cluster, which manages multiple data regions and schema regions. Data is a time-value pair, and schema is the path and data type of each time series.

	Client could only connect to the DataNode for operation.

	### Concepts

	\| Concept \| Type \| Description \|
	\|:------------------\|:---------------------------------\|:--------------------------------------------------------------------------------------------------------------------------------------------\|
	\| ConfigNode \| node role \| Configuration node, which manages cluster node information and partition information, monitors cluster status and controls load balancing \|
	\| DataNode \| node role \| Data node, which manages data and meta data \|
	\| Database \| meta data \| Database, data are isolated physically from different databases \|
	\| DeviceId \| device id \| The full path from root to the penultimate level in the metadata tree represents a device id \|
	\| SeriesSlot \| schema partition \| Each database contains many SeriesSlots, the partition key being DeviceId \|
	\| SchemaRegion \| schema region \| A collection of multiple SeriesSlots \|
	\| SchemaRegionGroup \| logical concept \| The number of SchemaRegions contained in group is the number of schema replicas, it manages the same schema data, and back up each other \|
	\| SeriesTimeSlot \| data partition \| The data of a time interval of SeriesSlot, a SeriesSlot contains multiple SeriesTimeSlots, the partition key being timestamp \|
	\| DataRegion \| data region \| A collection of multiple SeriesTimeSlots \|
	\| DataRegionGroup \| logical concept \| The number of DataRegions contained in group is the number of data replicas, it manages the same data, and back up each other \|

	## Characteristics of Cluster

	* Native Cluster Architecture
	* All modules are designed for cluster.
	* Standalone is a special form of Cluster.
	* High Scalability
	* Support adding nodes in a few seconds without data migration.
	* Massive Parallel Processing Architecture
	* Adopt the MPP architecture and volcano module for data processing, which have high extensibility.
	* Configurable Consensus Protocol
	* We could adopt different consensus protocol for data replicas and schema replicas.
	* Extensible Partition Strategy
	* The cluster adopts the lookup table for data and schema partitions, which is flexible to extend.
	* Built-in Metric Framework
	* Monitor the status of each node in cluster.

	## Partitioning Strategy

	The partitioning strategy partitions data and schema into different Regions, and allocates Regions to different DataNodes.

	It is recommended to set 1 database, and the cluster will dynamically allocate resources according to the number of nodes and cores.

	The database contains multiple SchemaRegions and DataRegions, which are managed by DataNodes.

	* Schema partition strategy
	* For a time series schema, the ConfigNode maps the device ID (full path from root to the penultimate tier node) into a SeriesSlot and allocate this SeriesSlot to a SchemaRegionGroup.
	* Data partition strategy
	* For a time series data point, the ConfigNode will map to a SeriesSlot according to the DeviceId, and then map it to a SeriesTimeSlot according to the timestamp, and allocate this SeriesTimeSlot to a DataRegionGroup.

	IoTDB uses a slot-based partitioning strategy, so the size of the partition information is controllable and does not grow infinitely with the number of time series or devices.

	Regions will be allocated to different DataNodes to avoid single point of failure, and the load balance of different DataNodes will be ensured when Regions are allocated.

	## Replication Strategy

	The replication strategy replicates data in multiple replicas, which are copies of each other. Multiple copies can provide high-availability services together and tolerate the failure of some copies.

	A region is the basic unit of replication. Multiple replicas of a region construct a high-availability RegionGroup.

	* Replication and consensus
	* ConfigNode Group: Consisting of all ConfigNodes.
	* SchemaRegionGroup: The cluster has multiple SchemaRegionGroups, and each SchemaRegionGroup has multiple SchemaRegions with the same id.
	* DataRegionGroup: The cluster has multiple DataRegionGroups, and each DataRegionGroup has multiple DataRegions with the same id.

	An illustration of the partition allocation in cluster:

	<img style="width:100%; max-width:500px; max-height:500px; margin-left:auto; margin-right:auto; display:block;" src="https://alioss.timecho.com/docs/img/UserGuide/Cluster/Data-Partition.png?raw=true">

	The figure contains 1 SchemaRegionGroup, and the schema_replication_factor is 3, so the 3 white SchemaRegion-0s form a replication group.

	The figure contains 3 DataRegionGroups, and the data_replication_factor is 3, so there are 9 DataRegions in total.

	## Consensus Protocol (Consistency Protocol)

	Among multiple Regions of each RegionGroup, consistency is guaranteed through a consensus protocol, which routes read and write requests to multiple replicas.

	* Current supported consensus protocol
	* SimpleConsensus：Provide strong consistency, could only be used when replica is 1, which is the empty implementation of the consensus protocol.
	* IoTConsensus：Provide eventual consistency, could be used in any number of replicas, 2 replicas could avoid single point failure, only for DataRegion, writings can be applied on each replica and replicated asynchronously to other replicas.
	* RatisConsensus：Provide Strong consistency, using raft consensus protocol, Could be used in any number of replicas, and could be used for any region groups.
	Currently, DataRegion uses RatisConsensus does not support multiple data directories. This feature is planned to be supported in future releases.