Apache IoTDB Cluster contains two types of nodes: ConfigNode and DataNode, each is a process that could be deployed independently.
An illustration of the cluster architecture:
ConfigNode is the control node of the cluster, which manages the cluster's node status, partition information, etc. All ConfigNodes in the cluster form a highly available group, which is fully replicated.
Notice:The replication factor of ConfigNode is all ConfigNodes that has joined the Cluster. Over half of the ConfigNodes is Running could the cluster work.
DataNode stores the data and schema of the cluster, which manages multiple data regions and schema regions. Data is a time-value pair, and schema is the path and data type of each time series.
Client could only connect to the DataNode for operation.
| Concept | Type | Description |
|---|---|---|
| ConfigNode | node role | Configuration node, which manages cluster node information and partition information, monitors cluster status and controls load balancing |
| DataNode | node role | Data node, which manages data and meta data |
| Database | meta data | Database, data are isolated physically from different databases |
| DeviceId | device id | The full path from root to the penultimate level in the metadata tree represents a device id |
| SeriesSlot | schema partition | Each database contains many SeriesSlots, the partition key being DeviceId |
| SchemaRegion | schema region | A collection of multiple SeriesSlots |
| SchemaRegionGroup | logical concept | The number of SchemaRegions contained in group is the number of schema replicas, it manages the same schema data, and back up each other |
| SeriesTimeSlot | data partition | The data of a time interval of SeriesSlot, a SeriesSlot contains multiple SeriesTimeSlots, the partition key being timestamp |
| DataRegion | data region | A collection of multiple SeriesTimeSlots |
| DataRegionGroup | logical concept | The number of DataRegions contained in group is the number of data replicas, it manages the same data, and back up each other |
The partitioning strategy partitions data and schema into different Regions, and allocates Regions to different DataNodes.
It is recommended to set 1 database, and the cluster will dynamically allocate resources according to the number of nodes and cores.
The database contains multiple SchemaRegions and DataRegions, which are managed by DataNodes.
IoTDB uses a slot-based partitioning strategy, so the size of the partition information is controllable and does not grow infinitely with the number of time series or devices.
Regions will be allocated to different DataNodes to avoid single point of failure, and the load balance of different DataNodes will be ensured when Regions are allocated.
The replication strategy replicates data in multiple replicas, which are copies of each other. Multiple copies can provide high-availability services together and tolerate the failure of some copies.
A region is the basic unit of replication. Multiple replicas of a region construct a high-availability RegionGroup.
An illustration of the partition allocation in cluster:
The figure contains 1 SchemaRegionGroup, and the schema_replication_factor is 3, so the 3 white SchemaRegion-0s form a replication group.
The figure contains 3 DataRegionGroups, and the data_replication_factor is 3, so there are 9 DataRegions in total.
Among multiple Regions of each RegionGroup, consistency is guaranteed through a consensus protocol, which routes read and write requests to multiple replicas.