Monitor Tool

The deployment of monitoring tools can be referenced in the document Monitoring Panel Deployment chapter.

1. Prometheus Mapping Relationship for Monitoring Metrics

For a monitoring metric with Metric Name as name, Tags as K1=V1, ..., Kn=Vn, the following mapping applies, where value is the specific value.

Monitoring Metric TypeMapping Relationship
Countername_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
AutoGauge, Gaugename{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
Histogramname_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value
Ratename_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="mean"} value
Timername_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value

2. Modifying Configuration Files

  1. Taking DataNode as an example, modify the iotdb-system.properties configuration file as follows:
dn_metric_reporter_list=PROMETHEUS
dn_metric_level=CORE
dn_metric_prometheus_reporter_port=9091
  1. Start the IoTDB DataNode.

  2. Open a browser or use curl to access http://server_ip:9091/metrics, and you will get metric data as follows:

...
# HELP file_count
# TYPE file_count gauge
file_count{name="wal",} 0.0
file_count{name="unseq",} 0.0
file_count{name="seq",} 2.0
...

3. Prometheus + Grafana

As shown above, IoTDB exposes monitoring metrics in the standard Prometheus format. You can use Prometheus to collect and store these metrics and Grafana to visualize them.

The relationship between IoTDB, Prometheus, and Grafana is illustrated below:

iotdb_prometheus_grafana

  1. IoTDB continuously collects monitoring metrics during operation.
  2. Prometheus pulls monitoring metrics from IoTDB's HTTP interface at fixed intervals (configurable).
  3. Prometheus stores the pulled monitoring metrics in its TSDB.
  4. Grafana queries monitoring metrics from Prometheus at fixed intervals (configurable) and visualizes them.

From the interaction flow, it is clear that additional work is required to deploy and configure Prometheus and Grafana.

For example, you can configure Prometheus as follows (some parameters can be adjusted as needed) to pull metrics from IoTDB:

job_name: pull-metrics
honor_labels: true
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
static_configs:
  - targets:
      - localhost:9091

For more details, refer to the following documents:

4. Apache IoTDB Dashboard

The Apache IoTDB Dashboard is a companion product of IoTDB Enterprise Edition, supporting unified centralized operation and maintenance management. It allows monitoring multiple clusters through a single monitoring panel. You can contact the business team to obtain the Dashboard's JSON file.

Apache IoTDB Dashboard

Apache IoTDB Dashboard

4.1 Cluster Overview

You can monitor, but not limited to:

  • Total CPU cores, total memory space, total disk space of the cluster.
  • Number of ConfigNodes and DataNodes in the cluster.
  • Cluster uptime.
  • Cluster write speed.
  • Current CPU, memory, and disk usage of each node in the cluster.
  • Node-specific information.

4.2 Data Writing

You can monitor, but not limited to:

  • Average write latency, median latency, 99th percentile latency.
  • Number and size of WAL files.
  • Node WAL flush SyncBuffer latency.

4.3 Data Query

You can monitor, but not limited to:

  • Node query loading time series metadata latency.
  • Node query reading time series latency.
  • Node query modifying time series metadata latency.
  • Node query loading Chunk metadata list latency.
  • Node query modifying Chunk metadata latency.
  • Node query filtering by Chunk metadata latency.
  • Node query constructing Chunk Reader latency average.

4.4 Storage Engine

You can monitor, but not limited to:

  • Number and size of files by type.
  • Number and size of TsFiles in various stages.
  • Number and latency of various tasks.

4.5 System Monitoring

You can monitor, but not limited to:

  • System memory, swap memory, process memory.
  • Disk space, file count, file size.
  • JVM GC time ratio, GC count by type, GC data volume, heap memory usage by generation.
  • Network transmission rate, packet sending rate.