健康检查工具

1. 概述

IoTDB 健康检查工具是一个用于检测 IoTDB 节点运行环境的工具。它可以帮助用户在安装部署数据库前或运行期间检查节点的运行环境,并获取详细的检查结果。

2. 前置要求

Linux 系统

  • nc(netcat)工具:默认已安装,用户需要有权限执行。
  • lsofnetstat:至少安装其中一个,用户需要有权限执行。

检查相应工具是否已安装:

检查 nc 是否安装:nc -h

检查 lsof 是否安装:lsof -v

Windows 系统

  • PowerShell:默认已启动。

3. 检查项

  • 检查节点所在服务器的端口占用情况(windows/linux)
  • 检查当前节点与集群中其他节点的端口连通性(windows/linux)
  • 检查系统中是否安装了 JDK(java_home)(windows/linux)
  • 检查系统内存分配情况,检查 IoTDB 内存分配情况(windows/linux)
  • 检查目录访问权限(windows/linux)
  • 检查系统最大打开文件数是否满足要求(>= 65535)(仅 linux)
  • 检查系统是否禁用了 swap(windows/linux)

4. 使用方法

4.1 命令格式

health_check.sh/health_check.bat -ips<远程服务器IP+端口> , -o <all(default)/remote/local>

4.2 参数说明

参数说明是否必填
-ips远程服务器 IP 和端口,支持检查多个服务器,格式如下:ip port1 port2,ip2 port2-1 port2-2非必填
-o检查参数,可选值为local(本机检查)、remote(远程服务器端口连接性检查)、all(本机和远程服务器端口一起检查),默认值为all非必填

5. 使用方法

5.1 示例 1:检查全部

health_check.sh/health_check.bat -ips 172.20.31.19 6667 18080,10.0.6.230 10311

输出结果:

Check: Installation Environment(JDK)
Requirement: JDK Version >=1.8
Result: JDK Version 11.0.21

Check: Installation Environment(Memory)
Requirement: Allocate sufficient memory for IoTDB
Result: Total Memory 7.8Gi, 2.33 G allocated to IoTDB ConfigNode, 3.88 G allocated to IoTDB DataNode

Check: Installation Environment(Directory Access)
Requirement: IoTDB needs data/datanode/data,data/datanode/consensus,data/datanode/system,data/datanode/wal,data/confignode/system,data/confignode/consensus,ext/pipe,ext/udf,ext/trigger write permission.
Result: 
data/datanode/data has write permission
data/datanode/consensus has write permission
data/datanode/system has write permission
data/datanode/wal has write permission
data/confignode/system has write permission
data/confignode/consensus has write permission
ext/pipe has write permission
ext/udf has write permission
ext/trigger has write permission

Check: Network(Local Port)
Requirement: Port 16668 10730 11742 10750 10760 10710 10720 is not occupied
Result: 
Port 16668  10730  11742  10750  10760  10710  10720  is free

Check: Network(Remote Port Connectivity)
Requirement: 172.20.31.19:6667 18080 ,10.0.6.230:10311  need to be accessible
Result: 
The following server ports are inaccessible:
IP: 10.0.6.230, Ports: 10311 

Check: System Settings(Maximum Open Files Number)
Requirement: >= 65535
Result: 65535

Check: System Settings(Swap)
Requirement: disabled
Result: disabled.

5.2 示例 2:检查本机

health_check.sh/health_check.bat -o local

输出结果:

Check: Installation Environment(JDK)
Requirement: JDK Version >=1.8
Result: JDK Version 11.0.21

Check: Installation Environment(Memory)
Requirement: Allocate sufficient memory for IoTDB
Result: Total Memory 7.8Gi, 2.33 G allocated to IoTDB ConfigNode, 3.88 G allocated to IoTDB DataNode

Check: Installation Environment(Directory Access)
Requirement: IoTDB needs data/datanode/data,data/datanode/consensus,data/datanode/system,data/datanode/wal,data/confignode/system,data/confignode/consensus,ext/pipe,ext/udf,ext/trigger write permission.
Result: 
data/datanode/data has write permission
data/datanode/consensus has write permission
data/datanode/system has write permission
data/datanode/wal has write permission
data/confignode/system has write permission
data/confignode/consensus has write permission
ext/pipe has write permission
ext/udf has write permission
ext/trigger has write permission

Check: Network(Local Port)
Requirement: Port 16668 10730 11742 10750 10760 10710 10720 is not occupied
Result: 
Port 16668  10730  11742  10750  10760  10710  10720  is free

Check: System Settings(Maximum Open Files Number)
Requirement: >= 65535
Result: 65535

Check: System Settings(Swap)
Requirement: disabled
Result: disabled.

5.3 示例 3:检查远程

health_check.sh/health_check.bat -o remote -ips 172.20.31.19 6667 18080,10.0.6.230 10311

输出结果:

Check: Network(Remote Port Connectivity)
Requirement: 172.20.31.19:6667 18080 ,10.0.6.230:10311  need to be accessible
Result: 
The following server ports are inaccessible:
IP: 10.0.6.230, Ports: 10311

6. 常见问题

6.1 如何调整内存分配

  • 修改confignode-env.sh中的MEMORY_SIZE
  • 修改datanode-env.sh中的MEMORY_SIZE

6.2 如何修改最大打开数文件

  • 设置系统最大打开文件数为 65535,以避免出现 "太多的打开文件 "的错误。
#查看当前限制
ulimit -n
# 临时修改
ulimit -n 65535
# 永久修改
echo "* soft nofile 65535" >>  /etc/security/limits.conf
echo "* hard nofile 65535" >>  /etc/security/limits.conf
#退出当前终端会话后查看,预期显示65535
ulimit -n

6.3 如何禁用 Swap 及禁用原因

  • 禁用原因:IoTDB 使用 Swap 会导致性能下降,建议禁用。
  • 禁用方式:
echo "vm.swappiness = 0">> /etc/sysctl.conf
# 一起执行 swapoff -a 和 swapon -a 命令是为了将 swap 里的数据转储回内存,并清空 swap 里的数据。
# 不可省略 swappiness 设置而只执行 swapoff -a;否则,重启后 swap 会再次自动打开,使得操作失效。
swapoff -a && swapon -a
# 在不重启的情况下使配置生效。
sysctl -p
# 检查内存分配,预期 swap 为 0
free -m