blob: cebcd1d905c3807690333224a9ec4bf5bc5036d1 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# 健康检查工具
## 1. 概述
IoTDB 健康检查工具是一个用于检测 IoTDB 节点运行环境的工具。它可以帮助用户在安装部署数据库前或运行期间检查节点的运行环境,并获取详细的检查结果。
## 2. 前置要求
Linux 系统
* `nc`(netcat)工具:默认已安装,用户需要有权限执行。
* `lsof` 或 `netstat`:至少安装其中一个,用户需要有权限执行。
> 检查相应工具是否已安装:
>
> 检查 `nc` 是否安装:`nc -h`
>
> 检查 `lsof` 是否安装:`lsof -v`
Windows 系统
* PowerShell:默认已启动。
## 3. 检查项
* 检查节点所在服务器的端口占用情况(windows/linux)
* 检查当前节点与集群中其他节点的端口连通性(windows/linux)
* 检查系统中是否安装了 JDK(java\_home)(windows/linux)
* 检查系统内存分配情况,检查 IoTDB 内存分配情况(windows/linux)
* 检查目录访问权限(windows/linux)
* 检查系统最大打开文件数是否满足要求(>= 65535)(仅 linux)
* 检查系统是否禁用了 swap(windows/linux)
## 4. 使用方法
### 4.1 命令格式
```Bash
health_check.sh/health_check.bat -ips<远程服务器IP+端口> , -o <all(default)/remote/local>
```
### 4.2 参数说明
| **参数** | **说明** | **是否必填** |
| ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------- |
| `-ips` | 远程服务器 IP 和端口,支持检查多个服务器,格式如下:`ip port1 port2,ip2 port2-1 port2-2` | 非必填 |
| `-o` | 检查参数,可选值为`local`(本机检查)、`remote`(远程服务器端口连接性检查)、`all`(本机和远程服务器端口一起检查),默认值为`all` | 非必填 |
## 5. 使用方法
### 5.1 示例 1:检查全部
```Bash
health_check.sh/health_check.bat -ips 172.20.31.19 6667 18080,10.0.6.230 10311
```
输出结果:
```Bash
Check: Installation Environment(JDK)
Requirement: JDK Version >=1.8
Result: JDK Version 11.0.21
Check: Installation Environment(Memory)
Requirement: Allocate sufficient memory for IoTDB
Result: Total Memory 7.8Gi, 2.33 G allocated to IoTDB ConfigNode, 3.88 G allocated to IoTDB DataNode
Check: Installation Environment(Directory Access)
Requirement: IoTDB needs data/datanode/data,data/datanode/consensus,data/datanode/system,data/datanode/wal,data/confignode/system,data/confignode/consensus,ext/pipe,ext/udf,ext/trigger write permission.
Result:
data/datanode/data has write permission
data/datanode/consensus has write permission
data/datanode/system has write permission
data/datanode/wal has write permission
data/confignode/system has write permission
data/confignode/consensus has write permission
ext/pipe has write permission
ext/udf has write permission
ext/trigger has write permission
Check: Network(Local Port)
Requirement: Port 16668 10730 11742 10750 10760 10710 10720 is not occupied
Result:
Port 16668 10730 11742 10750 10760 10710 10720 is free
Check: Network(Remote Port Connectivity)
Requirement: 172.20.31.19:6667 18080 ,10.0.6.230:10311 need to be accessible
Result:
The following server ports are inaccessible:
IP: 10.0.6.230, Ports: 10311
Check: System Settings(Maximum Open Files Number)
Requirement: >= 65535
Result: 65535
Check: System Settings(Swap)
Requirement: disabled
Result: disabled.
```
### 5.2 示例 2:检查本机
```Bash
health_check.sh/health_check.bat -o local
```
输出结果:
```Bash
Check: Installation Environment(JDK)
Requirement: JDK Version >=1.8
Result: JDK Version 11.0.21
Check: Installation Environment(Memory)
Requirement: Allocate sufficient memory for IoTDB
Result: Total Memory 7.8Gi, 2.33 G allocated to IoTDB ConfigNode, 3.88 G allocated to IoTDB DataNode
Check: Installation Environment(Directory Access)
Requirement: IoTDB needs data/datanode/data,data/datanode/consensus,data/datanode/system,data/datanode/wal,data/confignode/system,data/confignode/consensus,ext/pipe,ext/udf,ext/trigger write permission.
Result:
data/datanode/data has write permission
data/datanode/consensus has write permission
data/datanode/system has write permission
data/datanode/wal has write permission
data/confignode/system has write permission
data/confignode/consensus has write permission
ext/pipe has write permission
ext/udf has write permission
ext/trigger has write permission
Check: Network(Local Port)
Requirement: Port 16668 10730 11742 10750 10760 10710 10720 is not occupied
Result:
Port 16668 10730 11742 10750 10760 10710 10720 is free
Check: System Settings(Maximum Open Files Number)
Requirement: >= 65535
Result: 65535
Check: System Settings(Swap)
Requirement: disabled
Result: disabled.
```
### 5.3 示例 3:检查远程
```Bash
health_check.sh/health_check.bat -o remote -ips 172.20.31.19 6667 18080,10.0.6.230 10311
```
输出结果:
```Bash
Check: Network(Remote Port Connectivity)
Requirement: 172.20.31.19:6667 18080 ,10.0.6.230:10311 need to be accessible
Result:
The following server ports are inaccessible:
IP: 10.0.6.230, Ports: 10311
```
## 6. 常见问题
### 6.1 如何调整内存分配
* 修改`confignode-env.sh`中的MEMORY\_SIZE
* 修改`datanode-env.sh`中的MEMORY\_SIZE
### 6.2 如何修改最大打开数文件
* 设置系统最大打开文件数为 65535,以避免出现 "太多的打开文件 "的错误。
```Bash
#查看当前限制
ulimit -n
# 临时修改
ulimit -n 65535
# 永久修改
echo "* soft nofile 65535" >> /etc/security/limits.conf
echo "* hard nofile 65535" >> /etc/security/limits.conf
#退出当前终端会话后查看,预期显示65535
ulimit -n
```
### 6.3 如何禁用 Swap 及禁用原因
* 禁用原因:IoTDB 使用 Swap 会导致性能下降,建议禁用。
* 禁用方式:
```Bash
echo "vm.swappiness = 0">> /etc/sysctl.conf
# 一起执行 swapoff -a 和 swapon -a 命令是为了将 swap 里的数据转储回内存,并清空 swap 里的数据。
# 不可省略 swappiness 设置而只执行 swapoff -a;否则,重启后 swap 会再次自动打开,使得操作失效。
swapoff -a && swapon -a
# 在不重启的情况下使配置生效。
sysctl -p
# 检查内存分配,预期 swap 为 0
free -m
```