blob: 317ae14869bef1feca2077e7d5348f452c86afcb [file] [log] [blame]
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Configuration Recipe for monitoring ZooKeeper using Nagios
----------------------------------------------------------
I will start by making the assumption that you already have an working Nagios install.
WARNING: I have wrote these instructions while installing and configuring the plugin on my desktop computer running Ubuntu 9.10. I've installed Nagios using apt-get.
WARNING: You should customize the config files as suggested in order to match your Nagios and Zookeeper install.
WARNING: This README assumes you know how to configure Nagios and how it works.
WARNING: You should customize the warning and critical levels on service checks to meet your own needs.
1. Install the plugin
$ cp check_zookeeper.py /usr/lib/nagios/plugins/
2. Install the new commands
$ cp zookeeper.cfg /etc/nagios-plugins/config
3. Update the list of servers in zookeeper.cfg for the command 'check_zookeeper' and update the port for the command 'check_zk_node' (default: 2181)
4. Create a virtual host in Nagios used for monitoring the cluster as a whole -OR- Create a hostgroup named 'zookeeper-servers' and add all the zookeeper cluster nodes.
5. Define service checks like I have ilustrated bellow or just use the provided definitions.
define service {
use generic-service
host_name zookeeper-cluster
service_description ...
check_command check_zookeeper!<exported-var>!<warning-level>!<critical-level>
}
define service {
hostgroup_name zookeeper-servers
use generic-service
service_description ZK_Open_File_Descriptors_Count
check_command check_zk_node!<exported-var>!<warning-level>!<critical-level>
}
Ex:
a. check the number of open file descriptors
define service{
use generic-service
host_name zookeeper-cluster
service_description ZK_Open_File_Descriptor_Count
check_command check_zookeeper!zk_open_file_descriptor_count!500!800
}
b. check the number of ephemerals nodes
define service {
use generic-service
host_name localhost
service_description ZK_Ephemerals_Count
check_command check_zookeeper!zk_ephemerals_count!10000!100000
}
c. check the number of open file descriptors for each host in the group
define service {
hostgroup_name zookeeper-servers
use generic-service
service_description ZK_Open_File_Descriptors_Count
check_command check_zk_node!zk_open_file_descriptor_count!500!800
}