Hadoop Jmx Collector

These scripts help to collect Hadoop jmx and evently sent the metrics to stdout or Kafka. Tested with Python 2.7.

How to use it

Edit the configuration file (json file). For example:

   {
     "env": {
       "site": "sandbox"
     },
     "input": [
       {
         "component": "namenode",
         "host": "sandbox.hortonworks.com",
         "port": "50070",
         "https": false
       },
       {
         "component": "resourcemanager",
         "host": "sandbox.hortonworks.com",
         "port": "8088",
         "https": false
       }
     ],
     "filter": {
       "monitoring.group.selected": ["hadoop", "java.lang"]
     },
     "output": {
       "kafka": {
         "default_topic": "nn_jmx_metric_sandbox",
         "component_topic_mapping": {
           "namenode": "nn_jmx_metric_sandbox",
           "resourcemanager": "rm_jmx_metric_sandbox"
         },
         "broker_list": [
           "sandbox.hortonworks.com:6667"
         ]
       }
     }
   }

Run the scripts
for general use
python hadoop_jmx_kafka.py > 1.txt

Edit `eagle-collector.conf`

input (monitored hosts)
“port” defines the hadoop service port, such as 50070 => “namenode”, 60010 => “hbase master”.
filter
“monitoring.group.selected” can filter out beans which we care about.

output

if we left it empty, then the output is stdout by default.

  "output": {}

It also supports Kafka as its output.

  "output": {
    "kafka": {
      "topic": "test_topic",
      "broker_list": [ "sandbox.hortonworks.com:6667"]
    }
  }

Hadoop Jmx Collector

How to use it

for general use

Edit eagle-collector.conf

Edit `eagle-collector.conf`