| ~~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~~ contributor license agreements. See the NOTICE file distributed with |
| ~~ this work for additional information regarding copyright ownership. |
| ~~ The ASF licenses this file to You under the Apache License, Version 2.0 |
| ~~ (the "License"); you may not use this file except in compliance with |
| ~~ the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| ~~ |
| |
| Apache Chukwa Programmers Guide |
| |
| At the core of Apache Chukwa is a flexible system for collecting and processing |
| monitoring data, particularly log files. This document describes how to use the |
| collected data. (For an overview of Apache Chukwa data model and collection |
| pipeline, see the {{{./design.html}Design Guide}}.) |
| |
| In particular, this document discusses the Apache Chukwa archive file formats, the |
| demux and archiving mapreduce jobs, and the layout of Apache Chukwa storage directories. |
| |
| Agent REST API |
| |
| Apache Chukwa Agent offers programmable API to control Agent adaptors for collecting data from |
| remote sources, or setup a listening port for incoming data stream. Usage guide and |
| examples are documented in {{{./apidocs/agent-rest.html} Agent REST API doc}}. |
| |
| Demux |
| |
| A key use for Apache Chukwa is processing arriving data, in parallel, using Apache Chukwa Demux. |
| The most common way to do this is using Apache Chukwa demux framework. |
| As {{{./design.html}data flows through Chukwa}}, the demux parsers are often the |
| first user defined function to process data. |
| |
| By default, Apache Chukwa will use the default TsProcessor. This parser will try to |
| extract the real log statement from the log entry using the ISO8601 date |
| format. If it fails, it will use the time at which the chunk was written to |
| disk (agent timestamp). |
| |
| * Demux Data To HBase |
| |
| Demux parsers can be configured to run in <${CHUKWA_HOME}/etc/chukwa/chukwa-demux-conf.xml>. See |
| {{{./pipeline.html} Pipeline configuration guide}}. HBaseWriter is not a |
| real map reduce job. It is designed to reuse Demux parsers for extraction and transformation purpose. |
| There are some limitations to consider before implementing |
| Demux parser for loading data to HBase. In MapReduce job, mutliple values can be merged and |
| group into a key/value pair in shuffle/combine and merge phases. This kind of aggregation is |
| unsupported by Demux in HBaseWriter because the data are not merged in memory, but send to HBase. |
| HBase takes the role of merging values into a record by primary key. Therefore, Demux |
| reducer parser is not invoked by HBaseWriter. |
| |
| For writing a demux parser that works with HBaseWriter, there are two piece information to |
| encode to Demux parser. First, HBase table name to store the data. This is encoded in Demux |
| parser by annotation. Second, the column family name to store the data is encoded in the |
| ReducerType of the Demux Reducer parser. |
| |
| ** Example of Demux mapper parser |
| |
| --- |
| @Tables(annotations={ |
| @Table(name="SystemMetrics",columnFamily="cpu) |
| }) |
| public class SystemMetrics extends AbstractProcessor { |
| @Override |
| protected void parse(String recordEntry, |
| OutputCollector<ChukwaRecordKey, ChukwaRecord> output, Reporter reporter) |
| throws Throwable { |
| ... |
| buildGenericRecord(record, null, cal.getTimeInMillis(), "cpu"); |
| output.collect(key, record); |
| } |
| } |
| --- |
| |
| In this example, the data collected by SystemMetrics parser is stored into <"SystemMetrics"> |
| HBase table, and column family is stored to <"cpu"> column family. |
| |
| HICC REST API |
| |
| HICC visualization API offers simple API to compose dashboard, and charting widgets. |
| Data visualization API offers features for end user to interact with data in the |
| final product format. They are designed to display and summarize data for human |
| interaction. HICC usage guide and examples are documented in |
| {{{./apidocs/hicc-rest.html} HICC REST API doc}}. |
| |