blob: b0d1fee59fcd0edb83ac21f5aaa37b2f022857e3 [file] [log] [blame]
~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements. See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License. You may obtain a copy of the License at
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
Chukwa Programmers Guide
At the core of Chukwa is a flexible system for collecting and processing
monitoring data, particularly log files. This document describes how to use the
collected data. (For an overview of the Chukwa data model and collection
pipeline, see the {{{./design.html}Design Guide}}.)
In particular, this document discusses the Chukwa archive file formats, the
demux and archiving mapreduce jobs, and the layout of the Chukwa storage directories.
Chukwa Agent offers programmable API to control Agent adaptors for collecting data from
remote sources, or setup a listening port for incoming data stream. Usage guide and
examples are documented in {{{./apidocs/agent-rest.html} Agent REST API doc}}.
A key use for Chukwa is processing arriving data, in parallel, using Chukwa Demux.
The most common way to do this is using the Chukwa demux framework.
As {{{./design.html}data flows through Chukwa}}, the demux parsers are often the
first user defined function to process data.
By default, Chukwa will use the default TsProcessor. This parser will try to
extract the real log statement from the log entry using the ISO8601 date
format. If it fails, it will use the time at which the chunk was written to
disk (agent timestamp).
* Demux Data To HBase
Demux parsers can be configured to run in <${CHUKWA_HOME}/etc/chukwa/chukwa-demux-conf.xml>. See
{{{./pipeline.html} Pipeline configuration guide}}. HBaseWriter is not a
real map reduce job. It is designed to reuse Demux parsers for extraction and transformation purpose.
There are some limitations to consider before implementing
Demux parser for loading data to HBase. In MapReduce job, mutliple values can be merged and
group into a key/value pair in shuffle/combine and merge phases. This kind of aggregation is
unsupported by Demux in HBaseWriter because the data are not merged in memory, but send to HBase.
HBase takes the role of merging values into a record by primary key. Therefore, Demux
reducer parser is not invoked by HBaseWriter.
For writing a demux parser that works with HBaseWriter, there are two piece information to
encode to Demux parser. First, HBase table name to store the data. This is encoded in Demux
parser by annotation. Second, the column family name to store the data is encoded in the
ReducerType of the Demux Reducer parser.
** Example of Demux mapper parser
public class SystemMetrics extends AbstractProcessor {
protected void parse(String recordEntry,
OutputCollector<ChukwaRecordKey, ChukwaRecord> output, Reporter reporter)
throws Throwable {
buildGenericRecord(record, null, cal.getTimeInMillis(), "cpu");
output.collect(key, record);
In this example, the data collected by SystemMetrics parser is stored into <"SystemMetrics">
HBase table, and column family is stored to <"cpu"> column family.
HICC visualization API offers simple API to compose dashboard, and charting widgets.
Data visualization API offers features for end user to interact with data in the
final product format. They are designed to display and summarize data for human
interaction. HICC usage guide and examples are documented in
{{{./apidocs/hicc-rest.html} HICC REST API doc}}.