blob: b87414514c4d510bdc85f7f88888c895fc87e2b0 [file] [log] [blame]
~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements. See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License. You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
~~
Agent Configuration Guide
In a normal Apache Chukwa installation, an <Agent> process runs on every
machine being monitored. This process is responsible for all the data collection
on that host. Data collection might mean periodically running a Unix command,
or tailing a file, or listening for incoming UDP packets.
Each particular data source corresponds to a so-called <Adaptor>.
Adaptors are dynamically loadable modules that run inside the Agent process.
There is generally one Adaptor for each data source: for each file being
watched or for each Unix command being executed. Each adaptor has a unique name.
If you do not specify a name, one will be auto-generated by hashing the
Adaptor type and parameters.
There are a number of Adaptors built into Apache Chukwa, and you can also develop
your own. Apache Chukwa will use them if you add them to Apache Chukwa library search
path (e.g., by putting them in a jarfile in <$CHUKWA_HOME/lib>.)
Agent Control
Once an Agent process is running, there are a number of commands that you can
use to inspect and control it. By default, Agents listen for incoming commands
on port 9093. Commands are case-insensitive
*--------------------*-----------------------------------------*--------------:
| Command | Purpose | Options |
*--------------------*-----------------------------------------*--------------:
| <add> | Start an adaptor. | See below |
*--------------------*-----------------------------------------*--------------:
| <close> | Close socket connection to agent. | None |
*--------------------*-----------------------------------------*--------------:
| <help> | Display a list of available commands | None |
*--------------------*-----------------------------------------*--------------:
| <list> | List currently running adaptors | None |
*--------------------*-----------------------------------------*--------------:
| <reloadcollectors> | Re-read list of collectors (deprecated) | None |
*--------------------*-----------------------------------------*--------------:
| <stop> | Stop adaptor, abruptly | Adaptor name |
*--------------------*-----------------------------------------*--------------:
| <stopall> | Stop all adaptors, abruptly | Adaptor name |
*--------------------*-----------------------------------------*--------------:
| <shutdown> | Stop adaptor, gracefully | Adaptor name |
*--------------------*-----------------------------------------*--------------:
| <stopagent> | Stop agent process | None |
*--------------------*-----------------------------------------*--------------:
The add command is by far the most complex; it takes several mandatory and
optional parameters. The general form is as follows:
---
add [name =] <adaptor_class_name> <datatype> <adaptor specific params> <initial offset>
---
There are four mandatory fields: The word <add>, the class name for
the Adaptor, the datatype of the Adaptor's output, and the sequence number for
the first byte. There are two optional fields; the adaptor instance name, and
the adaptor parameters.
The adaptor name, if specified, should go after the add command, and be
followed with an equals sign. It should be a string of printable characters,
without whitespace or '='. Apache Chukwa Adaptor names all start with "adaptor_".
If you specify an adaptor name which does not start with that prefix, it will
be added automatically.
Adaptor parameters aren't required by Apache Chukwa agent, but each class of
adaptor may itself specify both mandatory and optional parameters. See below.
Configuration options
Apache Chukwa agents are configured via the file <conf/chukwa-agent-conf.xml.>
Apache Chukwa control port runs on port 9093 by default.
---
<property>
<name>chukwaAgent.control.port</name>
<value>9093</value>
<description>The socket port number the agent's control interface can be contacted at.</description>
</property>
---
Apache Chukwa agent working directory:
---
<property>
<name>chukwaAgent.checkpoint.dir</name>
<value>${CHUKWA_LOG_DIR}/</value>
<description>the location to put the agent's checkpoint file(s)</description>
</property>
---
Adaptors
This section lists the standard adaptors, and the arguments they take.
* <<FileAdaptor>> Pushes a whole file, as one Chunk, then exits.
Takes one mandatory parameter; the file to push.
---
add FileAdaptor FooData /tmp/foo 0
---
This pushes file </tmp/foo> as one chunk, with datatype <FooData>.
* <<filetailer.LWFTAdaptor>> Repeatedly tails a file, treating the file as
a sequence of bytes, ignoring the content. Chunk boundaries are arbitrary.
This is useful for streaming binary data. Takes one mandatory parameter;
a path to the file to tail. If log file is rotated while there is unread
data, this adaptor will not attempt to recover it.
---
add filetailer.LWFTAdaptor BarData /foo/bar 0
---
This pushes </foo/bar> in a sequence of Chunks of type <BarData>
* <<filetailer.FileTailingAdaptor>> Repeatedly tails a file, again
ignoring content and with unspecified Chunk boundaries. Takes one
mandatory parameter; a path to the file to tail. Keeps a
file handle open in order to detect log file rotation.
---
add filetailer.FileTailingAdaptor BarData /foo/bar 0
---
This pushes </foo/bar> in a sequence of Chunks of type <BarData>
* <<filetailer.RCheckFTAdaptor>>
An experimental modification of the above, which avoids the need to
keep a file handle open. Same parameters and usage as the above.
* <<filetailer.CharFileTailingAdaptorUTF8>>
The same as the base FileTailingAdaptor, except that chunks are
guaranteed to end only at carriage returns.
This is useful for most ASCII log file formats.
* <<filetailer.CharFileTailingAdaptorUTF8NewLineEscaped>>
The same, except that chunks are guaranteed to end only at
non-escaped carriage returns. This is useful for pushing
Apache Chukwa-formatted log files, where exception
stack traces stay in a single chunk.
* <<filetailer.FileTailingAdaptorPreserveLines>>
Similar to CharFileTailingAdaptorUTF8. The difference with the latter is
mainly seen in the Demux process: CharFileTailingAdaptorUTF8 will process
every line one by one whereas FileTailingAdaptorPreserveLines will process
all the lines of a same Chunk in a same go which makes the Demux jobs faster.
Same parameters and usage as the above.
* <<DirTailingAdaptor>> Takes a directory path and an
adaptor name as mandatory parameters; repeatedly scans that directory
and all subdirectories, and starts the indicated adaptor running on
each file. Since the DirTailingAdaptor does not, itself, emit data,
the datatype parameter is applied to the newly-spawned adaptors.
Note that if you try this on a large directory with an adaptor that
keeps file handles open, it is possible to exceed your system's limit
on open files.
A file pattern can be specified as an optional second parameter.
---
add DirTailingAdaptor logs /var/log/ *.log filetailer.CharFileTailingAdaptorUTF8 0
---
* <<ExecAdaptor>> Takes a frequency (in milliseconds) as optional
parameter, and then program name as mandatory parameter. Runs that program
repeatedly at a rate specified by frequency.
---
add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0
---
This adaptor will run <df> every minute, labeling output as Df.
* <<UDPAdaptor>> Takes a port number as mandatory parameter.
Binds to the indicated UDP port, and emits one Chunk for each received packet.
---
add UdpAdaptor Packets 1234 0
---
This adaptor will listen for incoming traffic on port 1234, labeling output as Packets.
* <<edu.berkeley.chukwa_xtrace.XtrAdaptor>> (available in <contrib>)
Takes an {{{http://www.x-trace.net/wiki/doku.php}Xtrace}} ReportSource
class name [without package] as mandatory argument, and no optional parameters.
Listens for incoming reports in the same way as that ReportSource would.
---
add edu.berkeley.chukwa_xtrace.XtrAdaptor Xtrace UdpReportSource 0
---
This adaptor will create and start a <UdpReportSource>, labeling its
output datatype as Xtrace.
* <<sigar.SystemMetrics>> This adaptor collects CPU, disk, network
utilization as well as model and specifications of the machine, and
emits data as one Chunk periodically.
---
add sigar.SystemMetrics SystemMetrics 60 0
---
This adaptor will take snapshots of system state every minute,
labeling output as SystemMetrics.
* <<SocketAdaptor>> This adaptor binds to a port and listen for Log4J
SocketAppender traffic. Each logging entry is converted to one
chunk.
---
add SocketAdaptor JobSummary 9098 0
---
This adaptor will bind to port 9098, and label output as JobSummary.