| ~~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~~ contributor license agreements. See the NOTICE file distributed with |
| ~~ this work for additional information regarding copyright ownership. |
| ~~ The ASF licenses this file to You under the Apache License, Version 2.0 |
| ~~ (the "License"); you may not use this file except in compliance with |
| ~~ the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| ~~ |
| |
| Agent Configuration Guide |
| |
| In a normal Apache Chukwa installation, an <Agent> process runs on every |
| machine being monitored. This process is responsible for all the data collection |
| on that host. Data collection might mean periodically running a Unix command, |
| or tailing a file, or listening for incoming UDP packets. |
| |
| Each particular data source corresponds to a so-called <Adaptor>. |
| Adaptors are dynamically loadable modules that run inside the Agent process. |
| There is generally one Adaptor for each data source: for each file being |
| watched or for each Unix command being executed. Each adaptor has a unique name. |
| If you do not specify a name, one will be auto-generated by hashing the |
| Adaptor type and parameters. |
| |
| There are a number of Adaptors built into Apache Chukwa, and you can also develop |
| your own. Apache Chukwa will use them if you add them to Apache Chukwa library search |
| path (e.g., by putting them in a jarfile in <$CHUKWA_HOME/lib>.) |
| |
| Agent Control |
| |
| Once an Agent process is running, there are a number of commands that you can |
| use to inspect and control it. By default, Agents listen for incoming commands |
| on port 9093. Commands are case-insensitive |
| |
| *--------------------*-----------------------------------------*--------------: |
| | Command | Purpose | Options | |
| *--------------------*-----------------------------------------*--------------: |
| | <add> | Start an adaptor. | See below | |
| *--------------------*-----------------------------------------*--------------: |
| | <close> | Close socket connection to agent. | None | |
| *--------------------*-----------------------------------------*--------------: |
| | <help> | Display a list of available commands | None | |
| *--------------------*-----------------------------------------*--------------: |
| | <list> | List currently running adaptors | None | |
| *--------------------*-----------------------------------------*--------------: |
| | <reloadcollectors> | Re-read list of collectors (deprecated) | None | |
| *--------------------*-----------------------------------------*--------------: |
| | <stop> | Stop adaptor, abruptly | Adaptor name | |
| *--------------------*-----------------------------------------*--------------: |
| | <stopall> | Stop all adaptors, abruptly | Adaptor name | |
| *--------------------*-----------------------------------------*--------------: |
| | <shutdown> | Stop adaptor, gracefully | Adaptor name | |
| *--------------------*-----------------------------------------*--------------: |
| | <stopagent> | Stop agent process | None | |
| *--------------------*-----------------------------------------*--------------: |
| |
| |
| The add command is by far the most complex; it takes several mandatory and |
| optional parameters. The general form is as follows: |
| |
| --- |
| add [name =] <adaptor_class_name> <datatype> <adaptor specific params> <initial offset> |
| --- |
| |
| There are four mandatory fields: The word <add>, the class name for |
| the Adaptor, the datatype of the Adaptor's output, and the sequence number for |
| the first byte. There are two optional fields; the adaptor instance name, and |
| the adaptor parameters. |
| |
| The adaptor name, if specified, should go after the add command, and be |
| followed with an equals sign. It should be a string of printable characters, |
| without whitespace or '='. Apache Chukwa Adaptor names all start with "adaptor_". |
| If you specify an adaptor name which does not start with that prefix, it will |
| be added automatically. |
| |
| Adaptor parameters aren't required by Apache Chukwa agent, but each class of |
| adaptor may itself specify both mandatory and optional parameters. See below. |
| |
| Configuration options |
| |
| Apache Chukwa agents are configured via the file <conf/chukwa-agent-conf.xml.> |
| Apache Chukwa control port runs on port 9093 by default. |
| |
| --- |
| <property> |
| <name>chukwaAgent.control.port</name> |
| <value>9093</value> |
| <description>The socket port number the agent's control interface can be contacted at.</description> |
| </property> |
| --- |
| |
| Apache Chukwa agent working directory: |
| |
| --- |
| <property> |
| <name>chukwaAgent.checkpoint.dir</name> |
| <value>${CHUKWA_LOG_DIR}/</value> |
| <description>the location to put the agent's checkpoint file(s)</description> |
| </property> |
| --- |
| |
| Adaptors |
| |
| This section lists the standard adaptors, and the arguments they take. |
| |
| * <<FileAdaptor>> Pushes a whole file, as one Chunk, then exits. |
| Takes one mandatory parameter; the file to push. |
| |
| --- |
| add FileAdaptor FooData /tmp/foo 0 |
| --- |
| This pushes file </tmp/foo> as one chunk, with datatype <FooData>. |
| |
| * <<filetailer.LWFTAdaptor>> Repeatedly tails a file, treating the file as |
| a sequence of bytes, ignoring the content. Chunk boundaries are arbitrary. |
| This is useful for streaming binary data. Takes one mandatory parameter; |
| a path to the file to tail. If log file is rotated while there is unread |
| data, this adaptor will not attempt to recover it. |
| |
| --- |
| add filetailer.LWFTAdaptor BarData /foo/bar 0 |
| --- |
| This pushes </foo/bar> in a sequence of Chunks of type <BarData> |
| |
| * <<filetailer.FileTailingAdaptor>> Repeatedly tails a file, again |
| ignoring content and with unspecified Chunk boundaries. Takes one |
| mandatory parameter; a path to the file to tail. Keeps a |
| file handle open in order to detect log file rotation. |
| |
| --- |
| add filetailer.FileTailingAdaptor BarData /foo/bar 0 |
| --- |
| This pushes </foo/bar> in a sequence of Chunks of type <BarData> |
| |
| * <<filetailer.RCheckFTAdaptor>> |
| An experimental modification of the above, which avoids the need to |
| keep a file handle open. Same parameters and usage as the above. |
| |
| * <<filetailer.CharFileTailingAdaptorUTF8>> |
| The same as the base FileTailingAdaptor, except that chunks are |
| guaranteed to end only at carriage returns. |
| This is useful for most ASCII log file formats. |
| |
| * <<filetailer.CharFileTailingAdaptorUTF8NewLineEscaped>> |
| The same, except that chunks are guaranteed to end only at |
| non-escaped carriage returns. This is useful for pushing |
| Apache Chukwa-formatted log files, where exception |
| stack traces stay in a single chunk. |
| |
| * <<filetailer.FileTailingAdaptorPreserveLines>> |
| Similar to CharFileTailingAdaptorUTF8. The difference with the latter is |
| mainly seen in the Demux process: CharFileTailingAdaptorUTF8 will process |
| every line one by one whereas FileTailingAdaptorPreserveLines will process |
| all the lines of a same Chunk in a same go which makes the Demux jobs faster. |
| Same parameters and usage as the above. |
| |
| * <<DirTailingAdaptor>> Takes a directory path and an |
| adaptor name as mandatory parameters; repeatedly scans that directory |
| and all subdirectories, and starts the indicated adaptor running on |
| each file. Since the DirTailingAdaptor does not, itself, emit data, |
| the datatype parameter is applied to the newly-spawned adaptors. |
| Note that if you try this on a large directory with an adaptor that |
| keeps file handles open, it is possible to exceed your system's limit |
| on open files. |
| A file pattern can be specified as an optional second parameter. |
| |
| --- |
| add DirTailingAdaptor logs /var/log/ *.log filetailer.CharFileTailingAdaptorUTF8 0 |
| --- |
| |
| * <<ExecAdaptor>> Takes a frequency (in milliseconds) as optional |
| parameter, and then program name as mandatory parameter. Runs that program |
| repeatedly at a rate specified by frequency. |
| |
| --- |
| add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0 |
| --- |
| This adaptor will run <df> every minute, labeling output as Df. |
| |
| * <<UDPAdaptor>> Takes a port number as mandatory parameter. |
| Binds to the indicated UDP port, and emits one Chunk for each received packet. |
| |
| --- |
| add UdpAdaptor Packets 1234 0 |
| --- |
| This adaptor will listen for incoming traffic on port 1234, labeling output as Packets. |
| |
| * <<edu.berkeley.chukwa_xtrace.XtrAdaptor>> (available in <contrib>) |
| Takes an {{{http://www.x-trace.net/wiki/doku.php}Xtrace}} ReportSource |
| class name [without package] as mandatory argument, and no optional parameters. |
| Listens for incoming reports in the same way as that ReportSource would. |
| |
| --- |
| add edu.berkeley.chukwa_xtrace.XtrAdaptor Xtrace UdpReportSource 0 |
| --- |
| This adaptor will create and start a <UdpReportSource>, labeling its |
| output datatype as Xtrace. |
| |
| * <<sigar.SystemMetrics>> This adaptor collects CPU, disk, network |
| utilization as well as model and specifications of the machine, and |
| emits data as one Chunk periodically. |
| |
| --- |
| add sigar.SystemMetrics SystemMetrics 60 0 |
| --- |
| This adaptor will take snapshots of system state every minute, |
| labeling output as SystemMetrics. |
| |
| * <<SocketAdaptor>> This adaptor binds to a port and listen for Log4J |
| SocketAppender traffic. Each logging entry is converted to one |
| chunk. |
| |
| --- |
| add SocketAdaptor JobSummary 9098 0 |
| --- |
| This adaptor will bind to port 9098, and label output as JobSummary. |