blob: 0355bb7a98bf3c52f5f47e42a4c78ba6f5c82935 [file] [log] [blame]
Flume Developer Notes
Jonathan Hsieh <>
// This is in asciidoc markup
== Introduction
This is meant to be a a guide for issues that occur when building,
debugging and setting up Flume as developer.
== High level directory and file structure.
Flume uses the Maven build system and has a Maven project object model
(pom) that has many components broken down into Maven modules. Below
we describe the contents of different directories.
./bin/ Flume startup scripts
./conf/ Flume configuration file samples
./contrib/flogger Flume logger: a Flume client implemented in C
./docs/man Flume man pages
./flume-config-web Flume master configuration servlet module
./flume-core Flume core module
./flume-distribution Flume distribution package module
./flume-docs Flume documentation generation module
./flume-log4j-appender Flume log4j-avro appender module
./flume-microbenchmarks Flume performance microbenchmark test suite
./flume-node-web Flume node status servlet module
./flume-windows-dist Flume node Windows distribution package module
./plugins/ Flume plugin modules (hello world skeleton and hbase)
./src/javaperf Flume performance tests (out of date)
./src/javatest-torture Flume reliability tests (out of date)
The files exclusions in `.gitignore` are either autogenerated by Maven or Eclipse.
== Building and Testing Flume
=== Prerequisites
There are several tools required to do a full build of Flume but only
the Thrift compiler is required for development and testing builds.
To build documentation, you will need to have asciidoc installed.
To build Windows installers, you will need to have makensis installed.
==== Building Thrift
The Thrift compiler is required to build Flume and currently does not
have a binary packages avaiblle for Linux based platforms. (Windows
is available in binary). There are several requirements necesary to
build it. Here's a link to the requirements
This page also contains links explaining how to install the
requirements for various platforms.
=== Using Maven
We are using Maven v2.x.x. The Maven build system steps through
several phases to create build artefacts. At the highest level, the
phases that are relevent to most devs are "compile" -> "test" ->
"package" -> "install".
There are several options and "profiles" available in the Flume build.
The default profile is a "dev" profile. Below we include a examples
of common build command lines to build different profiles.
A development build that runs unit tests and installs to local Maven
repo. This builds and tests all plugins, but excludes modules that
have aren't needed during development (eg. Windows installer,
mvn install
A development build that skips the execution of unit tests.
mvn install -DskipTests
A development build that runs unit tests. (no package generation)
mvn test
A development build that runs unit tests including only specific tests
(where <TestFile> is a regex of a class name without .java or .class
or path).
mvn test -Dtest=<ClassRegex>
Window node build, skipping unit tests (requires makensis).
NOTE: makensis is available on Linux and Mac OS X homebrew so this can
be built while running in these operating systems.
mvn install -Pwindows -DskipTests
Full build, skipping unit tests (requires asciidoc), and does not build Windows.
mvn install -Pfull-build -DskipTests
Full build, make both docs and Windows.
mvn install -Pfull-build,windows
==== Pointing the Maven build at the proper Thrift executable
Flume has, over time, upgraded to newer versions of Thrift. The Maven
build requires a pointer to the proper Thrift compiler.
If you install Thrift in a non-standard location (not
/usr/local/thrift/bin), you will need to provide the build some extra
information. This may be the case if you overrode the standard Thrift
install (+make install+ 's default target) or are running Thrift from
a home directory.
One way to provide this is via the Maven command line by setting the
thrift.executable variable (this assumes that we made different dirs
for different versions of Thrift):
mvn install -Dthrift.executable=/usr/local/thrift-0.6.0/bin/thrift
Another way to provide this information to your Maven build is to
modifiy your Maven profile by adding/modifiying your
~/.m2/settings.xml file and overriding the default thrift.executable
setting to point to your Thrift compiler executable. In the example
below, we install different versions of the Thrift compiler in
different directories and thus need to change the setting.
==== Including or excluding specific sets of tests.
We've added hooks to the maven build that will enable you to exclude
or include specific tests on a test run. This is useful for excluding
flakey tests or making a build that focuses solely upon flakey tests.
To do this we created two variables:
# test.include.pattern
# test.exclude.pattern
These variables take regular expression patterns of the files to be
included or excluded.
For the next set of examples, let's say you have flakey test called
TestFlaky1 and TestFlaky2.
You can execute tests that skip TestFlaky1 and TestFlaky2 by using the
following command line:
mvn test -Dtest.exclude.pattern=**/TestFlaky*.java
Alternately, you could be more explicit
mvn test -Dtest.exclude.pattern=**/,**/
Conversely, you could execute only the flaky tests by using:
mvn test -Dtest.include.pattern=**/TestFlaky*.java
You can also have a combination of imports and exports. This runs
TestFlaky* but skips over TestFlaky2:
mvn test -Dtest.include.pattern=**/TestFlaky*.java -Dtest.exclude.pattern=**/
NOTE: Both test.exclude.pattern and test.include.pattern get
overridden if the test parameter is used. Consider:
mvn test -Dtest.exclude.pattern=**/TestFlaky*.java -Dtest=TestFlaky1
In this case, TestFlaky1 will be run despite being in the
=== Running the most recent build
To run the most recent build of Flume, first build the distribuion
mvn install -DskipTests
You can then traverse into
This directory is setup exactly as the tarball installation of Flume
would be.
=== Running Performance Microbenchmarks.
The suite of source and sink microbenchmark tests (located in
./flume-microbenchmarks/javaperf) can be run by using `mvn test -Pperf`.
Just like with the normal test cases, you can use the
`-Dtest=<TestClass>`. So you can do:
mvn test -Pperf -Dtest=PerfThriftSinks
The logs should output lines that are formatted similarly to these
[junit] nullsink,ubuntu,begin,10998597,552872,disk_loaded,2895851957,301662152,receiver_started,156786445,305698624,sink_started,105303802,305704456,thrift sink to thrift source done,39520160510,320377056,MB/s,4.579940971898899,23094932,320379168
[junit] [ 0us, 547,544 b mem] Starting (after gc)
[junit] [ 10,998,597ns d 10,998,597ns 552,872 b mem] begin
[junit] [ 2,914,443,637ns d 2,895,851,957ns 301,662,152 b mem] disk_loaded
[junit] [ 3,514,297,391ns d 156,786,445ns 305,698,624 b mem] receiver_started
[junit] [ 4,082,661,503ns d 105,303,802ns 305,704,456 b mem] sink_started
[junit] [ 44,235,264,972ns d 39,520,160,510ns 320,377,056 b mem] thrift sink to thrift source done
[junit] [ 44,878,445,315ns d 23,094,932ns 320,379,168 b mem] MB/s,4.579940971898899
The first line is a summary of all the information in cvs format. The
other lines are in a tabular, more human-readable form. The left
column is cumulative time in ns and the middle is delta from previous
in ns. The last column of numbers the amount of memory in heap,
followed but some comments or labels.
=== Building on Windows platforms
Building Flume in Windows is possible. One can generate packages and
installer executable on Windows. This build assumes a cygwin
envrionment, but may not require it.
This build requires
* Maven for Windows
* makensis (for Windows installer build)
* java 1.6+
You should be able run the normal mvn commands.
The current Windows installer executable does not handle all error
handling situations and does not checks to see if not run as
=== Building documentation
Documentation for Flume is written in asciidoc. It relies on several
libraries to generate images.
* asciidoc v8.5.2
* graphviz (dot) v2.26.3
* xmlto
Documents can be built by running 'mvn -Pfull-build'
== Integrated Development Environments for Flume
Currently most Flume developers use the Eclipse IDE. We have included
some instructions for getting started with Eclipse.
=== Setting up a Flume Eclipse projects from the Maven POMs.
If you use Eclipse we suggest you use the m2eclipse plugin available
here to properly create an environment for dev and testing in Eclipse.
After installing it in Eclipse you will want to "Import" the Flume
pom.xml project.
This can be done by going to the Eclipse applications menu, navigating
to File > Import... > Existing Maven Projects. From there, browse to
and select the directory that contains the root of the Flume project.
The build requires the location of the Thrift compiler executable --
see the instructions about .m2/settings.xml files in the building
Flume section for more details.
The flume-core project will have errors -- these can be resolved by manually adding these dirs to you build source dirs:
* ./flume-core/target/generated-sources/antlr3
* ./flume-core/target/generated-sources/avro
* ./flume-core/target/generated-sources/thrift
* ./flume-core/target/generated-sources/version
== Debugging Flume
=== Flume's web applications
The default setup for Flume is to run its servlets from .WAR files
that include precompiled jsps.
On can have the node or master start specfic servlets .WARs, by
pointing the following properties in the system's flume-site.conf
file, like below.
Path where Flume master war lives. If a file it will load the
war, if a dir it will load all *.war in that dir.
Path where Flume node war lives. If a file it will load the
war, if a dir it will load all *.war in that dir.
// TODO document how to debug JSPs while in Eclipse
== Rules of the Repository
We have a few basic rules for code in the repository.
The master/trunk pointer:
* MUST always build.
* SHOULD always pass all unit tests
When commitng code we tag pushes with JIRA numbers, and their short descriptions.
Generally these are in the following format:
FLUME-42: Description from the jira
All source files must include the following header (or a variant
depending on comment characters):
* Licensed to Cloudera, Inc. under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Cloudera, Inc. licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* See the License for the specific language governing permissions and
* limitations under the License.
No build generated files should be checked in. Here are some examples
of generate files that should not be checked:
* html documentation
* thrift-generated source
* avro-generated source
* antlr generated source
* auto-generated versioning annotations