blob: 52451017f57c59115db230e64dcf93cad0ac2a3f [file] [log] [blame]
Build instructions for Tez
----------------------------------------------------------------------------------
Requirements:
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 2.0.2 or later (if running findbugs)
* ProtocolBuffer 2.5.0
* Internet connection for first build (to fetch all dependencies)
----------------------------------------------------------------------------------
Maven main modules:
tez................................(Main Tez project)
- tez-api .....................(Tez api)
- tez-common ..................(Tez common)
- tez-runtime-internals .......(Tez runtime internals)
- tez-runtime-library .........(Tez runtime library)
- tez-mapreduce ...............(Tez mapreduce)
- tez-dag .....................(Tez dag)
- tez-mapreduce-examples ......(Tez mapreduce examples)
- tez-tests ...................(Tez tests)
- tez-dist ....................(Tez dist)
----------------------------------------------------------------------------------
Maven build goals:
* Clean : mvn clean
* Compile : mvn compile
* Run tests : mvn test
* Create JAR : mvn package
* Run findbugs : mvn compile findbugs:findbugs
* Run checkstyle : mvn compile checkstyle:checkstyle
* Install JAR in M2 cache : mvn install
* Deploy JAR to Maven repo : mvn deploy
* Run clover : mvn test -Pclover [-Dclover.license=${user.home}/clover.license]
* Run Rat : mvn apache-rat:check
* Build javadocs : mvn javadoc:javadoc
* Build distribution : mvn package[-Dtar][-Dhadoop.version=2.2.0]
* Visualize state machines : mvn compile -Pvisualize -DskipTests=true
Build options:
* Use -Dtar to create a TAR with the distribution (tar.gz will be created under /tez-dist/target)
* Use -Dclover.license to specify the path to the clover license file
* Use -Dhadoop.version to specify the version of hadoop to build tez against
* Use -Dprotoc.path to specify the path to protoc
Tests options:
* Use -DskipTests to skip tests when running the following Maven goals:
'package', 'install', 'deploy' or 'verify'
* -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....
* -Dtest.exclude=<TESTCLASSNAME>
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
----------------------------------------------------------------------------------
Building against a specific version of hadoop:
Tez runs on top of Apache Hadoop YARN and requires hadoop version 2.2.0 or higher
For example to build tez against hadoop 3.0.0-SNAPSHOT
$ mvn package -Dtar -Dhadoop.version=3.0.0-SNAPSHOT
To skip Tests and java docs
$ mvn package -Dtar -Dhadoop.version=3.0.0-SNAPSHOT -DskipTests -Dmaven.javadoc.skip=true
----------------------------------------------------------------------------------
Protocol Buffer compiler:
The version of Protocol Buffer compiler, protoc, must be 2.5.0 and match the
version of the protobuf JAR.
If you have multiple versions of protoc in your system, you can set in your
build shell the PROTOC_PATH environment variable to point to the one you
want to use for the Tez build. If you don't define this environment variable,
protoc is looked up in the PATH.
You can also specify the path to protoc while building using -Dprotoc.path
$ mvn package -DskipTests -Dtar -Dprotoc.path=/usr/local/bin/protoc
----------------------------------------------------------------------------------
Building the docs:
The following commands will build a local copy of the Apache Tez website under docs
$ cd docs; mvn site
----------------------------------------------------------------------------------
Building components separately:
If you are building a submodule directory, all the Tez dependencies this
submodule has will be resolved as all other 3rd party dependencies. This is,
from the Maven cache or from a Maven repository (if not available in the cache
or the SNAPSHOT 'timed out').
An alternative is to run 'mvn install -DskipTests' from Tez source top
level once; and then work from the submodule. Keep in mind that SNAPSHOTs
time out after a while, using the Maven '-nsu' will stop Maven from trying
to update SNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Visualize the State Machines used in Tez internals:
Use -Pvisualize to generate a graphviz file named Tez.gv which can then be
converted into a state machine diagram that represents the state transitions of
the state machine for the classses provided.
Optional parameters:
* -Dtez.dag.state.classes=<comma-separated list of classes>
- By default, all 4 state machines - DAG, Vertex, Task and TaskAttempt are generated.
* -Dtez.graphviz.title
- Title for the Graph ( Default is Tez )
* -Dtez.graphviz.output.file
- Output file to be generated with the state machines ( Default is Tez.gv )
For example, to generate the state machine graphviz file for DAGImpl, run:
$ mvn compile -Pvisualize -Dtez.dag.state.classes=org.apache.tez.dag.app.dag.impl.DAGImpl -DskipTests=true
To generate the diagram, you can use a Graphviz application or something like:
$ dot -Tpng -o Tez.png Tez.gv'