blob: ff7ac512002ff7959029626eee1f64adcbd1b97f [file] [log] [blame]
Build instructions for Tez
For instructions on how to contribute to Tez, refer to:
* JDK 1.8+
* Maven 3.1 or later
* Findbugs 2.0.2 or later (if running findbugs)
* ProtocolBuffer 2.5.0
* Internet connection for first build (to fetch all dependencies)
* Hadoop version should be 2.7.0 or higher.
Maven main modules:
tez................................(Main Tez project)
- tez-api .....................(Tez api)
- tez-common ..................(Tez common)
- tez-runtime-internals .......(Tez runtime internals)
- tez-runtime-library .........(Tez runtime library)
- tez-mapreduce ...............(Tez mapreduce)
- tez-dag .....................(Tez dag)
- tez-examples ................(Tez examples)
- tez-plugins .................(Tez plugins)
- tez-tests ...................(Tez tests and additional test examples)
- tez-dist ....................(Tez dist)
- tez-ui ......................(Tez web user interface)
Maven build goals:
* Clean : mvn clean
* Compile : mvn compile
* Run tests : mvn test
* Create JAR : mvn package
* Run findbugs : mvn compile findbugs:findbugs
* Run checkstyle : mvn compile checkstyle:checkstyle
* Install JAR in M2 cache : mvn install
* Deploy JAR to Maven repo : mvn deploy
* Run clover : mvn test -Pclover [-Dclover.license=${user.home}/clover.license]
* Run Rat : mvn apache-rat:check
* Build javadocs : mvn javadoc:javadoc
* Build distribution : mvn package[-Dhadoop.version=2.7.0]
* Visualize state machines : mvn compile -Pvisualize -DskipTests=true
Build options:
* Use -Dpackage.format to create distributions with a format other than .tar.gz (mvn-assembly-plugin formats).
* Use -Dclover.license to specify the path to the clover license file
* Use -Dhadoop.version to specify the version of hadoop to build tez against
* Use -Dprotoc.path to specify the path to protoc
Tests options:
* Use -DskipTests to skip tests when running the following Maven goals:
'package', 'install', 'deploy' or 'verify'
* -Dtest.exclude=<TESTCLASSNAME>
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
Building against a specific version of hadoop:
Tez runs on top of Apache Hadoop YARN and requires hadoop version 2.7.0 or higher.
By default, it can be compiled against other compatible hadoop versions by just
specifying the hadoop.version. For example, to build tez against hadoop 3.0.0-SNAPSHOT
$ mvn package -Dhadoop.version=3.0.0-SNAPSHOT
To skip Tests and java docs
$ mvn package -Dhadoop.version=3.0.0-SNAPSHOT -DskipTests -Dmaven.javadoc.skip=true
However, to build against hadoop versions higher than 2.7.0, you will need to do the
For Hadoop version X where X >= 2.8.0
$ mvn package -Dhadoop.version=${X} -Phadoop28 -P\!hadoop27
For recent versions of Hadoop (which do not bundle aws and azure by default),
you can bundle AWS-S3 (2.7.0+) or Azure (2.7.0+) support:
$ mvn package -Dhadoop.version=${X} -Paws -Pazure
Tez also has some shims to provide version-specific implementations for various APIs.
For more details, please refer to
UI build issues:
In case of issue with UI build, please clean the UI cache.
$ mvn clean -PcleanUICache
Issue with PhantomJS on building in PowerPC.
Official PhantomJS binaries were not available for Power platform. Hence if the build fails in PPC
please try installing PhantomJS manually and rerun. Refer
and install it globally for the build to work.
Protocol Buffer compiler:
The version of Protocol Buffer compiler, protoc, must be 2.5.0 and match the
version of the protobuf JAR.
If you have multiple versions of protoc in your system, you can set in your
build shell the PROTOC_PATH environment variable to point to the one you
want to use for the Tez build. If you don't define this environment variable,
protoc is looked up in the PATH.
You can also specify the path to protoc while building using -Dprotoc.path
$ mvn package -DskipTests -Dprotoc.path=/usr/local/bin/protoc
Building the docs:
The following commands will build a local copy of the Apache Tez website under docs
$ cd docs; mvn site
Building components separately:
If you are building a submodule directory, all the Tez dependencies this
submodule has will be resolved as all other 3rd party dependencies. This is,
from the Maven cache or from a Maven repository (if not available in the cache
or the SNAPSHOT 'timed out').
An alternative is to run 'mvn install -DskipTests' from Tez source top
level once; and then work from the submodule. Keep in mind that SNAPSHOTs
time out after a while, using the Maven '-nsu' will stop Maven from trying
to update SNAPSHOTs from external repos.
Visualize the State Machines used in Tez internals:
Use -Pvisualize to generate a graphviz file named Tez.gv which can then be
converted into a state machine diagram that represents the state transitions of
the state machine for the classses provided.
Optional parameters:
* -Dtez.dag.state.classes=<comma-separated list of classes>
- By default, all 4 state machines - DAG, Vertex, Task and TaskAttempt are generated.
* -Dtez.graphviz.title
- Title for the Graph ( Default is Tez )
* -Dtez.graphviz.output.file
- Output file to be generated with the state machines ( Default is Tez.gv )
For example, to generate the state machine graphviz file for DAGImpl, run:
$ mvn compile -Pvisualize -DskipTests=true
To generate the diagram, you can use a Graphviz application or something like:
$ dot -Tpng -o Tez.png Tez.gv'
Building contrib tools under tez-tools :
Use -Ptools to build various contrib tools present under tez-tools. For example, run:
$ mvn package -Ptools