| Build instructions for Tez |
| |
| For instructions on how to contribute to Tez, refer to: |
| https://cwiki.apache.org/confluence/display/TEZ |
| |
| ---------------------------------------------------------------------------------- |
| Requirements: |
| |
| * JDK 1.8+ |
| * Maven 3.6.3 or later |
| * Findbugs 2.0.2 or later (if running findbugs) |
| * ProtocolBuffer 3.21.1 |
| * Internet connection for first build (to fetch all dependencies) |
| * Hadoop version should be 2.7.0 or higher. |
| |
| ---------------------------------------------------------------------------------- |
| Maven main modules: |
| |
| tez................................(Main Tez project) |
| - tez-api .....................(Tez api) |
| - tez-common ..................(Tez common) |
| - tez-runtime-internals .......(Tez runtime internals) |
| - tez-runtime-library .........(Tez runtime library) |
| - tez-mapreduce ...............(Tez mapreduce) |
| - tez-dag .....................(Tez dag) |
| - tez-examples ................(Tez examples) |
| - tez-plugins .................(Tez plugins) |
| - tez-tests ...................(Tez tests and additional test examples) |
| - tez-dist ....................(Tez dist) |
| - tez-ui ......................(Tez web user interface) |
| |
| ---------------------------------------------------------------------------------- |
| Maven build goals: |
| |
| * Clean : mvn clean |
| * Compile : mvn compile |
| * Run tests : mvn test |
| * Create JAR : mvn package |
| * Run findbugs : mvn compile findbugs:findbugs |
| * Run checkstyle : mvn compile checkstyle:checkstyle |
| * Install JAR in M2 cache : mvn install |
| * Deploy JAR to Maven repo : mvn deploy |
| * Run clover : mvn test -Pclover [-Dclover.license=${user.home}/clover.license] |
| * Run Rat : mvn apache-rat:check |
| * Build javadocs : mvn javadoc:javadoc |
| * Build distribution : mvn package[-Dhadoop.version=2.7.0] |
| * Visualize state machines : mvn compile -Pvisualize -DskipTests=true |
| |
| Build options: |
| |
| * Use -Dpackage.format to create distributions with a format other than .tar.gz (mvn-assembly-plugin formats). |
| * Use -Dclover.license to specify the path to the clover license file |
| * Use -Dhadoop.version to specify the version of hadoop to build tez against |
| * Use -Dprotoc.path to specify the path to protoc |
| * Use -Dallow.root.build to root build tez-ui components |
| |
| Tests options: |
| |
| * Use -DskipTests to skip tests when running the following Maven goals: |
| 'package', 'install', 'deploy' or 'verify' |
| * -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,.... |
| * -Dtest.exclude=<TESTCLASSNAME> |
| * -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java |
| |
| ---------------------------------------------------------------------------------- |
| Building against a specific version of hadoop: |
| |
| Tez runs on top of Apache Hadoop YARN and requires hadoop version 2.7.0 or higher. |
| |
| By default, it can be compiled against other compatible hadoop versions by just |
| specifying the hadoop.version. For example, to build tez against hadoop 3.0.0-SNAPSHOT |
| |
| $ mvn package -Dhadoop.version=3.0.0-SNAPSHOT |
| |
| To skip Tests and java docs |
| |
| $ mvn package -Dhadoop.version=3.0.0-SNAPSHOT -DskipTests -Dmaven.javadoc.skip=true |
| |
| However, to build against hadoop versions higher than 2.7.0, you will need to do the |
| following: |
| |
| For Hadoop version X where X >= 2.8.0 |
| |
| $ mvn package -Dhadoop.version=${X} -Phadoop28 -P\!hadoop27 |
| |
| For recent versions of Hadoop (which do not bundle aws and azure by default), |
| you can bundle AWS-S3 (2.7.0+) or Azure (2.7.0+) support: |
| |
| $ mvn package -Dhadoop.version=${X} -Paws -Pazure |
| |
| Tez also has some shims to provide version-specific implementations for various APIs. |
| For more details, please refer to https://cwiki.apache.org/confluence/display/TEZ/HadoopShims |
| |
| ---------------------------------------------------------------------------------- |
| UI build issues: |
| |
| In case of issue with UI build, please clean the UI cache. |
| |
| $ mvn clean -PcleanUICache |
| |
| Issue with PhantomJS on building in PowerPC. |
| |
| Official PhantomJS binaries were not available for Power platform. Hence if the build fails in PPC |
| please try installing PhantomJS manually and rerun. Refer https://github.com/ibmsoe/phantomjs-1/blob/v2.1.1-ppc64/README.md |
| and install it globally for the build to work. |
| |
| ---------------------------------------------------------------------------------- |
| Skip UI build: |
| |
| In case you want to completely skip UI build, you can use 'noui' profile. |
| For instance, a full build without tests and tez-ui looks like: |
| |
| $ mvn clean install -DskipTests -Pnoui |
| |
| It's important to note that maven will still include tez-ui project, but all of the maven plugins are skipped. |
| |
| ---------------------------------------------------------------------------------- |
| Protocol Buffer compiler: |
| |
| The version of Protocol Buffer compiler, protoc, can be defined on-the-fly as: |
| $ mvn clean install -DskipTests -pl ./tez-api -Dprotobuf.version=3.7.1 |
| |
| The default version is defined in the root pom.xml. |
| |
| If you have multiple versions of protoc in your system, you can set in your |
| build shell the PROTOC_PATH environment variable to point to the one you |
| want to use for the Tez build. If you don't define this environment variable then the |
| embedded protoc compiler will be used with the version defined in ${protobuf.version}. |
| It detects the platform and executes the corresponding protoc binary at build time. |
| |
| You can also specify the path to protoc while building using -Dprotoc.path |
| |
| $ mvn package -DskipTests -Dprotoc.path=/usr/local/bin/protoc |
| |
| ---------------------------------------------------------------------------------- |
| Building the docs: |
| |
| The following commands will build a local copy of the Apache Tez website under docs |
| $ cd docs; mvn site |
| |
| ---------------------------------------------------------------------------------- |
| Building components separately: |
| |
| If you are building a submodule directory, all the Tez dependencies this |
| submodule has will be resolved as all other 3rd party dependencies. This is, |
| from the Maven cache or from a Maven repository (if not available in the cache |
| or the SNAPSHOT 'timed out'). |
| An alternative is to run 'mvn install -DskipTests' from Tez source top |
| level once; and then work from the submodule. Keep in mind that SNAPSHOTs |
| time out after a while, using the Maven '-nsu' will stop Maven from trying |
| to update SNAPSHOTs from external repos. |
| |
| ---------------------------------------------------------------------------------- |
| Visualize the State Machines used in Tez internals: |
| |
| Use -Pvisualize to generate a graphviz file named Tez.gv which can then be |
| converted into a state machine diagram that represents the state transitions of |
| the state machine for the classses provided. |
| |
| Optional parameters: |
| * -Dtez.dag.state.classes=<comma-separated list of classes> |
| - By default, all 4 state machines - DAG, Vertex, Task and TaskAttempt are generated. |
| * -Dtez.graphviz.title |
| - Title for the Graph ( Default is Tez ) |
| * -Dtez.graphviz.output.file |
| - Output file to be generated with the state machines ( Default is Tez.gv ) |
| |
| For example, to generate the state machine graphviz file for DAGImpl, run: |
| |
| $ mvn compile -Pvisualize -Dtez.dag.state.classes=org.apache.tez.dag.app.dag.impl.DAGImpl -DskipTests=true |
| |
| To generate the diagram, you can use a Graphviz application or something like: |
| |
| $ dot -Tpng -o Tez.png Tez.gv' |
| |
| ---------------------------------------------------------------------------------- |
| Building contrib tools under tez-tools : |
| |
| Use -Ptools to build various contrib tools present under tez-tools. For example, run: |
| |
| $ mvn package -Ptools |
| |
| ---------------------------------------------------------------------------------- |