Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hive, Apache Pig etc.
At its heart, tez is very simple and has just two components:
The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every ‘task’ in tez has the following:
Input to consume key/value pairs from.
Processor to process them.
Output to collect the processed key/value pairs.
A master for the data-processing application, where-by one can put together arbitrary data-processing ‘tasks’ described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.
For instructions on how to contribute to Tez, refer to: Tez Wiki - How to Contribute
mvn cleanmvn compilemvn testmvn packagemvn compile spotbugs:spotbugsmvn compile checkstyle:checkstylemvn installmvn deploymvn test -Pjacocomvn apache-rat:checkmvn javadoc:javadocmvn package -Dhadoop.version=3.4.2mvn compile -Pvisualize -DskipTests=true-Dpackage.format to create distributions with a format other than .tar.gz (mvn-assembly-plugin formats).-Dhadoop.version to specify the version of Hadoop to build Tez against.-Dprotoc.path to specify the path to protoc.-Dallow.root.build to root build tez-ui components.Tez runs on top of Apache Hadoop YARN and requires Hadoop 3.x.
By default, it can be compiled against other compatible Hadoop versions by specifying hadoop.version:
mvn package -Dhadoop.version=3.4.2
For recent versions of Hadoop (which do not bundle AWS and Azure by default), you can bundle AWS-S3 or Azure support:
mvn package -Dhadoop.version=3.4.2 -Paws -Pazure
Tez also has shims to provide version-specific implementations for various APIs. For more details, refer to Hadoop Shims.
UI Build Issues
In case of issues with the UI build, please clean the UI cache:
mvn clean -PcleanUICache
Skip UI Build
To skip the UI build, use the noui profile:
mvn clean install -DskipTests -Pnoui
Maven will still include the tez-ui project, but all related plugins will be skipped.
Issue with PhantomJS on building in PowerPC
Official PhantomJS binaries were not available for the Power platform. If the build fails on PPC, try installing PhantomJS manually and rerun. Refer to PhantomJS README and install it globally.
The version of the Protocol Buffer compiler (protoc) can be defined on-the-fly:
mvn clean install -DskipTests -pl ./tez-api -Dprotobuf.version=3.25.5
The default version is defined in the root pom.xml.
If you have multiple versions of protoc, set the PROTOC_PATH environment variable to point to the desired binary. If not defined, the embedded protoc compiler corresponding to ${protobuf.version} will be used.
Alternatively, specify the path during the build:
mvn package -DskipTests -Dprotoc.path=/usr/local/bin/protoc
Build a local copy of the Apache Tez website:
mvn site -pl docs
If you are building a submodule directory, dependencies will be resolved from the Maven cache or remote repositories. Alternatively, run mvn install -DskipTests from the Tez top level once and then work from the submodule.
Use -Pvisualize to generate a Graphviz file (Tez.gv) representing state transitions:
mvn compile -Pvisualize -DskipTests=true
Optional parameters:
-Dtez.dag.state.classes=<comma-separated list of classes> (Default: DAG, Vertex, Task, TaskAttempt)-Dtez.graphviz.title (Default: Tez)-Dtez.graphviz.output.file (Default: tez-dag/target/Tez.gv)Example for DAGImpl:
mvn compile -Pvisualize \ -Dtez.dag.state.classes=org.apache.tez.dag.app.dag.impl.DAGImpl \ -DskipTests=true
Convert the .gv file to an image:
dot -Tpng -o Tez.png tez-dag/target/Tez.gv
Use -Ptools to build tools under tez-tools:
mvn package -Ptools