Deploy Apache Hadoop using either the 2.2.0 release or a compatible 2.x version.
Build tez using mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
Copy the relevant tez tarball into HDFS, and configure tezsite.xml
hadoop dfs -mkdir /apps/tez-0.5.0-SNAPSHOT hadoop dfs -copyFromLocal tez-dist/target/tez-0.5.0-SNAPSHOT-archive.tar.gz /apps/tez-0.5.0-SNAPSHOT/
set tez.lib.uris to "${fs.default.name}/apps/tez-0.5.0-SNAPSHOT/tez-0.5.0-SNAPSHOT.tar.gz"
Optional: If running existing MapReduce jobs on Tez. Modify mapred-site.xml to change “mapreduce.framework.name” property from its default value of “yarn” to “yarn-tez”
Configure the client node to include the tez-libraries in the hadoop classpath
tar -xvzf tez-dist/target/tez-0.5.0-SNAPSHOT.tar.gz -C $TEZ_JARS
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
Submit a MR job as you normally would using something like:
$HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1
This will use the TEZ DAG ApplicationMaster to run the MR job. This can be verified by looking at the AM’s logs from the YARN ResourceManager UI.
There is a basic example of using an MRR job in the tez-examples.jar. Refer to OrderedWordCount.java in the source code. To run this example:
$HADOOP_PREFIX/bin/hadoop jar tez-examples.jar orderedwordcount <input> <output>
This will use the TEZ DAG ApplicationMaster to run the ordered word count job. This job is similar to the word count example except that it also orders all words based on the frequency of occurrence.
Tez DAGs could be run separately as different applications or serially within a single TEZ session. There is a different variation of orderedwordcount in tez-tests that supports the use of Sessions and handling multiple input-output pairs. You can use it to run multiple DAGs serially on different inputs/outputs.
$HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount <input1> <output1> <input2> <output2> <input3> <output3> ...
The above will run multiple DAGs for each input-output pair.
To use TEZ sessions, set -DUSE_TEZ_SESSION=true
$HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount -DUSE_TEZ_SESSION=true <input1> <output1> <input2> <output2>