Deploy Apache Hadoop using either the 2.2.0 release or a compatible 2.x version.
Build tez using mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true
mvn clean package -Dtar -DskipTests=true -Dmaven.javadoc.skip=true
Copy the tez jars and their dependencies into HDFS.
hadoop dfs -put tez-dist/target/tez-0.4.1-incubating/tez-0.4.1-incubating /apps/
Configure tez-site.xml to set tez.lib.uris to point to the paths in HDFS containing the jars. Please note that the paths are not searched recursively so for basedir and basedir/lib/, you will need to configure the 2 paths as a comma-separated list. * Assuming you followed step 3, the value would be: “${fs.default.name}/apps/tez-0.4.1-incubating,${fs.default.name}/apps/tez-0.4.1-incubating/lib/”
Modify mapred-site.xml to change mapreduce.framework.name property from its default value of yarn to yarn-tez
Set HADOOP_CLASSPATH to have the following paths in it:
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
Please note the “*” which is an important requirement when setting up classpaths for directories containing jar files.Submit a MR job as you normally would using something like:
$HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1
This will use the TEZ DAG ApplicationMaster to run the MR job. This can be verified by looking at the AM’s logs from the YARN ResourceManager UI.
There is a basic example of using an MRR job in the tez-mapreduce-examples.jar. Refer to OrderedWordCount.java in the source code. To run this example:
$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input> <output>
This will use the TEZ DAG ApplicationMaster to run the ordered word count job. This job is similar to the word count example except that it also orders all words based on the frequency of occurrence.
There are multiple variations to run orderedwordcount. You can use it to run multiple DAGs serially on different inputs/outputs. These DAGs could be run separately as different applications or serially within a single TEZ session.
$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input1> <output1> <input2> <output2> <input3> <output3> ...
The above will run multiple DAGs for each input-output pair.
To use TEZ sessions, set -DUSE_TEZ_SESSION=true
$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount -DUSE_TEZ_SESSION=true <input1> <output1> <input2> <output2>