| How to run MR on TEZ |
| ======================= |
| |
| Tez provides an ApplicationMaster that can run MR or MRR jobs. There is a |
| translation layer implemented ( may have bugs so please file JIRAs if you |
| come across any issues ) that allows a user to run an MR job against the TEZ |
| DAG ApplicationMaster. |
| |
| NOTE: Due to some quirks, all jobs will show an IOException and a failure at |
| the end. Please look at the YARN ResourceManager UI to check the correct result |
| of the job. |
| |
| Install/Deploy Instructions |
| =========================== |
| |
| 1) Deploy Apache Hadoop using the 3.0.0-SNAPSHOT from trunk. |
| - You can also use the 2.1.0 branch. For this, the only change will be to build |
| tez with the -Dhadoop.version set to the correct version matching the hadoop |
| branch being used. |
| 2) Copy the tez jars and their dependencies into HDFS. |
| 3) Configure tez-site.xml to set tez.lib.uris to point to the paths in HDFS containing |
| the jars. Please note that the paths are not searched recursively so for <basedir> |
| and <basedir>/lib/, you will need to configure the 2 paths as a comma-separated list. |
| 3) Modify mapred-site.xml to change "mapreduce.framework.name" property from its |
| default value of "yarn" to "yarn-tez" |
| 4) set HADOOP_CLASSPATH to have the following paths in it: |
| - TEZ_CONF_DIR - location of tez-site.xml |
| - TEZ_JARS and TEZ_JARS/libs - location of the tez jars and dependencies. |
| 5) Submit a MR job as you normally would using something like: |
| |
| $HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1 |
| |
| This will use the TEZ DAG ApplicationMaster to run the MR job. This can be |
| verified by looking at the AM's logs from the YARN ResourceManager UI. |
| |
| 6) There is a basic example of using an MRR job in the tez-mapreduce-examples.jar. Refer to OrderedWordCount.java |
| in the source code. To run this example: |
| |
| $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input> <output> |
| |
| This will use the TEZ DAG ApplicationMaster to run the ordered word count job. This job is similar |
| to the word count example except that it also orders all words based on the frequency of |
| occurrence. |