Tez Local Mode is a development tool to test Tez jobs without needing to bring up a Hadoop cluster. Local Mode runs the Tez components AppMaster, TaskRunner that are used when executing a job on a cluster. From a developer tool perspective, it offers several advantages.
While majority of the components are reused in Local Mode, there are some bits which are not
Running a DAG in Local Mode
“tez.local.mode” should be set to true in the confgiuration instance used to create the TezClient.
The FileSystem must be configured to the local file system (“fs.default.name” must be set to “file:///”). This is required to be setup in all Configuration instances used to create a DAG. Typically, when using Tez for testing and prototyping without a Hadoop cluster, this is not a problem. It becomes a problem when Hadoop Configuration files are in the classpath, with a different default filesystem configured.
Setup the fetchers to make use of local reads instead of fetching from remote nodes. (“tez.runtime.optimize.local.fetch” must be set to true)
Beyond this, no other changes are required, to make use of Local Mode instead of running a job on a cluster.
If using this in code, the following changes should be made to configuration, after which this configuration instance becomes the base for all other Configuration instances.
Configuration conf = new Configuration(); conf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true); conf.set("fs.default.name", "file:///"); conf.setBoolean(TezRuntimeConfiguration.TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH, true);
If using a tez-site.xml config file, it should contain the following entries
<property> <name>fs.default.name</name> <value>file:///</value> </property> <property> <name>tez.local.mode</name> <value>true</value> </property> <property> <name>tez.runtime.optimize.local.fetch</name> <value>true</value> </property>
Things to watch out for
Potential pitfalls when moving from Local Mode to a real cluster
Local Mode with External Services