layout: page title: “Zeppelin on Yarn” description: “Apache Zeppelin supports to run interpreter process in yarn containers” group: usage/interpreter

{% include JB/setup %}

Zeppelin on Yarn

Zeppelin on yarn means to run interpreter process in yarn container. The key benefit is the scalability, you won't run out of memory of the zeppelin server host if you run large amount of interpreter processes.

Prerequisites

The following is required for yarn interpreter mode.

  • Hadoop client (both 2.x and 3.x are supported) is installed.
  • $HADOOP_HOME/bin is put in PATH. Because internally zeppelin will run command hadoop classpath to get all the hadoop jars and put them in the classpath of Zeppelin.
  • Set USE_HADOOP as true in zeppelin-env.sh.

Configuration

Yarn interpreter mode needs to be set for each interpreter. You can set zeppelin.interpreter.launcher to be yarn to run it in yarn mode. Besides that, you can also specify other properties as following table.

Differences with non-yarn interpreter mode (local mode)

There're several differences between yarn interpreter mode with non-yarn interpreter mode (local mode)

  • New yarn app will be allocated for the interpreter process.
  • Any local path setting won't work in yarn interpreter process. E.g. if you run python interpreter in yarn interpreter mode, then you need to make sure the python executable of zeppelin.python exist in all the nodes of yarn cluster. Because the python interpreter may launch in any node.
  • Don‘t use it for spark interpreter. Instead use spark’s built-in yarn-client or yarn-cluster which is more suitable for spark interpreter.