{% include JB/setup %}
Zeppelin service runs on local server. zeppelin is able to run the interpreter in the docker container, Isolating the operating environment of the interpreter through the docker container. Zeppelin can be easily used without having to install python, spark, etc. on the local node.
Key benefits are
Because DockerInterpreterProcess
communicates via docker's tcp interface.
By default, docker provides an interface as a sock file, so you need to modify the configuration file to open the tcp interface remotely.
vi /etc/docker/daemon.json
, Add tcp://0.0.0.0:2375
to the hosts
configuration item.
{ ... "hosts": ["tcp://0.0.0.0:2375","unix:///var/run/docker.sock"] }
hosts
property reference: https://docs.docker.com/engine/reference/commandline/dockerd/
zeppelin-site.xml
.<property> <name>zeppelin.run.mode</name> <value>docker</value> <description>'auto|local|k8s|docker'</description> </property> <property> <name>zeppelin.docker.container.image</name> <value>apache/zeppelin</value> <description>Docker image for interpreters</description> </property>
Set to the same time zone as the zeppelin server, keeping the time zone in the interpreter docker container the same as the server. E.g, "America/New_York"
or "Asia/Shanghai"
export DOCKER_TIME_ZONE="America/New_York"
To build Zeppelin image, support Kerberos certification & install spark binary.
Use the /scripts/docker/interpreter/Dockerfile
to build the image.
FROM apache/zeppelin:0.8.0 MAINTAINER Apache Software Foundation <dev@zeppelin.apache.org> ENV SPARK_VERSION=2.3.3 ENV HADOOP_VERSION=2.7 # support Kerberos certification RUN export DEBIAN_FRONTEND=noninteractive && apt-get update && apt-get install -yq krb5-user libpam-krb5 && apt-get clean RUN apt-get update && apt-get install -y curl unzip wget grep sed vim tzdata && apt-get clean # auto upload zeppelin interpreter lib RUN rm -rf /zeppelin RUN rm -rf /spark RUN wget https://www-us.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz RUN tar zxvf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz RUN mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark RUN rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
Then build docker image.
# build image. Replace <tag>. $ docker build -t <tag> .
Zeppelin service runs on local server, it auto configure itself to use DockerInterpreterLauncher
.
DockerInterpreterLauncher
via DockerInterpreterProcess
launcher creates each interpreter in a container using docker image.
DockerInterpreterProcess
uploads the binaries and configuration files of the local zeppelin service to the container:
All file paths uploaded to the container, Keep the same path as the local one. This will ensure that all configurations are used correctly.
When interpreter group is spark
, Zeppelin sets necessary spark configuration automatically to use Spark on Docker. Supports all running modes of local[*]
, yarn-client
, and yarn-cluster
of zeppelin spark interpreter.
Because there are only spark binary files in the interpreter image, no spark conf files are included. The configuration file in the spark-<version>/conf/
local to the zeppelin service needs to be uploaded to the ``/spark/conf/directory in the spark interpreter container. So you need to setting
export SPARK_CONF_DIR=/spark--path/conf/in the
zeppelin-env.sh` file.
You can also configure it in the spark interpreter properties.
properties name | Value | Description |
---|---|---|
SPARK_CONF_DIR | /spark--path.../conf/ | Spark--path/conf/ path local on the zeppelin service |
Because there are only spark binary files in the interpreter image, no configuration files are included. The configuration file in the hadoop-<version>/etc/hadoop
local to the zeppelin service needs to be uploaded to the spark interpreter container. So you need to setting export HADOOP_CONF_DIR=hadoop-<version>-path/etc/hadoop
in the zeppelin-env.sh
file.
You can also configure it in the spark interpreter properties.
properties name | Value | Description |
---|---|---|
HADOOP_CONF_DIR | hadoop--path/etc/hadoop | hadoop--path/etc/hadoop path local on the zeppelin service |
Because the zeppelin interpreter container uses the host network, the spark.ui.port port is automatically allocated, so do not configure spark.ui.port=xxxx
in spark-defaults.conf
Instead of build Zeppelin distribution package and docker image everytime during development, Zeppelin can run locally (such as inside your IDE in debug mode) and able to run Interpreter using DockerInterpreterLauncher by configuring following environment variables.
Configuration variable | Value | Description |
---|---|---|
ZEPPELIN_RUN_MODE | docker | Make Zeppelin run interpreter on Docker |
ZEPPELIN_DOCKER_CONTAINER_IMAGE | : | Zeppelin interpreter docker image to use |