branch-0.6 - zeppelin - Git at Google

commit	b8f4fb90273dc1c6607eccd36c493590ad0486cb	[log] [tgz]
author	z0621 <zhouyf0621@dtdream.com>	Mon May 08 18:19:26 2017 +0800
committer	Lee moon soo <moon@apache.org>	Mon May 15 18:09:51 2017 -0700
tree	85115eb3b962fa584a3279b1653e5202697e67fd
parent	662b7d6846c7b47c777916259f55e3ec1ffaf8df [diff]

tree: 85115eb3b962fa584a3279b1653e5202697e67fd

README.md

#Zeppelin

Documentation: User Guide
Mailing Lists: User and Dev mailing list
Continuous Integration:
Contributing: Contribution Guide
Issue Tracker: Jira
License: Apache 2.0

Zeppelin, a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.

Core feature:

Web based notebook style editor.
Built-in Apache Spark support

To know more about Zeppelin, visit our web site http://zeppelin.apache.org

Requirements

Git
Java 1.7
Tested on Mac OSX, Ubuntu 14.X, CentOS 6.X, Windows 7 Pro SP1
Maven (if you want to build from the source code)
Node.js Package Manager (npm, downloaded by Maven during build phase)

Getting Started

Before Build

If you don't have requirements prepared, install it. (The installation method may vary according to your environment, example is for Ubuntu.)

sudo apt-get update
sudo apt-get install git
sudo apt-get install openjdk-7-jdk
sudo apt-get install npm
sudo apt-get install libfontconfig

Proxy settings (optional)

If you are behind a corporate Proxy with NTLM authentication you can use Cntlm Authentication Proxy .

Before build start, run these commands from shell.

export http_proxy=http://localhost:3128
export https_proxy=http://localhost:3128
export HTTP_PROXY=http://localhost:3128
export HTTPS_PROXY=http://localhost:3128
npm config set proxy http://localhost:3128
npm config set https-proxy http://localhost:3128
npm config set registry "http://registry.npmjs.org/"
npm config set strict-ssl false
npm cache clean
git config --global http.proxy http://localhost:3128
git config --global https.proxy http://localhost:3128
git config --global url."http://".insteadOf git://

After build is complete, run these commands to cleanup.

npm config rm proxy
npm config rm https-proxy
git config --global --unset http.proxy
git config --global --unset https.proxy
git config --global --unset url."http://".insteadOf

Notes:

If you are on Windows replace export with set to set env variables
Replace localhost:3128 with standard pattern http://user:pwd@host:port
Git configuration is needed because Bower use it for fetching from GitHub

Install maven

wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo tar -zxf apache-maven-3.3.3-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/local/bin/mvn

Notes:

Ensure node is installed by running node --version
Ensure maven is running version 3.1.x or higher with mvn -version
Configure maven to use more memory than usual by export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"

Build

If you want to build Zeppelin from the source, please first clone this repository, then:

mvn clean package -DskipTests [Options]

Each Interpreter requires different Options.

Spark Interpreter

To build with a specific Spark version, Hadoop version or specific features, define one or more of the following profiles and options:

`-Pspark-[version]`

Set spark major version

Available profiles are

-Pspark-2.0
-Pspark-1.6
-Pspark-1.5
-Pspark-1.4
-Pspark-1.3
-Pspark-1.2
-Pspark-1.1
-Pcassandra-spark-1.5
-Pcassandra-spark-1.4
-Pcassandra-spark-1.3
-Pcassandra-spark-1.2
-Pcassandra-spark-1.1

minor version can be adjusted by -Dspark.version=x.x.x

`-Phadoop-[version]`

set hadoop major version

Available profiles are

-Phadoop-0.23
-Phadoop-1
-Phadoop-2.2
-Phadoop-2.3
-Phadoop-2.4
-Phadoop-2.6

minor version can be adjusted by -Dhadoop.version=x.x.x

`-Pscala-[version] (optional)`

set scala version (default 2.10) Available profiles are

-Pscala-2.10
-Pscala-2.11

`-Pyarn` (optional)

enable YARN support for local mode

YARN for local mode is not supported for Spark v1.5.0 or higher. Set SPARK_HOME instead.

`-Ppyspark` (optional)

enable PySpark support for local mode.

`-Pr` (optional)

enable R support with SparkR integration.

`-Psparkr` (optional)

another R support with SparkR integration as well as local mode support.

`-Pvendor-repo` (optional)

enable 3rd party vendor repository (cloudera)

`-Pmapr[version]` (optional)

For the MapR Hadoop Distribution, these profiles will handle the Hadoop version. As MapR allows different versions of Spark to be installed, you should specify which version of Spark is installed on the cluster by adding a Spark profile (-Pspark-1.2, -Pspark-1.3, etc.) as needed. The correct Maven artifacts can be found for every version of MapR at http://doc.mapr.com

Available profiles are

-Pmapr3
-Pmapr40
-Pmapr41
-Pmapr50
-Pmapr51

Example

Here're some examples:

# build with spark-2.0, scala-2.11
./dev/change_scala_version.sh 2.11
mvn clean package -Pspark-2.0 -Phadoop-2.4 -Pyarn -Ppyspark -Psparkr -Pscala-2.11

# build with spark-1.6, scala-2.10
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark -Psparkr

# spark-cassandra integration
mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests

# with CDH
mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests

# with MapR
mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests

Ignite Interpreter

mvn clean package -Dignite.version=1.6.0 -DskipTests

Scalding Interpreter

mvn clean package -Pscalding -DskipTests

Configure

If you wish to configure Zeppelin option (like port number), configure the following files:

./conf/zeppelin-env.sh
./conf/zeppelin-site.xml

(You can copy ./conf/zeppelin-env.sh.template into ./conf/zeppelin-env.sh. Same for zeppelin-site.xml.)

Setting SPARK_HOME and HADOOP_HOME

Without SPARK_HOME and HADOOP_HOME, Zeppelin uses embedded Spark and Hadoop binaries that you have specified with mvn build option. If you want to use system provided Spark and Hadoop, export SPARK_HOME and HADOOP_HOME in zeppelin-env.sh. You can use any supported version of spark without rebuilding Zeppelin.

# ./conf/zeppelin-env.sh
export SPARK_HOME=...
export HADOOP_HOME=...

External cluster configuration

Mesos

# ./conf/zeppelin-env.sh
export MASTER=mesos://...
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.uri=/path/to/spark-*.tgz" or SPARK_HOME="/path/to/spark_home"
export MESOS_NATIVE_LIBRARY=/path/to/libmesos.so

If you set SPARK_HOME, you should deploy spark binary on the same location to all worker nodes. And if you set spark.executor.uri, every worker can read that file on its node.

Yarn

# ./conf/zeppelin-env.sh
export SPARK_HOME=/path/to/spark_dir

Run

./bin/zeppelin-daemon.sh start

And browse localhost:8080 in your browser.

For configuration details check ./conf subdirectory.

Building for Scala 2.11

To produce a Zeppelin package compiled with Scala 2.11, use the -Pscala-2.11 profile:

./dev/change_scala_version.sh 2.11
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark -Pscala-2.11 -DskipTests clean install

Package

To package the final distribution including the compressed archive, run:

mvn clean package -Pbuild-distr

To build a distribution with specific profiles, run:

mvn clean package -Pbuild-distr -Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark

The profiles -Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark can be adjusted if you wish to build to a specific spark versions, or omit support such as yarn.

The archive is generated under zeppelin-distribution/target directory

###Run end-to-end tests Zeppelin comes with a set of end-to-end acceptance tests driving headless selenium browser

# assumes zeppelin-server running on localhost:8080 (use -Durl=.. to override)
mvn verify

# or take care of starting/stoping zeppelin-server from packaged zeppelin-distribuion/target
mvn verify -P using-packaged-distr

Requirements

Getting Started

Before Build

Proxy settings (optional)

Install maven

Build

Spark Interpreter

-Pspark-[version]

-Phadoop-[version]

-Pscala-[version] (optional)

-Pyarn (optional)

-Ppyspark (optional)

-Pr (optional)

-Psparkr (optional)

-Pvendor-repo (optional)

-Pmapr[version] (optional)

Example

Ignite Interpreter

Scalding Interpreter

Configure

Setting SPARK_HOME and HADOOP_HOME

External cluster configuration

Run

Building for Scala 2.11

Package

`-Pspark-[version]`

`-Phadoop-[version]`

`-Pscala-[version] (optional)`

`-Pyarn` (optional)

`-Ppyspark` (optional)

`-Pr` (optional)

`-Psparkr` (optional)

`-Pvendor-repo` (optional)

`-Pmapr[version]` (optional)