The main goal of the Toree is to provide the foundation for interactive applications to connect to and use Apache Spark. This branch supports Apache Spark 1.6+. See master for Spark 2+ support.
Toree provides an interface that allows clients to interact with a Spark Cluster. Clients can send libraries and snippets of code that are interpreted and ran against a preconfigured Spark context. These snippets can do a variety of things:
The main supported language is
Scala, but it is also capable of processing both
R. It implements the latest Jupyter message protocol (5.0), so it can easily plug into the latest releases of Jupyter/IPython (3.2.x+ and 4.x+) for quick, interactive data exploration.
A version of Toree is deployed as part of the Try Jupyter! site. Select
Scala 2.10.4 (Spark 1.4.1) under the
New dropdown. Note that this version only supports
This project uses
make as the entry point for build, test, and packaging. It supports 2 modes, local and vagrant. The default is local and all command (i.e. sbt) will be ran locally on your machine. This means that you need to install
jupyter/ipython, and other development requirements locally on your machine. The 2nd mode uses Vagrant to simplify the development experience. In vagrant mode, all commands are sent to the vagrant box that has all necessary dependencies pre-installed. To run in vagrant mode, run
To build and interact with Toree using Jupyter, run
This will start a Jupyter notebook server. Depending on your mode, it will be accessible at
http://192.168.44.44:8888. From here you can create notebooks that use Toree configured for Spark local mode.
Tests can be run by doing
NOTE: Do not use
To build and package up Toree, you currently need
docker installed. Once you do, run
This results in 3 packages.
./dist/toree-<VERSION>-source-release.tar.gzis an archive containing the source used to build the binary
./dist/toree-<VERSION>-binary-release.tar.gzis a simple package that contains JAR and executable
pipinstallable package that adds Apache Toree as a Jupyter kernel
make release uses
docker. Please refer to
docker installation instructions for your system.
USE_VAGRANT is not supported by this
To play with the example notebooks, run
A notebook server will be launched in a
Docker container with Toree and some other dependencies installed. Refer to your
Docker setup for the ip address. The notebook will be at
PIP packages are hosted on Apache dist. Currently we have a developer preview of 0.1.0.
pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.1.0/snapshots/toree-0.1.0.dev8.tar.gz jupyter toree install
Refer to and open issue here
We are working on publishing binary releases of Toree soon. As part of our move into Apache Incubator, Toree will start a new version sequence starting at
Our goal is to keep
master up to date with the latest version of Spark. When new versions of Spark require specific code changes to Toree, we will branch out older Spark version support.
As it stands, we maintain several branches for legacy versions of Spark. The table below shows what is available now.
|Branch||Apache Spark Version|
Please note that for the most part, new features will mainly be added to the
Generate the source, binary, and pip distributables via
make release. Copy the contents to the subversion repository
https://dist.apache.org/repos/dist/dev/incubator/toree as a new release candidate for the specified version (e.g. 0.1.0).
Publish staging jars to be available on Apache via
GPG_PASSWORD=... make publish-jars. From there, you need to close the open repo to promote to staging. This closing is done via the UI here: https://repository.apache.org/#stagingRepositories
Create a vote thread similar to