{% include JB/setup %}
Apache Zeppelin has a pluggable notebook storage mechanism controlled by zeppelin.notebook.storage
configuration option with multiple implementations. There are few notebook storage systems available for a use out of the box:
GitNotebookRepo
VFSNotebookRepo
FileSystemNotebookRepo
S3NotebookRepo
AzureNotebookRepo
GCSNotebookRepo
OSSNotebookRepo
MongoNotebookRepo
GitHubNotebookRepo
Multiple storage systems can be used at the same time by providing a comma-separated list of the class-names in the configuration. By default, only first two of them will be automatically kept in sync by Zeppelin.
To enable versioning for all your local notebooks though a standard Git repository - uncomment the next property in zeppelin-site.xml
in order to use GitNotebookRepo class:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
Notes may be stored in hadoop compatible file system such as hdfs, so that multiple Zeppelin instances can share the same notes. It supports all the versions of hadoop 2.x. If you use FileSystemNotebookRepo
, then zeppelin.notebook.dir
is the path on the hadoop compatible file system. And you need to specify HADOOP_CONF_DIR
in zeppelin-env.sh
so that zeppelin can find the right hadoop configuration files. If your hadoop cluster is kerberized, then you need to specify zeppelin.server.kerberos.keytab
and zeppelin.server.kerberos.principal
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.FileSystemNotebookRepo</value> <description>hadoop compatible file system notebook persistence layer implementation</description> </property>
Notebooks may be stored in S3, and optionally encrypted. The DefaultAWSCredentialsProviderChain
credentials provider is used for credentials and checks the following:
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variablesaws.accessKeyId
and aws.secretKey
Java System properties~/.aws/credentials
) used by the AWS CLIs3://bucket_name/username/notebook-id/
Configure by setting environment variables in the file zeppelin-env.sh:
export ZEPPELIN_NOTEBOOK_S3_BUCKET=bucket_name export ZEPPELIN_NOTEBOOK_S3_USER=username
Or using the file zeppelin-site.xml uncomment and complete the S3 settings:
<property> <name>zeppelin.notebook.s3.bucket</name> <value>bucket_name</value> <description>bucket name for notebook storage</description> </property> <property> <name>zeppelin.notebook.s3.user</name> <value>username</value> <description>user name for s3 folder structure</description> </property>
Uncomment the next property for use S3NotebookRepo class:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.S3NotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
Comment out the next property to disable local git notebook storage (the default):
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value> <description>versioned notebook persistence layer implementation</description> </property>
To use an AWS KMS encryption key to encrypt notebooks, set the following environment variable in the file zeppelin-env.sh:
export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID=kms-key-id
Or using the following setting in zeppelin-site.xml:
<property> <name>zeppelin.notebook.s3.kmsKeyID</name> <value>AWS-KMS-Key-UUID</value> <description>AWS KMS key ID used to encrypt notebook data in S3</description> </property>
In order to set custom KMS key region, set the following environment variable in the file zeppelin-env.sh:
export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION=kms-key-region
Or using the following setting in zeppelin-site.xml:
<property> <name>zeppelin.notebook.s3.kmsKeyRegion</name> <value>target-region</value> <description>AWS KMS key region in your AWS account</description> </property>
Format of target-region
is described in more details here in second Region
column (e.g. us-east-1
).
You may use a custom EncryptionMaterialsProvider
class as long as it is available in the classpath and able to initialize itself from system properties or another mechanism. To use this, set the following environment variable in the file zeppelin-env.sh:
export ZEPPELIN_NOTEBOOK_S3_EMP=class-name
Or using the following setting in zeppelin-site.xml:
<property> <name>zeppelin.notebook.s3.encryptionMaterialsProvider</name> <value>provider implementation class name</value> <description>Custom encryption materials provider used to encrypt notebook data in S3</description>
To request server-side encryption of notebooks, set the following environment variable in the file zeppelin-env.sh:
export ZEPPELIN_NOTEBOOK_S3_SSE=true
Or using the following setting in zeppelin-site.xml:
<property> <name>zeppelin.notebook.s3.sse</name> <value>true</value> <description>Server-side encryption enabled for notebooks</description> </property>
Using AzureNotebookRepo
you can connect your Zeppelin with your Azure account for notebook storage.
First of all, input your AccountName
, AccountKey
, and Share Name
in the file zeppelin-site.xml by commenting out and completing the next properties:
<property> <name>zeppelin.notebook.azure.connectionString</name> <value>DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey></value> <description>Azure account credentials</description> </property> <property> <name>zeppelin.notebook.azure.share</name> <value>zeppelin</value> <description>share name for notebook storage</description> </property>
Secondly, you can initialize AzureNotebookRepo
class in the file zeppelin-site.xml by commenting the next property:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value> <description>versioned notebook persistence layer implementation</description> </property>
and commenting out:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.AzureNotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
In case you want to use simultaneously your local git storage with Azure storage use the following property instead:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo, apache.zeppelin.notebook.repo.AzureNotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
Optionally, you can specify Azure folder structure name in the file zeppelin-site.xml by commenting out the next property:
<property> <name>zeppelin.notebook.azure.user</name> <value>user</value> <description>optional user name for Azure folder structure</description> </property>
Using GCSNotebookRepo
you can connect Zeppelin with Google Cloud Storage using Application Default Credentials.
First, choose a GCS path under which to store notebooks.
<property> <name>zeppelin.notebook.gcs.dir</name> <value></value> <description> A GCS path in the form gs://bucketname/path/to/dir. Notes are stored at {zeppelin.notebook.gcs.dir}/{notebook-id}/note.json </description> </property>
Then, initialize the GCSNotebookRepo
class in the file zeppelin-site.xml by commenting the next property:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value> <description>versioned notebook persistence layer implementation</description> </property>
and commenting out:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GCSNotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
Or, if you want to simultaneously use your local git storage with GCS, use the following property instead:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo,org.apache.zeppelin.notebook.repo.GCSNotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
Note: On Google App Engine, Google Cloud Shell, and Google Compute Engine, these steps are not necessary if you are using the default built in service account.
For more information, see Application Default Credentials
See the gcloud docs
As the user running the zeppelin daemon, run:
gcloud auth application-default login
You can also use --scopes
to restrict access to specific Google APIs, such as Cloud Storage and BigQuery.
Alternatively, to use a service account for authentication with GCS, you will need a JSON service account key file.
CREATE SERVICE ACCOUNT
Storage -> Storage Object Admin
. Note that this is different than Storage Admin
.Bigquery -> Bigquery Data Viewer and BigQuery User
).json
file. Click “Create”./path/to/my/key.json
), and give it appropriate permissions. Ensure at least the user running the zeppelin daemon can read it.If you wish to set this as your default credential file to access Google Services, point GOOGLE_APPLICATION_CREDENTIALS
at your new key file in zeppelin-env.sh. For example:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/my/key.json
If you do not want to use this key file as default credential file and want to specify a custom key file for authentication with GCS, update the following property :
<property> <name>zeppelin.notebook.google.credentialsJsonFilePath</name> <value>path/to/key.json</value> <description> Path to GCS credential key file for authentication with Google Storage. </description> </property>
Notebooks may be stored in Aliyun OSS.
oss://bucket_name/{noteboo_dir}/note_path
And you should configure oss related properties in file zeppelin-site.xml.
<property> <name>zeppelin.notebook.oss.bucket</name> <value>zeppelin</value> <description>bucket name for notebook storage</description> </property> <property> <name>zeppelin.notebook.oss.endpoint</name> <value>http://oss-cn-hangzhou.aliyuncs.com</value> <description>endpoint for oss bucket</description> </property> <property> <name>zeppelin.notebook.oss.accesskeyid</name> <value></value> <description>Access key id for your OSS account</description> </property> <property> <name>zeppelin.notebook.oss.accesskeysecret</name> <value></value> <description>Access key secret for your OSS account</description> </property>
Uncomment the next property for use OSSNotebookRepo class:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.OSSNotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
ZeppelinHub storage layer allows out of the box connection of Zeppelin instance with your ZeppelinHub account. First of all, you need to either comment out the following property in zeppelin-site.xml:
<!-- For connecting your Zeppelin with ZeppelinHub --> <!-- <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo, org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo</value> <description>two notebook persistence layers (local + ZeppelinHub)</description> </property> -->
or set the environment variable in the file zeppelin-env.sh:
export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.GitNotebookRepo, org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo"
Secondly, you need to set the environment variables in the file zeppelin-env.sh:
export ZEPPELINHUB_API_TOKEN=ZeppelinHub token export ZEPPELINHUB_API_ADDRESS=address of ZeppelinHub service (e.g. https://www.zeppelinhub.com)
You can get more information on generating token
and using authentication on the corresponding help page.
Using MongoNotebookRepo
, you can store your notebook in MongoDB.
You can use MongoDB as notebook storage by editting zeppelin-env.sh
or zeppelin-site.xml
.
zeppelin-env.sh
Add a line below to $ZEPPELIN_HOME/conf/zeppelin-env.sh
:
export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo
NOTE: The default MongoDB connection URI is
mongodb://localhost
zeppelin-site.xml
Or, uncomment lines below at $ZEPPELIN_HOME/conf/zeppelin-site.xml
:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> <description>notebook persistence layer implementation</description> </property>
And comment lines below:
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value> <description>versioned notebook persistence layer implementation</description> </property>
You can configure options below in zeppelin-env.sh
.
ZEPPELIN_NOTEBOOK_MONGO_URI
MongoDB connection URI used to connect to a MongoDB database serverZEPPELIN_NOTEBOOK_MONGO_DATABASE
Database nameZEPPELIN_NOTEBOOK_MONGO_COLLECTION
Collection nameZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT
If true
, import local notes (refer to description below for details)Or, you can configure them in zeppelin-site.xml
. Corresponding option names as follows:
zeppelin.notebook.mongo.uri
zeppelin.notebook.mongo.database
zeppelin.notebook.mongo.collection
zeppelin.notebook.mongo.autoimport
zeppelin-env.sh
export ZEPPELIN_NOTEBOOK_MONGO_URI=mongodb://db1.example.com:27017 export ZEPPELIN_NOTEBOOK_MONGO_DATABASE=myfancy export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION=notebook export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true
By setting ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT
as true
(default false
), you can import your local notes automatically when Zeppelin daemon starts up. This feature is for easy migration from local file system storage to MongoDB storage. A note with ID already existing in the collection will not be imported.
To enable GitHub tracking, uncomment the following properties in zeppelin-site.xml
<property> <name>zeppelin.notebook.git.remote.url</name> <value></value> <description>remote Git repository URL</description> </property> <property> <name>zeppelin.notebook.git.remote.username</name> <value>token</value> <description>remote Git repository username</description> </property> <property> <name>zeppelin.notebook.git.remote.access-token</name> <value></value> <description>remote Git repository password</description> </property> <property> <name>zeppelin.notebook.git.remote.origin</name> <value>origin</value> <description>Git repository remote</description> </property>
And set the zeppelin.notebook.storage
propery to org.apache.zeppelin.notebook.repo.GitHubNotebookRepo
<property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitHubNotebookRepo</value> </property>
The access token could be obtained by following the steps on this link https://github.com/settings/tokens.