docs/manual/source/system/anotherdatastore.html.md - predictionio - Git at Google

 ---
 title: Using Another Data Store
 ---

 <!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->

 PredictionIO has a thin storage layer to abstract meta data, event data, and
 model data access. The layer defines a set of standard interfaces to support
 multiple data store backends. PredictionIO users can configure the backend of
 choice through configuration files or environmental variables. Engine developers
 need not worry about the actual underlying storage architecture. Advanced
 developers can implement their own backend driver as an external library.


 ## Concepts

 In this section, we will visit some storage layer concepts that are common to
 users, engine developers, and advanced developers:

 - **Repository** is the highest level of data access abstraction and is where all
 engines and PredictionIO itself access data with.

 - **Source** is the actual data store backend that provide data access. A source is an
 implementation of the set of data access interfaces defined by *repositories*.

 Each of them will be explained in detail below:

 ### Repositories

 *Repository* is the highest level of data access abstraction and is where all
 engines and PredictionIO itself access data with.

 The storage layer currently defines three mandatory data repositories: *meta
 data*, *event data*, and *model data*. Each repository has its own set of data
 access interfaces.

 - **Meta data** is used by PredictionIO to store engine training and evaluation
 information. Commands like `pio build`, `pio train`, `pio deploy`, and `pio
 eval` all access meta data.

 - **Event data** is used by the Event Server to collect events, and by engines to
 source data.

 - **Model data** is used by PredictionIO for automatic persistence of trained
 models.

 The following configuration variables are used for configure these repositories:

   - *Meta data* is configured by the `PIO_STORAGE_REPOSITORIES_METADATA_XXX` variables.
   - *Event data* is configured by the `PIO_STORAGE_REPOSITORIES_EVENTDATA_XXX` variables.
   - *Model data* is configured by the `PIO_STORAGE_REPOSITORIES_MODELDATA_XXX` variables.

 Configuration variables will be explained in more details in later sections below (see Data Store Configuration).

 For example, you may see the following configuration variables defined in `conf/pio-env.sh`

 ```shell
 PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
 PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

 PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=predictionio_eventdata
 PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

 PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_
 PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
 ```

 The configuration variable with the *NAME* suffix controls the namespace used by
 the *source*.

 The configuration variable with the *SOURCE* suffix points to the actual
 **source** that will back this repository. *Source* will be explained below.


 ### Sources

 *Sources* are actual data store backends that provide data access. A source is an
 implementation of the set of data access interfaces defined by *repositories*.

 PredictionIO comes with the following sources:

 - **JDBC** (tested on MySQL and PostgreSQL):
   * Type name is **jdbc**.
   * Can be used for *Meta Data*, *Event Data* and *Model Data* repositories

 - **Elasticsearch**:
   * Type name is **elasticsearch**
   * Can be used for *Meta Data* repository

 - **Apache HBase**:
   * Type name is **hbase**
   * Can be used for *Event Data* repository

 - **Local file system**:
   * Type name is **localfs**
   * Can be used for *Model Data* repository

 - **HDFS**:
   * Type name is **hdfs**.
   * Can be used for *Model Data* repository

 - **S3**:
   * Type name is **s3**.
   * Can be used for *Model Data* repository

 Each repository can be configured to use different sources as shown above.

 Each source has its own set of configuration parameters. Configuration variables will be explained in more details in later sections below (see Data Store Configuration).

 The following is an example source configuration with name "PGSQL" with type `jdbc`:

 ```shell
 PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
 PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql:predictionio
 PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
 PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
 ```

 The following is an example of using this source "PGSQL" for the *meta data* repository:

 ```shell
 PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
 PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL
 ```

 ## Data Store Configuration

 Data store configuration is done by settings environmental variables. If you set
 them inside `conf/pio-env.sh`, they will be automatically available whenever you
 perform a `pio` command, e.g. `pio train`.

 Notice that all variables are prefixed by `PIO_STORAGE_`.

 ### Repositories Configuration

 Variable Format: `PIO_STORAGE_REPOSITORIES_<REPO>_<KEY>`

 Configuration variables of repositories are prefixed by
 `PIO_STORAGE_REPOSITORIES_`, followed by the repository name (e.g. `METADATA`),
 and then either `NAME` or `SOURCE`.

 Consider the following example:

 ```shell
 PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
 PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL
 ```

 The above configures PredictionIO to look for a source configured with the name
 `PGSQL`, and use `predictionio_metadata` as the namespace within such source. There is no
 restriction on namespace usage by the source, so behavior may vary. As an
 example, the official JDBC source uses the namespace as database table prefix.


 ### Sources Configuration

 Variable Format: `PIO_STORAGE_SOURCES_<NAME>_<KEY>`

 Configuration variables of sources are prefixed by
 `PIO_STORAGE_SOURCES_`, followed by the source name of choice (e.g. `PGSQL`,
 `MYSQL`, `HBASE`, etc), and a configuration `KEY`.

 INFO: The `TYPE` configuration key is mandatory. It is used by PredictionIO to
 determine the actual driver type to load.

 Depending on what the source `TYPE` is, different configuration keys are
 required.


 #### JDBC Configuration

 Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=jdbc`

 Supported Repositories: **meta**, **event**, **model**

 Tested on: MySQL 5.1+, PostgreSQL 9.1+

 When `TYPE` is set to `jdbc`, the following configuration keys are supported.

 -   URL (mandatory)

     The value must be a valid JDBC URL that points to a database, e.g.
     `PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql:predictionio`

 -   USERNAME (mandatory)

     The value must be a valid, non-empty username for the JDBC connection, e.g.
     `PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio_user`

 -   PASSWORD (mandatory)

     The value must be a valid, non-empty password for the JDBC connection, e.g.
     `PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio_user_password`

 -   PARTITIONS (optional, default to 4)

     This value is used by Apache Spark to determine the number of partitions to
     use when it reads from the JDBC connection, e.g.
     `PIO_STORAGE_SOURCES_PGSQL_PARTITIONS=4`

 -   CONNECTIONS (optional, default to 8)

     This value is used by scalikejdbc library to determine the max size of connection pool, e.g.
     `PIO_STORAGE_SOURCES_PGSQL_CONNECTIONS=8`

 -   INDEX (optional since v0.9.6, default to disabled)

     This value is used by creating indexes on entityId and entityType columns to
     improve performance when findByEntity function is called. Note that these columns
     of entityId and entityType will be created as varchar(255), e.g.
     `PIO_STORAGE_SOURCES_PGSQL_INDEX=enabled`


 #### Apache HBase Configuration

 Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=hbase`

 Supported Repositories: **event**

 Tested on: Apache HBase 0.98.5+, 1.0.0+

 When `TYPE` is set to `hbase`, no other configuration keys are required. Other
 client side HBase configuration must be done through `hbase-site.xml` pointed
 by the `HBASE_CONF_DIR` configuration variable.


 #### Elasticsearch Configuration

 Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=elasticsearch`

 Supported Repositories: **meta**

 When `TYPE` is set to `elasticsearch`, the following configuration keys are
 supported.

 -   HOSTS (mandatory)

     Comma-separated list of hostnames, e.g.
     `PIO_STORAGE_SOURCES_ES_HOSTS=es1,es2,es3`

 -   PORTS (mandatory)

     Comma-separated list of ports that corresponds to `HOSTS`, e.g.
     `PIO_STORAGE_SOURCES_ES_PORTS=9200,9200,9222`

 -   CLUSTERNAME (optional, default to `elasticsearch`)

     Elasticsearch cluster name, e.g.
     `PIO_STORAGE_SOURCES_ES_CLUSTERNAME=myescluster`

 INFO: Other advanced Elasticsearch parameters can be set by pointing
 `ES_CONF_DIR` configuration variable to the location of `elasticsearch.yml`.


 #### Local File System Configuration

 Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=localfs`

 Supported Repositories: **model**

 When `TYPE` is set to `localfs`, the following configuration keys are
 supported.

 -   PATH (mandatory)

     File system path at where models are stored, e.g.
     `PIO_STORAGE_SOURCES_FS_PATH=/mymodels`


 #### HDFS Configuration

 Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=hdfs`

 Supported Repositories: **model**

 When `TYPE` is set to `hdfs`, the following configuration keys are
 supported.

 -   PATH (mandatory)

     HDFS path at where models are stored, e.g.
     `PIO_STORAGE_SOURCES_HDFS_PATH=/mymodels`


 #### S3 Configuration

 Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=s3`

 Supported Repositories: **model**

 To provide authentication information, you can set the `AWS_ACCESS_KEY_ID`
 and `AWS_SECRET_ACCESS_KEY` environment variables or use one of the other
 methods in the [AWS Setup Docs](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#config-settings-and-precedence)

 When `TYPE` is set to `s3`, the following configuration keys are
 supported.

 -   REGION (mandatory)

     AWS Region to use, e.g.
     `PIO_STORAGE_SOURCES_S3_REGION=us-east-1`

 -   BUCKET_NAME (mandatory)

     S3 Bucket where models are stored, e.g.
     `PIO_STORAGE_SOURCES_S3_BUCKET_NAME=pio_bucket`

 -   BASE_PATH (optional)

     S3 base path where models are stored, e.g.
     `PIO_STORAGE_SOURCES_S3_BASE_PATH=pio_model`

 -   DISABLE_CHUNKED_ENCODING (optional)

     Disable the use of Chunked Encoding when transferring files to/from S3, e.g.
     `PIO_STORAGE_SOURCES_S3_DISABLE_CHUNKED_ENCODING=true`

 -   ENDPOINT (optional)

     S3 Endpoint to use, e.g.
     `PIO_STORAGE_SOURCES_S3_ENDPOINT=http://localstack:4572`


 ## Adding Support of Other Backends

 It is quite straightforward to implement support of other backends. A good
 starting point is to reference the JDBC implementation inside the
 [org.apache.predictionio.data.storage.jdbc
 package](https://github.com/apache/predictionio/tree/develop/data/src/main/scala/org/apache/predictionio/data/storage/jdbc).

 Contributions of different backends implementation is highly encouraged. To
 start contributing, please refer to [this guide](/community/contribute-code/).


 ### Deploying Your Custom Backend Support as a Plugin

 It is possible to deploy your custom backend implementation as a standalone JAR
 apart from the main PredictionIO binary distribution. The following is an
 outline of how this can be achieved.

 1.  Create an SBT project with a library dependency on PredictionIO's data
     access base traits (inside the `data` artifact).

 2.  Implement traits that you intend to support, and package everything into a
     big fat JAR (e.g. sbt-assembly).

 3.  Create a directory named `plugins` inside PredictionIO binary installation.

 4.  Copy the JAR from step 2 to `plugins`.

 5.  In storage configuration, specify `TYPE` as your complete package name. As
     an example, if you have implemented all your traits under the package name
     `org.mystorage.jdbc`, use something like

     ```shell
     PIO_STORAGE_SOURCES_MYJDBC_TYPE=org.mystorage.jdbc
     ...
     PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYJDBC
     ```

     to instruct PredictionIO to pick up `StorageClient` from the appropriate
     package.

 6.  Now you should be able to use your custom source and assign it to different
     repositories as you wish.
	---
	title: Using Another Data Store
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	PredictionIO has a thin storage layer to abstract meta data, event data, and
	model data access. The layer defines a set of standard interfaces to support
	multiple data store backends. PredictionIO users can configure the backend of
	choice through configuration files or environmental variables. Engine developers
	need not worry about the actual underlying storage architecture. Advanced
	developers can implement their own backend driver as an external library.


	## Concepts

	In this section, we will visit some storage layer concepts that are common to
	users, engine developers, and advanced developers:

	- Repository is the highest level of data access abstraction and is where all
	engines and PredictionIO itself access data with.

	- Source is the actual data store backend that provide data access. A source is an
	implementation of the set of data access interfaces defined by repositories.

	Each of them will be explained in detail below:

	### Repositories

	Repository is the highest level of data access abstraction and is where all
	engines and PredictionIO itself access data with.

	The storage layer currently defines three mandatory data repositories: *meta
	data, event data, and model data*. Each repository has its own set of data
	access interfaces.

	- Meta data is used by PredictionIO to store engine training and evaluation
	information. Commands like `pio build`, `pio train`, `pio deploy`, and `pio
	eval` all access meta data.

	- Event data is used by the Event Server to collect events, and by engines to
	source data.

	- Model data is used by PredictionIO for automatic persistence of trained
	models.

	The following configuration variables are used for configure these repositories:

	- Meta data is configured by the `PIO_STORAGE_REPOSITORIES_METADATA_XXX` variables.
	- Event data is configured by the `PIO_STORAGE_REPOSITORIES_EVENTDATA_XXX` variables.
	- Model data is configured by the `PIO_STORAGE_REPOSITORIES_MODELDATA_XXX` variables.

	Configuration variables will be explained in more details in later sections below (see Data Store Configuration).

	For example, you may see the following configuration variables defined in `conf/pio-env.sh`

	```shell
	PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
	PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

	PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=predictionio_eventdata
	PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

	PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_
	PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
	```

	The configuration variable with the NAME suffix controls the namespace used by
	the source.

	The configuration variable with the SOURCE suffix points to the actual
	source that will back this repository. Source will be explained below.


	### Sources

	Sources are actual data store backends that provide data access. A source is an
	implementation of the set of data access interfaces defined by repositories.

	PredictionIO comes with the following sources:

	- JDBC (tested on MySQL and PostgreSQL):
	* Type name is jdbc.
	* Can be used for Meta Data, Event Data and Model Data repositories

	- Elasticsearch:
	* Type name is elasticsearch
	* Can be used for Meta Data repository

	- Apache HBase:
	* Type name is hbase
	* Can be used for Event Data repository

	- Local file system:
	* Type name is localfs
	* Can be used for Model Data repository

	- HDFS:
	* Type name is hdfs.
	* Can be used for Model Data repository

	- S3:
	* Type name is s3.
	* Can be used for Model Data repository

	Each repository can be configured to use different sources as shown above.

	Each source has its own set of configuration parameters. Configuration variables will be explained in more details in later sections below (see Data Store Configuration).

	The following is an example source configuration with name "PGSQL" with type `jdbc`:

	```shell
	PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
	PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql:predictionio
	PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
	PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
	```

	The following is an example of using this source "PGSQL" for the meta data repository:

	```shell
	PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
	PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL
	```

	## Data Store Configuration

	Data store configuration is done by settings environmental variables. If you set
	them inside `conf/pio-env.sh`, they will be automatically available whenever you
	perform a `pio` command, e.g. `pio train`.

	Notice that all variables are prefixed by `PIO_STORAGE_`.

	### Repositories Configuration

	Variable Format: `PIO_STORAGE_REPOSITORIES_<REPO>_<KEY>`

	Configuration variables of repositories are prefixed by
	`PIO_STORAGE_REPOSITORIES_`, followed by the repository name (e.g. `METADATA`),
	and then either `NAME` or `SOURCE`.

	Consider the following example:

	```shell
	PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
	PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL
	```

	The above configures PredictionIO to look for a source configured with the name
	`PGSQL`, and use `predictionio_metadata` as the namespace within such source. There is no
	restriction on namespace usage by the source, so behavior may vary. As an
	example, the official JDBC source uses the namespace as database table prefix.


	### Sources Configuration

	Variable Format: `PIO_STORAGE_SOURCES_<NAME>_<KEY>`

	Configuration variables of sources are prefixed by
	`PIO_STORAGE_SOURCES_`, followed by the source name of choice (e.g. `PGSQL`,
	`MYSQL`, `HBASE`, etc), and a configuration `KEY`.

	INFO: The `TYPE` configuration key is mandatory. It is used by PredictionIO to
	determine the actual driver type to load.

	Depending on what the source `TYPE` is, different configuration keys are
	required.


	#### JDBC Configuration

	Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=jdbc`

	Supported Repositories: meta, event, model

	Tested on: MySQL 5.1+, PostgreSQL 9.1+

	When `TYPE` is set to `jdbc`, the following configuration keys are supported.

	- URL (mandatory)

	The value must be a valid JDBC URL that points to a database, e.g.
	`PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql:predictionio`

	- USERNAME (mandatory)

	The value must be a valid, non-empty username for the JDBC connection, e.g.
	`PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio_user`

	- PASSWORD (mandatory)

	The value must be a valid, non-empty password for the JDBC connection, e.g.
	`PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio_user_password`

	- PARTITIONS (optional, default to 4)

	This value is used by Apache Spark to determine the number of partitions to
	use when it reads from the JDBC connection, e.g.
	`PIO_STORAGE_SOURCES_PGSQL_PARTITIONS=4`

	- CONNECTIONS (optional, default to 8)

	This value is used by scalikejdbc library to determine the max size of connection pool, e.g.
	`PIO_STORAGE_SOURCES_PGSQL_CONNECTIONS=8`

	- INDEX (optional since v0.9.6, default to disabled)

	This value is used by creating indexes on entityId and entityType columns to
	improve performance when findByEntity function is called. Note that these columns
	of entityId and entityType will be created as varchar(255), e.g.
	`PIO_STORAGE_SOURCES_PGSQL_INDEX=enabled`


	#### Apache HBase Configuration

	Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=hbase`

	Supported Repositories: event

	Tested on: Apache HBase 0.98.5+, 1.0.0+

	When `TYPE` is set to `hbase`, no other configuration keys are required. Other
	client side HBase configuration must be done through `hbase-site.xml` pointed
	by the `HBASE_CONF_DIR` configuration variable.


	#### Elasticsearch Configuration

	Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=elasticsearch`

	Supported Repositories: meta

	When `TYPE` is set to `elasticsearch`, the following configuration keys are
	supported.

	- HOSTS (mandatory)

	Comma-separated list of hostnames, e.g.
	`PIO_STORAGE_SOURCES_ES_HOSTS=es1,es2,es3`

	- PORTS (mandatory)

	Comma-separated list of ports that corresponds to `HOSTS`, e.g.
	`PIO_STORAGE_SOURCES_ES_PORTS=9200,9200,9222`

	- CLUSTERNAME (optional, default to `elasticsearch`)

	Elasticsearch cluster name, e.g.
	`PIO_STORAGE_SOURCES_ES_CLUSTERNAME=myescluster`

	INFO: Other advanced Elasticsearch parameters can be set by pointing
	`ES_CONF_DIR` configuration variable to the location of `elasticsearch.yml`.


	#### Local File System Configuration

	Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=localfs`

	Supported Repositories: model

	When `TYPE` is set to `localfs`, the following configuration keys are
	supported.

	- PATH (mandatory)

	File system path at where models are stored, e.g.
	`PIO_STORAGE_SOURCES_FS_PATH=/mymodels`


	#### HDFS Configuration

	Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=hdfs`

	Supported Repositories: model

	When `TYPE` is set to `hdfs`, the following configuration keys are
	supported.

	- PATH (mandatory)

	HDFS path at where models are stored, e.g.
	`PIO_STORAGE_SOURCES_HDFS_PATH=/mymodels`


	#### S3 Configuration

	Variable Format: `PIO_STORAGE_SOURCES_[NAME]_TYPE=s3`

	Supported Repositories: model

	To provide authentication information, you can set the `AWS_ACCESS_KEY_ID`
	and `AWS_SECRET_ACCESS_KEY` environment variables or use one of the other
	methods in the [AWS Setup Docs](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#config-settings-and-precedence)

	When `TYPE` is set to `s3`, the following configuration keys are
	supported.

	- REGION (mandatory)

	AWS Region to use, e.g.
	`PIO_STORAGE_SOURCES_S3_REGION=us-east-1`

	- BUCKET_NAME (mandatory)

	S3 Bucket where models are stored, e.g.
	`PIO_STORAGE_SOURCES_S3_BUCKET_NAME=pio_bucket`

	- BASE_PATH (optional)

	S3 base path where models are stored, e.g.
	`PIO_STORAGE_SOURCES_S3_BASE_PATH=pio_model`

	- DISABLE_CHUNKED_ENCODING (optional)

	Disable the use of Chunked Encoding when transferring files to/from S3, e.g.
	`PIO_STORAGE_SOURCES_S3_DISABLE_CHUNKED_ENCODING=true`

	- ENDPOINT (optional)

	S3 Endpoint to use, e.g.
	`PIO_STORAGE_SOURCES_S3_ENDPOINT=http://localstack:4572`


	## Adding Support of Other Backends

	It is quite straightforward to implement support of other backends. A good
	starting point is to reference the JDBC implementation inside the
	[org.apache.predictionio.data.storage.jdbc
	package](https://github.com/apache/predictionio/tree/develop/data/src/main/scala/org/apache/predictionio/data/storage/jdbc).

	Contributions of different backends implementation is highly encouraged. To
	start contributing, please refer to [this guide](/community/contribute-code/).


	### Deploying Your Custom Backend Support as a Plugin

	It is possible to deploy your custom backend implementation as a standalone JAR
	apart from the main PredictionIO binary distribution. The following is an
	outline of how this can be achieved.

	1. Create an SBT project with a library dependency on PredictionIO's data
	access base traits (inside the `data` artifact).

	2. Implement traits that you intend to support, and package everything into a
	big fat JAR (e.g. sbt-assembly).

	3. Create a directory named `plugins` inside PredictionIO binary installation.

	4. Copy the JAR from step 2 to `plugins`.

	5. In storage configuration, specify `TYPE` as your complete package name. As
	an example, if you have implemented all your traits under the package name
	`org.mystorage.jdbc`, use something like

	```shell
	PIO_STORAGE_SOURCES_MYJDBC_TYPE=org.mystorage.jdbc
	...
	PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYJDBC
	```

	to instruct PredictionIO to pick up `StorageClient` from the appropriate
	package.

	6. Now you should be able to use your custom source and assign it to different
	repositories as you wish.