blob: 25ffc7693b043419cf49ba62520741eb0212bfa8 [file] [log] [blame]
---
title: Installing PredictionIO from Source Code
---
## Building
Run the following to download and build PredictionIO from its source code.
```
$ git clone https://github.com/PredictionIO/PredictionIO.git
$ cd PredictionIO
$ git checkout master
$ ./make-distribution.sh
```
You should see something like the following when it finishes building
successfully.
```
...
PredictionIO-<%= data.versions.pio %>/sbt/sbt
PredictionIO-<%= data.versions.pio %>/conf/
PredictionIO-<%= data.versions.pio %>/conf/pio-env.sh
PredictionIO binary distribution created at PredictionIO-<%= data.versions.pio %>.tar.gz
```
## Installing Dependencies
### Spark Setup
Apache Spark is the default processing engine for PredictionIO. Download [Apache
Spark release 1.2.0 package hadoop2.4](http://spark.apache.org/downloads.html).
```
$ wget http://d3kbcqa49mib13.cloudfront.net/<%= data.versions.spark_download_filename %>.tgz
$ tar zxvf <%= data.versions.spark_download_filename %>.tgz
```
Copy the configuration template `conf/pio-env.sh.template` to `conf/pio-env.sh`
in your PredictionIO installation directory. After that, edit `conf/pio-env.sh`
and point `SPARK_HOME` to the location where you extracted Apache Spark.
```
SPARK_HOME=/home/abc/Downloads/<%= data.versions.spark_download_filename %>
```
### Storage Setup
#### Elasticsearch Setup
By default, PredictionIO uses Elasticsearch at localhost as the data store to
store its metadata. Simply download and install
[Elasticsearch](http://www.elasticsearch.org/), which looks like this:
```
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/<%= data.versions.elasticsearch_download_filename %>.tar.gz
$ tar zxvf <%= data.versions.elasticsearch_download_filename %>.tar.gz
$ cd <%= data.versions.elasticsearch_download_filename %>
```
If you are using a shared network, change the `network.host` line in
`config/elasticsearch.yml` to `network.host: 127.0.0.1` because by default,
Elasticsearch looks for other machines on the network upon setup and you may run
into weird errors if there are other machines that is also running
Elasticsearch.
If you are not using the default setting at localhost. You may change the following in ```conf/pio-env.sh``` to fit your setup.
WARNING: Make sure to set `PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME` unless you do
not plan to use `bin/pio-start-all`, e.g. you have a custom HBase setup.
```
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/path/to/elasticsearch
```
#### <a name="hbase"></a>HBase Setup
By default, PredictionIO's Data API uses [HBase](http://hbase.apache.org/) at localhost as the data store
for event data.
```
$ wget https://archive.apache.org/dist/hbase/<%= data.versions.hbase_basename %>/<%= data.versions.hbase_basename %>-<%= data.versions.hbase_variant %>.tar.gz
$ tar zxvf <%= data.versions.hbase_basename %>-<%= data.versions.hbase_variant %>.tar.gz
$ cd <%= data.versions.hbase_basename %>-<%= data.versions.hbase_dir_suffix %>
```
You will need to at least add a minimal configuration to HBase to start it in
standalone mode. Details can be found
[here](http://hbase.apache.org/book/quickstart.html). Here, we are showing a
sample minimal configuration.
> For production deployment, run a fully distributed HBase configuration.
Edit `/path/to/hbase/conf/hbase-site.xml` and configure the local **data**
directories for HBase and ZooKeeper. These directories should be empty the first
time you launch PredictionIO.
```
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///path/to/hbase-data-dir</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/path/to/zookeeper-data-dir</value>
</property>
</configuration>
```
Edit `/path/to/hbase/conf/hbase-env.sh` to set `JAVA_HOME` for the cluster. For
Mac users it would be
```
export JAVA_HOME=`/usr/libexec/java_home -v 1.7`
```
Navigate back to your PredictionIO's installation directory and edit
`conf/pio-env.sh` add the path to HBase's root and config directories:
```
HBASE_CONF_DIR=/path/to/hbase/conf
...
PIO_STORAGE_SOURCES_HBASE_HOME=/path/to/hbase
```
<%= partial 'shared/install/dependent_services' %>
Now you have installed everything you need!
#### [Next: Recommendation Engine Quick Start](/templates/recommendation/quickstart/)