blob: 259c3428fc84eb7862e5ebe8f4fc37a14a295295 [file] [log] [blame]
 Installing Apache PredictionIO (incubating) from Source Code

Assuming you are following the directory structure in the followoing, replace /home/abc with your own home directory wherever you see it.

Building

Run the following to download and build Apache PredictionIO (incubating) from its source code.

1
2
3
4
$git clone https://github.com/apache/incubator-predictionio.git$ cd incubator-predictionio
$git checkout master$ ./make-distribution.sh

You should see something like the following when it finishes building successfully.

1
2
3
4
5
...
PredictionIO-0.9.6/sbt/sbt
PredictionIO-0.9.6/conf/
PredictionIO-0.9.6/conf/pio-env.sh
PredictionIO binary distribution created at PredictionIO-0.9.6.tar.gz

Extract the binary distribution you have just built.

1
$tar zxvf PredictionIO-0.9.6.tar.gz Installing Dependencies Let us install dependencies inside a subdirectory of the Apache PredictionIO (incubating) installation. By following this convention, you can use Apache PredictionIO (incubating)'s default configuration as is. 1$ mkdir PredictionIO-0.9.6/vendors

Spark Setup

Apache Spark is the default processing engine for PredictionIO. Download and extract it.

1
2
$wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz$ tar zxvfC spark-1.5.1-bin-hadoop2.6.tgz PredictionIO-0.9.6/vendors

If you decide to install Apache Spark to another location, you must edit PredictionIO-0.9.6/conf/pio-env.sh and change the SPARK_HOME variable to point to your own Apache Spark installation.

Storage Setup

PostgreSQL Setup

Setting up PostgreSQL to work with PredictionIO.

Make sure you have PostgreSQL installed. For Mac Users, Homebrew is recommended and can be used as

1
$brew install postgresql or on Ubuntu: apt-get install postgresql-9.4 Now that PostgreSQL is installed use the following comands$ createdb pio

If you get an error of the form could not connect to server: No such file or directory, then you must first start the server manually,:

$pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start Finally use the command: $ psql -c "create user pio with password 'pio'"

Your configuration in pio-env.sh is now compatible to run with PostgreSQL.

HBase and Elasticsearch Setup

Elasticsearch Setup

You may skip this section if you are using PostgreSQL or MySQL.

1
2
$wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.tar.gz$ tar zxvfC elasticsearch-1.4.4.tar.gz PredictionIO-0.9.6/vendors

If you decide to install Elasticsearch to another location, you must edit PredictionIO-0.9.6/conf/pio-env.sh and change the PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME variable to point to your own Elasticsearch installation.

If you are using a shared network, change the network.host line in PredictionIO-0.9.6/vendors/elasticsearch-1.4.4/config/elasticsearch.yml to network.host: 127.0.0.1 because by default, Elasticsearch looks for other machines on the network upon setup and you may run into weird errors if there are other machines that is also running Elasticsearch.

If you are not using the default setting at localhost, you may change the following in PredictionIO-0.9.6/conf/pio-env.sh to fit your setup.

1
2
3
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
HBase Setup

You may skip this section if you are using PostgreSQL or MySQL.

HBase is the default event data store for PredictionIO. Download and extract it.

1
2
$wget http://archive.apache.org/dist/hbase/hbase-1.0.0/hbase-1.0.0-bin.tar.gz$ tar zxvfC hbase-1.0.0-bin.tar.gz PredictionIO-0.9.6/vendors

If you decide to install HBase to another location, you must edit PredictionIO-0.9.6/conf/pio-env.sh and change the PIO_STORAGE_SOURCES_HBASE_HOME variable to point to your own HBase installation.

You will need to at least add a minimal configuration to HBase to start it in standalone mode. Details can be found here. Here, we are showing a sample minimal configuration.

For production deployment, run a fully distributed HBase configuration.

Edit PredictionIO-0.9.6/vendors/hbase-1.0.0/conf/hbase-site.xml.

1
2
3
4
5
6
7
8
9
10
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///home/abc/PredictionIO-0.9.6/vendors/hbase-1.0.0/data</value>
</property>
<property>
<value>/home/abc/PredictionIO-0.9.6/vendors/hbase-1.0.0/zookeeper</value>
</property>
</configuration>

HBase will create hbase.rootdir automatically to store its data.

Edit PredictionIO-0.9.6/vendors/hbase-1.0.0/conf/hbase-env.sh to set JAVA_HOME for the cluster. For example:

1
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre

For Mac users, use this instead (change 1.8 to 1.7 if you have Java 7 installed):

1
export JAVA_HOME=/usr/libexec/java_home -v 1.8`

In addition, you must set your environment variable JAVA_HOME. For example, in /home/abc/.bashrc add the following line:

1
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Start PredictionIO and Dependent Services

If you are using PostgreSQL or MySQL, skip pio-start-all and pio-stop-all, and do PredictionIO-0.9.6/bin/pio eventserver & instead.

Simply do PredictionIO-0.9.6/bin/pio-start-all and you should see something similar to the following:

1
2
3
4
5
6
7
$PredictionIO-0.9.6/bin/pio-start-all Starting Elasticsearch... Starting HBase... starting master, logging to /home/abc/PredictionIO-0.9.6/vendors/hbase-1.0.0/bin/../logs/hbase-abc-master-yourhost.local.out Waiting 10 seconds for HBase to fully initialize... Starting PredictionIO Event Server...$

You may use jps to verify that you have everything started:

1
2
3
4
5
6
$jps -l 15344 org.apache.hadoop.hbase.master.HMaster 15409 io.prediction.tools.console.Console 15256 org.elasticsearch.bootstrap.Elasticsearch 15469 sun.tools.jps.Jps$

A running setup will have these up and running:

• io.prediction.tools.console.Console