Using Jupyter based docker, you can use Jupyter Notebook with PredictionIO environment. It helps you with your exploratory data analysis (EDA).
First of all, start Jupyter container with PredictionIO environment:
docker-compose -f docker-compose.jupyter.yml \ -f pgsql/docker-compose.base.yml \ -f pgsql/docker-compose.meta.yml \ -f pgsql/docker-compose.event.yml \ -f pgsql/docker-compose.model.yml \ up
Open http://127.0.0.1:8888/
and then open a new terminal in Jupyter from New
pulldown button.
Clone a template using Git:
cd templates/ git clone https://github.com/apache/predictionio-template-recommender.git cd predictionio-template-recommender/
Replace a name with MyApp1
.
sed -i "s/INVALID_APP_NAME/MyApp1/" engine.json
Using pio command, register a new application as MyApp1
.
pio app new MyApp1
This command prints an access key as below.
[INFO] [Pio$] Access Key: bbe8xRHN1j3Sa8WeAT8TSxt5op3lUqhvXmKY1gLRjg70K-DUhHIJJ0-UzgKumxGm
Set it to an environment variable ACCESS_KEY
.
ACCESS_KEY=bbe8xRHN1j3Sa8WeAT8TSxt5op3lUqhvXmKY1gLRjg70K-DUhHIJJ0-UzgKumxGm
Download trainging data and import them to PredictionIO Event server.
curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt python data/import_eventserver.py --access_key $ACCESS_KEY
Build your template by the following command:
pio build --verbose
To create a model, run:
pio train
Clone a template using Git:
cd templates/ git clone https://github.com/jpioug/predictionio-template-iris.git predictionio-template-iris/
Using pio command, register a new application as IrisApp
.
pio app new --access-key IRIS_TOKEN IrisApp
Download trainging data and import them to PredictionIO Event server.
python data/import_eventserver.py
Build your template by the following command:
pio build --verbose
To do data analysis, open templates/predictionio-template-iris/eda.ipynb
on Jupyter.
You need to clear the following environment variables in the terminal before executing pio train
.
unset PYSPARK_PYTHON unset PYSPARK_DRIVER_PYTHON unset PYSPARK_DRIVER_PYTHON_OPTS
To create a model, run:
pio train --main-py-file train.py