blob: 56141f26bcadadccf24da4d87f32bf7cf66caa36 [file] [log] [blame]
---
title: Quick Start - Classification Engine Template
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## Overview
An engine template is an almost-complete implementation of an engine.
PredictionIO's Classification Engine Template
has integrated **Apache Spark MLlib**'s Naive Bayes algorithm by default.
The default use case of Classification Engine Template is to predict the service
plan (*plan*) a user will subscribe to based on his 3 properties: *attr0*,
*attr1* and *attr2*.
You can customize it easily to fit your specific use case and needs.
We are going to show you how to create your own classification engine for
production use based on this template.
## Usage
### Event Data Requirements
By default, the template requires the following events to be collected:
- user $set event, which set the attributes of the user
NOTE: You can customize to use other event.
### Input Query
- individual attributes values (for version >= v0.3.1)
WARNING: for version < v0.3.1, it is array of features values
### Output PredictedResult
- the predicted label
## 1. Install and Run PredictionIO
<%= partial 'shared/quickstart/install' %>
## 2. Create a new Engine from an Engine Template
<%= partial 'shared/quickstart/create_engine', locals: { engine_name: 'MyClassification', template_name: 'Classification Engine Template', template_repo: 'apache/predictionio-template-attribute-based-classifier' } %>
## 3. Generate an App ID and Access Key
<%= partial 'shared/quickstart/create_app' %>
## 4. Collecting Data
Next, let's collect some training data. By default, the Classification Engine Template reads 4 properties of a user record: attr0, attr1, attr2 and plan. This templates requires '$set' user events.
INFO: This template can easily be customized to use different or more number of attributes.
<%= partial 'shared/quickstart/collect_data' %>
To set properties "attr0", "attr1", "attr2" and "plan" for user "u0" on time `2014-11-02T09:39:45.618-08:00` (current time will be used if eventTime is not specified), you can send `$set` event for the user. To send this event, run the following `curl` command:
<div class="tabs">
<div data-tab="REST API" data-lang="json">
```
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$set",
"entityType" : "user",
"entityId" : "u0",
"properties" : {
"attr0" : 0,
"attr1" : 1,
"attr2" : 0,
"plan" : 1
}
"eventTime" : "2014-11-02T09:39:45.618-08:00"
}'
```
</div>
<div data-tab="Python SDK" data-lang="python">
```python
import predictionio
client = predictionio.EventClient(
access_key=<ACCESS KEY>,
url=<URL OF EVENTSERVER>,
threads=5,
qsize=500
)
# Set the 4 properties for a user
client.create_event(
event="$set",
entity_type="user",
entity_id=<USER ID>,
properties= {
"attr0" : int(<VALUE OF ATTR0>),
"attr1" : int(<VALUE OF ATTR1>),
"attr2" : int(<VALUE OF ATTR2>),
"plan" : int(<VALUE OF PLAN>)
}
)
```
</div>
<div data-tab="PHP SDK" data-lang="php">
```php
<?php
require_once("vendor/autoload.php");
use predictionio\EventClient;
$client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);
// Set the 4 properties for a user
$client->createEvent(array(
'event' => '$set',
'entityType' => 'user',
'entityId' => <USER ID>,
'properties' => array(
'attr0' => <VALUE OF ATTR0>,
'attr1' => <VALUE OF ATTR1>,
'attr2' => <VALUE OF ATTR2>,
'plan' => <VALUE OF PLAN>
)
));
?>
```
</div>
<div data-tab="Ruby SDK" data-lang="ruby">
```ruby
# Create a client object.
client = PredictionIO::EventClient.new(<ACCESS KEY>, <URL OF EVENTSERVER>)
# Set the 4 properties for a user.
client.create_event(
'$set',
'user',
<USER ID>, {
'properties' => {
'attr0' => <VALUE OF ATTR0 (integer)>,
'attr1' => <VALUE OF ATTR1 (integer)>,
'attr2' => <VALUE OF ATTR2 (integer)>,
'plan' => <VALUE OF PLAN (integer)>,
}
}
)
```
</div>
<div data-tab="Java SDK" data-lang="java">
```java
import com.google.common.collect.ImmutableMap;
import org.apache.predictionio.Event;
import org.apache.predictionio.EventClient;
EventClient client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);
// set the 4 properties for a user
Event event = new Event()
.event("$set")
.entityType("user")
.entityId(<USER ID>)
.properties(ImmutableMap.<String, Object>of(
"attr0", <VALUE OF ATTR0>,
"attr1", <VALUE OF ATTR1>,
"attr2", <VALUE OF ATTR2>,
"plan", <VALUE OF PLAN>
));
client.createEvent(event);
```
</div>
</div>
Note that you can also set the properties for the user with multiple `$set` events (They will be aggregated during engine training).
To set properties "attr0", "attr1" and "attr2", and "plan" for user "u1" at different time, you can send following `$set` events for the user. To send these events, run the following `curl` command:
<div class="tabs">
<div data-tab="REST API" data-lang="json">
```
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$set",
"entityType" : "user",
"entityId" : "u1",
"properties" : {
"attr0" : 0
}
"eventTime" : "2014-11-02T09:39:45.618-08:00"
}'
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$set",
"entityType" : "user",
"entityId" : "u1",
"properties" : {
"attr1" : 1,
"attr2": 0
}
"eventTime" : "2014-11-02T09:39:45.618-08:00"
}'
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$set",
"entityType" : "user",
"entityId" : "u1",
"properties" : {
"plan" : 1
}
"eventTime" : "2014-11-02T09:39:45.618-08:00"
}'
```
</div>
<div data-tab="Python SDK" data-lang="python">
```python
# You may also set the properties one by one
client.create_event(
event="$set",
entity_type="user",
entity_id=<USER ID>,
properties= {
"attr0" : int(<VALUE OF ATTR0>)
}
)
client.create_event(
event="$set",
entity_type="user",
entity_id=<USER ID>,
properties= {
"attr1" : int(<VALUE OF ATTR1>),
"attr2" : int(<VALUE OF ATTR2>)
}
)
client.create_event(
event="$set",
entity_type="user",
entity_id=<USER ID>,
properties= {
"plan" : int(<VALUE OF PLAN>)
}
)
```
</div>
<div data-tab="PHP SDK" data-lang="php">
```php
<?php
// You may also set the properties one by one
$client->createEvent(array(
'event' => '$set',
'entityType' => 'user',
'entityId' => <USER ID>,
'properties' => array(
'attr0' => <VALUE OF ATTR0>
)
));
$client->createEvent(array(
'event' => '$set',
'entityType' => 'user',
'entityId' => <USER ID>,
'properties' => array(
'attr1' => <VALUE OF ATTR1>,
'attr2' => <VALUE OF ATTR2>
)
));
$client->createEvent(array(
'event' => '$set',
'entityType' => 'user',
'entityId' => <USER ID>,
'properties' => array(
'plan' => <VALUE OF PLAN>
)
));
?>
```
</div>
<div data-tab="Ruby SDK" data-lang="ruby">
```ruby
# You may also set the properties one by one.
client.create_event(
'$set',
'user',
<USER ID>, {
'properties' => {
'attr0' => <VALUE OF ATTR0 (integer)>
}
}
)
client.create_event(
'$set',
'user',
<USER ID>, {
'properties' => {
'attr1' => <VALUE OF ATTR1 (integer)>,
}
}
)
# Etc...
```
</div>
<div data-tab="Java SDK" data-lang="java">
```java
// you may also set the properties one by one
client.createEvent(new Event()
.event("$set")
.entityType("user")
.entityId(<USER ID>)
.property("attr0", <VALUE OF ATTR0>));
client.createEvent(new Event()
.event("$set")
.entityType("user")
.entityId(<USER ID>)
.property("attr1", <VALUE OF ATTR1>)
.property("attr2", <VALUE OF ATTR2>));
client.createEvent(new Event()
.event("$set")
.entityType("user")
.entityId(<USER ID>)
.property("plan", <VALUE OF PLAN>));
```
</div>
</div>
The properties of the `user` can be set, unset, or delete by special events **$set**, **$unset** and **$delete**. Please refer to [Event API](/datacollection/eventapi/#note-about-properties) for more details of using these events.
<%= partial 'shared/quickstart/query_eventserver' %>
### Import More Sample Data
<%= partial 'shared/quickstart/import_sample_data' %>
A Python import script `import_eventserver.py` is provided to import the data to
Event Server using Python SDK. Please upgrade to the latest Python SDK.
<%= partial 'shared/quickstart/install_python_sdk' %>
Make sure you are under the `MyClassification` directory. Execute the following to import the data:
```
$ cd MyClassification
$ python data/import_eventserver.py --access_key $ACCESS_KEY
```
You should see the following output:
```
Importing data...
6 events are imported.
```
Now the training data is stored as events inside the Event Store.
<%= partial 'shared/quickstart/query_eventserver_short' %>
## 5. Deploy the Engine as a Service
<%= partial 'shared/quickstart/deploy_enginejson', locals: { engine_name: 'MyClassification' } %>
<%= partial 'shared/quickstart/deploy', locals: { engine_name: 'MyClassification' } %>
## 6. Use the Engine
Now, You can try to retrieve predicted results. For example, to predict the
label (i.e. *plan* in this case) of a user with attr0=2, attr1=0 and attr2=0,
you send this JSON `{ "attr0":2, "attr1":0, "attr2":0 }` to the deployed engine and it will
return a JSON of the predicted plan. Simply send a query by making a HTTP
request or through the `EngineClient` of an SDK.
With the deployed engine running, open another terminal and run the following `curl` command or use SDK to send the query:
<div class="tabs">
<div data-tab="REST API" data-lang="bash">
```bash
$ curl -H "Content-Type: application/json" \
-d '{ "attr0":2, "attr1":0, "attr2":0 }' http://localhost:8000/queries.json
```
</div>
<div data-tab="Python SDK" data-lang="python">
```python
import predictionio
engine_client = predictionio.EngineClient(url="http://localhost:8000")
print engine_client.send_query({"attr0":2, "attr1":0, "attr2":0})
```
</div>
<div data-tab="PHP SDK" data-lang="php">
```php
<?php
require_once("vendor/autoload.php");
use predictionio\EngineClient;
$client = new EngineClient('http://localhost:8000');
$response = $client->sendQuery(array('attr0'=> 2, 'attr1' => 0, 'attr2' => 0));
print_r($response);
?>
```
</div>
<div data-tab="Ruby SDK" data-lang="ruby">
```ruby
# Create client object.
client = PredictionIO::EngineClient.new(<ENGINE DEPLOY URL>)
# Query PredictionIO.
response = client.send_query('attr0' => 2, 'attr1' => 0, 'attr2' => 0)
puts response
```
</div>
<div data-tab="Java SDK" data-lang="java">
```java
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.gson.JsonObject;
import org.apache.predictionio.EngineClient;
EngineClient engineClient = new EngineClient(<ENGINE DEPLOY URL>);
JsonObject response = engineClient.sendQuery(ImmutableMap.<String, Object>of(
"attr0", 2,
"attr1", 0,
"attr2", 0
));
```
</div>
</div>
WARNING: The Query format is changed since version v0.3.1. If you are using old Classification template version v0.3.0 or earlier, the query format is array of feature values instead: `{ "features": [2, 0, 0] } `.
The following is sample JSON response:
```
{"label":0.0}
```
Similarly, to predict the label (i.e. *plan* in this case) of a user with
attr0=4, attr1=3 and attr2=8, you send this JSON `{ "attr0": 4, "attr1": 3, "attr2": 8] }` to
the deployed engine and it will return a JSON of the predicted plan.
WARNING: For classification template version v0.3.0 or earlier, the query JSON would be `{ "features": [4, 3, 8] }`.
*MyClassification* is now running.
<%= partial 'shared/quickstart/production' %>
Next, we are going to take a look at the engine
architecture and explain how you can customize it completely.
#### [Next: DASE Components Explained](/templates/classification/dase/)