| --- |
| title: Quick Start - Classification Engine Template |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| ## Overview |
| |
| An engine template is an almost-complete implementation of an engine. |
| PredictionIO's Classification Engine Template |
| has integrated **Apache Spark MLlib**'s Naive Bayes algorithm by default. |
| |
| The default use case of Classification Engine Template is to predict the service |
| plan (*plan*) a user will subscribe to based on his 3 properties: *attr0*, |
| *attr1* and *attr2*. |
| |
| You can customize it easily to fit your specific use case and needs. |
| |
| We are going to show you how to create your own classification engine for |
| production use based on this template. |
| |
| ## Usage |
| |
| ### Event Data Requirements |
| |
| By default, the template requires the following events to be collected: |
| |
| - user $set event, which set the attributes of the user |
| |
| NOTE: You can customize to use other event. |
| |
| ### Input Query |
| |
| - individual attributes values (for version >= v0.3.1) |
| |
| WARNING: for version < v0.3.1, it is array of features values |
| |
| ### Output PredictedResult |
| |
| - the predicted label |
| |
| ## 1. Install and Run PredictionIO |
| |
| <%= partial 'shared/quickstart/install' %> |
| |
| ## 2. Create a new Engine from an Engine Template |
| |
| <%= partial 'shared/quickstart/create_engine', locals: { engine_name: 'MyClassification', template_name: 'Classification Engine Template', template_repo: 'apache/predictionio-template-attribute-based-classifier' } %> |
| |
| ## 3. Generate an App ID and Access Key |
| |
| <%= partial 'shared/quickstart/create_app' %> |
| |
| ## 4. Collecting Data |
| |
| Next, let's collect some training data. By default, the Classification Engine Template reads 4 properties of a user record: attr0, attr1, attr2 and plan. This templates requires '$set' user events. |
| |
| INFO: This template can easily be customized to use different or more number of attributes. |
| |
| <%= partial 'shared/quickstart/collect_data' %> |
| |
| To set properties "attr0", "attr1", "attr2" and "plan" for user "u0" on time `2014-11-02T09:39:45.618-08:00` (current time will be used if eventTime is not specified), you can send `$set` event for the user. To send this event, run the following `curl` command: |
| |
| <div class="tabs"> |
| <div data-tab="REST API" data-lang="json"> |
| ``` |
| $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "event" : "$set", |
| "entityType" : "user", |
| "entityId" : "u0", |
| "properties" : { |
| "attr0" : 0, |
| "attr1" : 1, |
| "attr2" : 0, |
| "plan" : 1 |
| } |
| "eventTime" : "2014-11-02T09:39:45.618-08:00" |
| }' |
| ``` |
| </div> |
| <div data-tab="Python SDK" data-lang="python"> |
| ```python |
| import predictionio |
| |
| client = predictionio.EventClient( |
| access_key=<ACCESS KEY>, |
| url=<URL OF EVENTSERVER>, |
| threads=5, |
| qsize=500 |
| ) |
| |
| # Set the 4 properties for a user |
| client.create_event( |
| event="$set", |
| entity_type="user", |
| entity_id=<USER ID>, |
| properties= { |
| "attr0" : int(<VALUE OF ATTR0>), |
| "attr1" : int(<VALUE OF ATTR1>), |
| "attr2" : int(<VALUE OF ATTR2>), |
| "plan" : int(<VALUE OF PLAN>) |
| } |
| ) |
| ``` |
| </div> |
| |
| <div data-tab="PHP SDK" data-lang="php"> |
| ```php |
| <?php |
| require_once("vendor/autoload.php"); |
| use predictionio\EventClient; |
| |
| $client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>); |
| |
| // Set the 4 properties for a user |
| $client->createEvent(array( |
| 'event' => '$set', |
| 'entityType' => 'user', |
| 'entityId' => <USER ID>, |
| 'properties' => array( |
| 'attr0' => <VALUE OF ATTR0>, |
| 'attr1' => <VALUE OF ATTR1>, |
| 'attr2' => <VALUE OF ATTR2>, |
| 'plan' => <VALUE OF PLAN> |
| ) |
| )); |
| ?> |
| ``` |
| </div> |
| <div data-tab="Ruby SDK" data-lang="ruby"> |
| ```ruby |
| # Create a client object. |
| client = PredictionIO::EventClient.new(<ACCESS KEY>, <URL OF EVENTSERVER>) |
| |
| # Set the 4 properties for a user. |
| client.create_event( |
| '$set', |
| 'user', |
| <USER ID>, { |
| 'properties' => { |
| 'attr0' => <VALUE OF ATTR0 (integer)>, |
| 'attr1' => <VALUE OF ATTR1 (integer)>, |
| 'attr2' => <VALUE OF ATTR2 (integer)>, |
| 'plan' => <VALUE OF PLAN (integer)>, |
| } |
| } |
| ) |
| ``` |
| </div> |
| <div data-tab="Java SDK" data-lang="java"> |
| ```java |
| import com.google.common.collect.ImmutableMap; |
| import org.apache.predictionio.Event; |
| import org.apache.predictionio.EventClient; |
| |
| EventClient client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>); |
| |
| // set the 4 properties for a user |
| Event event = new Event() |
| .event("$set") |
| .entityType("user") |
| .entityId(<USER ID>) |
| .properties(ImmutableMap.<String, Object>of( |
| "attr0", <VALUE OF ATTR0>, |
| "attr1", <VALUE OF ATTR1>, |
| "attr2", <VALUE OF ATTR2>, |
| "plan", <VALUE OF PLAN> |
| )); |
| client.createEvent(event); |
| ``` |
| </div> |
| </div> |
| |
| |
| Note that you can also set the properties for the user with multiple `$set` events (They will be aggregated during engine training). |
| |
| To set properties "attr0", "attr1" and "attr2", and "plan" for user "u1" at different time, you can send following `$set` events for the user. To send these events, run the following `curl` command: |
| |
| <div class="tabs"> |
| <div data-tab="REST API" data-lang="json"> |
| ``` |
| $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "event" : "$set", |
| "entityType" : "user", |
| "entityId" : "u1", |
| "properties" : { |
| "attr0" : 0 |
| } |
| "eventTime" : "2014-11-02T09:39:45.618-08:00" |
| }' |
| |
| $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "event" : "$set", |
| "entityType" : "user", |
| "entityId" : "u1", |
| "properties" : { |
| "attr1" : 1, |
| "attr2": 0 |
| } |
| "eventTime" : "2014-11-02T09:39:45.618-08:00" |
| }' |
| |
| $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "event" : "$set", |
| "entityType" : "user", |
| "entityId" : "u1", |
| "properties" : { |
| "plan" : 1 |
| } |
| "eventTime" : "2014-11-02T09:39:45.618-08:00" |
| }' |
| ``` |
| </div> |
| <div data-tab="Python SDK" data-lang="python"> |
| ```python |
| # You may also set the properties one by one |
| client.create_event( |
| event="$set", |
| entity_type="user", |
| entity_id=<USER ID>, |
| properties= { |
| "attr0" : int(<VALUE OF ATTR0>) |
| } |
| ) |
| client.create_event( |
| event="$set", |
| entity_type="user", |
| entity_id=<USER ID>, |
| properties= { |
| "attr1" : int(<VALUE OF ATTR1>), |
| "attr2" : int(<VALUE OF ATTR2>) |
| } |
| ) |
| |
| client.create_event( |
| event="$set", |
| entity_type="user", |
| entity_id=<USER ID>, |
| properties= { |
| "plan" : int(<VALUE OF PLAN>) |
| } |
| ) |
| ``` |
| </div> |
| |
| <div data-tab="PHP SDK" data-lang="php"> |
| ```php |
| <?php |
| |
| // You may also set the properties one by one |
| $client->createEvent(array( |
| 'event' => '$set', |
| 'entityType' => 'user', |
| 'entityId' => <USER ID>, |
| 'properties' => array( |
| 'attr0' => <VALUE OF ATTR0> |
| ) |
| )); |
| |
| $client->createEvent(array( |
| 'event' => '$set', |
| 'entityType' => 'user', |
| 'entityId' => <USER ID>, |
| 'properties' => array( |
| 'attr1' => <VALUE OF ATTR1>, |
| 'attr2' => <VALUE OF ATTR2> |
| ) |
| )); |
| |
| $client->createEvent(array( |
| 'event' => '$set', |
| 'entityType' => 'user', |
| 'entityId' => <USER ID>, |
| 'properties' => array( |
| 'plan' => <VALUE OF PLAN> |
| ) |
| )); |
| |
| ?> |
| ``` |
| </div> |
| <div data-tab="Ruby SDK" data-lang="ruby"> |
| ```ruby |
| # You may also set the properties one by one. |
| client.create_event( |
| '$set', |
| 'user', |
| <USER ID>, { |
| 'properties' => { |
| 'attr0' => <VALUE OF ATTR0 (integer)> |
| } |
| } |
| ) |
| |
| client.create_event( |
| '$set', |
| 'user', |
| <USER ID>, { |
| 'properties' => { |
| 'attr1' => <VALUE OF ATTR1 (integer)>, |
| } |
| } |
| ) |
| |
| # Etc... |
| ``` |
| </div> |
| <div data-tab="Java SDK" data-lang="java"> |
| ```java |
| // you may also set the properties one by one |
| client.createEvent(new Event() |
| .event("$set") |
| .entityType("user") |
| .entityId(<USER ID>) |
| .property("attr0", <VALUE OF ATTR0>)); |
| client.createEvent(new Event() |
| .event("$set") |
| .entityType("user") |
| .entityId(<USER ID>) |
| .property("attr1", <VALUE OF ATTR1>) |
| .property("attr2", <VALUE OF ATTR2>)); |
| client.createEvent(new Event() |
| .event("$set") |
| .entityType("user") |
| .entityId(<USER ID>) |
| .property("plan", <VALUE OF PLAN>)); |
| ``` |
| </div> |
| </div> |
| |
| The properties of the `user` can be set, unset, or delete by special events **$set**, **$unset** and **$delete**. Please refer to [Event API](/datacollection/eventapi/#note-about-properties) for more details of using these events. |
| |
| |
| <%= partial 'shared/quickstart/query_eventserver' %> |
| |
| ### Import More Sample Data |
| |
| <%= partial 'shared/quickstart/import_sample_data' %> |
| |
| A Python import script `import_eventserver.py` is provided to import the data to |
| Event Server using Python SDK. Please upgrade to the latest Python SDK. |
| |
| <%= partial 'shared/quickstart/install_python_sdk' %> |
| |
| Make sure you are under the `MyClassification` directory. Execute the following to import the data: |
| |
| ``` |
| $ cd MyClassification |
| $ python data/import_eventserver.py --access_key $ACCESS_KEY |
| ``` |
| |
| You should see the following output: |
| |
| ``` |
| Importing data... |
| 6 events are imported. |
| ``` |
| |
| Now the training data is stored as events inside the Event Store. |
| |
| <%= partial 'shared/quickstart/query_eventserver_short' %> |
| |
| ## 5. Deploy the Engine as a Service |
| |
| <%= partial 'shared/quickstart/deploy_enginejson', locals: { engine_name: 'MyClassification' } %> |
| |
| <%= partial 'shared/quickstart/deploy', locals: { engine_name: 'MyClassification' } %> |
| |
| ## 6. Use the Engine |
| |
| Now, You can try to retrieve predicted results. For example, to predict the |
| label (i.e. *plan* in this case) of a user with attr0=2, attr1=0 and attr2=0, |
| you send this JSON `{ "attr0":2, "attr1":0, "attr2":0 }` to the deployed engine and it will |
| return a JSON of the predicted plan. Simply send a query by making a HTTP |
| request or through the `EngineClient` of an SDK. |
| |
| With the deployed engine running, open another terminal and run the following `curl` command or use SDK to send the query: |
| |
| <div class="tabs"> |
| <div data-tab="REST API" data-lang="bash"> |
| ```bash |
| $ curl -H "Content-Type: application/json" \ |
| -d '{ "attr0":2, "attr1":0, "attr2":0 }' http://localhost:8000/queries.json |
| |
| ``` |
| </div> |
| <div data-tab="Python SDK" data-lang="python"> |
| ```python |
| import predictionio |
| engine_client = predictionio.EngineClient(url="http://localhost:8000") |
| print engine_client.send_query({"attr0":2, "attr1":0, "attr2":0}) |
| ``` |
| </div> |
| <div data-tab="PHP SDK" data-lang="php"> |
| ```php |
| <?php |
| require_once("vendor/autoload.php"); |
| use predictionio\EngineClient; |
| |
| $client = new EngineClient('http://localhost:8000'); |
| |
| $response = $client->sendQuery(array('attr0'=> 2, 'attr1' => 0, 'attr2' => 0)); |
| print_r($response); |
| |
| ?> |
| ``` |
| </div> |
| <div data-tab="Ruby SDK" data-lang="ruby"> |
| ```ruby |
| # Create client object. |
| client = PredictionIO::EngineClient.new(<ENGINE DEPLOY URL>) |
| |
| # Query PredictionIO. |
| response = client.send_query('attr0' => 2, 'attr1' => 0, 'attr2' => 0) |
| |
| puts response |
| ``` |
| </div> |
| <div data-tab="Java SDK" data-lang="java"> |
| |
| ```java |
| import com.google.common.collect.ImmutableList; |
| import com.google.common.collect.ImmutableMap; |
| import com.google.gson.JsonObject; |
| |
| import org.apache.predictionio.EngineClient; |
| |
| EngineClient engineClient = new EngineClient(<ENGINE DEPLOY URL>); |
| |
| JsonObject response = engineClient.sendQuery(ImmutableMap.<String, Object>of( |
| "attr0", 2, |
| "attr1", 0, |
| "attr2", 0 |
| )); |
| ``` |
| </div> |
| </div> |
| |
| WARNING: The Query format is changed since version v0.3.1. If you are using old Classification template version v0.3.0 or earlier, the query format is array of feature values instead: `{ "features": [2, 0, 0] } `. |
| |
| The following is sample JSON response: |
| |
| ``` |
| {"label":0.0} |
| ``` |
| |
| Similarly, to predict the label (i.e. *plan* in this case) of a user with |
| attr0=4, attr1=3 and attr2=8, you send this JSON `{ "attr0": 4, "attr1": 3, "attr2": 8] }` to |
| the deployed engine and it will return a JSON of the predicted plan. |
| |
| WARNING: For classification template version v0.3.0 or earlier, the query JSON would be `{ "features": [4, 3, 8] }`. |
| |
| *MyClassification* is now running. |
| |
| <%= partial 'shared/quickstart/production' %> |
| |
| Next, we are going to take a look at the engine |
| architecture and explain how you can customize it completely. |
| |
| #### [Next: DASE Components Explained](/templates/classification/dase/) |