This small how-to explains how to add user defined properties to items returned by PredictionIO engine. This how-to is based on the Similar Product Engine Template version v0.1.3 To use this how-to you need to be familiar with scala programming language. In this how-to we also suppose you was able to set up and run Similar Product Engine
(see their quick start guide).
A full end-to-end example can be found on GitHub.
Suppose you would like to use Similar Product Engine for suggesting your users the videos they can also like. The Similar Product Engine
will answer to you with list of IDs for such videos. So, for example REST
response from the engine right now looks like the one below
{"itemScores":[ { "item":"i12", "score":1.1700499715209998 },{ "item":"i44", "score":1.1153550716504106 } ]}
But you want the engine to return more information about every video. Let's think you want add fields title
, date
, and imdbUrl
to every item, so, the resulting REST
respose for your case should look similar to the posted below
{"itemScores":[ { "item":"i12", "title":"title for movie i12", "date":"1935", "imdbUrl":"http://imdb.com/fake-url/i12", "score":1.1700499715209998 },{ "item":"i44", "title":"title for movie i44", "date":"1974", "imdbUrl":"http://imdb.com/fake-url/i44", "score":1.1153550716504106 } ]}
Recall the DASE Architecture, a PredictionIO engine has 4 main components: Data Source
, Data Preparator
, Algorithm
, and Serving
components. To achieve your goal, you will need provide the information about video to engine (using sdk), and then let this information to pass from Data Source
through all the engine to the Serving
component where the engine will send required information back to your application.
In file DataSource.scala#L104 you will find class Item
defined in the next way
case class Item(categories: Option[List[String]])
At the first, we need simply add required fields to this class
case class Item( title: String, date: String, imdbUrl: String, categories: Option[List[String]])
Now, your IDE (or compiler) will say you about all the places where you need make changes to create item properly. For example, DataSource.scala#L52
Item(categories = properties.getOpt[List[String]]("categories"))
You need now to add needed properties to item
Item( title = properties.get[String]("title"), date = properties.get[String]("date"), imdbUrl = properties.get[String]("imdbUrl"), categories = properties.getOpt[List[String]]("categories"))
Now, when you've fixed item creation, take a look on class ItemScore
from the file Engine.scala
case class ItemScore( item: String, score: Double ) extends Serializable
Engine will return class PredictedResult
which contains property itemScores: Array[ItemScore]
. So, since your result items are of classItemScore
, you need modify this class too. In our example after modification you will have something similar to below
case class ItemScore( item: String, title: String, date: String, imdbUrl: String, score: Double ) extends Serializable
Again, now you need to go through all the places where ItemScore
is created and fix compiler errors.
Result is initially created by the Algorithm
component and then is passed to the Serving
component. Take a look on a place where object of class ItemScore is initially created in file ALSAlgorithm.scala#L171.
new ItemScore( item = model.itemIntStringMap(i), score = s )
You code after changes will be similar to posted below
val it = model.items(i) new ItemScore( item = model.itemIntStringMap(i), title = it.title, date = it.date, imdbUrl = it.imdbUrl, score = s )
Using model.items(i)
you can receive corresponding object of the Item
class, and now you can access its properties which you created during previous step. Using model.itemIntStringMap(i)
you can receive ID of corresponding item.
And this is the final step. You should supply your data to the engine using new format now. To get the idea take a look on this piece of code in our sample python script that creates test.
Creating item before modification.
client.create_event( event="$set", entity_type="item", entity_id=item_id, properties={ "categories" : random.sample(categories, random.randint(1, 4)) } )
Creating item after modification.
client.create_event( event="$set", entity_type="item", entity_id=item_id, properties={ "categories" : random.sample(categories, random.randint(1, 4)), "title": "title for movie " + item_id, "date": 1935 + random.randint(1, 25), "imdbUrl": "http://imdb.com/fake-url/" + item_id } )
When you are ready, don't forget to fill application with new data and then
$ pio build
$ pio train
$ pio deploy
Now, you should be able to see desired results by querying engine
curl -H "Content-Type: application/json" -d '{ "items": ["i1", "i3"], "num": 10}' http://localhost:8000/queries.json
A full end-to-end example can be found on GitHub.