tree: 0c319e3e5b6675afab31322f636dfed2543a044e [path history] [tgz]
  1. data/
  2. project/
  3. src/
  4. build.sbt
  5. engine.json
  6. README.md
  7. template.json
examples/scala-parallel-similarproduct/add-rateevent/README.md

Similar Product Template Modified to Add Explicit Rate Event

This example engine is based on Similar Product Tempplate version v0.1.2 and is modified to add Explicit Rate Event to training data.

For example, An User would rate an item with a score or rating.The rating is used to train the model.

Documentation

Please refer to http://docs.prediction.io/templates/similarproduct/quickstart/

Development Notes

Changes to DataSource.scala

  1. class “Rating” is created.
case class Rating(
  user: String,
  item: String,
  rating: Double,
  t:Long
)
  1. “rateEventsRDD” is initialized by filtering events of type “rate” from Events database as shown below.
   val rateEventsRDD: RDD[RateEvent] = eventsDb.find(
        appId = dsp.appId,
        entityType = Some("user"),
        eventNames = Some(List("rate")...
       
   val rateEvent = try {
        event.event match {
        case "rate" => RateEvent(
        user = event.entityId,
        item = event.targetEntityId.get,
        rating = event.properties.get[Double]("rating")...

Changes to Preparator.scala

  1. val “rateEvents” is added to class “PreparedData”
   class PreparedData(
    val users: RDD[(String, User)],
    val items: RDD[(String, Item)],
    val rateEvents: RDD[RateEvent]
  1. val “rateEvents” is initialized in the Object of class “PreparedData”.
   new PreparedData(
    users = trainingData.users,
    items = trainingData.items,
    rateEvents = trainingData.rateEvents)

Changes to ALSAlgorithm.scala

  1. Changed the Signature of “train” method to include SparkContext and edited method definition to specify the debug message.
   def train(sc:SparkContext ,data: PreparedData): ALSModel = {
    require(!data.rateEvents.take(1).isEmpty,
    s"rateEvents in PreparedData cannot be empty." +
  1. MlibRatings are initialized from rateEvents.
   val mllibRatings = data.rateEvents
  1. Invoke “ALS.train” method to train explicit rate events.
   val m = ALS.train(
    ratings = mllibRatings,
    rank = ap.rank,
    iterations = ap.numIterations,
    lambda = ap.lambda,
    blocks = -1,
    seed = seed)
  1. Define “rateEvent” RDD to filter rate events from events.
   val rateEventsRDD: RDD[RateEvent] = eventsDb.find(...)
  1. if a user may rate same item with different value at different times,use the latest value for this case.
    .reduceByKey { case (v1, v2) => // MODIFIED
       // if a user may rate same item with different value at different times,
       // use the latest value for this case.
       // Can remove this reduceByKey() if no need to support this case.
       val (rating1, t1) = v1
       val (rating2, t2) = v2
       // keep the latest value
       if (t1 > t2) v1 else v2
      }
  1. persist mlibRating.
   .map { case ((u, i), (rating, t)) => // MODIFIED
         // MLlibRating requires integer index for user and item
         MLlibRating(u, i, rating) // MODIFIED
        }.cache()
  1. Add “rateEvent” to class “TrainingData”.
   class TrainingData(
    val users: RDD[(String, User)],
    val items: RDD[(String, Item)],
    val rateEvents: RDD[RateEvent]
 )
  1. Add “rateEvent” to object “TrainingData”.
   new TrainingData(
    users = usersRDD,
    items = itemsRDD,
    rateEvents = rateEventsRDD)