Let's say you want to supply a backList for each query to exclude some items from recommendation (For example, in the browsing session, the user just added some items to shopping cart, or you have a list of items you want to filter out, you may want to supply blackList in Query). This how-to will demonstrate how you can do it.
You can find the complete modified source code here.
Note that you may also use E-Commerce Recommendation Template which supports this feature by default.
If you are looking for filtering out items based on the specific user-to-item events logged by EventServer (eg. filter all items which the user has “buy” events on), you can use the E-Commerce Recommendation Template. Please refer to the algorithm parameters “unseenOnly” and “seenEvents” of the E-Commerce Recommenation Template.
First of all we need to specify query parameter to send items ids that the user has already seen. Lets modify case class Query
in MyRecommendation/src/main/scala/Engine.scala:
case class Query( user: String, num: Int, blackList: Set[String] // ADDED )
Then we need to change the code that computes recommendation score to filter out the seen items. Lets modify class MyRecommendation/src/main/scala/ALSModel.scala. Just add the following two methods to that class.
import com.github.fommil.netlib.BLAS.{getInstance => blas} // ADDED ... // ADDED def recommendProductsWithFilter(user: Int, num: Int, productIdFilter: Set[Int]) = { val filteredProductFeatures = productFeatures .filter { case (id, _) => !productIdFilter.contains(id) } // (*) recommend(userFeatures.lookup(user).head, filteredProductFeatures, num) .map(t => Rating(user, t._1, t._2)) } // ADDED private def recommend( recommendToFeatures: Array[Double], recommendableFeatures: RDD[(Int, Array[Double])], num: Int): Array[(Int, Double)] = { val scored = recommendableFeatures.map { case (id, features) => (id, blas.ddot(features.length, recommendToFeatures, 1, features, 1)) } scored.top(num)(Ordering.by(_._2)) } ...
Please make attention that method recommend
is the copy of method org.apache.spark.mllib.recommendation.MatrixFactorizationModel#recommend
. We can't reuse this because it’s private. Method recommendProductsWithFilter
is the almost full copy of org.apache.spark.mllib.recommendation.MatrixFactorizationModel#recommendProducts
method. The difference only is the line with commentary ‘(*)’ where we apply filtering.
Next we need to invoke our new method with filtering when we query recommendations. Lets modify method predict
in MyRecommendation/src/main/scala/ALSAlgorithm.scala:
def predict(model: ALSModel, query: Query): PredictedResult = { // Convert String ID to Int index for Mllib model.userStringIntMap.get(query.user).map { userInt => // create inverse view of itemStringIntMap val itemIntStringMap = model.itemStringIntMap.inverse // recommendProductsWithFilter() returns Array[MLlibRating], which uses item Int // index. Convert it to String ID for returning PredictedResult val blackList = query.blackList.flatMap(model.itemStringIntMap.get) // ADDED val itemScores = model .recommendProductsWithFilter(userInt, query.num, blackList) // MODIFIED .map (r => ItemScore(itemIntStringMap(r.product), r.rating)) PredictedResult(itemScores) }.getOrElse{ logger.info(s"No prediction for unknown user ${query.user}.") PredictedResult(Array.empty) } }
Then we can build/train/deploy the engine and test the result:
The query
curl \ -H "Content-Type: application/json" \ -d '{ "user": "1", "num": 4 }' \ http://localhost:8000/queries.json
will return the result
{ "itemScores": [{ "item": "32", "score": 13.405593705856901 }, { "item": "90", "score": 10.980439687813178 }, { "item": "75", "score": 10.748973860065737 }, { "item": "1", "score": 9.769636099226231 }] }
Lets say that the user has seen the 32
item.
curl \ -H "Content-Type: application/json" \ -d '{ "user": "1", "num": 4, "blackList": ["32"] }' \ http://localhost:8000/queries.json
will return the result
{ "itemScores": [{ "item": "90", "score": 10.980439687813178 }, { "item": "75", "score": 10.748973860065737 }, { "item": "1", "score": 9.769636099226231 }, { "item": "49", "score": 8.653951817512265 }] }
without item 32
.