This example engine is based on Classification Tempplate version v0.1.1 and is modified to use Random Forest algorithm and demonstrates how to use custom attributes for classification.
Please refer to http://docs.prediction.io/templates/classification/quickstart/
predict the service plan (plan) a user will subscribe to based on his 3 properties: Gender, Age and Education
Prepared below input data in the format “Plan, Gender, Age, Education”
0,Male,30,College 0,Female,29,College 1,Male,30,High School 1,Female,28,High School 2,Male,35,No School 2,Male,40,No School 2,Female,25,No School 0,Male,40,College
Input data is available in data/sample_random_forest_data.txt
Updated “import_eventserver.py” withe below changes
parser.add_argument('--file', default="./data/sample_random_forest_data.txt")
client.create_event( event="$set", entity_type="user", entity_id=str(count), # use the count num as user ID properties= { "gender" : data[1], "age" : int(data[2]), "education" : data[3], "plan" : int(data[0]) } )
In the engine.json, removed “naive” algorithm and added “randomforest” algorithm.
"algorithms": [ { "name": "randomforest", "params": { "numClasses": 3, "numTrees": 5, "featureSubsetStrategy": "auto", "impurity": "gini", "maxDepth": 4, "maxBins": 100 } } ]
val gendersMap = Map("Male" -> 0.0, "Female" -> 1.0) val educationMap = Map("No School" -> 0.0,"High School" -> 1.0,"College" -> 2.0)
Then encoded the categorical features values using map.
LabeledPoint(properties.get[Double]("plan"), Vectors.dense(Array( gendersMap(properties.get[String]("gender")), properties.get[Double]("age"), educationMap(properties.get[String]("education")) )) )
class TrainingData( val labeledPoints: RDD[LabeledPoint], val gendersMap: Map[String,Double], val educationMap: Map[String,Double] ) extends Serializable
readTraining returns below:
new TrainingData(labeledPoints, gendersMap, educationMap)
In the Engine.scala, replaced “naive” algorithm with “randomforest” algorithm
Map("randomforest" -> classOf[RandomForestAlgorithm]),
Updated Query.sclaa to include attributes
class Query( val gender: String, val age: Int, val education: String ) extends Serializable
added attributes to PreparedData
class PreparedData( val labeledPoints: RDD[LabeledPoint], val gendersMap: Map[String,Double], val educationMap: Map[String,Double] ) extends Serializable
new PreparedData(trainingData.labeledPoints,trainingData.gendersMap,trainingData.educationMap)
created new model class
class PIORandomForestModel( val gendersMap: Map[String, Double], val educationMap: Map[String, Double], val randomForestModel: RandomForestModel ) extends Serializable
train method returns new model class
new PIORandomForestModel( gendersMap = data.gendersMap, educationMap = data.educationMap, randomForestModel = m )
Predict method implementation
def predict( model: PIORandomForestModel, // CHANGED query: Query): PredictedResult = { val gendersMap = model.gendersMap val educationMap = model.educationMap val randomForestModel = model.randomForestModel val label = randomForestModel.predict(Vectors.dense(Array(gendersMap(query.gender),query.age.toDouble,educationMap(query.education)))) new PredictedResult(label) }
curl -H "Content-Type: application/json" -d '{ "gender":"Male", "age":30, "education":"College" }' http://localhost:8000/queries.json curl -H "Content-Type: application/json" -d '{ "gender":"Female", "age":35, "education":"No School" }' http://localhost:8000/queries.json