| --- |
| title: Building Evaluation Metrics |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| PredictionIO enables developer to implement evaluation custom evaluation |
| metric with just a few lines of code. |
| We illustrate it with [the classification |
| template](/templates/classification/quickstart/). |
| |
| ## Overview |
| |
| A simplistic form of metric is a function which takes a |
| `(Query, PredictedResult, ActualResult)`-tuple (*QPA-tuple*) as input |
| and return a score. |
| Exploiting this properties allows us to implement custom metric with a single |
| line of code (plus some boilerplates). We demonstrate this with two metrics: |
| accuracy and precision. |
| |
| <!-- |
| (Note: This simple form may not be able to handle metrics which require |
| multi-stage computation, for example root-mean-square-error.) |
| --> |
| |
| |
| ## Example 1: Accuracy Metric |
| |
| Accuracy is a metric capturing |
| the portion of correct prediction among all test data points. A way |
| to model this is for each correct QPA-tuple, we give a score of 1.0 and |
| otherwise 0.0, then we take an average of all tuple scores. |
| |
| PredictionIO has a [[AverageMetric]] helper class which provides this feature. |
| This class takes 4 type parameters, [[EvalInfo]], [[Query]], |
| [[PredictedResult]], and |
| [[ActualResult]], these types can be found from the engine's signature. |
| Line 5 below is the custom calculation. |
| |
| ```scala |
| case class Accuracy |
| extends AverageMetric[EmptyEvaluationInfo, Query, PredictedResult, ActualResult] { |
| def calculate(query: Query, predicted: PredictedResult, actual: ActualResult) |
| : Double = |
| (if (predicted.label == actual.label) 1.0 else 0.0) |
| } |
| ``` |
| |
| Once we define a metric, we tell PredictionIO we are using it in the `Evaluation` |
| object. We can run the following command to kick start the evaluation. |
| |
| ``` |
| $ pio build |
| ... |
| $ pio eval org.example.classification.AccuracyEvaluation org.example.classification.EngineParamsList |
| ... |
| ``` |
| |
| (See MyClassification/src/main/scala/***Evaluation.scala*** for full usage.) |
| |
| |
| ## Example 2: Precision Metric |
| |
| Precision is a metric for binary classifier |
| capturing the portion of correction prediction among |
| all *positive* predictions. |
| We don't care about the cases where the QPA-tuple gives a negative prediction. |
| (Recall that a binary classifier only provide two output values: *positive* and |
| *negative*.) |
| The following table illustrates all four cases: |
| |
| | PredictedResult | ActualResult | Value | |
| | :----: | :----: | :----: | |
| | Positive | Positive | 1.0 | |
| | Positive | Negative | 0.0 | |
| | Negative | Positive | Don't care | |
| | Negative | Negative | Don't care | |
| |
| Calculating the precision metric is a slightly more involved procedure than |
| calculating the accuracy metric as we have to specially handle the *don't care* |
| negative cases. |
| |
| PredictionIO provides a helper class `OptionAverageMetric` allows user to |
| specify *don't care* values as `None`. It only aggregates the non-None values. |
| Lines 3 to 4 is the method signature of `calcuate` method. The key difference |
| is that the return value is a `Option[Double]`, in contrast to `Double` for |
| `AverageMetric`. This class only computes the average of `Some(.)` results. |
| Lines 5 to 13 are the actual logic. The first `if` factors out the |
| positively predicted case, and the computation is similiar to the accuracy |
| metric. The negatively predicted case are the *don't cares*, which we return |
| `None`. |
| |
| ```scala |
| case class Precision(label: Double) |
| extends OptionAverageMetric[EmptyEvaluationInfo, Query, PredictedResult, ActualResult] { |
| def calculate(query: Query, predicted: PredictedResult, actual: ActualResult) |
| : Option[Double] = { |
| if (predicted.label == label) { |
| if (predicted.label == actual.label) { |
| Some(1.0) // True positive |
| } else { |
| Some(0.0) // False positive |
| } |
| } else { |
| None // Unrelated case for calcuating precision |
| } |
| } |
| } |
| ``` |
| |
| We define a new `Evaluation` object to tell PredictionIO how to use this |
| new precision metric. |
| |
| ``` |
| object PrecisionEvaluation extends Evaluation { |
| engineMetric = (ClassificationEngine(), new Precision(label = 1.0)) |
| } |
| ``` |
| |
| We can kickstarts the evaluation with the following command, notice that |
| we are reusing the same engine params list as before. This address the |
| separation of concern when we conduct hyperparameter tuning. |
| |
| ``` |
| $ pio build |
| ... |
| $ pio eval org.example.classification.PrecisionEvaluation org.example.classification.EngineParamsList |
| ... |
| [INFO] [CoreWorkflow$] Starting evaluation instance ID: SMhzYbJ9QgKkD0fQzTA7MA |
| ... |
| [INFO] [MetricEvaluator] Iteration 0 |
| [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":10.0}}],"servingParams":{"":{}}} |
| [INFO] [MetricEvaluator] Result: MetricScores(0.8846153846153846,List()) |
| [INFO] [MetricEvaluator] Iteration 1 |
| [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":100.0}}],"servingParams":{"":{}}} |
| [INFO] [MetricEvaluator] Result: MetricScores(0.7936507936507936,List()) |
| [INFO] [MetricEvaluator] Iteration 2 |
| [INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appId":19,"evalK":5}},"preparatorParams":{"":{}},"algorithmParamsList":[{"naive":{"lambda":1000.0}}],"servingParams":{"":{}}} |
| [INFO] [MetricEvaluator] Result: MetricScores(0.37593984962406013,List()) |
| [INFO] [CoreWorkflow$] Updating evaluation instance with result: MetricEvaluatorResult: |
| # engine params evaluated: 3 |
| Optimal Engine Params: |
| { |
| "dataSourceParams":{ |
| "":{ |
| "appId":19, |
| "evalK":5 |
| } |
| }, |
| "preparatorParams":{ |
| "":{ |
| |
| } |
| }, |
| "algorithmParamsList":[ |
| { |
| "naive":{ |
| "lambda":10.0 |
| } |
| } |
| ], |
| "servingParams":{ |
| "":{ |
| |
| } |
| } |
| } |
| Metrics: |
| org.example.classification.Precision: 0.8846153846153846 |
| ``` |
| |
| (See MyClassification/src/main/scala/***PrecisionEvaluation.scala*** for |
| the full usage.) |