layout: docs title: Evaluation

Evaluation

In this tutorial, we will demonstrate how to implement an evaluation Metrics component to run Offline Evaluation for the Engine. We will continue to use the Item Recommendation Engine developed in Tutorial1 as example and implement a Metrics which computes the Root Mean Square Error.

Step 1 - Training and Test Set Split

To run Offline Evaluation, we need Training and Test Set data. We will modify DataSource.java to do a random split of the rating data to generate the Test Set. For demonstration purpose, the modified DataSource.java is put under directory tutorial3/`.

Recall that io.prediction.controller.java.LJavaDataSource takes the following type parameters:

public abstract class LJavaDataSource<DSP extends Params,DP,TD,Q,A>
  • DSP: DataSource Parameters class.
  • DP: Data Parameters class. It is used to describe the generated Training Data and the Test Data Query and Actual, which is used by Metrics during evaluation.
  • TD: Training Data class.
  • Q: Input Query class.
  • A: Actual result class.

The Actual result is used by Metrics to compare with Prediciton outputs to compute the score. In this tutorial, the Actual result is also the rating value, which is Float type. Since we don't have any Data Parameters defined, we can simply use Object.

You can find the implementation in tutorial3/DataSource.java:

public class DataSource extends LJavaDataSource<
  DataSourceParams, Object, TrainingData, Query, Float> {
  //...
  @Override
  public Iterable<Tuple3<Object, TrainingData, Iterable<Tuple2<Query, Float>>>> read() {
    // ...
  }
}

As explained in earlier tutorials, the read() method should read data from the source (e.g. database or text file, etc) and return the Training Data (TD) and Test Data (Iterable[(Q, A)]) with a Data Parameters (DP) associated with this Training and Test Data Set.

Note that the read() method's return type is Iterable because it could return one or more of Training and Test Data Set. For example, we may want to evaluate the engine with multiple iterations of random training and test data split. In this case, each set corresponds to one such random split.

Note that the Test Data is actually an Iterable of input Query and Actual result. During evaluation, PredictionIO sends the Query to the engine and retrieve Prediction output, which will be evaluated against the Actual result by the Metric.

Step 2 - Metrics

We will implement a Root Mean Square Error (RMSE) metric. You can find the implementation in Metrics.java. The Metrics extends io.prediction.controller.java.JavaMetrics, which requires the following type parameters:

public abstract class JavaMetrics<MP extends Params,DP,Q,P,A,MU,MR,MMR>
  • MP: Metrics Parameters class.
  • DP: Data Parameters class.
  • Q: Input Query class.
  • P: Prediction output class.
  • A: Actual result class.
  • MU: Metric Unit class.
  • MR: Metric Result class.
  • MMR: Multiple Metric Result class.

and overrides the following methods:

public abstract MU computeUnit(Q query, P predicted, A actual)

public abstract MR computeSet(DP dataParams, Iterable<MU> metricUnits)

public abstract MMR computeMultipleSets(Iterable<scala.Tuple2<DP,MR>> input)

The method computeUnit() computes the Metric Unit (MU) for each Prediction and Actual results of the input Query.

For this RMSE metric, computeUnit() returns the square error of each predicted rating and actual rating.

@Override
public Double computeUnit(Query query, Float predicted, Float actual) {
  logger.info("Q: " + query.toString() + " P: " + predicted + " A: " + actual);
  // return squared error
  double error;
  if (predicted.isNaN())
    error = -actual;
  else
    error = predicted - actual;
  return (error * error);
}

The method computeSet() takes all of the Metric Unit (MU) of the same set to compute the Metric Result (MR) for this set.

For this RMSE metric, computeSet() calculates the square root mean of all square errors of the same set and then return it.

@Override
public Double computeSet(Object dataParams, Iterable<Double> metricUnits) {
  double sum = 0.0;
  int count = 0;
  for (double squareError : metricUnits) {
    sum += squareError;
    count += 1;
  }
  return Math.sqrt(sum / count);
}

The method computeMultipleSets() takes the Metric Results of all sets to do a final computation and returns Multiple Metric Result. PredictionIO will display this final Miltiple Metric Result in the terminal.

In this tutorial, it simply combines all Metric Results as String and return it.

@Override
public String computeMultipleSets(
  Iterable<Tuple2<Object, Double>> input) {
  return Arrays.toString(IteratorUtils.toArray(input.iterator()));
}

Step 3 - Run Evaluation

To run evaluation with metric, simply add the Metrics class to the runEngine() in JavaWorkflow.runEngine() (as shown in Runner3.java).

Because our Metrics class doesn't take parameter, EmptyParams class is used.

JavaWorkflow.runEngine(
  (new EngineFactory()).apply(),
  engineParams,
  Metrics.class,
  new EmptyParams(),
  new WorkflowParamsBuilder().batch("MyEngine").verbose(3).build()
);

Execute the following command:

$ cd $PIO_HOME/examples
$ ../bin/pio run io.prediction.examples.java.recommendations.tutorial3.Runner3 -- -- data/test/ratings.csv

where $PIO_HOME is the root directory of the PredictionIO code tree.

You should see the following output when it finishes running.

2014-08-26 22:27:35,471 INFO  SparkContext - Job finished: collect at DebugWorkflow.scala:680, took 0.105194049 s
2014-08-26 22:27:35,720 WARN  APIDebugWorkflow$ - java.lang.String is not a NiceRendering instance.
2014-08-26 22:27:35,731 INFO  APIDebugWorkflow$ - Saved engine instance with ID: ka_oDJuLRnq3qDynEYcRCw

To view the Metric Result (RMSE score), start the dashboard with the pio dashboard command:

$ cd $PIO_HOME/examples
$ ../bin/pio dashboard

Then point your browser to localhost:8000 to view the result. You should see the result

[(null,1.0), (null,3.8078865529319543), (null,1.5811388300841898)]

in the page.

Step 4 - Running with MovieLens 100K data set:

Run the following to fetch the data set. The ml-100k will be downloaded into the data/ directory.

$ cd $PIO_HOME/examples
$ src/main/java/recommendations/fetch.sh

Re-run Runner3 with the ml-100k data set:

$ ../bin/pio run io.prediction.examples.java.recommendations.tutorial3.Runner3 -- -- data/ml-100k/u.data

You should see the following output when it finishes running.

2014-08-26 22:35:41,131 INFO  SparkContext - Job finished: collect at DebugWorkflow.scala:680, took 4.175164176 s
2014-08-26 22:35:41,511 WARN  APIDebugWorkflow$ - java.lang.String is not a NiceRendering instance.
2014-08-26 22:35:41,520 INFO  APIDebugWorkflow$ - Saved engine instance with ID: XG2sfCeXQ4WY2W1pXNSPCg

To view the Metric Result (RMSE score), start the dashboard with the pio dashboard command:

$ cd $PIO_HOME/examples
$ ../bin/pio dashboard

Then point your browser to http://localhost:9000 to view the result. You should see the result

[(null,1.052046904037191), (null,1.042766938101085), (null,1.0490312745374106)]

in the page.

Up to this point, you should be familiar with basic components of PredictionIO (DataSource, Algorithm and Metrics) and know how to develop your algorithms and prediction engines, deploy them and serve real time prediction queries.

In the next tutorial, we will demonstrate how to use Preparator to do pre-processing of Training Data for the Algorithm, incorporate multiple Algorithms into the Engine and create a custom Serving component.

Next: Combining Multiple Algorithms at Serving