layout: docs title: Testing Engine Components

Testing Engine Components

During development, you may want to run each component step by step to test out the data pipeline. In this tutorial, we will demonstrate how to do it easily.

Test Run DataSource

In src/main/java/recommendations/tutorial2, you can find Runner1.java. It is a small program that uses JavaSimpleEngineBuilder to build an engine and uses JavaWorkflow to run the workflow.

To test the DataSource component, we can simply create an Engine with the DataSource component only and leave other components empty:

private static class HalfBakedEngineFactory implements IJavaEngineFactory {
  public JavaSimpleEngine<TrainingData, Object, Query, Float, Object> apply() {
    return new JavaSimpleEngineBuilder<
      TrainingData, Object, Query, Float, Object> ()
      .dataSourceClass(DataSource.class)
      .build();
  }
}

Similarly, we only need to add the DataSourceParams to JavaEngineParamsBuilder.

JavaEngineParams engineParams = new JavaEngineParamsBuilder()
  .dataSourceParams(new DataSourceParams(filePath))
  .build();

Then, you can run this Engine by using JavaWorkflow.

    JavaWorkflow.runEngine(
      (new HalfBakedEngineFactory()).apply(),
      engineParams,
      null,
      new EmptyParams(),
      new WorkflowParamsBuilder().batch("MyEngine").verbose(3).build()
    );

For quick testing purpose, a very simple test data is provided in data/test/ratings.csv. Each row of the file represents user ID, item ID, and the rating value:

1,1,2
1,2,3
1,3,4
...

The Runner1.java takes the path of the rating file as argument. Execute the following command to run (The ../bin/pio run command will automatically compile and package the JARs):

$ cd $PIO_HOME/examples
$ ../bin/pio run io.prediction.examples.java.recommendations.tutorial2.Runner1 -- -- data/test/ratings.csv

where $PIO_HOME is the root directory of the PredictionIO code tree. The -- is to separate parameters passed to pio run (the Runner1 class in this case`), parameters passed to Apache Spark (no special parameters in this case), and parameters passed to the main class (the CSV file in this case).

If it runs successfully, you should see the following console output at the end. It prints out the TrainingData generated by DataSource.

2014-08-05 15:24:40,140 INFO  SparkContext - Job finished: collect at DebugWorkflow.scala:411, took 0.022947 s
2014-08-05 15:24:40,141 INFO  APIDebugWorkflow$ - Data Set 0
2014-08-05 15:24:40,142 INFO  APIDebugWorkflow$ - Params: null
2014-08-05 15:24:40,142 INFO  APIDebugWorkflow$ - TrainingData:
2014-08-05 15:24:40,142 INFO  APIDebugWorkflow$ - [[(1,1,2.0), (1,2,3.0), (1,3,4.0), (2,3,4.0), (2,4,1.0), (3,2,2.0), (3,3,1.0), (3,4,3.0), (4,1,5.0), (4,2,3.0), (4,4,2.0)]]
2014-08-05 15:24:40,143 INFO  APIDebugWorkflow$ - TestingData: (count=0)
2014-08-05 15:24:40,143 INFO  APIDebugWorkflow$ - Data source complete
2014-08-05 15:24:40,143 INFO  APIDebugWorkflow$ - Preparator is null. Stop here

As you can see, it stops after running the DataSource component and it prints out Training Data for debugging.

Test Run Algorithm

By simply adding addAlgorithmClass() and addAlgorithmParams() in the JavaSimpleEngineBuilder and JavaEngineParamsBuilder, you can test the Algorithm class in the workflow as well, as shown in Runner2.java:

private static class HalfBakedEngineFactory implements IJavaEngineFactory {
  public JavaSimpleEngine<TrainingData, Object, Query, Float, Object> apply() {
    return new JavaSimpleEngineBuilder<
      TrainingData, Object, Query, Float, Object> ()
      .dataSourceClass(DataSource.class)
      .preparatorClass() // Use default Preparator
      .addAlgorithmClass("MyRecommendationAlgo", Algorithm.class) // Add Algorithm
      .build();
  }
}
JavaEngineParams engineParams = new JavaEngineParamsBuilder()
  .dataSourceParams(new DataSourceParams(filePath))
  .addAlgorithmParams("MyRecommendationAlgo", new AlgoParams(0.2)) // Add Algorithm Params
  .build();

Execute the following command to run:

$ cd $PIO_HOME/examples
$ ../bin/pio run io.prediction.examples.java.recommendations.tutorial2.Runner2 -- -- data/test/ratings.csv

You should see the Model generated by the Algorithm at the end of the console output:

2014-08-26 21:17:28,174 INFO  SparkContext - Job finished: collect at DebugWorkflow.scala:71, took 0.051342917 s
2014-08-26 21:17:28,174 INFO  APIDebugWorkflow$ - [Model: [itemSimilarity: {1=org.apache.commons.math3.linear.OpenMapRealVector@65fa6c0, 2=org.apache.commons.math3.linear.OpenMapRealVector@c2eb7f66, 3=org.apache.commons.math3.linear.OpenMapRealVector@2302395e, 4=org.apache.commons.math3.linear.OpenMapRealVector@d2fb7858}]
[userHistory: {1=org.apache.commons.math3.linear.OpenMapRealVector@5a1123a3, 2=org.apache.commons.math3.linear.OpenMapRealVector@d1225bfd, 3=org.apache.commons.math3.linear.OpenMapRealVector@572123a3, 4=org.apache.commons.math3.linear.OpenMapRealVector@a51523a3}]]
2014-08-26 21:17:28,175 INFO  APIDebugWorkflow$ - Serving is null. Stop here

By adding each component step by step, we can easily test and debug the data pipeline.

Next: Evaluation