docs/_docs/machine-learning/model-selection/evaluator.adoc - ignite - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one or more
 // contributor license agreements.  See the NOTICE file distributed with
 // this work for additional information regarding copyright ownership.
 // The ASF licenses this file to You under the Apache License, Version 2.0
 // (the "License"); you may not use this file except in compliance with
 // the License.  You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 = Evaluator

 Apache Ignite ML comes with a number of machine learning algorithms that can be used to learn from and make predictions on data. When these algorithms are applied to build machine learning models, there is a need to evaluate the performance of the model on some criteria, which depends on the application and its requirements. Apache Ignite ML also provides a suite of classification and regression metrics for the purpose of evaluating the performance of machine learning models.

 == Classification model evaluation

 While there are many different types of classification algorithms, the evaluation of classification models all share similar principles. In a supervised classification problem, there exists a true output and a model-generated predicted output for each data point. For this reason, the results for each data point can be assigned to one of four categories:

 * True Positive (TP) - label is positive and prediction is also positive
 * True Negative (TN) - label is negative and prediction is also negative
 * False Positive (FP) - label is negative but prediction is positive
 * False Negative (FN) - label is positive but prediction is negative

 Especially, these metrics are important for binary classification.

 CAUTION: Multiclass classification evalution is not supported yet in Apache Ignite ML.

 The full list of binary classification metrics supported in Apache Ignite ML is next:

 * Accuracy
 * Balanced accuracy
 * F-Measure
 * FallOut
 * FN
 * FP
 * FDR
 * MissRate
 * NPV
 * Precision
 * Recall
 * Specificity
 * TN
 * TP

 The explanation and formulas for these metrics can be found https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers[here].


 [source, java]
 ----
 // Define the vectorizer.
 Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
    .labeled(Vectorizer.LabelCoordinate.FIRST);

 // Define the trainer.
 SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer();

 // Train the model.
 SVMLinearClassificationModel mdl = trainer.fit(ignite, dataCache, vectorizer);

 // Calculate all classification metrics.
 EvaluationResult res = Evaluator
   .evaluateBinaryClassification(dataCache, mdl, vectorizer);

 double accuracy = res.get(MetricName.ACCURACY)
 ----


 == Regression model evaluation

 Regression analysis is used when predicting a continuous output variable from a number of independent variables.

 The full list of regression metrics supported in Apache Ignite ML is as follows:

 * MAE
 * R2
 * RMSE
 * RSS
 * MSE


 [source, java]
 ----
 // Define the vectorizer.
 Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
    .labeled(Vectorizer.LabelCoordinate.FIRST);

 // Define the trainer.
 KNNRegressionTrainer trainer = new KNNRegressionTrainer()
     .withK(5)
     .withDistanceMeasure(new ManhattanDistance())
     .withIdxType(SpatialIndexType.BALL_TREE)
     .withWeighted(true);

 // Train the model.
 KNNRegressionModel knnMdl = trainer.fit(ignite, dataCache, vectorizer);

 // Calculate all classification metrics.
 EvaluationResult res = Evaluator
   .evaluateRegression(dataCache, mdl, vectorizer);

 double mse = res.get(MetricName.MSE);
 ----
	// Licensed to the Apache Software Foundation (ASF) under one or more
	// contributor license agreements. See the NOTICE file distributed with
	// this work for additional information regarding copyright ownership.
	// The ASF licenses this file to You under the Apache License, Version 2.0
	// (the "License"); you may not use this file except in compliance with
	// the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.
	= Evaluator

	Apache Ignite ML comes with a number of machine learning algorithms that can be used to learn from and make predictions on data. When these algorithms are applied to build machine learning models, there is a need to evaluate the performance of the model on some criteria, which depends on the application and its requirements. Apache Ignite ML also provides a suite of classification and regression metrics for the purpose of evaluating the performance of machine learning models.

	== Classification model evaluation

	While there are many different types of classification algorithms, the evaluation of classification models all share similar principles. In a supervised classification problem, there exists a true output and a model-generated predicted output for each data point. For this reason, the results for each data point can be assigned to one of four categories:

	* True Positive (TP) - label is positive and prediction is also positive
	* True Negative (TN) - label is negative and prediction is also negative
	* False Positive (FP) - label is negative but prediction is positive
	* False Negative (FN) - label is positive but prediction is negative

	Especially, these metrics are important for binary classification.

	CAUTION: Multiclass classification evalution is not supported yet in Apache Ignite ML.

	The full list of binary classification metrics supported in Apache Ignite ML is next:

	* Accuracy
	* Balanced accuracy
	* F-Measure
	* FallOut
	* FN
	* FP
	* FDR
	* MissRate
	* NPV
	* Precision
	* Recall
	* Specificity
	* TN
	* TP

	The explanation and formulas for these metrics can be found https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers[here].


	[source, java]
	----
	// Define the vectorizer.
	Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
	.labeled(Vectorizer.LabelCoordinate.FIRST);

	// Define the trainer.
	SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer();

	// Train the model.
	SVMLinearClassificationModel mdl = trainer.fit(ignite, dataCache, vectorizer);

	// Calculate all classification metrics.
	EvaluationResult res = Evaluator
	.evaluateBinaryClassification(dataCache, mdl, vectorizer);

	double accuracy = res.get(MetricName.ACCURACY)
	----


	== Regression model evaluation

	Regression analysis is used when predicting a continuous output variable from a number of independent variables.

	The full list of regression metrics supported in Apache Ignite ML is as follows:

	* MAE
	* R2
	* RMSE
	* RSS
	* MSE


	[source, java]
	----
	// Define the vectorizer.
	Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
	.labeled(Vectorizer.LabelCoordinate.FIRST);

	// Define the trainer.
	KNNRegressionTrainer trainer = new KNNRegressionTrainer()
	.withK(5)
	.withDistanceMeasure(new ManhattanDistance())
	.withIdxType(SpatialIndexType.BALL_TREE)
	.withWeighted(true);

	// Train the model.
	KNNRegressionModel knnMdl = trainer.fit(ignite, dataCache, vectorizer);

	// Calculate all classification metrics.
	EvaluationResult res = Evaluator
	.evaluateRegression(dataCache, mdl, vectorizer);

	double mse = res.get(MetricName.MSE);
	----