blob: 0660f41c65c30be82b28f9fd9bdc54cd3ba7b713 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Evaluator
Apache Ignite ML comes with a number of machine learning algorithms that can be used to learn from and make predictions on data. When these algorithms are applied to build machine learning models, there is a need to evaluate the performance of the model on some criteria, which depends on the application and its requirements. Apache Ignite ML also provides a suite of classification and regression metrics for the purpose of evaluating the performance of machine learning models.
== Classification model evaluation
While there are many different types of classification algorithms, the evaluation of classification models all share similar principles. In a supervised classification problem, there exists a true output and a model-generated predicted output for each data point. For this reason, the results for each data point can be assigned to one of four categories:
* True Positive (TP) - label is positive and prediction is also positive
* True Negative (TN) - label is negative and prediction is also negative
* False Positive (FP) - label is negative but prediction is positive
* False Negative (FN) - label is positive but prediction is negative
Especially, these metrics are important for binary classification.
CAUTION: Multiclass classification evalution is not supported yet in Apache Ignite ML.
The full list of binary classification metrics supported in Apache Ignite ML is next:
* Accuracy
* Balanced accuracy
* F-Measure
* FallOut
* FN
* FP
* FDR
* MissRate
* NPV
* Precision
* Recall
* Specificity
* TN
* TP
The explanation and formulas for these metrics can be found https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers[here].
[source, java]
----
// Define the vectorizer.
Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
.labeled(Vectorizer.LabelCoordinate.FIRST);
// Define the trainer.
SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer();
// Train the model.
SVMLinearClassificationModel mdl = trainer.fit(ignite, dataCache, vectorizer);
// Calculate all classification metrics.
EvaluationResult res = Evaluator
.evaluateBinaryClassification(dataCache, mdl, vectorizer);
double accuracy = res.get(MetricName.ACCURACY)
----
== Regression model evaluation
Regression analysis is used when predicting a continuous output variable from a number of independent variables.
The full list of regression metrics supported in Apache Ignite ML is as follows:
* MAE
* R2
* RMSE
* RSS
* MSE
[source, java]
----
// Define the vectorizer.
Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>()
.labeled(Vectorizer.LabelCoordinate.FIRST);
// Define the trainer.
KNNRegressionTrainer trainer = new KNNRegressionTrainer()
.withK(5)
.withDistanceMeasure(new ManhattanDistance())
.withIdxType(SpatialIndexType.BALL_TREE)
.withWeighted(true);
// Train the model.
KNNRegressionModel knnMdl = trainer.fit(ignite, dataCache, vectorizer);
// Calculate all classification metrics.
EvaluationResult res = Evaluator
.evaluateRegression(dataCache, mdl, vectorizer);
double mse = res.get(MetricName.MSE);
----