| // Licensed to the Apache Software Foundation (ASF) under one or more |
| // contributor license agreements. See the NOTICE file distributed with |
| // this work for additional information regarding copyright ownership. |
| // The ASF licenses this file to You under the Apache License, Version 2.0 |
| // (the "License"); you may not use this file except in compliance with |
| // the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| = Evaluator |
| |
| Apache Ignite ML comes with a number of machine learning algorithms that can be used to learn from and make predictions on data. When these algorithms are applied to build machine learning models, there is a need to evaluate the performance of the model on some criteria, which depends on the application and its requirements. Apache Ignite ML also provides a suite of classification and regression metrics for the purpose of evaluating the performance of machine learning models. |
| |
| == Classification model evaluation |
| |
| While there are many different types of classification algorithms, the evaluation of classification models all share similar principles. In a supervised classification problem, there exists a true output and a model-generated predicted output for each data point. For this reason, the results for each data point can be assigned to one of four categories: |
| |
| * True Positive (TP) - label is positive and prediction is also positive |
| * True Negative (TN) - label is negative and prediction is also negative |
| * False Positive (FP) - label is negative but prediction is positive |
| * False Negative (FN) - label is positive but prediction is negative |
| |
| Especially, these metrics are important for binary classification. |
| |
| CAUTION: Multiclass classification evalution is not supported yet in Apache Ignite ML. |
| |
| The full list of binary classification metrics supported in Apache Ignite ML is next: |
| |
| * Accuracy |
| * Balanced accuracy |
| * F-Measure |
| * FallOut |
| * FN |
| * FP |
| * FDR |
| * MissRate |
| * NPV |
| * Precision |
| * Recall |
| * Specificity |
| * TN |
| * TP |
| |
| The explanation and formulas for these metrics can be found https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers[here]. |
| |
| |
| [source, java] |
| ---- |
| // Define the vectorizer. |
| Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>() |
| .labeled(Vectorizer.LabelCoordinate.FIRST); |
| |
| // Define the trainer. |
| SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer(); |
| |
| // Train the model. |
| SVMLinearClassificationModel mdl = trainer.fit(ignite, dataCache, vectorizer); |
| |
| // Calculate all classification metrics. |
| EvaluationResult res = Evaluator |
| .evaluateBinaryClassification(dataCache, mdl, vectorizer); |
| |
| double accuracy = res.get(MetricName.ACCURACY) |
| ---- |
| |
| |
| == Regression model evaluation |
| |
| Regression analysis is used when predicting a continuous output variable from a number of independent variables. |
| |
| The full list of regression metrics supported in Apache Ignite ML is as follows: |
| |
| * MAE |
| * R2 |
| * RMSE |
| * RSS |
| * MSE |
| |
| |
| [source, java] |
| ---- |
| // Define the vectorizer. |
| Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>() |
| .labeled(Vectorizer.LabelCoordinate.FIRST); |
| |
| // Define the trainer. |
| KNNRegressionTrainer trainer = new KNNRegressionTrainer() |
| .withK(5) |
| .withDistanceMeasure(new ManhattanDistance()) |
| .withIdxType(SpatialIndexType.BALL_TREE) |
| .withWeighted(true); |
| |
| // Train the model. |
| KNNRegressionModel knnMdl = trainer.fit(ignite, dataCache, vectorizer); |
| |
| // Calculate all classification metrics. |
| EvaluationResult res = Evaluator |
| .evaluateRegression(dataCache, mdl, vectorizer); |
| |
| double mse = res.get(MetricName.MSE); |
| ---- |
| |