blob: 4afa98d0a92d036263910c74b777d054df442c33 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Linear Regression
== Overview
Apache Ignite supports the ordinary least squares Linear Regression algorithm - one of the most basic and powerful machine learning algorithms. This documentation describes how the algorithm works, and is implemented in Apache Ignite.
The basic idea behind the Linear Regression algorithm is an assumption that a dependent variable `y` and an explanatory variable `x` are in the following relationship:
image::images/111.gif[]
WARNING:Be aware that further documentation uses a dot product of vectors `x` and `b`, and explicitly avoids using a constant term. It is mathematically correct in the case where vector `x` is supplemented by one value equal to 1.
The above assumption allows us to make a prediction based on a feature vector `x` if a vector `b` is known. This fact is reflected in Apache Ignite in the `LinearRegressionModel` class responsible for making predictions.
== Model
A Model in the case of linear regression is represented by the class `LinearRegressionModel`. It enables a prediction to be made for a given vector of features, in the following way:
[source, java]
----
LinearRegressionModel model = ...;
double prediction = model.predict(observation);
----
Model is fully independent object and after the training it can be saved, serialized and restored.
== Trainers
Linear Regression is a supervised learning algorithm. This means that to find parameters (vector `b`), we need to train on a training dataset and minimize the loss function:
image::images/222.gif[]
Apache Ignite provides two linear regression trainers: trainer based on the LSQR algorithm and another trainer based on the Stochastic Gradient Descent method.
=== LSQR Trainer
The LSQR algorithm finds the least-squares solution to a large, sparse, linear system of equations. The Apache Ignite implementation is a distributed version of this algorithm.
[source, java]
----
// Create linear regression trainer.
LinearRegressionLSQRTrainer trainer = new LinearRegressionLSQRTrainer();
// Train model.
LinearRegressionModel mdl = trainer.fit(ignite, dataCache, vectorizer);
// Make a prediction.
double prediction = mdl.apply(coordinates);
----
=== SGD Trainer
Another Linear Regression Trainer uses the stochastic gradient descent method to find a minimum of the loss function. The configuration of this trainer is similar to link:machine-learning/binary-classification/multilayer-perceptron[multilayer perceptron trainer] configuration and we can specify the type of updater (`SGD`, `RProp` of `Nesterov`), max number of iterations, batch size, number of local iterations and seed.
[source, java]
----
// Create linear regression trainer.
LinearRegressionSGDTrainer<?> trainer = new LinearRegressionSGDTrainer<>(
new UpdatesStrategy<>(
new RPropUpdateCalculator(),
RPropParameterUpdate::sumLocal,
RPropParameterUpdate::avg
),
100000, // Max iterations.
10, // Batch size.
100, // Local iterations.
123L // Random seed.
);
// Train model.
LinearRegressionModel mdl = trainer.fit(ignite, dataCache, vectorizer);
// Make a prediction.
double prediction = mdl.apply(coordinates);
----
== Examples
To see how the Linear Regression can be used in practice, try these https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/ml/regression/linear[examples] that are available on GitHub and delivered with every Apache Ignite distribution.