docs/libs/ml/min_max

mathjax: include htmlTitle: FlinkML - MinMax Scaler title: FlinkML - MinMax Scaler

This will be replaced by the TOC {:toc}

Description

The MinMax scaler scales the given data set, so that all values will lie between a user specified range [min,max]. In case the user does not provide a specific minimum and maximum value for the scaling range, the MinMax scaler transforms the features of the input data set to lie in the [0,1] interval. Given a set of input data $x_1, x_2,... x_n$, with minimum value:

$$x_{min} = min({x_1, x_2,..., x_n})$$

and maximum value:

$$x_{max} = max({x_1, x_2,..., x_n})$$

The scaled data set $z_1, z_2,...,z_n$ will be:

$$z_{i}= \frac{x_{i} - x_{min}}{x_{max} - x_{min}} \left ( max - min \right ) + min$$

where $\textit{min}$ and $\textit{max}$ are the user specified minimum and maximum values of the range to scale.

Operations

MinMaxScaler is a Transformer. As such, it supports the fit and transform operation.

Fit

MinMaxScaler is trained on all subtypes of Vector or LabeledVector:

fit[T <: Vector]: DataSet[T] => Unit
fit: DataSet[LabeledVector] => Unit

Transform

MinMaxScaler transforms all subtypes of Vector or LabeledVector into the respective type:

transform[T <: Vector]: DataSet[T] => DataSet[T]
transform: DataSet[LabeledVector] => DataSet[LabeledVector]

Parameters

The MinMax scaler implementation can be controlled by the following two parameters:

Examples

{% highlight scala %} // Create MinMax scaler transformer val minMaxscaler = MinMaxScaler() .setMin(-1.0)

// Obtain data set to be scaled val dataSet: DataSet[Vector] = ...

// Learn the minimum and maximum values of the training data minMaxscaler.fit(dataSet)

// Scale the provided data set to have min=-1.0 and max=1.0 val scaledDS = minMaxscaler.transform(dataSet) {% endhighlight %}