mathjax: include htmlTitle: FlinkML - Distance Metrics title: FlinkML - Distance Metrics

  • This will be replaced by the TOC {:toc}

Description

Different metrics of distance are convenient for different types of analysis. Flink ML provides built-in implementations for many standard distance metrics. You can create custom distance metrics by implementing the DistanceMetric trait.

Built-in Implementations

Currently, FlinkML supports the following metrics:

<tbody>
  <tr>
    <td><strong>Euclidean Distance</strong></td>
    <td>
      $$d(\x, \y) = \sqrt{\sum_{i=1}^n \left(x_i - y_i \right)^2}$$
    </td>
  </tr>
  <tr>
    <td><strong>Squared Euclidean Distance</strong></td>
    <td>
      $$d(\x, \y) = \sum_{i=1}^n \left(x_i - y_i \right)^2$$
    </td>
  </tr>
  <tr>
    <td><strong>Cosine Similarity</strong></td>
    <td>
      $$d(\x, \y) = 1 - \frac{\x^T \y}{\Vert \x \Vert \Vert \y \Vert}$$
    </td>
  </tr>
  <tr>
    <td><strong>Chebyshev Distance</strong></td>
    <td>
      $$d(\x, \y) = \max_{i}\left(\left \vert x_i - y_i \right\vert \right)$$
    </td>
  </tr>
  <tr>
    <td><strong>Manhattan Distance</strong></td>
    <td>
      $$d(\x, \y) = \sum_{i=1}^n \left\vert x_i - y_i \right\vert$$
    </td>
  </tr>
  <tr>
    <td><strong>Minkowski Distance</strong></td>
    <td>
      $$d(\x, \y) = \left( \sum_{i=1}^{n} \left( x_i - y_i \right)^p \right)^{\rfrac{1}{p}}$$
    </td>
  </tr>
  <tr>
    <td><strong>Tanimoto Distance</strong></td>
    <td>
      $$d(\x, \y) = 1 - \frac{\x^T\y}{\Vert \x \Vert^2 + \Vert \y \Vert^2 - \x^T\y}$$ 
      with $\x$ and $\y$ being bit-vectors
    </td>
  </tr>
</tbody>

Custom Implementation

You can create your own distance metric by implementing the DistanceMetric trait.

{% highlight scala %} class MyDistance extends DistanceMetric { override def distance(a: Vector, b: Vector) = ... // your implementation for distance metric }

object MyDistance { def apply() = new MyDistance() }

val myMetric = MyDistance() {% endhighlight %}