tree a5e963c9cc61668a6d7051a6addd4d97c2c9fc81
parent e96e1cbafc53e90f64c08899cccbc1f02a9e46b3
author Matthias Boehm <mboehm7@gmail.com> 1604181947 +0100
committer Matthias Boehm <mboehm7@gmail.com> 1604182004 +0100

[SYSTEMDS-2710] Fix K-Means and GMM builtin functions, robustness IPA

This patch fixes various issues related to the recent addition of seeds
to the Kmeans builtin function.

First, the GMM built-in function was failing in IPA because the GMM call
to Kmeans used by-position arguments and thus missed the new seed
argument. We now use named parameters to guard against future additions.

Second, the error handling in such cases of mismatching numbers of
arguments was horrible (failing in inter-procedural analysis index out
of bounds exceptions). We now use a more robust handling in IPA such
that the user get the intended, error message explaining the problem.

Third, the kmeans addition of seeds used a uniform random matrix and
incrementally added uniform random matrices per centroid. Adding
multiple uniformly distributed random variables, gives a normally
distributed random variable. For K-Means initialization this is not
desirable and ultimately led to the flaky python test failures.
