layout: site title: Buildin Reference

Table of Contents

Introduction

The DML (Declarative Machine Learning) language has built-in functions which enable access to both low- and high-level functions to support all kinds of use cases.

A builtin ir either implemented on a compiler level or as DML scripts that are loaded at compile time.

Built-In Construction Functions

There are some functions which generate an object for us. They create matrices, tensors, lists and other non-primitive objects.

tensor-Function

The tensor-function creates a tensor for us.

tensor(data, dims, byRow = TRUE)

Arguments

NameTypeDefaultDescription
dataMatrix[?], Tensor[?], Scalar[?]requiredThe data with which the tensor should be filled. See data-Argument.
dimsMatrix[Integer], Tensor[Integer], Scalar[String], List[Integer]requiredThe dimensions of the tensor. See dims-Argument.
byRowBooleanTRUENOT USED. Will probably be removed or replaced.

Note that this function is highly unstable and will be overworked and might change signature and functionality.

Returns

TypeDescription
Tensor[?]The generated Tensor. Will support more datatypes than Double.
data-Argument

The data-argument can be a Matrix of any datatype from which the elements will be taken and placed in the tensor until filled. If given as a Tensor the same procedure takes place. We iterate through Matrix and Tensor by starting with each dimension index at 0 and then incrementing the lowest one, until we made a complete pass over the dimension, and then increasing the dimension index above. This will be done until the Tensor is completely filled.

If data is a Scalar, we fill the whole tensor with the value.

dims-Argument

The dimension of the tensor can either be given by a vector represented by either by a Matrix, Tensor, String or List. Dimensions given by a String will be expected to be concatenated by spaces.

Example

print("Dimension matrix:");
d = matrix("2 3 4", 1, 3);
print(toString(d, decimal=1))

print("Tensor A: Fillvalue=3, dims=2 3 4");
A = tensor(3, d); # fill with value, dimensions given by matrix
print(toString(A))

print("Tensor B: Reshape A, dims=4 2 3");
B = tensor(A, "4 2 3"); # reshape tensor, dimensions given by string
print(toString(B))

print("Tensor C: Reshape dimension matrix, dims=1 3");
C = tensor(d, list(1, 3)); # values given by matrix, dimensions given by list
print(toString(C, decimal=1))

print("Tensor D: Values=tst, dims=Tensor C");
D = tensor("tst", C); # fill with string, dimensions given by tensor
print(toString(D))

Note that reshape construction is not yet supported for SPARK execution.

DML-Bodied Built-In Functions

DML-bodied built-in functions are written as DML-Scripts and executed as such when called.

confusionMatrix-Function

A confusionMatrix-accepts a vector for prediction and a one-hot-encoded matrix, then it computes the max value of each vector and compare them, after which it calculates and returns the sum of classifications and the average of each true class.

Usage

confusionMatrix(P, Y)

Arguments

NameTypeDefaultDescription
PMatrix[Double]---vector of prediction
YMatrix[Double]---vector of Golden standard One Hot Encoded

Returns

NameTypeDescription
ConfusionSumMatrix[Double]The Confusion Matrix Sums of classifications
ConfusionAvgMatrix[Double]The Confusion Matrix averages of each true class

Example

numClasses = 1
z = rand(rows = 5, cols = 1, min = 1, max = 9)
X = round(rand(rows = 5, cols = 1, min = 1, max = numClasses))
y = toOneHot(X, numClasses)
[ConfusionSum, ConfusionAvg] = confusionMatrix(P=z, Y=y)

cvlm-Function

The cvlm-function is used for cross-validation of the provided data model. This function follows a non-exhaustive cross validation method. It uses lm and lmpredict functions to solve the linear regression and to predict the class of a feature vector with no intercept, shifting, and rescaling.

Usage

cvlm(X, y, k)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredRecorded Data set into matrix
yMatrix[Double]required1-column matrix of response values.
kIntegerrequiredNumber of subsets needed, It should always be more than 1 and less than nrow(X)
icptInteger0Intercept presence, shifting and rescaling the columns of X
regDouble1e-7Regularization constant (lambda) for L2-regularization. set to nonzero for highly dependant/sparse/numerous features

Returns

TypeDescription
Matrix[Double]Response values
Matrix[Double]Validated data set

Example

X = rand (rows = 5, cols = 5)
y = X %*% rand(rows = ncol(X), cols = 1)
[predict, beta] = cvlm(X = X, y = y, k = 4)

DBSCAN-Function

The dbscan() implements the DBSCAN Clustering algorithm using Euclidian distance.

Usage

Y = dbscan(X = X, eps = 2.5, minPts = 5)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredThe input Matrix to do DBSCAN on.
epsDouble0.5Maximum distance between two points for one to be considered reachable for the other.
minPtsInt5Number of points in a neighborhood for a point to be considered as a core point (includes the point itself).

Returns

TypeDescription
Matrix[Integer]The mapping of records to clusters

Example

X = rand(rows=1780, cols=180, min=1, max=20) 
dbscan(X = X, eps = 2.5, minPts = 360)

discoverFD-Function

The discoverFD-function finds the functional dependencies.

Usage

discoverFD(X, Mask, threshold)

Arguments

NameTypeDefaultDescription
XDouble--Input Matrix X, encoded Matrix if data is categorical
MaskDouble--A row vector for interested features i.e. Mask =[1, 0, 1] will exclude the second column from processing
thresholdDouble--threshold value in interval [0, 1] for robust FDs

Returns

TypeDescription
Doublematrix of functional dependencies

dist-Function

The dist-function is used to compute Euclidian distances between N d-dimensional points.

Usage

dist(X)

Arguments

NameTypeDefaultDescription
XMatrix[Double]required(n x d) matrix of d-dimensional points

Returns

TypeDescription
Matrix[Double](n x n) symmetric matrix of Euclidian distances

Example

X = rand (rows = 5, cols = 5)
Y = dist(X)

glm-Function

The glm-function is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models.

Usage

glm(X,Y)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredmatrix X of feature vectors
YMatrix[Double]requiredmatrix Y with either 1 or 2 columns: if dfam = 2, Y is 1-column Bernoulli or 2-column Binomial (#pos, #neg)
dfamInt1Distribution family code: 1 = Power, 2 = Binomial
vpowDouble0.0Power for Variance defined as (mean)^power (ignored if dfam != 1): 0.0 = Gaussian, 1.0 = Poisson, 2.0 = Gamma, 3.0 = Inverse Gaussian
linkInt0Link function code: 0 = canonical (depends on distribution), 1 = Power, 2 = Logit, 3 = Probit, 4 = Cloglog, 5 = Cauchit
lpowDouble1.0Power for Link function defined as (mean)^power (ignored if link != 1): -2.0 = 1/mu^2, -1.0 = reciprocal, 0.0 = log, 0.5 = sqrt, 1.0 = identity
ynegDouble0.0Response value for Bernoulli “No” label, usually 0.0 or -1.0
icptInt0Intercept presence, X columns shifting and rescaling: 0 = no intercept, no shifting, no rescaling; 1 = add intercept, but neither shift nor rescale X; 2 = add intercept, shift & rescale X columns to mean = 0, variance = 1
regDouble0.0Regularization parameter (lambda) for L2 regularization
tolDouble1e-6Tolerance (epislon) value.
dispDouble0.0(Over-)dispersion value, or 0.0 to estimate it from data
moiInt200Maximum number of outer (Newton / Fisher Scoring) iterations
miiInt0Maximum number of inner (Conjugate Gradient) iterations, 0 = no maximum

Returns

TypeDescription
Matrix[Double]Matrix whose size depends on icpt ( icpt=0: ncol(X) x 1; icpt=1: (ncol(X) + 1) x 1; icpt=2: (ncol(X) + 1) x 2)

Example

X = rand (rows = 5, cols = 5 )
y = X %*% rand(rows = ncol(X), cols = 1)
beta = glm(X=X,Y=y)

gridSearch-Function

The gridSearch-function is used to find the optimal hyper-parameters of a model which results in the most accurate predictions. This function takes train and eval functions by name.

Usage

gridSearch(X, y, train, predict, params, paramValues, verbose)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredInput Matrix of vectors.
yMatrix[Double]requiredInput Matrix of vectors.
trainStringrequiredSpecified training function.
predictStringrequiredEvaluation based function.
paramsList[String]requiredList of parameters
paramValuesList[Unknown]requiredRange of values for the parameters
verboseBooleanTRUEIf TRUE print messages are activated

Returns

TypeDescription
Matrix[Double]Parameter combination
Frame[Unknown]Best results model

Example

X = rand (rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
params = list("reg", "tol", "maxi")
paramRanges = list(10^seq(0,-4), 10^seq(-5,-9), 10^seq(1,3))
[B, opt]= gridSearch(X=X, y=y, train="lm", predict="lmPredict", params=params, paramValues=paramRanges, verbose = TRUE)

hyperband-Function

The hyperband-function is used for hyper parameter optimization and is based on multi-armed bandits and early elimination. Through multiple parallel brackets and consecutive trials it will return the hyper parameter combination which performed best on a validation dataset. A set of hyper parameter combinations is drawn from uniform distributions with given ranges; Those make up the candidates for hyperband. Notes:

  • hyperband is hard-coded for lmCG, and uses lmpredict for validation
  • hyperband is hard-coded to use the number of iterations as a resource
  • hyperband can only optimize continuous hyperparameters

Usage

hyperband(X_train, y_train, X_val, y_val, params, paramRanges, R, eta, verbose)

Arguments

NameTypeDefaultDescription
X_trainMatrix[Double]requiredInput Matrix of training vectors.
y_trainMatrix[Double]requiredLabels for training vectors.
X_valMatrix[Double]requiredInput Matrix of validation vectors.
y_valMatrix[Double]requiredLabels for validation vectors.
paramsList[String]requiredList of parameters to optimize.
paramRangesMatrix[Double]requiredThe min and max values for the uniform distributions to draw from. One row per hyper parameter, first column specifies min, second column max value.
RScalar[int]81Controls number of candidates evaluated.
etaScalar[int]3Determines fraction of candidates to keep after each trial.
verboseBooleanTRUEIf TRUE print messages are activated.

Returns

TypeDescription
Matrix[Double]1-column matrix of weights of best performing candidate
Frame[Unknown]hyper parameters of best performing candidate

Example

X_train = rand(rows=50, cols=10);
y_train = rowSums(X_train) + rand(rows=50, cols=1);
X_val = rand(rows=50, cols=10);
y_val = rowSums(X_val) + rand(rows=50, cols=1);

params = list("reg");
paramRanges = matrix("0 20", rows=1, cols=2);

[bestWeights, optHyperParams] = hyperband(X_train=X_train, y_train=y_train, 
    X_val=X_val, y_val=y_val, params=params, paramRanges=paramRanges);

img_brightness-Function

The img_brightness-function is an image data augumentation function. It changes the brightness of the image.

Usage

img_brightness(img_in, value, channel_max)

Arguments

NameTypeDefaultDescription
img_inMatrix[Double]---Input matrix/image
valueDouble---The amount of brightness to be changed for the image
channel_maxInteger---Maximum value of the brightness of the image

Returns

NameTypeDefaultDescription
img_outMatrix[Double]---Output matrix/image

Example

A = rand(rows = 3, cols = 3, min = 0, max = 255)
B = img_brightness(img_in = A, value = 128, channel_max = 255)

img_crop-Function

The img_crop-function is an image data augumentation function. It cuts out a subregion of an image.

Usage

img_crop(img_in, w, h, x_offset, y_offset)

Arguments

NameTypeDefaultDescription
img_inMatrix[Double]---Input matrix/image
wInteger---The width of the subregion required
hInteger---The height of the subregion required
x_offsetInteger---The horizontal coordinate in the image to begin the crop operation
y_offsetInteger---The vertical coordinate in the image to begin the crop operation

Returns

NameTypeDefaultDescription
img_outMatrix[Double]---Cropped matrix/image

Example

A = rand(rows = 3, cols = 3, min = 0, max = 255) 
B = img_crop(img_in = A, w = 20, h = 10, x_offset = 0, y_offset = 0)

img_mirror-Function

The img_mirror-function is an image data augumentation function. It flips an image on the X (horizontal) or Y (vertical) axis.

Usage

img_mirror(img_in, horizontal_axis)

Arguments

NameTypeDefaultDescription
img_inMatrix[Double]---Input matrix/image
horizontal_axisBoolean---If TRUE, the image is flipped with respect to horizontal axis otherwise vertical axis

Returns

NameTypeDefaultDescription
img_outMatrix[Double]---Flipped matrix/image

Example

A = rand(rows = 3, cols = 3, min = 0, max = 255)
B = img_mirror(img_in = A, horizontal_axis = TRUE)

imputeByFD-Function

The imputeByFD-function imputes missing values from observed values (if exist) using robust functional dependencies.

Usage

imputeByFD(F, sourceAttribute, targetAttribute, threshold)

Arguments

NameTypeDefaultDescription
FString--A data frame
sourceInteger--Source attribute to use for imputation and error correction
targetInteger--Attribute to be fixed
thresholdDouble--threshold value in interval [0, 1] for robust FDs

Returns

TypeDescription
StringFrame with possible imputations

KMeans-Function

The kmeans() implements the KMeans Clustering algorithm.

Usage

kmeans(X = X, k = 20, runs = 10, max_iter = 5000, eps = 0.000001, is_verbose = FALSE, avg_sample_size_per_centroid = 50)

Arguments

NameTypeDefaultDescription
xMatrix[Double]requiredThe input Matrix to do KMeans on.
kInt10Number of centroids
runsInt10Number of runs (with different initial centroids)
max_iterInt100Max no. of iterations allowed
epsDouble0.000001Tolerance (epsilon) for WCSS change ratio
is_verboseBooleanFALSEdo not print per-iteration stats

Returns

TypeDescription
StringThe mapping of records to centroids
StringThe output matrix with the centroids

Example

X = rand (rows = 3972, cols = 972)
kmeans(X = X, k = 20, runs = 10, max_iter = 5000, eps = 0.000001, is_verbose = FALSE, avg_sample_size_per_centroid = 50)

lm-Function

The lm-function solves linear regression using either the direct solve method or the conjugate gradient algorithm depending on the input size of the matrices (See lmDS-function and lmCG-function respectively).

Usage

lm(X, y, icpt = 0, reg = 1e-7, tol = 1e-7, maxi = 0, verbose = TRUE)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.
yMatrix[Double]required1-column matrix of response values.
icptInteger0Intercept presence, shifting and rescaling the columns of X (Details)
regDouble1e-7Regularization constant (lambda) for L2-regularization. set to nonzero for highly dependant/sparse/numerous features
tolDouble1e-7Tolerance (epsilon); conjugate gradient procedure terminates early if L2 norm of the beta-residual is less than tolerance * its initial norm
maxiInteger0Maximum number of conjugate gradient iterations. 0 = no maximum
verboseBooleanTRUEIf TRUE print messages are activated

Note that if number of features is small enough (rows of X/y < 2000), the lmDS-Function' is called internally and parameters tol and maxi are ignored.

Returns

TypeDescription
Matrix[Double]1-column matrix of weights.
icpt-Argument

The icpt-argument can be set to 3 modes:

  • 0 = no intercept, no shifting, no rescaling
  • 1 = add intercept, but neither shift nor rescale X
  • 2 = add intercept, shift & rescale X columns to mean = 0, variance = 1

Example

X = rand (rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
lm(X = X, y = y)

intersect-Function

The intersect-function implements set intersection for numeric data.

Usage

intersect(X, Y)

Arguments

NameTypeDefaultDescription
XDouble--matrix X, set A
YDouble--matrix Y, set B

Returns

TypeDescription
Doubleintersection matrix, set of intersecting items

lmDS-Function

The lmDS-function solves linear regression by directly solving the linear system.

Usage

lmDS(X, y, icpt = 0, reg = 1e-7, verbose = TRUE)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.
yMatrix[Double]required1-column matrix of response values.
icptInteger0Intercept presence, shifting and rescaling the columns of X (Details)
regDouble1e-7Regularization constant (lambda) for L2-regularization. set to nonzero for highly dependant/sparse/numerous features
verboseBooleanTRUEIf TRUE print messages are activated

Returns

TypeDescription
Matrix[Double]1-column matrix of weights.

Example

X = rand (rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
lmDS(X = X, y = y)

lmCG-Function

The lmCG-function solves linear regression using the conjugate gradient algorithm.

Usage

lmCG(X, y, icpt = 0, reg = 1e-7, tol = 1e-7, maxi = 0, verbose = TRUE)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.
yMatrix[Double]required1-column matrix of response values.
icptInteger0Intercept presence, shifting and rescaling the columns of X (Details)
regDouble1e-7Regularization constant (lambda) for L2-regularization. set to nonzero for highly dependant/sparse/numerous features
tolDouble1e-7Tolerance (epsilon); conjugate gradient procedure terminates early if L2 norm of the beta-residual is less than tolerance * its initial norm
maxiInteger0Maximum number of conjugate gradient iterations. 0 = no maximum
verboseBooleanTRUEIf TRUE print messages are activated

Returns

TypeDescription
Matrix[Double]1-column matrix of weights.

Example

X = rand (rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
lmCG(X = X, y = y, maxi = 10)

lmpredict-Function

The lmpredict-function predicts the class of a feature vector.

Usage

lmpredict(X, w)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vector(s).
wMatrix[Double]required1-column matrix of weights.
icptMatrix[Double]0Intercept presence, shifting and rescaling of X (Details)

Returns

TypeDescription
Matrix[Double]1-column matrix of classes.

Example

X = rand (rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
w = lm(X = X, y = y)
yp = lmpredict(X, w)

mice-Function

The mice-function implements Multiple Imputation using Chained Equations (MICE) for nominal data.

Usage

mice(F, cMask, iter, complete, verbose)

Arguments

NameTypeDefaultDescription
FFrame[String]requiredData Frame with one-dimensional row matrix with N columns where N>1.
cMaskMatrix[Double]required0/1 row vector for identifying numeric (0) and categorical features (1) with one-dimensional row matrix with column = ncol(F).
iterInteger3Number of iteration for multiple imputations.
completeInteger3A complete dataset generated though a specific iteration.
verboseBooleanFALSEBoolean value.

Returns

TypeDescription
Frame[String]imputed dataset.
Frame[String]A complete dataset generated though a specific iteration.

Example

F = as.frame(matrix("4 3 2 8 7 8 5", rows=1, cols=7))
cMask = round(rand(rows=1,cols=ncol(F),min=0,max=1))
[dataset, singleSet] = mice(F, cMask, iter = 3, complete = 3, verbose = FALSE)

multiLogReg-Function

The multiLogReg-function solves Multinomial Logistic Regression using Trust Region method. (See: Trust Region Newton Method for Logistic Regression, Lin, Weng and Keerthi, JMLR 9 (2008) 627-650)

Usage

multiLogReg(X, Y, icpt, reg, tol, maxi, maxii, verbose)

Arguments

NameTypeDefaultDescription
XDouble--The matrix of feature vectors
YDouble--The matrix with category labels
icptInt0Intercept presence, shifting and rescaling X columns: 0 = no intercept, no shifting, no rescaling; 1 = add intercept, but neither shift nor rescale X; 2 = add intercept, shift & rescale X columns to mean = 0, variance = 1
regDouble0regularization parameter (lambda = 1/C); intercept is not regularized
tolDouble1e-6tolerance (“epsilon”)
maxiInt100max. number of outer newton interations
maxiiInt0max. number of inner (conjugate gradient) iterations

Returns

TypeDescription
DoubleRegression betas as output for prediction

Example

X = rand(rows = 50, cols = 30)
Y = X %*% rand(rows = ncol(X), cols = 1)
betas = multiLogReg(X = X, Y = Y, icpt = 2,  tol = 0.000001, reg = 1.0, maxi = 100, maxii = 20, verbose = TRUE)

pnmf-Function

The pnmf-function implements Poisson Non-negative Matrix Factorization (PNMF). Matrix X is factorized into two non-negative matrices, W and H based on Poisson probabilistic assumption. This non-negativity makes the resulting matrices easier to inspect.

Usage

pnmf(X, rnk, eps = 10^-8, maxi = 10, verbose = TRUE)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.
rnkIntegerrequiredNumber of components into which matrix X is to be factored.
epsDouble10^-8Tolerance
maxiInteger10Maximum number of conjugate gradient iterations.
verboseBooleanTRUEIf TRUE, ‘iter’ and ‘obj’ are printed.

Returns

TypeDescription
Matrix[Double]List of pattern matrices, one for each repetition.
Matrix[Double]List of amplitude matrices, one for each repetition.

Example

X = rand(rows = 50, cols = 10)
[W, H] = pnmf(X = X, rnk = 2, eps = 10^-8, maxi = 10, verbose = TRUE)

scale-Function

The scale function is a generic function whose default method centers or scales the column of a numeric matrix.

Usage

scale(X, center=TRUE, scale=TRUE)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.
centerBooleanrequiredeither a logical value or numerical value.
scaleBooleanrequiredeither a logical value or numerical value.

Returns

TypeDescription
Matrix[Double]1-column matrix of weights.

Example

X = rand(rows = 20, cols = 10)
center=TRUE;
scale=TRUE;
Y= scale(X,center,scale)

sigmoid-Function

The Sigmoid function is a type of activation function, and also defined as a squashing function which limit the output to a range between 0 and 1, which will make these functions useful in the prediction of probabilities.

Usage

sigmoid(X)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.

Returns

TypeDescription
Matrix[Double]1-column matrix of weights.

Example

X = rand (rows = 20, cols = 10)
Y = sigmoid(X)

steplm-Function

The steplm-function (stepwise linear regression) implements a classical forward feature selection method. This method iteratively runs what-if scenarios and greedily selects the next best feature until the Akaike information criterion (AIC) does not improve anymore. Each configuration trains a regression model via lm, which in turn calls either the closed form lmDS or iterative lmGC.

Usage

steplm(X, y, icpt);

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.
yMatrix[Double]required1-column matrix of response values.
icptInteger0Intercept presence, shifting and rescaling the columns of X (Details)
regDouble1e-7Regularization constant (lambda) for L2-regularization. set to nonzero for highly dependent/sparse/numerous features
tolDouble1e-7Tolerance (epsilon); conjugate gradient procedure terminates early if L2 norm of the beta-residual is less than tolerance * its initial norm
maxiInteger0Maximum number of conjugate gradient iterations. 0 = no maximum
verboseBooleanTRUEIf TRUE print messages are activated

Returns

TypeDescription
Matrix[Double]Matrix of regression parameters (the betas) and its size depend on icpt input value. (C in the example)
Matrix[Double]Matrix of selected features ordered as computed by the algorithm. (S in the example)
icpt-Argument

The icpt-arg can be set to 2 modes:

  • 0 = no intercept, no shifting, no rescaling
  • 1 = add intercept, but neither shift nor rescale X
selected-Output

If the best AIC is achieved without any features the matrix of selected features contains 0. Moreover, in this case no further statistics will be produced

Example

X = rand (rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
[C, S] = steplm(X = X, y = y, icpt = 1);

slicefinder-Function

The slicefinder-function returns top-k worst performing subsets according to a model calculation.

Usage

slicefinder(X,W, y, k, paq, S);

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredRecoded dataset into Matrix
WMatrix[Double]requiredTrained model
yMatrix[Double]required1-column matrix of response values.
kInteger1Number of subsets required
paqInteger1amount of values wanted for each col, if paq = 1 then its off
SInteger2amount of subsets to combine (for now supported only 1 and 2)

Returns

TypeDescription
Matrix[Double]Matrix containing the information of top_K slices (relative error, standart error, value0, value1, col_number(sort), rows, cols,range_row,range_cols, value00, value01,col_number2(sort), rows2, cols2,range_row2,range_cols2)

Usage

X = rand (rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
w = lm(X = X, y = y)
ress = slicefinder(X = X,W = w, Y = y,  k = 5, paq = 1, S = 2);

normalize-Function

The normalize-function normalises the values of a matrix by changing the dataset to use a common scale. This is done while preserving differences in the ranges of values. The output is a matrix of values in range [0,1].

Usage

normalize(X); 

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.

Returns

TypeDescription
Matrix[Double]1-column matrix of normalized values.

Example

X = rand(rows = 50, cols = 10)
y = X %*% rand(rows = ncol(X), cols = 1)
y = normalize(X = X)

gnmf-Function

The gnmf-function does Gaussian Non-Negative Matrix Factorization. In this, a matrix X is factorized into two matrices W and H, such that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect.

Usage

gnmf(X, rnk, eps = 10^-8, maxi = 10)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of feature vectors.
rnkIntegerrequiredNumber of components into which matrix X is to be factored.
epsDouble10^-8Tolerance
maxiInteger10Maximum number of conjugate gradient iterations.

Returns

TypeDescription
Matrix[Double]List of pattern matrices, one for each repetition.
Matrix[Double]List of amplitude matrices, one for each repetition.

Example

X = rand(rows = 50, cols = 10)
W = rand(rows = nrow(X), cols = 2, min = -0.05, max = 0.05);
H = rand(rows = 2, cols = ncol(X), min = -0.05, max = 0.05);
gnmf(X = X, rnk = 2, eps = 10^-8, maxi = 10)

naivebayes-Function

The naivebayes-function computes the class conditional probabilities and class priors.

Usage

naivebayes(D, C, laplace, verbose)

Arguments

NameTypeDefaultDescription
DMatrix[Double]requiredOne dimensional column matrix with N rows.
CMatrix[Double]requiredOne dimensional column matrix with N rows.
LaplaceDouble1Any Double value.
VerboseBooleanTRUEBoolean value.

Returns

TypeDescription
Matrix[Double]Class priors, One dimensional column matrix with N rows.
Matrix[Double]Class conditional probabilites, One dimensional column matrix with N rows.

Example

D=rand(rows=10,cols=1,min=10)
C=rand(rows=10,cols=1,min=10)
[prior, classConditionals] = naivebayes(D, C, laplace = 1, verbose = TRUE)

outlier-Function

This outlier-function takes a matrix data set as input from where it determines which point(s) have the largest difference from mean.

Usage

outlier(X, opposite)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredMatrix of Recoded dataset for outlier evaluation
oppositeBooleanrequired(1)TRUE for evaluating outlier from upper quartile range, (0)FALSE for evaluating outlier from lower quartile range

Returns

TypeDescription
Matrix[Double]matrix indicating outlier values

Example

X = rand (rows = 50, cols = 10)
outlier(X=X, opposite=1)

toOneHot-Function

The toOneHot-function encodes unordered categorical vector to multiple binarized vectors.

Usage

toOneHot(X, numClasses)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredvector with N integer entries between 1 and numClasses.
numClassesintrequirednumber of columns, must be greater than or equal to largest value in X.

Returns

TypeDescription
Matrix[Double]one-hot-encoded matrix with shape (N, numClasses).

Example

numClasses = 5
X = round(rand(rows = 10, cols = 10, min = 1, max = numClasses))
y = toOneHot(X,numClasses)

msvm-Function

The msvm-function implements builtin multiclass SVM with squared slack variables It learns one-against-the-rest binary-class classifiers by making a function call to l2SVM

Usage

msvm(X, Y, intercept, epsilon, lamda, maxIterations, verbose)

Arguments

NameTypeDefaultDescription
XDouble---Matrix X of feature vectors.
YDouble---Matrix Y of class labels.
interceptBooleanFalseNo Intercept ( If set to TRUE then a constant bias column is added to X)
num_classesInteger10Number of classes.
epsilonDouble0.001Procedure terminates early if the reduction in objective function value is less than epsilon (tolerance) times the initial objective function value.
lamdaDouble1.0Regularization parameter (lambda) for L2 regularization
maxIterationsInteger100Maximum number of conjugate gradient iterations
verboseBooleanFalseSet to true to print while training.

Returns

NameTypeDefaultDescription
modelDouble---Model matrix.

Example

X = rand(rows = 50, cols = 10)
y = round(X %*% rand(rows=ncol(X), cols=1))
model = msvm(X = X, Y = y, intercept = FALSE, epsilon = 0.005, lambda = 1.0, maxIterations = 100, verbose = FALSE)

winsorize-Function

The winsorize-function removes outliers from the data. It does so by computing upper and lower quartile range of the given data then it replaces any value that falls outside this range (less than lower quartile range or more than upper quartile range).

Usage

winsorize(X)

Arguments

NameTypeDefaultDescription
XMatrix[Double]requiredrecorded data set with possible outlier values

Returns

TypeDescription
Matrix[Double]Matrix without outlier values

Example

X = rand(rows=10, cols=10,min = 1, max=9)
Y = winsorize(X=X)

gmm-Function

The gmm-function implements builtin Gaussian Mixture Model with four different types of covariance matrices i.e., VVV, EEE, VVI, VII and two initialization methods namely “kmeans” and “random”.

Usage

gmm(X=X, n_components = 3,  model = "VVV",  init_params = "random", iter = 100, reg_covar = 0.000001, tol = 0.0001, verbose=TRUE)

Arguments

NameTypeDefaultDescription
XDouble---Matrix X of feature vectors.
n_componentsInteger3Number of n_components in the Gaussian mixture model
modelString“VVV”“VVV”: unequal variance (full),each component has its own general covariance matrix

“EEE”: equal variance (tied), all components share the same general covariance matrix

“VVI”: spherical, unequal volume (diag), each component has its own diagonal covariance matrix

“VII”: spherical, equal volume (spherical), each component has its own single variance
init_paramString“kmeans”initialize weights with “kmeans” or “random”
iterationsInteger100Number of iterations
reg_covarDouble1e-6regularization parameter for covariance matrix
tolDouble0.000001tolerance value for convergence
verboseBooleanFalseSet to true to print intermediate results.

Returns

NameTypeDefaultDescription
weightDouble---A matrix whose [i,k]th entry is the probability that observation i in the test data belongs to the kth class
labelsDouble---Prediction matrix
dfInteger---Number of estimated parameters
bicDouble---Bayesian information criterion for best iteration

Example

X = read($1)
[labels, df, bic] = gmm(X=X, n_components = 3,  model = "VVV",  init_params = "random", iter = 100, reg_covar = 0.000001, tol = 0.0001, verbose=TRUE)