blob: 8555a5bcd8f137e5a71db1d99b003853a781f49e [file] [log] [blame]
 \begin{comment} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. \end{comment} \subsection{Generalized Linear Models (GLM)} \label{sec:GLM} \noindent{\bf Description} \smallskip Generalized Linear Models~\cite{Gill2000:GLM,McCullagh1989:GLM,Nelder1972:GLM} extend the methodology of linear and logistic regression to a variety of distributions commonly assumed as noise effects in the response variable. As before, we are given a collection of records $(x_1, y_1)$, \ldots, $(x_n, y_n)$ where $x_i$ is a numerical vector of explanatory (feature) variables of size~\mbox{$\dim x_i = m$}, and $y_i$ is the response (dependent) variable observed for this vector. GLMs assume that some linear combination of the features in~$x_i$ determines the \emph{mean}~$\mu_i$ of~$y_i$, while the observed $y_i$ is a random outcome of a noise distribution $\Prob[y\mid \mu_i]\,$\footnote{$\Prob[y\mid \mu_i]$ is given by a density function if $y$ is continuous.} with that mean~$\mu_i$: \begin{equation*} x_i \,\,\,\,\mapsto\,\,\,\, \eta_i = \beta_0 + \sum\nolimits_{j=1}^m \beta_j x_{i,j} \,\,\,\,\mapsto\,\,\,\, \mu_i \,\,\,\,\mapsto \,\,\,\, y_i \sim \Prob[y\mid \mu_i] \end{equation*} In linear regression the response mean $\mu_i$ \emph{equals} some linear combination over~$x_i$, denoted above by~$\eta_i$. In logistic regression with $y\in\{0, 1\}$ (Bernoulli) the mean of~$y$ is the same as $\Prob[y=1]$ and equals $1/(1+e^{-\eta_i})$, the logistic function of~$\eta_i$. In GLM, $\mu_i$ and $\eta_i$ can be related via any given smooth monotone function called the \emph{link function}: $\eta_i = g(\mu_i)$. The unknown linear combination parameters $\beta_j$ are assumed to be the same for all records. The goal of the regression is to estimate the parameters~$\beta_j$ from the observed data. Once the~$\beta_j$'s are accurately estimated, we can make predictions about~$y$ for a new feature vector~$x$. To do so, compute $\eta$ from~$x$ and use the inverted link function $\mu = g^{-1}(\eta)$ to compute the mean $\mu$ of~$y$; then use the distribution $\Prob[y\mid \mu]$ to make predictions about~$y$. Both $g(\mu)$ and $\Prob[y\mid \mu]$ are user-provided. Our GLM script supports a standard set of distributions and link functions, see below for details. \smallskip \noindent{\bf Usage} \smallskip {\hangindent=\parindent\noindent\it% {\tt{}-f }path/\/{\tt{}GLM.dml} {\tt{} -nvargs} {\tt{} X=}path/file {\tt{} Y=}path/file {\tt{} B=}path/file {\tt{} fmt=}format {\tt{} O=}path/file {\tt{} Log=}path/file {\tt{} dfam=}int {\tt{} vpow=}double {\tt{} link=}int {\tt{} lpow=}double {\tt{} yneg=}double {\tt{} icpt=}int {\tt{} reg=}double {\tt{} tol=}double {\tt{} disp=}double {\tt{} moi=}int {\tt{} mii=}int } \smallskip \noindent{\bf Arguments} \begin{Description} \item[{\tt X}:] Location (on HDFS) to read the matrix of feature vectors; each row constitutes an example. \item[{\tt Y}:] Location to read the response matrix, which may have 1 or 2 columns \item[{\tt B}:] Location to store the estimated regression parameters (the $\beta_j$'s), with the intercept parameter~$\beta_0$ at position {\tt B[}$m\,{+}\,1$, {\tt 1]} if available \item[{\tt fmt}:] (default:\mbox{ }{\tt "text"}) Matrix file output format, such as {\tt text}, {\tt mm}, or {\tt csv}; see read/write functions in SystemML Language Reference for details. \item[{\tt O}:] (default:\mbox{ }{\tt " "}) Location to write certain summary statistics described in Table~\ref{table:GLM:stats}, by default it is standard output. \item[{\tt Log}:] (default:\mbox{ }{\tt " "}) Location to store iteration-specific variables for monitoring and debugging purposes, see Table~\ref{table:GLM:log} for details. \item[{\tt dfam}:] (default:\mbox{ }{\tt 1}) Distribution family code to specify $\Prob[y\mid \mu]$, see Table~\ref{table:commonGLMs}:\\ {\tt 1} = power distributions with $\Var(y) = \mu^{\alpha}$; {\tt 2} = binomial or Bernoulli \item[{\tt vpow}:] (default:\mbox{ }{\tt 0.0}) When {\tt dfam=1}, this provides the~$q$ in $\Var(y) = a\mu^q$, the power dependence of the variance of~$y$ on its mean. In particular, use:\\ {\tt 0.0} = Gaussian, {\tt 1.0} = Poisson, {\tt 2.0} = Gamma, {\tt 3.0} = inverse Gaussian \item[{\tt link}:] (default:\mbox{ }{\tt 0}) Link function code to determine the link function~$\eta = g(\mu)$:\\ {\tt 0} = canonical link (depends on the distribution family), see Table~\ref{table:commonGLMs};\\ {\tt 1} = power functions, {\tt 2} = logit, {\tt 3} = probit, {\tt 4} = cloglog, {\tt 5} = cauchit \item[{\tt lpow}:] (default:\mbox{ }{\tt 1.0}) When {\tt link=1}, this provides the~$s$ in $\eta = \mu^s$, the power link function; {\tt lpow=0.0} gives the log link $\eta = \log\mu$. Common power links:\\ {\tt -2.0} = $1/\mu^2$, {\tt -1.0} = reciprocal, {\tt 0.0} = log, {\tt 0.5} = sqrt, {\tt 1.0} = identity \item[{\tt yneg}:] (default:\mbox{ }{\tt 0.0}) When {\tt dfam=2} and the response matrix $Y$ has 1~column, this specifies the $y$-value used for Bernoulli No'' label. All other $y$-values are treated as the Yes'' label. For example, {\tt yneg=-1.0} may be used when $y\in\{-1, 1\}$; either {\tt yneg=1.0} or {\tt yneg=2.0} may be used when $y\in\{1, 2\}$. \item[{\tt icpt}:] (default:\mbox{ }{\tt 0}) Intercept and shifting/rescaling of the features in~$X$:\\ {\tt 0} = no intercept (hence no~$\beta_0$), no shifting/rescaling of the features;\\ {\tt 1} = add intercept, but do not shift/rescale the features in~$X$;\\ {\tt 2} = add intercept, shift/rescale the features in~$X$ to mean~0, variance~1 \item[{\tt reg}:] (default:\mbox{ }{\tt 0.0}) L2-regularization parameter (lambda) \item[{\tt tol}:] (default:\mbox{ }{\tt 0.000001}) Tolerance (epsilon) used in the convergence criterion: we terminate the outer iterations when the deviance changes by less than this factor; see below for details \item[{\tt disp}:] (default:\mbox{ }{\tt 0.0}) Dispersion parameter, or {\tt 0.0} to estimate it from data \item[{\tt moi}:] (default:\mbox{ }{\tt 200}) Maximum number of outer (Fisher scoring) iterations \item[{\tt mii}:] (default:\mbox{ }{\tt 0}) Maximum number of inner (conjugate gradient) iterations, or~0 if no maximum limit provided \end{Description} \begin{table}[t]\small\centerline{% \begin{tabular}{|ll|} \hline Name & Meaning \\ \hline {\tt TERMINATION\_CODE} & A positive integer indicating success/failure as follows: \\ & $1 = {}$Converged successfully; $2 = {}$Maximum \# of iterations reached; \\ & $3 = {}$Input ({\tt X}, {\tt Y}) out of range; $4 = {}$Distribution/link not supported \\ {\tt BETA\_MIN} & Smallest beta value (regression coefficient), excluding the intercept \\ {\tt BETA\_MIN\_INDEX} & Column index for the smallest beta value \\ {\tt BETA\_MAX} & Largest beta value (regression coefficient), excluding the intercept \\ {\tt BETA\_MAX\_INDEX} & Column index for the largest beta value \\ {\tt INTERCEPT} & Intercept value, or NaN if there is no intercept (if {\tt icpt=0}) \\ {\tt DISPERSION} & Dispersion used to scale deviance, provided in {\tt disp} input argument \\ & or estimated (same as {\tt DISPERSION\_EST}) if {\tt disp} argument is${} \leq 0$ \\ {\tt DISPERSION\_EST} & Dispersion estimated from the dataset \\ {\tt DEVIANCE\_UNSCALED} & Deviance from the saturated model, assuming dispersion${} = 1.0$ \\ {\tt DEVIANCE\_SCALED} & Deviance from the saturated model, scaled by {\tt DISPERSION} value \\ \hline \end{tabular}} \caption{Besides~$\beta$, GLM regression script computes a few summary statistics listed above. They are provided in CSV format, one comma-separated name-value pair per each line.} \label{table:GLM:stats} \end{table} \begin{table}[t]\small\centerline{% \begin{tabular}{|ll|} \hline Name & Meaning \\ \hline {\tt NUM\_CG\_ITERS} & Number of inner (Conj.\ Gradient) iterations in this outer iteration \\ {\tt IS\_TRUST\_REACHED} & $1 = {}$trust region boundary was reached, $0 = {}$otherwise \\ {\tt POINT\_STEP\_NORM} & L2-norm of iteration step from old point ($\beta$-vector) to new point \\ {\tt OBJECTIVE} & The loss function we minimize (negative partial log-likelihood) \\ {\tt OBJ\_DROP\_REAL} & Reduction in the objective during this iteration, actual value \\ {\tt OBJ\_DROP\_PRED} & Reduction in the objective predicted by a quadratic approximation \\ {\tt OBJ\_DROP\_RATIO} & Actual-to-predicted reduction ratio, used to update the trust region \\ {\tt GRADIENT\_NORM} & L2-norm of the loss function gradient (omitted if point is rejected) \\ {\tt LINEAR\_TERM\_MIN} & The minimum value of $X \pxp \beta$, used to check for overflows \\ {\tt LINEAR\_TERM\_MAX} & The maximum value of $X \pxp \beta$, used to check for overflows \\ {\tt IS\_POINT\_UPDATED} & $1 = {}$new point accepted; $0 = {}$new point rejected, old point restored \\ {\tt TRUST\_DELTA} & Updated trust region size, the delta'' \\ \hline \end{tabular}} \caption{ The {\tt Log} file for GLM regression contains the above \mbox{per-}iteration variables in CSV format, each line containing triple (Name, Iteration\#, Value) with Iteration\# being~0 for initial values.} \label{table:GLM:log} \end{table} \begin{table}[t]\hfil \begin{tabular}{|ccccccc|} \hline \multicolumn{4}{|c}{INPUT PARAMETERS} & Distribution & Link & Cano- \\ {\tt dfam} & {\tt vpow} & {\tt link} & {\tt\ lpow} & family & function & nical?\\ \hline {\tt 1} & {\tt 0.0} & {\tt 1} & {\tt -1.0} & Gaussian & inverse & \\ {\tt 1} & {\tt 0.0} & {\tt 1} & {\tt\ 0.0} & Gaussian & log & \\ {\tt 1} & {\tt 0.0} & {\tt 1} & {\tt\ 1.0} & Gaussian & identity & Yes \\ {\tt 1} & {\tt 1.0} & {\tt 1} & {\tt\ 0.0} & Poisson & log & Yes \\ {\tt 1} & {\tt 1.0} & {\tt 1} & {\tt\ 0.5} & Poisson & sq.root & \\ {\tt 1} & {\tt 1.0} & {\tt 1} & {\tt\ 1.0} & Poisson & identity & \\ {\tt 1} & {\tt 2.0} & {\tt 1} & {\tt -1.0} & Gamma & inverse & Yes \\ {\tt 1} & {\tt 2.0} & {\tt 1} & {\tt\ 0.0} & Gamma & log & \\ {\tt 1} & {\tt 2.0} & {\tt 1} & {\tt\ 1.0} & Gamma & identity & \\ {\tt 1} & {\tt 3.0} & {\tt 1} & {\tt -2.0} & Inverse Gauss & $1/\mu^2$ & Yes \\ {\tt 1} & {\tt 3.0} & {\tt 1} & {\tt -1.0} & Inverse Gauss & inverse & \\ {\tt 1} & {\tt 3.0} & {\tt 1} & {\tt\ 0.0} & Inverse Gauss & log & \\ {\tt 1} & {\tt 3.0} & {\tt 1} & {\tt\ 1.0} & Inverse Gauss & identity & \\ \hline {\tt 2} & {\tt *} & {\tt 1} & {\tt\ 0.0} & Binomial & log & \\ {\tt 2} & {\tt *} & {\tt 1} & {\tt\ 0.5} & Binomial & sq.root & \\ {\tt 2} & {\tt *} & {\tt 2} & {\tt\ *} & Binomial & logit & Yes \\ {\tt 2} & {\tt *} & {\tt 3} & {\tt\ *} & Binomial & probit & \\ {\tt 2} & {\tt *} & {\tt 4} & {\tt\ *} & Binomial & cloglog & \\ {\tt 2} & {\tt *} & {\tt 5} & {\tt\ *} & Binomial & cauchit & \\ \hline \end{tabular}\hfil \caption{Common GLM distribution families and link functions. (Here {\tt *}'' stands for any value.'')} \label{table:commonGLMs} \end{table} \noindent{\bf Details} \smallskip In GLM, the noise distribution $\Prob[y\mid \mu]$ of the response variable~$y$ given its mean~$\mu$ is restricted to have the \emph{exponential family} form \begin{equation} Y \sim\, \Prob[y\mid \mu] \,=\, \exp\left(\frac{y\theta - b(\theta)}{a} + c(y, a)\right),\,\,\textrm{where}\,\,\,\mu = \E(Y) = b'(\theta). \label{eqn:GLM} \end{equation} Changing the mean in such a distribution simply multiplies all \mbox{$\Prob[y\mid \mu]$} by~$e^{\,y\hspace{0.2pt}\theta/a}$ and rescales them so that they again integrate to~1. Parameter $\theta$ is called \emph{canonical}, and the function $\theta = b'^{\,-1}(\mu)$ that relates it to the mean is called the~\emph{canonical link}; constant~$a$ is called \emph{dispersion} and rescales the variance of~$y$. Many common distributions can be put into this form, see Table~\ref{table:commonGLMs}. The canonical parameter~$\theta$ is often chosen to coincide with~$\eta$, the linear combination of the regression features; other choices for~$\eta$ are possible too. Rather than specifying the canonical link, GLM distributions are commonly defined by their variance $\Var(y)$ as the function of the mean~$\mu$. It can be shown from Eq.~(\ref{eqn:GLM}) that $\Var(y) = a\,b''(\theta) = a\,b''(b'^{\,-1}(\mu))$. For example, for the Bernoulli distribution $\Var(y) = \mu(1-\mu)$, for the Poisson distribution \mbox{$\Var(y) = \mu$}, and for the Gaussian distribution $\Var(y) = a\cdot 1 = \sigma^2$. It turns out that for many common distributions $\Var(y) = a\mu^q$, a power function. We support all distributions where $\Var(y) = a\mu^q$, as well as the Bernoulli and the binomial distributions. For distributions with $\Var(y) = a\mu^q$ the canonical link is also a power function, namely $\theta = \mu^{1-q}/(1-q)$, except for the Poisson ($q = 1$) whose canonical link is $\theta = \log\mu$. We support all power link functions in the form $\eta = \mu^s$, dropping any constant factor, with $\eta = \log\mu$ for $s=0$. The binomial distribution has its own family of link functions, which includes logit (the canonical link), probit, cloglog, and cauchit (see Table~\ref{table:binomial_links}); we support these only for the binomial and Bernoulli distributions. Links and distributions are specified via four input parameters: {\tt dfam}, {\tt vpow}, {\tt link}, and {\tt lpow} (see Table~\ref{table:commonGLMs}). \begin{table}[t]\hfil \begin{tabular}{|cc|cc|} \hline Name & Link function & Name & Link function \\ \hline Logit & $\displaystyle \eta = 1 / \big(1 + e^{-\mu}\big)^{\mathstrut}$ & Cloglog & $\displaystyle \eta = \log \big(\!- \log(1 - \mu)\big)^{\mathstrut}$ \\ Probit & $\displaystyle \mu = \frac{1}{\sqrt{2\pi}}\int\nolimits_{-\infty_{\mathstrut}}^{\,\eta\mathstrut} \!\!\!\!\! e^{-\frac{t^2}{2}} dt$ & Cauchit & $\displaystyle \eta = \tan\pi(\mu - 1/2)$ \\ \hline \end{tabular}\hfil \caption{The supported non-power link functions for the Bernoulli and the binomial distributions. (Here $\mu$~is the Bernoulli mean.)} \label{table:binomial_links} \end{table} The observed response values are provided to the regression script as matrix~$Y$ having 1 or 2 columns. If a power distribution family is selected ({\tt dfam=1}), matrix $Y$ must have 1~column that provides $y_i$ for each~$x_i$ in the corresponding row of matrix~$X$. When {\tt dfam=2} and $Y$ has 1~column, we assume the Bernoulli distribution for $y_i\in\{y_{\mathrm{neg}}, y_{\mathrm{pos}}\}$ with $y_{\mathrm{neg}}$ from the input parameter {\tt yneg} and with $y_{\mathrm{pos}} \neq y_{\mathrm{neg}}$. When {\tt dfam=2} and $Y$ has 2~columns, we assume the binomial distribution; for each row~$i$ in~$X$, cells $Y[i, 1]$ and $Y[i, 2]$ provide the positive and the negative binomial counts respectively. Internally we convert the 1-column Bernoulli into the 2-column binomial with 0-versus-1 counts. We estimate the regression parameters via L2-regularized negative log-likelihood minimization: \begin{equation*} f(\beta; X, Y) \,\,=\,\, -\sum\nolimits_{i=1}^n \big(y_i\theta_i - b(\theta_i)\big) \,+\,(\lambda/2) \sum\nolimits_{j=1}^m \beta_j^2\,\,\to\,\,\min \end{equation*} where $\theta_i$ and $b(\theta_i)$ are from~(\ref{eqn:GLM}); note that $a$ and $c(y, a)$ are constant w.r.t.~$\beta$ and can be ignored here. The canonical parameter $\theta_i$ depends on both $\beta$ and~$x_i$: \begin{equation*} \theta_i \,\,=\,\, b'^{\,-1}(\mu_i) \,\,=\,\, b'^{\,-1}\big(g^{-1}(\eta_i)\big) \,\,=\,\, \big(b'^{\,-1}\circ g^{-1}\big)\left(\beta_0 + \sum\nolimits_{j=1}^m \beta_j x_{i,j}\right) \end{equation*} The user-provided (via {\tt reg}) regularization coefficient $\lambda\geq 0$ can be used to mitigate overfitting and degeneracy in the data. Note that the intercept is never regularized. Our iterative minimizer for $f(\beta; X, Y)$ uses the Fisher scoring approximation to the difference $\varDelta f(z; \beta) = f(\beta + z; X, Y) \,-\, f(\beta; X, Y)$, recomputed at each iteration: \begin{gather*} \varDelta f(z; \beta) \,\,\,\approx\,\,\, 1/2 \cdot z^T A z \,+\, G^T z, \,\,\,\,\textrm{where}\,\,\,\, A \,=\, X^T\!\diag(w) X \,+\, \lambda I\\ \textrm{and}\,\,\,\,G \,=\, - X^T u \,+\, \lambda\beta, \,\,\,\textrm{with $n\,{\times}\,1$ vectors $w$ and $u$ given by}\\ \forall\,i = 1\ldots n: \,\,\,\, w_i = \big[v(\mu_i)\,g'(\mu_i)^2\big]^{-1} \!\!\!\!\!\!,\,\,\,\,\,\,\,\,\, u_i = (y_i - \mu_i)\big[v(\mu_i)\,g'(\mu_i)\big]^{-1} \!\!\!\!\!\!.\,\,\,\, \end{gather*} Here $v(\mu_i)=\Var(y_i)/a$, the variance of $y_i$ as the function of the mean, and $g'(\mu_i) = d \eta_i/d \mu_i$ is the link function derivative. The Fisher scoring approximation is minimized by trust-region conjugate gradient iterations (called the \emph{inner} iterations, with the Fisher scoring iterations as the \emph{outer} iterations), which approximately solve the following problem: \begin{equation*} 1/2 \cdot z^T A z \,+\, G^T z \,\,\to\,\,\min\,\,\,\,\textrm{subject to}\,\,\,\, \|z\|_2 \leq \delta \end{equation*} The conjugate gradient algorithm closely follows Algorithm~7.2 on page~171 of~\cite{Nocedal2006:Optimization}. The trust region size $\delta$ is initialized as $0.5\sqrt{m}\,/ \max\nolimits_i \|x_i\|_2$ and updated as described in~\cite{Nocedal2006:Optimization}. The user can specify the maximum number of the outer and the inner iterations with input parameters {\tt moi} and {\tt mii}, respectively. The Fisher scoring algorithm terminates successfully if $2|\varDelta f(z; \beta)| < (D_1(\beta) + 0.1)\hspace{0.5pt}\eps$ where $\eps > 0$ is a tolerance supplied by the user via {\tt tol}, and $D_1(\beta)$ is the unit-dispersion deviance estimated as \begin{equation*} D_1(\beta) \,\,=\,\, 2 \cdot \big(\Prob[Y \mid \! \begin{smallmatrix}\textrm{saturated}\\\textrm{model}\end{smallmatrix}, a\,{=}\,1] \,\,-\,\,\Prob[Y \mid X, \beta, a\,{=}\,1]\,\big) \end{equation*} The deviance estimate is also produced as part of the output. Once the Fisher scoring algorithm terminates, if requested by the user, we estimate the dispersion~$a$ from Eq.~\ref{eqn:GLM} using Pearson residuals \begin{equation} \hat{a} \,\,=\,\, \frac{1}{n-m}\cdot \sum_{i=1}^n \frac{(y_i - \mu_i)^2}{v(\mu_i)} \label{eqn:dispersion} \end{equation} and use it to adjust our deviance estimate: $D_{\hat{a}}(\beta) = D_1(\beta)/\hat{a}$. If input argument {\tt disp} is {\tt 0.0} we estimate $\hat{a}$, otherwise we use its value as~$a$. Note that in~(\ref{eqn:dispersion}) $m$~counts the intercept ($m \leftarrow m+1$) if it is present. \smallskip \noindent{\bf Returns} \smallskip The estimated regression parameters (the $\hat{\beta}_j$'s) are populated into a matrix and written to an HDFS file whose path/name was provided as the {\tt B}'' input argument. What this matrix contains, and its size, depends on the input argument {\tt icpt}, which specifies the user's intercept and rescaling choice: \begin{Description} \item[{\tt icpt=0}:] No intercept, matrix~$B$ has size $m\,{\times}\,1$, with $B[j, 1] = \hat{\beta}_j$ for each $j$ from 1 to~$m = {}$ncol$(X)$. \item[{\tt icpt=1}:] There is intercept, but no shifting/rescaling of~$X$; matrix~$B$ has size $(m\,{+}\,1) \times 1$, with $B[j, 1] = \hat{\beta}_j$ for $j$ from 1 to~$m$, and $B[m\,{+}\,1, 1] = \hat{\beta}_0$, the estimated intercept coefficient. \item[{\tt icpt=2}:] There is intercept, and the features in~$X$ are shifted to mean${} = 0$ and rescaled to variance${} = 1$; then there are two versions of the~$\hat{\beta}_j$'s, one for the original features and another for the shifted/rescaled features. Now matrix~$B$ has size $(m\,{+}\,1) \times 2$, with $B[\cdot, 1]$ for the original features and $B[\cdot, 2]$ for the shifted/rescaled features, in the above format. Note that $B[\cdot, 2]$ are iteratively estimated and $B[\cdot, 1]$ are obtained from $B[\cdot, 2]$ by complementary shifting and rescaling. \end{Description} Our script also estimates the dispersion $\hat{a}$ (or takes it from the user's input) and the deviances $D_1(\hat{\beta})$ and $D_{\hat{a}}(\hat{\beta})$, see Table~\ref{table:GLM:stats} for details. A log file with variables monitoring progress through the iterations can also be made available, see Table~\ref{table:GLM:log}. \smallskip \noindent{\bf Examples} \smallskip {\hangindent=\parindent\noindent\tt \hml -f GLM.dml -nvargs X=/user/biadmin/X.mtx Y=/user/biadmin/Y.mtx B=/user/biadmin/B.mtx fmt=csv dfam=2 link=2 yneg=-1.0 icpt=2 reg=0.01 tol=0.00000001 disp=1.0 moi=100 mii=10 O=/user/biadmin/stats.csv Log=/user/biadmin/log.csv } \smallskip \noindent{\bf See Also} \smallskip In case of binary classification problems, consider using L2-SVM or binary logistic regression; for multiclass classification, use multiclass~SVM or multinomial logistic regression. For the special cases of linear regression and logistic regression, it may be more efficient to use the corresponding specialized scripts instead of~GLM.