\begin{comment} | |

Licensed to the Apache Software Foundation (ASF) under one | |

or more contributor license agreements. See the NOTICE file | |

distributed with this work for additional information | |

regarding copyright ownership. The ASF licenses this file | |

to you under the Apache License, Version 2.0 (the | |

"License"); you may not use this file except in compliance | |

with the License. You may obtain a copy of the License at | |

http://www.apache.org/licenses/LICENSE-2.0 | |

Unless required by applicable law or agreed to in writing, | |

software distributed under the License is distributed on an | |

"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |

KIND, either express or implied. See the License for the | |

specific language governing permissions and limitations | |

under the License. | |

\end{comment} | |

\subsubsection{Multi-class Support Vector Machines} | |

\label{msvm} | |

\noindent{\bf Description} | |

Support Vector Machines are used to model the relationship between a categorical | |

dependent variable y and one or more explanatory variables denoted X. This | |

implementation supports dependent variables that have domain size greater or | |

equal to 2 and hence is not restricted to binary class labels. | |

\\ | |

\noindent{\bf Usage} | |

\begin{tabbing} | |

\texttt{-f} \textit{path}/\texttt{m-svm.dml -nvargs} | |

\=\texttt{X=}\textit{path}/\textit{file} | |

\texttt{Y=}\textit{path}/\textit{file} | |

\texttt{icpt=}\textit{int}\\ | |

\>\texttt{tol=}\textit{double} | |

\texttt{reg=}\textit{double} | |

\texttt{maxiter=}\textit{int} | |

\texttt{model=}\textit{path}/\textit{file}\\ | |

\>\texttt{Log=}\textit{path}/\textit{file} | |

\texttt{fmt=}\textit{csv}$\vert$\textit{text} | |

\end{tabbing} | |

\begin{tabbing} | |

\texttt{-f} \textit{path}/\texttt{m-svm-predict.dml -nvargs} | |

\=\texttt{X=}\textit{path}/\textit{file} | |

\texttt{Y=}\textit{path}/\textit{file} | |

\texttt{icpt=}\textit{int} | |

\texttt{model=}\textit{path}/\textit{file}\\ | |

\>\texttt{scores=}\textit{path}/\textit{file} | |

\texttt{accuracy=}\textit{path}/\textit{file}\\ | |

\>\texttt{confusion=}\textit{path}/\textit{file} | |

\texttt{fmt=}\textit{csv}$\vert$\textit{text} | |

\end{tabbing} | |

\noindent{\bf Arguments} | |

\begin{itemize} | |

\item X: Location (on HDFS) containing the explanatory variables | |

in a matrix. Each row constitutes an example. | |

\item Y: Location (on HDFS) containing a 1-column matrix specifying | |

the categorical dependent variable (label). Labels are assumed to be | |

contiguously numbered from 1 $\ldots$ \#classes. Note that, this | |

argument is optional for prediction. | |

\item icpt (default: {\tt 0}): If set to 1 then a constant bias column | |

is added to X. | |

\item tol (default: {\tt 0.001}): Procedure terminates early if the reduction | |

in objective function value is less than tolerance times the initial objective | |

function value. | |

\item reg (default: {\tt 1}): Regularization constant. See details to find | |

out where lambda appears in the objective function. If one were interested | |

in drawing an analogy with C-SVM, then C = 2/lambda. Usually, cross validation | |

is employed to determine the optimum value of lambda. | |

\item maxiter (default: {\tt 100}): The maximum number of iterations. | |

\item model: Location (on HDFS) that contains the learnt weights. | |

\item Log: Location (on HDFS) to collect various metrics (e.g., objective | |

function value etc.) that depict progress across iterations while training. | |

\item fmt (default: {\tt text}): Specifies the output format. Choice of | |

comma-separated values (csv) or as a sparse-matrix (text). | |

\item scores: Location (on HDFS) to store scores for a held-out test set. | |

Note that, this is an optional argument. | |

\item accuracy: Location (on HDFS) to store the accuracy computed on a | |

held-out test set. Note that, this is an optional argument. | |

\item confusion: Location (on HDFS) to store the confusion matrix | |

computed using a held-out test set. Note that, this is an optional | |

argument. | |

\end{itemize} | |

\noindent{\bf Details} | |

Support vector machines learn a classification function by solving the | |

following optimization problem ($L_2$-SVM): | |

\begin{eqnarray*} | |

&\textrm{argmin}_w& \frac{\lambda}{2} ||w||_2^2 + \sum_i \xi_i^2\\ | |

&\textrm{subject to:}& y_i w^{\top} x_i \geq 1 - \xi_i ~ \forall i | |

\end{eqnarray*} | |

where $x_i$ is an example from the training set with its label given by $y_i$, | |

$w$ is the vector of parameters and $\lambda$ is the regularization constant | |

specified by the user. | |

To extend the above formulation (binary class SVM) to the multiclass setting, | |

one standard approache is to learn one binary class SVM per class that | |

separates data belonging to that class from the rest of the training data | |

(one-against-the-rest SVM, see C. Scholkopf, 1995). | |

To account for the missing bias term, one may augment the data with a column | |

of constants which is achieved by setting intercept argument to 1 (C-J Hsieh | |

et al, 2008). | |

This implementation optimizes the primal directly (Chapelle, 2007). It uses | |

nonlinear conjugate gradient descent to minimize the objective function | |

coupled with choosing step-sizes by performing one-dimensional Newton | |

minimization in the direction of the gradient. | |

\\ | |

\noindent{\bf Returns} | |

The learnt weights produced by m-svm.dml are populated into a matrix that | |

has as many columns as there are classes in the training data, and written | |

to file provided on HDFS (see model in section Arguments). The number of rows | |

in this matrix is ncol(X) if intercept was set to 0 during invocation and ncol(X) + 1 | |

otherwise. The bias terms, if used, are placed in the last row. Depending on what | |

arguments are provided during invocation, m-svm-predict.dml may compute one or more | |

of scores, accuracy and confusion matrix in the output format specified. | |

\\ | |

%%\noindent{\bf See Also} | |

%% | |

%%In case of binary classification problems, please consider using a binary class classifier | |

%%learning algorithm, e.g., binary class $L_2$-SVM (see Section \ref{l2svm}) or logistic regression | |

%%(see Section \ref{logreg}). To model the relationship between a scalar dependent variable | |

%%y and one or more explanatory variables X, consider Linear Regression instead (see Section | |

%%\ref{linreg-solver} or Section \ref{linreg-iterative}). | |

%%\\ | |

%% | |

\noindent{\bf Examples} | |

\begin{verbatim} | |

hadoop jar SystemML.jar -f m-svm.dml -nvargs X=/user/biadmin/X.mtx | |

Y=/user/biadmin/y.mtx | |

icpt=0 tol=0.001 | |

reg=1.0 maxiter=100 fmt=csv | |

model=/user/biadmin/weights.csv | |

Log=/user/biadmin/Log.csv | |

\end{verbatim} | |

\begin{verbatim} | |

hadoop jar SystemML.jar -f m-svm-predict.dml -nvargs X=/user/biadmin/X.mtx | |

Y=/user/biadmin/y.mtx | |

icpt=0 fmt=csv | |

model=/user/biadmin/weights.csv | |

scores=/user/biadmin/scores.csv | |

accuracy=/user/biadmin/accuracy.csv | |

confusion=/user/biadmin/confusion.csv | |

\end{verbatim} | |

\noindent{\bf References} | |

\begin{itemize} | |

\item W. T. Vetterling and B. P. Flannery. \newblock{\em Conjugate Gradient Methods in Multidimensions in | |

Numerical Recipes in C - The Art in Scientific Computing.} \newblock W. H. Press and S. A. Teukolsky | |

(eds.), Cambridge University Press, 1992. | |

\item J. Nocedal and S. J. Wright. \newblock{\em Numerical Optimization.} \newblock Springer-Verlag, 1999. | |

\item C-J Hsieh, K-W Chang, C-J Lin, S. S. Keerthi and S. Sundararajan. \newblock {\em A Dual Coordinate | |

Descent Method for Large-scale Linear SVM.} \newblock International Conference of Machine Learning | |

(ICML), 2008. | |

\item Olivier Chapelle. \newblock{\em Training a Support Vector Machine in the Primal.} \newblock Neural | |

Computation, 2007. | |

\item B. Scholkopf, C. Burges and V. Vapnik. \newblock{\em Extracting Support Data for a Given Task.} \newblock International Conference on Knowledge Discovery and Data Mining (ICDM), 1995. | |

\end{itemize} | |