| <!-- HTML header for doxygen 1.8.4--> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> |
| <meta http-equiv="X-UA-Compatible" content="IE=9"/> |
| <meta name="generator" content="Doxygen 1.8.4"/> |
| <meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/> |
| <title>MADlib: Support Vector Decomposition</title> |
| <link href="tabs.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="jquery.js"></script> |
| <script type="text/javascript" src="dynsections.js"></script> |
| <link href="navtree.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="resize.js"></script> |
| <script type="text/javascript" src="navtree.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(initResizable); |
| $(window).load(resizeHeight); |
| </script> |
| <link href="search/search.css" rel="stylesheet" type="text/css"/> |
| <script type="text/javascript" src="search/search.js"></script> |
| <script type="text/javascript"> |
| $(document).ready(function() { searchBox.OnSelectItem(0); }); |
| </script> |
| <script type="text/x-mathjax-config"> |
| MathJax.Hub.Config({ |
| extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], |
| jax: ["input/TeX","output/HTML-CSS"], |
| }); |
| </script><script src="../mathjax/MathJax.js"></script> |
| <link href="doxygen.css" rel="stylesheet" type="text/css" /> |
| <link href="madlib_extra.css" rel="stylesheet" type="text/css"/> |
| <!-- google analytics --> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| ga('create', 'UA-45382226-1', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| </head> |
| <body> |
| <div id="top"><!-- do not remove this div, it is closed by doxygen! --> |
| <div id="titlearea"> |
| <table cellspacing="0" cellpadding="0"> |
| <tbody> |
| <tr style="height: 56px;"> |
| <td style="padding-left: 0.5em;"> |
| <div id="projectname">MADlib |
|  <span id="projectnumber">1.2</span> <span style="font-size:10pt; font-style:italic"><a href="../latest/./group__grp__svd.html"> A newer version is available</a></span> |
| </div> |
| <div id="projectbrief">User Documentation</div> |
| </td> |
| <!--BEGIN VERSIONS LINKS--> |
| <td style="padding-left: 0.5em;"> |
| <div class="versionlist"><ul> |
| <li class="head">More versions:</li> |
| <li><a href="../v1.1/index.html">v1.1</li> |
| <li><a href="../v1.0/index.html">v1.0</li> |
| <li><a href="../v0.7/index.html">v0.7</li> |
| <li><a href="../v0.5/index.html">v0.5</li></ul> |
| </div> |
| </td> |
| <td> <div id="MSearchBox" class="MSearchBoxInactive"> |
| <span class="left"> |
| <img id="MSearchSelect" src="search/mag_sel.png" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| alt=""/> |
| <input type="text" id="MSearchField" value="Search" accesskey="S" |
| onfocus="searchBox.OnSearchFieldFocus(true)" |
| onblur="searchBox.OnSearchFieldFocus(false)" |
| onkeyup="searchBox.OnSearchFieldChange(event)"/> |
| </span><span class="right"> |
| <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a> |
| </span> |
| </div> |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <!-- end header part --> |
| <!-- Generated by Doxygen 1.8.4 --> |
| <script type="text/javascript"> |
| var searchBox = new SearchBox("searchBox", "search",false,'Search'); |
| </script> |
| </div><!-- top --> |
| <div id="side-nav" class="ui-resizable side-nav-resizable"> |
| <div id="nav-tree"> |
| <div id="nav-tree-contents"> |
| <div id="nav-sync" class="sync"></div> |
| </div> |
| </div> |
| <div id="splitbar" style="-moz-user-select:none;" |
| class="ui-resizable-handle"> |
| </div> |
| </div> |
| <script type="text/javascript"> |
| $(document).ready(function(){initNavTree('group__grp__svd.html','');}); |
| </script> |
| <div id="doc-content"> |
| <!-- window showing the filter options --> |
| <div id="MSearchSelectWindow" |
| onmouseover="return searchBox.OnSearchSelectShow()" |
| onmouseout="return searchBox.OnSearchSelectHide()" |
| onkeydown="return searchBox.OnSearchSelectKey(event)"> |
| <a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark"> </span>Groups</a></div> |
| |
| <!-- iframe showing the search results (closed by default) --> |
| <div id="MSearchResultsWindow"> |
| <iframe src="javascript:void(0)" frameborder="0" |
| name="MSearchResults" id="MSearchResults"> |
| </iframe> |
| </div> |
| |
| <div class="header"> |
| <div class="headertitle"> |
| <div class="title">Support Vector Decomposition<div class="ingroups"><a class="el" href="group__grp__matrix__factorization.html">Matrix Factorization</a></div></div> </div> |
| </div><!--header--> |
| <div class="contents"> |
| |
| <p>In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics. |
| <a href="#details">More...</a></p> |
| <p>Let \(A\) be a \(mxn\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows: </p> |
| <p class="formulaDsp"> |
| \[ A = U \Sigma V^T, \] |
| </p> |
| <p> where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the <em>{singular</em> values}.</p> |
| <dl class="section user"><dt>Input</dt><dd>The input table can be two forms:<ul> |
| <li>Dense matrix representation The table contains a 'row_id' column that identifies each row. Further, the other columns are assumed to be the data for the matrix represented in two possible forms, illustrated in a the 2x2 matrix example below:<ol type="1"> |
| <li><pre> |
| row_id col1 col2 |
| row1 1 1 0 |
| row2 2 0 1 |
| </pre></li> |
| <li><pre> |
| row_id row_vec |
| row1 1 {1, 0} |
| row2 2 {0, 1} |
| </pre></li> |
| </ol> |
| </li> |
| </ul> |
| </dd></dl> |
| <ul> |
| <li>Sparse matrix representation An example representation is given below: <pre> |
| row_id col_id value |
| row1 1 1 1 |
| row2 2 2 1 |
| </pre> The 'row_id' represents the row number, 'col_id' represents the column id and the 'value' represents the matrix value at ['row_id', 'col_id']</li> |
| </ul> |
| <dl class="section user"><dt>Output</dt><dd>Ouptut for eigen vectors/values will be in the following 3 tables:<ul> |
| <li>Left singular matrix: Table named <output_table_prefix>_left (e.g. ‘netflix_u’)</li> |
| <li>Right singular matrix: Table named <output_table_prefix>_right (e.g. ‘netflix_v’)</li> |
| <li>Singular values: Table named <output_table_prefix>_s (e.g. ‘netflix_s’)</li> |
| </ul> |
| </dd></dl> |
| <p>The singular vector tables are of the format: </p> |
| <pre class="fragment">row_id INTEGER, -- ID corresponding to each eigen value (in decreasing order) |
| row_vec FLOAT8[] -- Singular vector elements for this row_id. Each array is of size k |
| </pre><p>The singular values table is in a sparse table format: </p> |
| <pre class="fragment">row_id INTEGER, -- i for ith eigen value |
| col_id INTEGER, -- i for ith eigen value (same as row_id) |
| value FLOAT8 -- Eigen Value |
| </pre><p>Result summary table will be in the following format: </p> |
| <pre class="fragment">rows_used INTEGER, -- Number of rows used for SVD calculation |
| exec_time FLOAT8, -- Total time for executing SVD |
| iter INTEGER, -- Total number of iterations run |
| recon_error FLOAT8 -- Total quality score (i.e. approximation quality) for this set of |
| orthonormal basis |
| relative_recon_error FLOAT8 -- relative quality score |
| </pre><p>In the result summary table, the reconstruction error is computed as \( \sqrt{mean((X - USV^T)_{ij}^2)} \), where the average is over all elements of the matrices. The relative reconstruction error is then computed as ratio of the reconstruction error and \( \sqrt{mean(X_{ij}^2)} \).</p> |
| <dl class="section user"><dt>Usage</dt><dd>Dense matrices: <pre class="fragment">SELECT {schema_madlib}.svd( |
| source_table, -- TEXT, Source table name (dense matrix) |
| output_table_prefix, -- TEXT, Prefix for output tables |
| row_id, -- TEXT, ID for each row |
| k, -- INTEGER, Number of singular vectors to compute |
| nIterations, -- INTEGER, Number of iterations to run (OPTIONAL) |
| result_summary_table, -- TEXT Table name to store result summary (OPTIONAL) |
| ); |
| </pre></dd></dl> |
| <p>Sparse matrices: </p> |
| <pre class="fragment">SELECT {schema_madlib}.svd_sparse( |
| source_table, -- TEXT, Source table name (sparse matrix) |
| output_table_prefix, -- TEXT, Prefix for output tables |
| row_id, -- TEXT, ID for each row |
| col_id, -- TEXT, ID for each column |
| value, -- TEXT, Non-zero values of the sparse matrix |
| row_dim, -- INTEGER, Row dimension of sparse matrix |
| col_dim, -- INTEGER, Col dimension of sparse matrix |
| k, -- INTEGER, Number of singular vectors to compute |
| nIterations, -- INTEGER, Number of iterations to run (OPTIONAL) |
| result_summary_table -- TEXT Table name to store result summary (OPTIONAL) |
| ); |
| </pre><p>Block matrices: </p> |
| <pre class="fragment">SELECT {schema_madlib}.svd_block( |
| source_table, -- TEXT, Source table name (block matrix) |
| output_table_prefix, -- TEXT, Prefix for output tables |
| k, -- INTEGER, Number of singular vectors to compute |
| nIterations, -- INTEGER, Number of iterations to run (OPTIONAL) |
| result_summary_table -- TEXT Table name to store result summary (OPTIONAL) |
| ); |
| </pre><p>Native implementation for sparse matrix: </p> |
| <pre class="fragment">SELECT {schema_madlib}.svd_sparse_native( |
| source_table, -- TEXT, Source table name (sparse matrix) |
| output_table_prefix, -- TEXT, Prefix for output tables |
| row_id, -- TEXT, ID for each row |
| col_id, -- TEXT, ID for each column |
| value, -- TEXT, Non-zero values of the sparse matrix |
| row_dim, -- INTEGER, Row dimension of sparse matrix |
| col_dim, -- INTEGER, Col dimension of sparse matrix |
| k, -- INTEGER, Number of singular vectors to compute |
| lanczos_iter, -- INTEGER, Number of iterations to run, optional |
| result_summary_table -- TEXT Table name to store result summary, optional |
| ); |
| </pre><dl class="section user"><dt>Examples</dt><dd><pre class="fragment">CREATE TABLE mat ( |
| row_id integer, |
| row_vec double precision[] |
| ); |
| -- sample input data |
| COPY mat (row_id, row_vec) FROM stdin; |
| 1 {691,58,899,163,159,533,604,582,269,390} |
| 0 {396,840,353,446,318,886,15,584,159,383} |
| 3 {462,532,787,265,982,306,600,608,212,885} |
| 2 {293,742,298,75,404,857,941,662,846,2} |
| 5 {327,946,368,943,7,516,272,24,591,204} |
| 4 {304,151,337,387,643,753,603,531,459,652} |
| 7 {458,959,774,376,228,354,300,669,718,565} |
| 6 {877,59,260,302,891,498,710,286,864,675} |
| 9 {882,761,398,688,761,405,125,484,222,873} |
| 8 {824,390,818,844,180,943,424,520,65,913} |
| 11 {492,220,576,289,321,261,173,1,44,241} |
| 10 {528,1,860,18,814,242,314,965,935,809} |
| 13 {350,192,211,633,53,783,30,444,176,932} |
| 12 {415,701,221,503,67,393,479,218,219,916} |
| 15 {739,651,678,577,273,935,661,47,373,618} |
| 14 {909,472,871,695,930,455,398,893,693,838} |
| \. |
| DROP TABLE if exists svd_u; |
| DROP TABLE if exists svd_v; |
| DROP TABLE if exists svd_s; |
| -- SVD for dense matrices |
| SELECT {schema_madlib}.svd('mat', 'svd', 'row_id', 10); |
| ---------------------------------------------------------------- |
| DROP TABLE if exists mat_sparse; |
| SELECT {schema_madlib}.matrix_sparsify('mat', 'mat_sparse', False); |
| DROP TABLE if exists svd_u; |
| DROP TABLE if exists svd_v; |
| DROP TABLE if exists svd_s; |
| -- SVD for sparse matrices |
| SELECT {schema_madlib}.svd_sparse('mat_sparse', 'svd', 'row_id', 'col_id', 'value', 10); |
| </pre></dd></dl> |
| <dl class="section user"><dt>Technical Background</dt><dd>In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics. Let \(A\) be a \(m \times n\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows: <p class="formulaDsp"> |
| \[ A = U \Sigma V^T, \] |
| </p> |
| where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the <em>{singular</em> values}. It is possible to formulate the problem of computing the singular triplets ( \(\sigma_i, u_i, v_i\)) of \(A\) as an eigenvalue problem involving a Hermitian matrix related to \(A\). There are two possible ways of achieving this:<ol type="1"> |
| <li>With the cross product matrix, \(A^TA\) and \(AA^T\)</li> |
| <li>With the cyclic matrix <p class="formulaDsp"> |
| \[ H(A) = \begin{bmatrix} 0 & A\\ A^* & 0 \end{bmatrix} \] |
| </p> |
| The singular values are the nonnegative square roots of the eigenvalues of the cross product matrix. This approach may imply a severe loss of accuracy in the smallest singular values. The cyclic matrix approach is an alternative that avoids this problem, but at the expense of significantly increasing the cost of the computation. Computing the cross product matrix explicitly is not recommended, especially in the case of sparse A. Bidiagonalization was proposed by Golub and Kahan [citation?] as a way of tridiagonalizing the cross product matrix without forming it explicitly. Consider the following decomposition <p class="formulaDsp"> |
| \[ A = P B Q^T, \] |
| </p> |
| where \(P\) and \(Q\) are unitary matrices and \(B\) is an \(m \times n\) upper bidiagonal matrix. Then the tridiagonal matrix \(B*B\) is unitarily similar to \(A*A\). Additionally, specific methods exist that compute the singular values of \(B\) without forming \(B*B\). Therefore, after computing the SVD of B, <p class="formulaDsp"> |
| \[ B = X\Sigma Y^T, \] |
| </p> |
| it only remains to compute the SVD of the original matrix with \(U = PX\) and \(V = QY\). </li> |
| </ol> |
| </dd></dl> |
| </div><!-- contents --> |
| </div><!-- doc-content --> |
| <!-- start footer part --> |
| <div id="nav-path" class="navpath"><!-- id is needed for treeview function! --> |
| <ul> |
| <li class="footer">Generated on Thu Jan 9 2014 20:35:40 for MADlib by |
| <a href="http://www.doxygen.org/index.html"> |
| <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.4 </li> |
| </ul> |
| </div> |
| </body> |
| </html> |