blob: 1e2d1b16ec0d42cd4611edaa11fde96678984b2c [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.4"/>
<title>MADlib: Linear-Algebra Operations</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
$(window).load(resizeHeight);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
jax: ["input/TeX","output/HTML-CSS"],
});
</script><script src="../mathjax/MathJax.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">MADlib
&#160;<span id="projectnumber">1.1</span> <span style="font-size:10pt; font-style:italic"><a href="../latest/./group__grp__linalg.html"> A newer version is available</a></span>
</div>
<div id="projectbrief">User Documentation</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.4 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main&#160;Page</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
<li>
<div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</li>
</ul>
</div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('group__grp__linalg.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Groups</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">Linear-Algebra Operations<div class="ingroups"><a class="el" href="group__grp__early__stage.html">Early Stage Development</a></div></div> </div>
</div><!--header-->
<div class="contents">
<p>In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics.
<a href="#details">More...</a></p>
<div id="dynsection-0" onclick="return toggleVisibility(this)" class="dynheader closed" style="cursor:pointer;">
<img id="dynsection-0-trigger" src="closed.png" alt="+"/> Collaboration diagram for Linear-Algebra Operations:</div>
<div id="dynsection-0-summary" class="dynsummary" style="display:block;">
</div>
<div id="dynsection-0-content" class="dyncontent" style="display:none;">
<center><table><tr><td><div class="center"><iframe scrolling="no" frameborder="0" src="group__grp__linalg.svg" width="432" height="40"><p><b>This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead.</b></p></iframe>
</div>
</td></tr></table></center>
</div>
<dl class="section warning"><dt>Warning</dt><dd><em> This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change. </em></dd></dl>
<dl class="section user"><dt>About:</dt><dd></dd></dl>
<p>The linalg module consists of useful utility functions for basic linear algebra operations. Several of these functions can be used while implementing new algorithms.</p>
<p>Refer to the file for documentation on each of the utlity functions.</p>
<p>Linear-algebra functions.</p>
<dl class="section see"><dt>See Also</dt><dd>File <a class="el" href="linalg_8sql__in.html" title="SQL functions for linear algebra. ">linalg.sql_in</a> documenting the SQL functions.</dd></dl>
<p>Let \(A\) be a \(mxn\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows: </p>
<p class="formulaDsp">
\[ A = U \Sigma V^T, \]
</p>
<p> where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the {singular values}.</p>
<dl class="section user"><dt>Input:</dt><dd>The input table can be two forms:<ul>
<li>Dense matrix representation The table contains a 'row_id' column that identifies each row. Further, the other columns are assumed to be the data for the matrix represented in two possible forms, illustrated in a the 2x2 matrix example below:<ol type="1">
<li><pre>
row_id col1 col2
row1 1 1 0
row2 2 0 1
</pre></li>
<li><pre>
row_id row_vec
row1 1 {1, 0}
row2 2 {0, 1}
</pre></li>
</ol>
</li>
<li>Sparse matrix representation An example representation is given below: <pre>
row_id col_id value
row1 1 1 1
row2 2 2 1
</pre> The 'row_id' represents the row number, 'col_id' represents the column id and the 'value' represents the matrix value at ['row_id', 'col_id']</li>
</ul>
</dd></dl>
<h2>The output and summary table </h2>
<h2>OUTPUT TABLE </h2>
<p>Ouptut for eigen vectors/values will be in the following 3 tables:</p>
<ul>
<li>Left singular matrix: Table named &lt;output_table_prefix&gt;_left (e.g. ‘netflix_u’)</li>
<li>Right singular matrix: Table named &lt;output_table_prefix&gt;_right (e.g. ‘netflix_v’)</li>
<li>Singular values: Table named &lt;output_table_prefix&gt;_s (e.g. ‘netflix_s’) The singular vector tables are of the format: <div class="fragment"><div class="line">row_id INTEGER, -- ID corresponding to each eigen value (in decreasing order)</div>
<div class="line">row_vec FLOAT8[] -- Singular vector elements <span class="keywordflow">for</span> <span class="keyword">this</span> row_id</div>
</div><!-- fragment --> Each array is of size k The singular values table is in a sparse table format: <div class="fragment"><div class="line">row_id INTEGER, -- i <span class="keywordflow">for</span> ith eigen value</div>
<div class="line">col_id INTEGER, -- i <span class="keywordflow">for</span> ith eigen value (same as row_id)</div>
<div class="line">value FLOAT8 -- Eigen Value</div>
</div><!-- fragment --> <hr/>
<h2>RESULT SUMMARY TABLE </h2>
</li>
</ul>
<p>Result summary table will be in the following format: </p>
<div class="fragment"><div class="line">rows_used INTEGER, -- Number of rows used <span class="keywordflow">for</span> SVD calculation</div>
<div class="line">exec_time FLOAT8, -- Total time <span class="keywordflow">for</span> executing SVD</div>
<div class="line">iter INTEGER, -- Total number of iterations run</div>
<div class="line">recon_error FLOAT8 -- Total quality score (i.e. approximation quality) <span class="keywordflow">for</span> <span class="keyword">this</span> set of</div>
<div class="line"> orthonormal basis</div>
<div class="line">relative_recon_error FLOAT8 -- relative quality score</div>
</div><!-- fragment --><p>In the result summary table, the reconstruction error is computed as \( \sqrt{mean((X - USV^T)_{ij}^2)} \), where the average is over all elements of the matrices. The relative reconstruction error is then computed as ratio of the reconstruction error and \( \sqrt{mean(X_{ij}^2)} \).</p>
<dl class="section user"><dt>Usage:</dt><dd>Dense matrices: <div class="fragment"><div class="line">SELECT {schema_madlib}.svd(</div>
<div class="line"> source_table, -- TEXT, Source table name (dense matrix)</div>
<div class="line"> output_table_prefix, -- TEXT, Prefix <span class="keywordflow">for</span> output tables</div>
<div class="line"> row_id, -- TEXT, ID <span class="keywordflow">for each</span> row</div>
<div class="line"> k, -- INTEGER, Number of singular vectors to compute</div>
<div class="line"> nIterations, -- INTEGER, Number of iterations to run (OPTIONAL)</div>
<div class="line"> result_summary_table, -- TEXT Table name to store result summary (OPTIONAL)</div>
<div class="line">);</div>
</div><!-- fragment --> <pre class="fragment">Sparse matrices:
</pre> <div class="fragment"><div class="line">SELECT {schema_madlib}.svd_sparse(</div>
<div class="line"> source_table, -- TEXT, Source table name (sparse matrix)</div>
<div class="line"> output_table_prefix, -- TEXT, Prefix <span class="keywordflow">for</span> output tables</div>
<div class="line"> row_id, -- TEXT, ID <span class="keywordflow">for each</span> row</div>
<div class="line"> col_id, -- TEXT, ID <span class="keywordflow">for each</span> column</div>
<div class="line"> value, -- TEXT, Non-zero values of the sparse matrix</div>
<div class="line"> row_dim, -- INTEGER, Row dimension of sparse matrix</div>
<div class="line"> col_dim, -- INTEGER, Col dimension of sparse matrix</div>
<div class="line"> k, -- INTEGER, Number of singular vectors to compute</div>
<div class="line"> nIterations, -- INTEGER, Number of iterations to run (OPTIONAL)</div>
<div class="line"> result_summary_table -- TEXT Table name to store result summary (OPTIONAL)</div>
<div class="line">);</div>
</div><!-- fragment --> <pre class="fragment">Block matrices:
</pre> <div class="fragment"><div class="line">SELECT {schema_madlib}.svd_block(</div>
<div class="line"> source_table, -- TEXT, Source table name (block matrix)</div>
<div class="line"> output_table_prefix, -- TEXT, Prefix <span class="keywordflow">for</span> output tables</div>
<div class="line"> k, -- INTEGER, Number of singular vectors to compute</div>
<div class="line"> nIterations, -- INTEGER, Number of iterations to run (OPTIONAL)</div>
<div class="line"> result_summary_table -- TEXT Table name to store result summary (OPTIONAL)</div>
<div class="line">);</div>
</div><!-- fragment --> <pre class="fragment">Native implementation for sparse matrix:
</pre> <div class="fragment"><div class="line">SELECT {schema_madlib}.svd_sparse_native(</div>
<div class="line"> source_table, -- TEXT, Source table name (sparse matrix)</div>
<div class="line"> output_table_prefix, -- TEXT, Prefix <span class="keywordflow">for</span> output tables</div>
<div class="line"> row_id, -- TEXT, ID <span class="keywordflow">for each</span> row</div>
<div class="line"> col_id, -- TEXT, ID <span class="keywordflow">for each</span> column</div>
<div class="line"> value, -- TEXT, Non-zero values of the sparse matrix</div>
<div class="line"> row_dim, -- INTEGER, Row dimension of sparse matrix</div>
<div class="line"> col_dim, -- INTEGER, Col dimension of sparse matrix</div>
<div class="line"> k, -- INTEGER, Number of singular vectors to compute</div>
<div class="line"> lanczos_iter, -- INTEGER, Number of iterations to run, optional</div>
<div class="line"> result_summary_table -- TEXT Table name to store result summary, optional</div>
<div class="line">);</div>
</div><!-- fragment --></dd></dl>
<div class="fragment"><div class="line">CREATE TABLE mat (</div>
<div class="line"> row_id integer,</div>
<div class="line"> row_vec <span class="keywordtype">double</span> precision[]</div>
<div class="line">);</div>
<div class="line"></div>
<div class="line">-- sample input data</div>
<div class="line">COPY mat (row_id, row_vec) FROM stdin;</div>
<div class="line">1 {691,58,899,163,159,533,604,582,269,390}</div>
<div class="line">0 {396,840,353,446,318,886,15,584,159,383}</div>
<div class="line">3 {462,532,787,265,982,306,600,608,212,885}</div>
<div class="line">2 {293,742,298,75,404,857,941,662,846,2}</div>
<div class="line">5 {327,946,368,943,7,516,272,24,591,204}</div>
<div class="line">4 {304,151,337,387,643,753,603,531,459,652}</div>
<div class="line">7 {458,959,774,376,228,354,300,669,718,565}</div>
<div class="line">6 {877,59,260,302,891,498,710,286,864,675}</div>
<div class="line">9 {882,761,398,688,761,405,125,484,222,873}</div>
<div class="line">8 {824,390,818,844,180,943,424,520,65,913}</div>
<div class="line">11 {492,220,576,289,321,261,173,1,44,241}</div>
<div class="line">10 {528,1,860,18,814,242,314,965,935,809}</div>
<div class="line">13 {350,192,211,633,53,783,30,444,176,932}</div>
<div class="line">12 {415,701,221,503,67,393,479,218,219,916}</div>
<div class="line">15 {739,651,678,577,273,935,661,47,373,618}</div>
<div class="line">14 {909,472,871,695,930,455,398,893,693,838}</div>
<div class="line">\.</div>
<div class="line"></div>
<div class="line"> DROP TABLE <span class="keywordflow">if</span> exists svd_u;</div>
<div class="line">DROP TABLE <span class="keywordflow">if</span> exists svd_v;</div>
<div class="line">DROP TABLE <span class="keywordflow">if</span> exists svd_s;</div>
<div class="line">-- SVD <span class="keywordflow">for</span> dense matrices</div>
<div class="line">SELECT {schema_madlib}.svd(<span class="stringliteral">&#39;mat&#39;</span>, <span class="stringliteral">&#39;svd&#39;</span>, <span class="stringliteral">&#39;row_id&#39;</span>, 10);</div>
<div class="line">----------------------------------------------------------------</div>
<div class="line">DROP TABLE <span class="keywordflow">if</span> exists mat_sparse;</div>
<div class="line">SELECT {schema_madlib}.matrix_sparsify(<span class="stringliteral">&#39;mat&#39;</span>, <span class="stringliteral">&#39;mat_sparse&#39;</span>, False);</div>
<div class="line"></div>
<div class="line">DROP TABLE <span class="keywordflow">if</span> exists svd_u;</div>
<div class="line">DROP TABLE <span class="keywordflow">if</span> exists svd_v;</div>
<div class="line">DROP TABLE <span class="keywordflow">if</span> exists svd_s;</div>
<div class="line">-- SVD <span class="keywordflow">for</span> sparse matrices</div>
<div class="line">SELECT {schema_madlib}.svd_sparse(<span class="stringliteral">&#39;mat_sparse&#39;</span>, <span class="stringliteral">&#39;svd&#39;</span>, <span class="stringliteral">&#39;row_id&#39;</span>, <span class="stringliteral">&#39;col_id&#39;</span>, <span class="stringliteral">&#39;value&#39;</span>, 10);*</div>
</div><!-- fragment --><dl class="section user"><dt>Technical Background</dt><dd>In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics.</dd></dl>
<p>Let \(A\) be a \(m \times n\) matrix, where \(m \ge n\). Then \(A\) can be decomposed as follows: </p>
<p class="formulaDsp">
\[ A = U \Sigma V^T, \]
</p>
<p> where \(U\) is a \(m \times n\) orthonormal matrix, \(\Sigma\) is a \(n \times n\) diagonal matrix, and \(V\) is an \(n \times n\) orthonormal matrix. The diagonal elements of \(\Sigma\) are called the {singular values}.</p>
<p>It is possible to formulate the problem of computing the singular triplets ( \(\sigma_i, u_i, v_i\)) of \(A\) as an eigenvalue problem involving a Hermitian matrix related to \(A\). There are two possible ways of achieving this:</p>
<ol type="1">
<li>With the cross product matrix, \(A^TA\) and \(AA^T\)</li>
<li>With the cyclic matrix <p class="formulaDsp">
\[ H(A) = \begin{bmatrix} 0 &amp; A\\ A^* &amp; 0 \end{bmatrix} \]
</p>
</li>
</ol>
<p>The singular values are the nonnegative square roots of the eigenvalues of the cross product matrix. This approach may imply a severe loss of accuracy in the smallest singular values. The cyclic matrix approach is an alternative that avoids this problem, but at the expense of significantly increasing the cost of the computation.</p>
<p>Computing the cross product matrix explicitly is not recommended, especially in the case of sparse A. Bidiagonalization was proposed by Golub and Kahan [citation?] as a way of tridiagonalizing the cross product matrix without forming it explicitly.</p>
<p>Consider the following decomposition </p>
<p class="formulaDsp">
\[ A = P B Q^T, \]
</p>
<p> where \(P\) and \(Q\) are unitary matrices and \(B\) is an \(m \times n\) upper bidiagonal matrix. Then the tridiagonal matrix \(B*B\) is unitarily similar to \(A*A\). Additionally, specific methods exist that compute the singular values of \(B\) without forming \(B*B\). Therefore, after computing the SVD of B, </p>
<p class="formulaDsp">
\[ B = X\Sigma Y^T, \]
</p>
<p> it only remains to compute the SVD of the original matrix with \(U = PX\) and \(V = QY\). </p>
</div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated on Wed Aug 21 2013 16:09:52 for MADlib by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.4 </li>
</ul>
</div>
</body>
</html>