Documentation: Correct + improve linear systems

commit: e9cc661df7f5b5eee7f715e5157fe550929e8b97 [log] [tgz]
author: Mark Wellons <mwellons@gopivotal.com> Fri Aug 09 15:24:13 2013 -0700
committer: Rahul Iyer <riyer@gopivotal.com> Fri Aug 09 15:25:43 2013 -0700
tree: bfbc160a1c00371c77de2574f89d7e841516a0b3
parent: 01f256ef85a93ed3b3a5cd68a2e8f2b19df93437 [diff]
diff --git a/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in b/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in
index 553ee7c..3ab5a57 100644
--- a/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in
+++ b/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in

@@ -15,10 +15,24 @@
 /**
 @addtogroup grp_dense_linear_solver
 
+
+<div class ="toc"><b>Contents</b>
+<ul>
+<li class="level1"><a href="#dls_about">About</a></li>
+<li class="level1"><a href="#dls_online_help">Online Help</a></li>
+<li class="level1"><a href="#dls_function">Function Syntax</a></li>
+<li class="level1"><a href="#dls_args">Arguments</a></li>
+<li class="level1"><a href="#dls_opt_params">Optimizer Parameters</a></li>
+<li class="level1"><a href="#dls_output">Output Tables</a></li>
+<li class="level1"><a href="#dls_examples">Examples</a></li>
+</ul>
+</div>
+
+@anchor dls_about
 @about
 
-The linear systems module implements solution methods for systems of a consistent
-linear equations.
+The linear systems module implements solution methods for systems of consistent
+linear equations.  Systems of linear equations take the form: 
 \f[
   Ax = b
 \f]
@@ -27,9 +41,10 @@
 We assume that there are no rows of \f$A\f$ where all elements are zero. 
 The algorithms implemented in this module can handle large dense
 linear systems. Currently, the algorithms implemented in this module
-solve the lienar system by a direct decomposition. Hence, these methods are 
+solve the linear system by a direct decomposition. Hence, these methods are 
 known as <em>direct method</em>. 
 
+@anchor dls_online_help
 @par Online Help
 
 View short help messages using the following statements:
@@ -44,21 +59,21 @@
 SELECT madlib.linear_solver_dense('direct');
 @endverbatim
 
-@usage
+@anchor dls_function
+@par Function Syntax
 
 <pre>
-SELECT {schema_madlib}.linear_solve_dense (
-    'tbl_source',      -- Data table
-    'tbl_result',      -- Result table
-    'row_id',          -- Name of column containing row_id
-    'left_hand_side',  -- Left Hand Side of the equations
-    'right_hand_side', -- Right Hand side of the equations
-    'grouping_cols',   -- Grouping columns (Default: NULL)
-    'optimizer',       -- Name of optimizer. Default: 'direct'
-    'optimizer_params' -- Text array of optimizer parameters
-);
+SELECT linear_solver_dense(tbl_source, tbl_result, row_id, LHS, 
+                RHS, grouping_col := NULL, optimizer := 'direct', 
+                optimizer_params := 'algorithm = householderqr');
 </pre>
 
+@anchor dls_args
+@par Arguments
+
+<DL class="arglist">
+<DT>tbl_source</DT>
+<DD>Text value. The name of the table containing the training data.
 The input data is expected to be of the following form:
 <pre>{TABLE|VIEW} <em>sourceName</em> (
     ...
@@ -68,30 +83,12 @@
     ...
 )</pre>
 
-Here, each row represents a single equation using. The <em> rhs </em> refers
-to the right hand side of the equations while the <em> lhs_array </em>
+Here, each row represents a single equation using. The <em> right_hand_side 
+</em> refers
+to the right hand side of the equations while the <em> left_hand_side </em>
 refers to the multipliers on the variables on the left hand side of the same
 equations. 
-
-Output is stored in the <em>tbl_result</em>
-@verbatim
-    tbl_result      |   Data Types
---------------------|---------------------
-solution            | DOUBLE PRECISION[]
-residual_norm       | DOUBLE PRECISIOn
-iters               | INTEGER
-@endverbatim
-
-@par Syntax
-
-<pre>
-SELECT dense_linear_sytems('tbl_source', 'tbl_result', 'row_id', 'LHS', 
-                'RHS', NULL, 'direct', 'algorithm = householderqr');
-</pre>
-
-<DL>
-<DT>tbl_source</DT>
-<DD>Text value. The name of the table containing the training data.</DD>
+</DD>
 
 <DT>tbl_result</DT>
 <DD>Text value. The name of the table where the output is saved.</DD>
@@ -102,12 +99,12 @@
 \note For a system with N equations, the row_id's must be a continuous
 range of integers from \f$ 0 \ldots n-1 \f$.
 
-<DT>left_hand_size</DT>
-<DD>Text value. The name of the column storing the 'left hand size' of the 
+<DT>LHS</DT>
+<DD>Text value. The name of the column storing the 'left hand side' of the 
 equations stored as an array.</DD>
 
-<DT>right_hand_size</DT>
-<DD>Text value. The name of the column storing the 'right hand size' of the 
+<DT>RHS</DT>
+<DD>Text value. The name of the column storing the 'right hand side' of the 
 equations.</DD>
 
 <DT>grouping_col (optional) </DT>
@@ -121,13 +118,13 @@
 <DD>Text value. Optimizer specific parameters. Default: NULL.</DD>
 </DL>
 
-
+@anchor dls_opt_params
 @par Optimizer Parameters
 
 For each optimizer, there are specific parameters that can be tuned
 for better performance.
-
-<DL>
+\par
+<DL class="arglist">
 <DT>algorithm (default: householderqr)</dT>
 <DD>
 
@@ -148,16 +145,19 @@
        llt                  | Pos. Definite    |  +++  |  +
        ldlt                 | Pos. or Neg Def  |  +++  |  ++
 
+    For speed '++' is faster than '+', which is faster than '-'.
+    For accuracy '+++' is better than '++'.
+
     More details about the individual algorithms can be found on the  <a href="http://eigen.tuxfamily.org/dox-devel/group__TutorialLinearAlgebra.html"> Eigen documentation</a>.  Eigen is an open source library for linear algebra.
 
 
 </DD>
 </DL>
 
-
+@anchor dls_output
 @par Output statistics 
-
-<DL>
+Output is stored in the <em>tbl_result</em> table.  
+<DL class="arglist">
 <DT>solution</dT>
 <DD>
 The solution is an array (of double precision) with the variables in the same
@@ -167,8 +167,8 @@
 
 <DT>residual_norm</dT>
 <DD>
-Computes the scaled residual norm, defined as \f$ \frac{|Ax - b|}{|b|} \f$
-gives the user an indication of the accuracy of the solution.
+Computes the scaled residual norm, defined as \f$ \frac{|Ax - b|}{|b|} \f$. 
+This value is an indication of the accuracy of the solution.
 </DD>
 
 <DT>iters</dT>
@@ -184,15 +184,15 @@
 
 
 
-
+@anchor dls_examples
 @examp
 
 -#  Create the sample data set:
 \verbatim
-sql>  CREATE TABLE source_table (id INTEGER NOT NULL, 
+sql>  CREATE TABLE linear_systems_test_data (id INTEGER NOT NULL, 
                                  lhs DOUBLE PRECISION[],
                                  rhs DOUBLE PRECISION);
-sql> INSERT INTO linear_systems_test_data(id, a, b) VaLUES
+sql> INSERT INTO linear_systems_test_data(id, lhs, rhs) VaLUES
         (0, ARRAY[1,0,0], 20),
         (1, ARRAY[0,1,0], 15),
         (2, ARRAY[0,0,1], 20);
@@ -200,11 +200,11 @@
 
 -# Solve the linear systems with default parameters
 \verbatim
-sql> SELECT madlib.linregr_train('source_table', 
-                                 'output_table', 
-                                 'id',
-                                 'lhs',
-                                 'rhs');
+sql> SELECT madlib.linear_solver_dense('linear_systems_test_data', 
+                                       'output_table', 
+                                       'id',
+                                       'lhs',
+                                       'rhs');
 \endverbatim
 
 -# Obtain the output from the output table
@@ -224,17 +224,14 @@
        'linear_systems_test_data',
        'result_table',
        'id',
-       'a',
-       'b',
+       'lhs',
+       'rhs',
         NULL,
        'direct',
        'algorithm=llt'
        );
 \endverbatim
 
-\anchor Background
-@background
-
 
 @sa File dense_linear_sytems.sql_in documenting the SQL functions.
 

diff --git a/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in b/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in
index 286b4e9..19b2aff 100644
--- a/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in
+++ b/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in

@@ -15,17 +15,30 @@
 /**
 @addtogroup grp_sparse_linear_solver
 
+<div class ="toc"><b>Contents</b>
+<ul>
+<li class="level1"><a href="#sls_about">About</a></li>
+<li class="level1"><a href="#sls_online_help">Online Help</a></li>
+<li class="level1"><a href="#sls_function">Function Syntax</a></li>
+<li class="level1"><a href="#sls_args">Arguments</a></li>
+<li class="level1"><a href="#sls_opt_params">Optimizer Parameters</a></li>
+<li class="level1"><a href="#sls_output">Output Tables</a></li>
+<li class="level1"><a href="#sls_examples">Examples</a></li>
+</ul>
+</div>
+
+@anchor sls_about
 @about
 
 The sparse linear systems module implements solution methods for systems of a consistent
-linear equations.
+linear equations. Systems of linear equations take the form:
 \f[
   Ax = b
 \f]
 
 where \f$x \in \mathbb{R}^{n}\f$, \f$A \in \mathbb{R}^{m \times n} \f$ and \f$b \in \mathbb{R}^{m}\f$.
-This module accepts sparse matrix input formats for \f$A\f$ and \f$b\f$.We assume
-that all there are no rows of \f$A\f$ where all elements are zero.
+This module accepts sparse matrix input formats for \f$A\f$ and \f$b\f$.
+We assume that there are no rows of \f$A\f$ where all elements are zero.
 
 \note Algorithms with fail if there is an row of the input matrix containing all zeros.
 
@@ -33,6 +46,7 @@
 square linear systems. Currently, the algorithms implemented in this module
 solve the linear system using direct or iterative methods.
 
+@anchor sls_online_help
 @par Online Help
 
 View short help messages using the following statements:
@@ -50,25 +64,23 @@
 SELECT madlib.linear_solver_sparse('iterative');
 @endverbatim
 
-@usage
+@anchor sls_function
+@par Syntax
 
 <pre>
-SELECT {schema_madlib}.linear_solve_sparse (
-    'tbl_lhs_source',  -- Data table containing the LHS matrix
-    'tbl_rhs_source',  -- Data table containing the RHS vector
-    'tbl_result',      -- Result table
-    'lhs_row_id',      -- Name of column with row_id for the LHS
-    'lhs_col_id',      -- Name of column with col_id for the LHS
-    'lhs_value',       -- Name of column with value for the LHS
-    'rhs_row_id',      -- Name of column with row_id for the RHS
-    'rhs_col_id',      -- Name of column with col_id for the RHS
-    'num_vars',        -- Number of variables in the system of equations
-    'grouping_cols',   -- Grouping columns (Default: NULL)
-    'optimizer',       -- Name of optimizer. Default: 'direct'
-    'optimizer_params' -- Text array of optimizer parameters
-);
+linear_solver_sparse(tbl_source_lhs, tbl_source_rhs, tbl_result,
+                     row_id, LHS, RHS, grouping_cols := NULL,
+                     optimizer := 'direct',
+                     optimizer_params := 'algorithm = llt');
 </pre>
 
+@anchor sls_args
+@par Arguments
+
+
+<DL class="arglist">
+<DT>LHS</DT>
+<DD>Text value. The name of the table containing the left hand side matrix.
 For the LHS matrix, the input data is expected to be of the following form:
 <pre>{TABLE|VIEW} <em>sourceName</em> (
     ...
@@ -78,6 +90,15 @@
     ...
 )</pre>
 
+
+Here, each row represents a single equation using. The <em> rhs </em> refers
+to the right hand side of the equations while the <em> lhs</em>
+refers to the multipliers on the variables on the left hand side of the same
+equations.
+</DD>
+
+<DT>RHS</DT>
+<DD>Text value. The name of the table containing the right hand side vector.
 For the RHS matrix, the input data is expected to be of the following form:
 <pre>{TABLE|VIEW} <em>sourceName</em> (
     ...
@@ -151,13 +172,13 @@
 <DD>Text value. Optimizer specific parameters. Default: NULL.</DD>
 </DL>
 
-
+@anchor sls_opt_params
 @par Optimizer Parameters
 
-For each optimizer, there are specific parameters that can be used to tuned
+For each optimizer, there are specific parameters that can be tuned
 for better performance.
 
-<DL>
+<DL class="arglist">
 <DT>algorithm (default: ldlt)</dT>
 <DD>
 
@@ -170,8 +191,12 @@
 
       Algorithm          | Contitions on A  | Speed | Memory
       ----------------------------------------------------------
-      llt                | Sym. Pos Def     |  ++   |  ++
-      ldlt               | Sym. Pos Def     |  ++   |  ++
+      llt                | Sym. Pos Def     |  ++   |  ---
+      ldlt               | Sym. Pos Def     |  ++   |  ---
+
+    For speed '++' is faster than '+', which is faster than '-'.
+    For accuracy '+++' is better than '++'.
+    For memory, '-' uses less memory than '--'.
 
  Note: ldlt is often preferred over llt
 
@@ -186,10 +211,13 @@
 
       Algorithm            | Contitions on A  | Speed | Memory | Convergence
       ----------------------------------------------------------------------
-      cg-mem               | Sym. Pos Def     |  +++  |   +    |    ++
-      bicgstab-mem         | Square           |  ++   |   +    |    +
-      precond-cg-mem       | Sym. Pos Def     |  ++   |   +    |    +++
-      precond-bicgstab-mem | Square           |  +    |   +    |    ++
+      cg-mem               | Sym. Pos Def     |  +++  |   -    |    ++
+      bicgstab-mem         | Square           |  ++   |   -    |    +
+      precond-cg-mem       | Sym. Pos Def     |  ++   |   -    |    +++
+      precond-bicgstab-mem | Square           |  +    |   -    |    ++
+
+    For memory, '-' uses less memory than '--'.
+    For speed, '++' is faster than '+'.
 
 Details:
 -#  <b>cg-mem: </b> In memory conjugate gradient with diagonal preconditioners.
@@ -206,9 +234,10 @@
 </DL>
 
 
+@anchor sls_output
 @par Output statistics
 
-<DL>
+<DL class="arglist">
 <DT>solution</dT>
 <DD>
 The solution is an array (of double precision) with the variables in the same
@@ -231,7 +260,7 @@
 </DL>
 
 
-
+@anchor sls_examples
 @examp
 
 -#  Create the sample data set:
@@ -264,8 +293,9 @@
 
 -# Solve the linear systems with default parameters
 \verbatim
-sql> SELECT madlib.linregr_train('lhs_source_table',
-                                 'rhs_source_table',
+sql> SELECT madlib.linear_solver_sparse(
+                                 'sparse_linear_systems_lhs',
+                                 'sparse_linear_systems_rhs',
                                  'output_table',
                                  'rid',
                                  'cid',
@@ -285,41 +315,42 @@
 \endverbatim
 
 
--# Chose a different algorirhm that the default algorithm
+-# Chose a different algorithm than the default algorithm
 \verbatim
 drop table if exists result_table;
-select madlib.linear_solver_sparse(
-       'linear_systems_test_data',
-       'result_table',
-       'id',
-       'a',
-       'b',
-        NULL,
-       'direct',
-       'algorithm=llt'
-       );
 
+SELECT madlib.linear_solver_sparse(
+                                 'sparse_linear_systems_lhs',
+                                 'sparse_linear_systems_rhs',
+                                 'output_table',
+                                 'rid',
+                                 'cid',
+                                 'val',
+                                 'rid',
+                                 'val',
+                                 4,
+                                 NULL,
+                                 'direct',
+                                 'algorithm=llt');
 
--# Chose a different algorirhm that the default algorithm
+-# Chose a different algorithm than the default algorithm
 \verbatim
 drop table if exists result_table;
-select madlib.linear_solver_sparse(
-       'linear_systems_test_data',
-       'result_table',
-       'id',
-       'a',
-       'b',
-        NULL,
-       'iterative',
-       'algorithm=cg-mem, toler=1e-5'
-       );
-
+madlib.linear_solver_sparse(
+                             'sparse_linear_systems_lhs',
+                             'sparse_linear_systems_rhs',
+                             'output_table',
+                             'rid',
+                             'cid',
+                             'val',
+                             'rid',
+                             'val',
+                             4,
+                             NULL,
+                             'iterative',
+                             'algorithm=cg-mem, toler=1e-5');
 \endverbatim
 
-\anchor Background
-@background
-
-
 @sa File sparse_linear_sytems.sql_in documenting the SQL functions.
 
 @internal

diff --git a/src/ports/postgres/modules/pca/pca.sql_in b/src/ports/postgres/modules/pca/pca.sql_in
index 17594f8..f8b734a 100644
--- a/src/ports/postgres/modules/pca/pca.sql_in
+++ b/src/ports/postgres/modules/pca/pca.sql_in

@@ -68,7 +68,7 @@
 @par Training Function
 The training functions have the following formats:
 @verbatim
-madlib.pca_project( source_table,  out_table, row_id,
+pca_project( source_table,  out_table, row_id,
     k, grouping_cols:= NULL,
     lanczos_iter := min(k+40, <smallest_matrix_dimension>),
     use_correlation := False, result_summary_table := NULL)
@@ -101,7 +101,7 @@
 two standard  MADlib dense matrix formats, and  a sparse input table
  should be in the standard MADlib sparse matrix format.
 
-The  two standard MADlib dense matrix formats are
+The two standard MADlib dense matrix formats are
 <pre>{TABLE|VIEW} <em>source_table</em> (
     <em>row_id</em> INTEGER,
     row_vec FLOAT8[],
@@ -183,6 +183,7 @@
 
 The output is divided into three tables (one of which is optional).
 The output table (<em>'out_table'</em> above) encodes the principal components with the
+
 <em>k</em> highest eigenvalues. The table has the following columns:
 \par
 <DL class="arglist">

diff --git a/src/ports/postgres/modules/regress/linear.sql_in b/src/ports/postgres/modules/regress/linear.sql_in
index faeb062..091b5f1 100644
--- a/src/ports/postgres/modules/regress/linear.sql_in
+++ b/src/ports/postgres/modules/regress/linear.sql_in

@@ -28,7 +28,7 @@
 </ul></div>
 
 @anchor about
-@about 
+@about
 Ordinary Least Squares Regression, also called Linear Regression, is a
 statistical model used to fit linear models.
 
@@ -69,7 +69,7 @@
 
 @anchor output
 @par Output Table
-The output table produced by the linear regression training function contains the following columns. 
+The output table produced by the linear regression training function contains the following columns.
 <DL class="arglist">
 <DT>\<...></DT>
 <DD>Any grouping columns provided during training.
commit	e9cc661df7f5b5eee7f715e5157fe550929e8b97	[log] [tgz]
author	Mark Wellons <mwellons@gopivotal.com>	Fri Aug 09 15:24:13 2013 -0700
committer	Rahul Iyer <riyer@gopivotal.com>	Fri Aug 09 15:25:43 2013 -0700
tree	bfbc160a1c00371c77de2574f89d7e841516a0b3
parent	01f256ef85a93ed3b3a5cd68a2e8f2b19df93437 [diff]