Documentation: Correct + improve linear systems
diff --git a/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in b/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in
index 553ee7c..3ab5a57 100644
--- a/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in
+++ b/src/ports/postgres/modules/linear_systems/dense_linear_systems.sql_in
@@ -15,10 +15,24 @@
/**
@addtogroup grp_dense_linear_solver
+
+<div class ="toc"><b>Contents</b>
+<ul>
+<li class="level1"><a href="#dls_about">About</a></li>
+<li class="level1"><a href="#dls_online_help">Online Help</a></li>
+<li class="level1"><a href="#dls_function">Function Syntax</a></li>
+<li class="level1"><a href="#dls_args">Arguments</a></li>
+<li class="level1"><a href="#dls_opt_params">Optimizer Parameters</a></li>
+<li class="level1"><a href="#dls_output">Output Tables</a></li>
+<li class="level1"><a href="#dls_examples">Examples</a></li>
+</ul>
+</div>
+
+@anchor dls_about
@about
-The linear systems module implements solution methods for systems of a consistent
-linear equations.
+The linear systems module implements solution methods for systems of consistent
+linear equations. Systems of linear equations take the form:
\f[
Ax = b
\f]
@@ -27,9 +41,10 @@
We assume that there are no rows of \f$A\f$ where all elements are zero.
The algorithms implemented in this module can handle large dense
linear systems. Currently, the algorithms implemented in this module
-solve the lienar system by a direct decomposition. Hence, these methods are
+solve the linear system by a direct decomposition. Hence, these methods are
known as <em>direct method</em>.
+@anchor dls_online_help
@par Online Help
View short help messages using the following statements:
@@ -44,21 +59,21 @@
SELECT madlib.linear_solver_dense('direct');
@endverbatim
-@usage
+@anchor dls_function
+@par Function Syntax
<pre>
-SELECT {schema_madlib}.linear_solve_dense (
- 'tbl_source', -- Data table
- 'tbl_result', -- Result table
- 'row_id', -- Name of column containing row_id
- 'left_hand_side', -- Left Hand Side of the equations
- 'right_hand_side', -- Right Hand side of the equations
- 'grouping_cols', -- Grouping columns (Default: NULL)
- 'optimizer', -- Name of optimizer. Default: 'direct'
- 'optimizer_params' -- Text array of optimizer parameters
-);
+SELECT linear_solver_dense(tbl_source, tbl_result, row_id, LHS,
+ RHS, grouping_col := NULL, optimizer := 'direct',
+ optimizer_params := 'algorithm = householderqr');
</pre>
+@anchor dls_args
+@par Arguments
+
+<DL class="arglist">
+<DT>tbl_source</DT>
+<DD>Text value. The name of the table containing the training data.
The input data is expected to be of the following form:
<pre>{TABLE|VIEW} <em>sourceName</em> (
...
@@ -68,30 +83,12 @@
...
)</pre>
-Here, each row represents a single equation using. The <em> rhs </em> refers
-to the right hand side of the equations while the <em> lhs_array </em>
+Here, each row represents a single equation using. The <em> right_hand_side
+</em> refers
+to the right hand side of the equations while the <em> left_hand_side </em>
refers to the multipliers on the variables on the left hand side of the same
equations.
-
-Output is stored in the <em>tbl_result</em>
-@verbatim
- tbl_result | Data Types
---------------------|---------------------
-solution | DOUBLE PRECISION[]
-residual_norm | DOUBLE PRECISIOn
-iters | INTEGER
-@endverbatim
-
-@par Syntax
-
-<pre>
-SELECT dense_linear_sytems('tbl_source', 'tbl_result', 'row_id', 'LHS',
- 'RHS', NULL, 'direct', 'algorithm = householderqr');
-</pre>
-
-<DL>
-<DT>tbl_source</DT>
-<DD>Text value. The name of the table containing the training data.</DD>
+</DD>
<DT>tbl_result</DT>
<DD>Text value. The name of the table where the output is saved.</DD>
@@ -102,12 +99,12 @@
\note For a system with N equations, the row_id's must be a continuous
range of integers from \f$ 0 \ldots n-1 \f$.
-<DT>left_hand_size</DT>
-<DD>Text value. The name of the column storing the 'left hand size' of the
+<DT>LHS</DT>
+<DD>Text value. The name of the column storing the 'left hand side' of the
equations stored as an array.</DD>
-<DT>right_hand_size</DT>
-<DD>Text value. The name of the column storing the 'right hand size' of the
+<DT>RHS</DT>
+<DD>Text value. The name of the column storing the 'right hand side' of the
equations.</DD>
<DT>grouping_col (optional) </DT>
@@ -121,13 +118,13 @@
<DD>Text value. Optimizer specific parameters. Default: NULL.</DD>
</DL>
-
+@anchor dls_opt_params
@par Optimizer Parameters
For each optimizer, there are specific parameters that can be tuned
for better performance.
-
-<DL>
+\par
+<DL class="arglist">
<DT>algorithm (default: householderqr)</dT>
<DD>
@@ -148,16 +145,19 @@
llt | Pos. Definite | +++ | +
ldlt | Pos. or Neg Def | +++ | ++
+ For speed '++' is faster than '+', which is faster than '-'.
+ For accuracy '+++' is better than '++'.
+
More details about the individual algorithms can be found on the <a href="http://eigen.tuxfamily.org/dox-devel/group__TutorialLinearAlgebra.html"> Eigen documentation</a>. Eigen is an open source library for linear algebra.
</DD>
</DL>
-
+@anchor dls_output
@par Output statistics
-
-<DL>
+Output is stored in the <em>tbl_result</em> table.
+<DL class="arglist">
<DT>solution</dT>
<DD>
The solution is an array (of double precision) with the variables in the same
@@ -167,8 +167,8 @@
<DT>residual_norm</dT>
<DD>
-Computes the scaled residual norm, defined as \f$ \frac{|Ax - b|}{|b|} \f$
-gives the user an indication of the accuracy of the solution.
+Computes the scaled residual norm, defined as \f$ \frac{|Ax - b|}{|b|} \f$.
+This value is an indication of the accuracy of the solution.
</DD>
<DT>iters</dT>
@@ -184,15 +184,15 @@
-
+@anchor dls_examples
@examp
-# Create the sample data set:
\verbatim
-sql> CREATE TABLE source_table (id INTEGER NOT NULL,
+sql> CREATE TABLE linear_systems_test_data (id INTEGER NOT NULL,
lhs DOUBLE PRECISION[],
rhs DOUBLE PRECISION);
-sql> INSERT INTO linear_systems_test_data(id, a, b) VaLUES
+sql> INSERT INTO linear_systems_test_data(id, lhs, rhs) VaLUES
(0, ARRAY[1,0,0], 20),
(1, ARRAY[0,1,0], 15),
(2, ARRAY[0,0,1], 20);
@@ -200,11 +200,11 @@
-# Solve the linear systems with default parameters
\verbatim
-sql> SELECT madlib.linregr_train('source_table',
- 'output_table',
- 'id',
- 'lhs',
- 'rhs');
+sql> SELECT madlib.linear_solver_dense('linear_systems_test_data',
+ 'output_table',
+ 'id',
+ 'lhs',
+ 'rhs');
\endverbatim
-# Obtain the output from the output table
@@ -224,17 +224,14 @@
'linear_systems_test_data',
'result_table',
'id',
- 'a',
- 'b',
+ 'lhs',
+ 'rhs',
NULL,
'direct',
'algorithm=llt'
);
\endverbatim
-\anchor Background
-@background
-
@sa File dense_linear_sytems.sql_in documenting the SQL functions.
diff --git a/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in b/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in
index 286b4e9..19b2aff 100644
--- a/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in
+++ b/src/ports/postgres/modules/linear_systems/sparse_linear_systems.sql_in
@@ -15,17 +15,30 @@
/**
@addtogroup grp_sparse_linear_solver
+<div class ="toc"><b>Contents</b>
+<ul>
+<li class="level1"><a href="#sls_about">About</a></li>
+<li class="level1"><a href="#sls_online_help">Online Help</a></li>
+<li class="level1"><a href="#sls_function">Function Syntax</a></li>
+<li class="level1"><a href="#sls_args">Arguments</a></li>
+<li class="level1"><a href="#sls_opt_params">Optimizer Parameters</a></li>
+<li class="level1"><a href="#sls_output">Output Tables</a></li>
+<li class="level1"><a href="#sls_examples">Examples</a></li>
+</ul>
+</div>
+
+@anchor sls_about
@about
The sparse linear systems module implements solution methods for systems of a consistent
-linear equations.
+linear equations. Systems of linear equations take the form:
\f[
Ax = b
\f]
where \f$x \in \mathbb{R}^{n}\f$, \f$A \in \mathbb{R}^{m \times n} \f$ and \f$b \in \mathbb{R}^{m}\f$.
-This module accepts sparse matrix input formats for \f$A\f$ and \f$b\f$.We assume
-that all there are no rows of \f$A\f$ where all elements are zero.
+This module accepts sparse matrix input formats for \f$A\f$ and \f$b\f$.
+We assume that there are no rows of \f$A\f$ where all elements are zero.
\note Algorithms with fail if there is an row of the input matrix containing all zeros.
@@ -33,6 +46,7 @@
square linear systems. Currently, the algorithms implemented in this module
solve the linear system using direct or iterative methods.
+@anchor sls_online_help
@par Online Help
View short help messages using the following statements:
@@ -50,25 +64,23 @@
SELECT madlib.linear_solver_sparse('iterative');
@endverbatim
-@usage
+@anchor sls_function
+@par Syntax
<pre>
-SELECT {schema_madlib}.linear_solve_sparse (
- 'tbl_lhs_source', -- Data table containing the LHS matrix
- 'tbl_rhs_source', -- Data table containing the RHS vector
- 'tbl_result', -- Result table
- 'lhs_row_id', -- Name of column with row_id for the LHS
- 'lhs_col_id', -- Name of column with col_id for the LHS
- 'lhs_value', -- Name of column with value for the LHS
- 'rhs_row_id', -- Name of column with row_id for the RHS
- 'rhs_col_id', -- Name of column with col_id for the RHS
- 'num_vars', -- Number of variables in the system of equations
- 'grouping_cols', -- Grouping columns (Default: NULL)
- 'optimizer', -- Name of optimizer. Default: 'direct'
- 'optimizer_params' -- Text array of optimizer parameters
-);
+linear_solver_sparse(tbl_source_lhs, tbl_source_rhs, tbl_result,
+ row_id, LHS, RHS, grouping_cols := NULL,
+ optimizer := 'direct',
+ optimizer_params := 'algorithm = llt');
</pre>
+@anchor sls_args
+@par Arguments
+
+
+<DL class="arglist">
+<DT>LHS</DT>
+<DD>Text value. The name of the table containing the left hand side matrix.
For the LHS matrix, the input data is expected to be of the following form:
<pre>{TABLE|VIEW} <em>sourceName</em> (
...
@@ -78,6 +90,15 @@
...
)</pre>
+
+Here, each row represents a single equation using. The <em> rhs </em> refers
+to the right hand side of the equations while the <em> lhs</em>
+refers to the multipliers on the variables on the left hand side of the same
+equations.
+</DD>
+
+<DT>RHS</DT>
+<DD>Text value. The name of the table containing the right hand side vector.
For the RHS matrix, the input data is expected to be of the following form:
<pre>{TABLE|VIEW} <em>sourceName</em> (
...
@@ -151,13 +172,13 @@
<DD>Text value. Optimizer specific parameters. Default: NULL.</DD>
</DL>
-
+@anchor sls_opt_params
@par Optimizer Parameters
-For each optimizer, there are specific parameters that can be used to tuned
+For each optimizer, there are specific parameters that can be tuned
for better performance.
-<DL>
+<DL class="arglist">
<DT>algorithm (default: ldlt)</dT>
<DD>
@@ -170,8 +191,12 @@
Algorithm | Contitions on A | Speed | Memory
----------------------------------------------------------
- llt | Sym. Pos Def | ++ | ++
- ldlt | Sym. Pos Def | ++ | ++
+ llt | Sym. Pos Def | ++ | ---
+ ldlt | Sym. Pos Def | ++ | ---
+
+ For speed '++' is faster than '+', which is faster than '-'.
+ For accuracy '+++' is better than '++'.
+ For memory, '-' uses less memory than '--'.
Note: ldlt is often preferred over llt
@@ -186,10 +211,13 @@
Algorithm | Contitions on A | Speed | Memory | Convergence
----------------------------------------------------------------------
- cg-mem | Sym. Pos Def | +++ | + | ++
- bicgstab-mem | Square | ++ | + | +
- precond-cg-mem | Sym. Pos Def | ++ | + | +++
- precond-bicgstab-mem | Square | + | + | ++
+ cg-mem | Sym. Pos Def | +++ | - | ++
+ bicgstab-mem | Square | ++ | - | +
+ precond-cg-mem | Sym. Pos Def | ++ | - | +++
+ precond-bicgstab-mem | Square | + | - | ++
+
+ For memory, '-' uses less memory than '--'.
+ For speed, '++' is faster than '+'.
Details:
-# <b>cg-mem: </b> In memory conjugate gradient with diagonal preconditioners.
@@ -206,9 +234,10 @@
</DL>
+@anchor sls_output
@par Output statistics
-<DL>
+<DL class="arglist">
<DT>solution</dT>
<DD>
The solution is an array (of double precision) with the variables in the same
@@ -231,7 +260,7 @@
</DL>
-
+@anchor sls_examples
@examp
-# Create the sample data set:
@@ -264,8 +293,9 @@
-# Solve the linear systems with default parameters
\verbatim
-sql> SELECT madlib.linregr_train('lhs_source_table',
- 'rhs_source_table',
+sql> SELECT madlib.linear_solver_sparse(
+ 'sparse_linear_systems_lhs',
+ 'sparse_linear_systems_rhs',
'output_table',
'rid',
'cid',
@@ -285,41 +315,42 @@
\endverbatim
--# Chose a different algorirhm that the default algorithm
+-# Chose a different algorithm than the default algorithm
\verbatim
drop table if exists result_table;
-select madlib.linear_solver_sparse(
- 'linear_systems_test_data',
- 'result_table',
- 'id',
- 'a',
- 'b',
- NULL,
- 'direct',
- 'algorithm=llt'
- );
+SELECT madlib.linear_solver_sparse(
+ 'sparse_linear_systems_lhs',
+ 'sparse_linear_systems_rhs',
+ 'output_table',
+ 'rid',
+ 'cid',
+ 'val',
+ 'rid',
+ 'val',
+ 4,
+ NULL,
+ 'direct',
+ 'algorithm=llt');
--# Chose a different algorirhm that the default algorithm
+-# Chose a different algorithm than the default algorithm
\verbatim
drop table if exists result_table;
-select madlib.linear_solver_sparse(
- 'linear_systems_test_data',
- 'result_table',
- 'id',
- 'a',
- 'b',
- NULL,
- 'iterative',
- 'algorithm=cg-mem, toler=1e-5'
- );
-
+madlib.linear_solver_sparse(
+ 'sparse_linear_systems_lhs',
+ 'sparse_linear_systems_rhs',
+ 'output_table',
+ 'rid',
+ 'cid',
+ 'val',
+ 'rid',
+ 'val',
+ 4,
+ NULL,
+ 'iterative',
+ 'algorithm=cg-mem, toler=1e-5');
\endverbatim
-\anchor Background
-@background
-
-
@sa File sparse_linear_sytems.sql_in documenting the SQL functions.
@internal
diff --git a/src/ports/postgres/modules/pca/pca.sql_in b/src/ports/postgres/modules/pca/pca.sql_in
index 17594f8..f8b734a 100644
--- a/src/ports/postgres/modules/pca/pca.sql_in
+++ b/src/ports/postgres/modules/pca/pca.sql_in
@@ -68,7 +68,7 @@
@par Training Function
The training functions have the following formats:
@verbatim
-madlib.pca_project( source_table, out_table, row_id,
+pca_project( source_table, out_table, row_id,
k, grouping_cols:= NULL,
lanczos_iter := min(k+40, <smallest_matrix_dimension>),
use_correlation := False, result_summary_table := NULL)
@@ -101,7 +101,7 @@
two standard MADlib dense matrix formats, and a sparse input table
should be in the standard MADlib sparse matrix format.
-The two standard MADlib dense matrix formats are
+The two standard MADlib dense matrix formats are
<pre>{TABLE|VIEW} <em>source_table</em> (
<em>row_id</em> INTEGER,
row_vec FLOAT8[],
@@ -183,6 +183,7 @@
The output is divided into three tables (one of which is optional).
The output table (<em>'out_table'</em> above) encodes the principal components with the
+
<em>k</em> highest eigenvalues. The table has the following columns:
\par
<DL class="arglist">
diff --git a/src/ports/postgres/modules/regress/linear.sql_in b/src/ports/postgres/modules/regress/linear.sql_in
index faeb062..091b5f1 100644
--- a/src/ports/postgres/modules/regress/linear.sql_in
+++ b/src/ports/postgres/modules/regress/linear.sql_in
@@ -28,7 +28,7 @@
</ul></div>
@anchor about
-@about
+@about
Ordinary Least Squares Regression, also called Linear Regression, is a
statistical model used to fit linear models.
@@ -69,7 +69,7 @@
@anchor output
@par Output Table
-The output table produced by the linear regression training function contains the following columns.
+The output table produced by the linear regression training function contains the following columns.
<DL class="arglist">
<DT>\<...></DT>
<DD>Any grouping columns provided during training.