blob: fac519c26de50c6cd50b8405362771270056a9a8 [file] [log] [blame]
/**
@mainpage
MADlib is an open-source library for scalable in-database analytics. It provides
data-parallel implementations of mathematical, statistical and machine learning
methods for structured and unstructured data.
The MADlib mission: to foster widespread development of scalable analytic
skills, by harnessing efforts from commercial practice, academic research, and
open-source development.
Useful links:
<ul>
<li>MADlib project site http://madlib.net/</li>
<li>MADlib bug reporting site: http://jira.madlib.net/ and quick guide: https://github.com/madlib/madlib/wiki/Bug-reporting</li>
<li>User documentation for earlier releases:
<a href="http://doc.madlib.net/v1.7.1/index.html">v1.7.1</a>,
<a href="http://doc.madlib.net/v1.7/index.html">v1.7</a>,
<a href="http://doc.madlib.net/v1.6/index.html">v1.6</a>,
<a href="http://doc.madlib.net/v1.5/index.html">v1.5</a>,
<a href="http://doc.madlib.net/v1.4/index.html">v1.4</a>,
<a href="http://doc.madlib.net/v1.3/index.html">v1.3</a>,
<a href="http://doc.madlib.net/v1.2/index.html">v1.2</a>
</li>
</ul>
Please refer to the <a href="@DOXYGEN_README_FILE@">Read-Me</a> file for information
about incorporated third-party material. License information regarding MADlib
and included third-party libraries can be found inside the
<a href="@DOXYGEN_LICENSE_DIR@">license</a> directory.
@defgroup grp_datatrans Data Types and Transformations
@{Data types and transformation operations @}
@defgroup grp_arraysmatrix Arrays and Matrices
@ingroup grp_datatrans
@brief Mathematical operations for arrays and matrices
@details
These modules provide basic mathematical operations to be run on array and matrices.
For a distributed system, a matrix cannot simply be represented as a 2D array of numbers in memory.
<b>We provide two forms of distributed representation of a matrix</b>:
- Dense: The matrix is represented as a distributed collection of 1-D arrays.
An example 3x10 matrix would be the below table:
<pre>
row_id | row_vec
--------+-------------------------
1 | {9,6,5,8,5,6,6,3,10,8}
2 | {8,2,2,6,6,10,2,1,9,9}
3 | {3,9,9,9,8,6,3,9,5,6}
</pre>
- Sparse: The matrix is represented using the row and column indices for each
non-zero entry of the matrix. Example:
<pre>
row_id | col_id | value
--------+--------+-------
1 | 1 | 9
1 | 5 | 6
1 | 6 | 6
2 | 1 | 8
3 | 1 | 3
3 | 2 | 9
4 | 7 | 0
(6 rows)
</pre>
&nbsp;
All matrix operations work with either form of representation.
In many cases, a matrix function can be <b>decomposed to vector operations
applied independently on each row of a matrix (or corresponding rows of two
matrices)</b>. We have also provided access to these internal vector operations
(\ref grp_array) for greater flexibility. Matrix operations like
<em>matrix_add</em> use the corresponding vector operation (<em>array_add</em>)
and also include additional validation and formating. Other functions like
<em>matrix_mult</em> are complex and use a combination of such vector operations
and other SQL operations.
<b>It's important to note</b> that these array functions are only available for the
dense format representation of the matrix. In general, the scope of a single
array function invocation is limited to only an array (1-dimensional or
2-dimensional) that fits in memory. When such function is executed on a table of
arrays, the function is called multiple times - once for each array (or pair of
arrays). On contrary, scope of a single matrix function invocation is the
complete matrix stored as a distributed table.
@{
@defgroup grp_array Array Operations
@defgroup grp_matrix Matrix Operations
@defgroup grp_matrix_factorization Matrix Factorization
@brief Matrix Factorization methods including Singular Value Decomposition and Low-rank Matrix Factorization
@{
@defgroup grp_lmf Low-rank Matrix Factorization
@defgroup grp_svd Singular Value Decomposition
@}
@defgroup grp_linalg Norms and Distance functions
@defgroup grp_svec Sparse Vectors
@}
@defgroup grp_pca Dimensionality Reduction
@ingroup grp_datatrans
@brief A collection of methods for dimensionality reduction.
@details A collection of methods for dimensionality reduction.
@{
@defgroup grp_pca_train Principal Component Analysis
@defgroup grp_pca_project Principal Component Projection
@}
@defgroup grp_data_prep Encoding Categorical Variables
@ingroup grp_datatrans
@defgroup grp_mdl Model Evaluation
@{Contains the cross-validation module, a collection of routines useful for
<a href="http://en.wikipedia.org/wiki/Cross-validation_(statistics)">Cross-validation</a>. @}
@defgroup grp_validation Cross Validation
@ingroup grp_mdl
@defgroup grp_stats Statistics
@{Contains statistics modules @}
@defgroup grp_desc_stats Descriptive Statistics
@ingroup grp_stats
@{A collection of methods to compute descriptive statistics of the dataset @}
@defgroup grp_summary Summary
@ingroup grp_desc_stats
@defgroup grp_correlation Pearson's Correlation
@ingroup grp_desc_stats
@defgroup grp_inf_stats Inferential Statistics
@ingroup grp_stats
@{A collection of methods to compute inferential statistics on a dataset.@}
@defgroup grp_stats_tests Hypothesis Tests
@ingroup grp_inf_stats
@defgroup grp_prob Probability Functions
@ingroup grp_stats
@defgroup grp_super Supervised Learning
@{Contains methods which perform supervised learning tasks @}
@defgroup grp_regml Regression Models
@ingroup grp_super
A collection of methods for modeling conditional expectation of a response variable.
@{
@defgroup grp_clustered_errors Clustered Variance
@defgroup grp_cox_prop_hazards Cox-Proportional Hazards Regression
@defgroup grp_elasticnet Elastic Net Regularization
@defgroup grp_glm Generalized Linear Models
@defgroup grp_linreg Linear Regression
@defgroup grp_logreg Logistic Regression
@defgroup grp_marginal Marginal Effects
@defgroup grp_multinom Multinomial Regression
@defgroup grp_ordinal Ordinal Regression
@defgroup grp_robust Robust Variance
@}
@defgroup grp_tree Tree Methods
@ingroup grp_super
@{A collection of recursive partitioning (tree) methods. @}
@defgroup grp_decision_tree Decision Tree
@ingroup grp_tree
@defgroup grp_random_forest Random Forest
@ingroup grp_tree
@defgroup grp_crf Conditional Random Field
@ingroup grp_super
@defgroup grp_tsa Time Series Analysis
@{A collection of methods to analyze time series data. @}
@defgroup grp_arima ARIMA
@ingroup grp_tsa
@defgroup grp_unsupervised Unsupervised Learning
@{A collection of methods for unsupervised learning tasks @}
@defgroup grp_association_rules Association Rules
@ingroup grp_unsupervised
@{A collection of methods used to uncover interesting patterns in transactional datasets. @}
@defgroup grp_assoc_rules Apriori Algorithm
@ingroup grp_association_rules
@defgroup grp_clustering Clustering
@ingroup grp_unsupervised
@{A collection of methods for clustering data@}
@defgroup grp_kmeans k-Means Clustering
@ingroup grp_clustering
@defgroup grp_topic_modelling Topic Modelling
@ingroup grp_unsupervised
@{A collection of methods to uncover abstract topics in a document corpus @}
@defgroup grp_lda Latent Dirichlet Allocation
@ingroup grp_topic_modelling
@defgroup grp_utility_functions Utility Functions
@defgroup @grp_utilities Developer Database Functions
@ingroup grp_utility_functions
@defgroup grp_linear_solver Linear Solvers
@ingroup grp_utility_functions
@{A collection of methods that implement solutions for systems of consistent linear equations. @}
@defgroup grp_dense_linear_solver Dense Linear Systems
@ingroup grp_linear_solver
@defgroup grp_sparse_linear_solver Sparse Linear Systems
@ingroup grp_linear_solver
@defgroup grp_pmml PMML Export
@ingroup grp_utility_functions
@defgroup grp_text_analysis Text Analysis
@ingroup grp_utility_functions
@{A collection of methods to find patterns in textual data. @}
@defgroup grp_text_utilities Term Frequency
@ingroup grp_text_analysis
@defgroup grp_early_stage Early Stage Development
@{A collection of implementations which are in early stage of development.
There may be some issues that will be addressed in a future version.
Interface and implementation are subject to change. @}
@defgroup grp_sketches Cardinality Estimators
@ingroup grp_early_stage
@defgroup grp_countmin CountMin (Cormode-Muthukrishnan)
@ingroup grp_sketches
@defgroup grp_fmsketch FM (Flajolet-Martin)
@ingroup grp_sketches
@defgroup grp_mfvsketch MFV (Most Frequent Values)
@ingroup grp_sketches
@defgroup grp_cg Conjugate Gradient
@ingroup grp_early_stage
@defgroup grp_bayes Naive Bayes Classification
@ingroup grp_early_stage
@defgroup grp_sample Random Sampling
@ingroup grp_early_stage
@defgroup grp_kernmach Support Vector Machines
@ingroup grp_early_stage
@defgroup grp_deprecated Deprecated Modules
@{A collection of deprecated modules. @}
@defgroup grp_dectree Decision Tree (old C4.5 implementation)
@ingroup grp_deprecated
@defgroup grp_svdmf Matrix Factorization
@ingroup grp_deprecated
@defgroup grp_mlogreg Multinomial Logistic Regression
@ingroup grp_deprecated
@defgroup grp_profile Profile
@ingroup grp_deprecated
@defgroup grp_quantile Quantile
@ingroup grp_deprecated
@defgroup grp_rf Random Forest (old implementation)
@ingroup grp_deprecated
*/