blob: f7f68899e2331b0a079bba822e90aa815b193dee [file] [log] [blame]
 /** @mainpage Apache MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical, graph and machine learning methods for structured and unstructured data. Useful links:
Please refer to the ReadMe file for information about incorporated third-party material. License information regarding MADlib and included third-party libraries can be found in the License directory. @defgroup grp_datatrans Data Types and Transformations @details Data types and operations that transform and shape data. @defgroup grp_arraysmatrix Arrays and Matrices @ingroup grp_datatrans @brief Mathematical operations for arrays and matrices. @details These modules provide basic mathematical operations to be run on array and matrices. For a distributed system, a matrix cannot simply be represented as a 2D array of numbers in memory. We provide two forms of distributed representation of a matrix: - Dense: The matrix is represented as a distributed collection of 1-D arrays. An example 3x10 matrix would be the below table:
row_id |         row_vec
--------+-------------------------
1    | {9,6,5,8,5,6,6,3,10,8}
2    | {8,2,2,6,6,10,2,1,9,9}
3    | {3,9,9,9,8,6,3,9,5,6}
- Sparse: The matrix is represented using the row and column indices for each non-zero entry of the matrix. Example:
row_id | col_id | value
--------+--------+-------
1 |      1 |     9
1 |      5 |     6
1 |      6 |     6
2 |      1 |     8
3 |      1 |     3
3 |      2 |     9
4 |      7 |     0
(6 rows)