MADlib® is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
See the project webpage MADlib Home
for links to the latest binary and source packages. For installation and contribution guides, please see MADlib Wiki
The latest documentation of MADlib modules can be found at MADlib Docs
or can be accessed directly from the MADlib installation directory by opening doc/user/html/index.html
.
The following block-diagram gives a high-level overview of MADlib's architecture.
MADlib incorporates material from the following third-party components
argparse 1.2.1
“provides an easy, declarative interface for creating command line tools”Boost 1.47.0 (or newer)
“provides peer-reviewed portable C++ source libraries”Eigen 3.2.2
“is a C++ template library for linear algebra”PyYAML 3.10
“is a YAML parser and emitter for Python”PyXB 1.2.4
“is a Python library for XML Schema Bindings”License information regarding MADlib and included third-party libraries can be found inside the license
directory.
Changes between MADlib versions are described in the ReleaseNotes.txt
file.
MAD Skills : New Analysis Practices for Big Data (VLDB 2009)
Hybrid In-Database Inference for Declarative Information Extraction (SIGMOD 2011)
Towards a Unified Architecture for In-Database Analytics (SIGMOD 2012)
The MADlib Analytics Library or MAD Skills, the SQL (VLDB 2012)