| MADlib Release Notes |
| -------------------- |
| |
| These release notes contain the significant changes in each MADlib release, |
| with most recent versions listed at the top. |
| |
| A complete list of changes for each release can be obtained by viewing the git |
| commit history located at https://github.com/madlib/madlib/commits/master. |
| |
| Current list of bugs and issues can be found at http://jira.madlib.net. |
| |
| -------------------------------------------------------------------------------- |
| MADlib v0.3 |
| |
| Relase Date: 2012-Feb-9 |
| |
| New features: |
| * Installer: |
| - Single installer package targeting all supported DBMSs per OS (MADLIB-218) |
| * C++ Abstraction Layer: |
| - Switched from using Armadillo to using Eigen for linear-algebra |
| operations, thereby eliminating the dependency on LAPACK/BLAS (MADLIB-275) |
| - Reimplemented as a template library for performance improvements |
| (MADLIB-295) |
| * Decision Trees: |
| - Major update |
| - Now supports multiple split criteria (information gain, gini, gain ratio) |
| - Now supports tree pruning using a validation set to address over fitting |
| - Now supports additional functions for tree output |
| - Now supports continuous features in addition to categorical features |
| - Additional support for handling null values |
| - Improved scalability and performance |
| * k-Means Clustering: |
| - Now handles any input that is convertible to SVEC. (MADLIB-42) |
| - Multiple distance functions (L1-norm, L2-norm, cosine similarity, Tanimoto |
| similarity) (MADLIB-43) |
| - Supports multiple seedings methods (kmeans++, random, user-specified list |
| of centroids) |
| - Replaced goodness of fit with the (simplified) Silhouette coefficient |
| (MADLIB-45) |
| - New run-time parameters (MADLIB-47) |
| * Linear Regression: |
| - Major speed improvement |
| * Logistic Regression: |
| - Major speed improvement |
| - Now handles any input that is convertible to BOOLEAN (dependent variable) |
| or DOUBLE PRECISION[] (independent variables). (MADLIB-283) |
| - An under-/overflow safe version to evaluate the (usual) logistic function, |
| for scoring logistic regression (MADLIB-271) |
| - A third optimizer: Incremental-gradient-descent (MADLIB-303) |
| * Support: |
| - For Greenplum <= 4.2.0, added a workaround for INSERT INTO in the same way |
| as the existing CREATE TABLE AS workaround. This workaround is not needed |
| in Greenplum >= 4.2.1 any more. (MADLIB-265) |
| - Function version() returns Madlib build information (MADLIB-309) |
| |
| Bug fixes: |
| Sparse vectors: |
| - Fixed sparse-vector type case problems (MADLIB-282, MADLIB-305) |
| - Fixed a situation where using svec_svf() could cause a segmentation fault |
| (MADLIB-350) |
| - Increased compatibility with internal PostgreSQL conventions (MADLIB-257) |
| Logistic regression: |
| - Handle numerical instability more gracefully (MADLIB-343, MADLIB-345) |
| - Handle unexpected inputs more gracefully (MADLIB-284, MADLIB-344) |
| - Fixed "Random variate x is nan, but must be finite" issue (MADLIB-356) |
| |
| Known issues: |
| - Decision Trees not supported on Greenplum 4.0 (MADLIB-346, MADLIB-347) |
| - K-means: the error '"nan" does not exist' may be raised when input vectors |
| contain NaN. (MADLIB-364) |
| - Association Rules require the madlib schema to be in the search path |
| (MADLIB-353) |
| - Invalid arguments are not always guaranteed to be handled gracefully and |
| may lead to confusing error messages (MADLIB-28, 336, 359, 361, 363, 364) |
| |
| -------------------------------------------------------------------------------- |
| MADlib v0.2.1beta |
| |
| Release Date: 2011-Sep-14 |
| |
| General changes: |
| * numerous improvements to the C++ abstraction layer: |
| - code clean-up |
| - fixed issue where incorrect values were returned when used with |
| debug builds of PostgreSQL/Greenplum (MADLIB-253) |
| - fixed issue where returning arrays to PostgreSQL/Greenplum could lead |
| to a crash (MADLIB-250) |
| - allocated memory is now 16-byte aligned for improved stability and |
| performance (MADLIB-236) |
| * compiling with advanced warnings enabled by default now |
| * all C/C++ code now free of warnings. On gcc <= 4.6, there might still be |
| warnings due to "unclean" macros in DBMS header files (MADLIB-228) |
| * prepared Solaris support in a later release (MADLIB-204) |
| - added support for Sun Compiler in CMake build script |
| - fixed all compilation errors with Sun compiler |
| * added UDF to mimic "CREATE TABLE AS ...", as a workaround for a Greenplum |
| issue (MADLIB-241). Included this as GP Compatibility module. |
| * madpack utility: |
| - dropped madpack dependency on PygreSQL (MADLIB-217) |
| - improved security in madpack install-check (MADLIB-229) |
| - fixed bashism in madpack (MADLIB-222) |
| - fixed install-check not running on non-default schema (MADLIB-251) |
| |
| Modules/methods: |
| * SVM (kernel_machines): |
| - fixed cumulative error count in svm_cls_update() function |
| - improved memory management in SVM module |
| * Linear regression (regress): |
| - fixed unexpected behavior for some edge cases (MADLIB-214) |
| - fixed crashing with huge number of independent vars (MADLIB-250) |
| * Logistic regression (regress): |
| - added support for arbitrary expressions for dep./indep. variables, not |
| just column names (MADLIB-255) |
| * Quantile: |
| - fixed quantile() function to be exact |
| - added simple version for small data sets |
| * Sparse Vectors: |
| - added check for sorted dictionary to svec_sfv (MADLIB-187) |
| * Decision Tree (decision_tree): |
| - now can be run multiple times in one session (MADLIB-156) |
| |
| Known issues: |
| * non-unified API for several SQL UDFs (MADLIB-208) |
| * performance of the conjugate-gradient optimizer in logistic regression |
| can be very poor (MADLIB-164) |
| |
| -------------------------------------------------------------------------------- |
| MADlib v0.2.0beta |
| |
| Release Date: 2011-Jul-8 |
| |
| General changes: |
| * new build and installation framework based on CMake |
| * new C++ abstraction layer for easy and secure method development |
| * new database installation utility (madpack) |
| |
| Modules/methods: |
| * new: Association Rules (assoc_rules) |
| * new: Array Operators (array_ops) |
| * new: Decision Tree (decision_tree) |
| * new: Conjugate Gradient (conjugate_gradient) |
| * new: Parallel LDA (plda) |
| * improved: all methods from previous release |
| |
| Known issues: |
| * non-unified API for several SQL UDFs (MADLIB-208) |
| * running decision tree more than once in one session fails (MADLIB-156) |
| * performance of the conjugate-gradient optimizer in logistic regression |
| can be very poor (MADLIB-164) |
| * svec_sfv function doesn't check for sorted dictionary (MADLIB-187) |
| |
| -------------------------------------------------------------------------------- |
| MADlib v0.1.0alpha |
| |
| Release Date: 2011-Jan-31 |
| |
| Initial release. |
| |
| Included modules/methods: |
| * Naive-Bayes Classification (bayes) |
| * k-Means Clustering (kmeans) |
| * Support Vector Machines (kernel_machines) |
| * Sketch-based Estimators (sketch) |
| * Sketch-based Profile (data_profile) |
| * Quantile (quantile) |
| * Linear & Logistic Regression (regress) |
| * SVD Matrix Factorisation (svdmf) |
| * Sparse Vectors (svec) |
| |
| -------------------------------------------------------------------------------- |
| MADlib v0.1.0prerelease |
| |
| Release date: 2011-Jan-25 |
| |
| Demo release. |