blob: e93362450f7e63bcc1c5d2d3cfb9b35c53e24cef [file] [log] [blame]
Installing MADlib on Pivotal HAWQ
=================================
MADlib is a library of statistics and machine learning functions that can be
installed in HAWQ. MADlib is installed separately from the main HAWQ
installation. For a description of the general MADlib installation process,
refer to the MADlib installation guide for PostgreSQL and GPDB:
https://github.com/madlib/madlib/wiki/Installation-Guide
An installation script, hawq_install.sh, installs the MADlib RPM distribution on
the HAWQ master and segment nodes. It installs the MADlib files but does not
register MADlib functions with HAWQ databases. After running hawq_install.sh,
use the madpack utility program to install, reinstall, or upgrade the MADlib
database objects.
After adding new segment nodes to HAWQ, MADlib must be installed on the new
segment nodes. This should be done after the HAWQ binaries are properly
installed and preferably before running gpexpand.
Upgrading HAWQ from 1.1 to 1.2
------------------------------
In HAWQ 1.1 a portion of MADlib v0.5 came preinstalled. These functions in their
original form are incompatible with HAWQ 1.2 and will be removed as part of the
HAWQ 1.2 upgrade. Dependencies on MADlib 0.5 should be removed from the
installation before performing the HAWQ 1.2 upgrade. When the HAWQ upgrade is
complete, install MADlib 1.5 or higher and then reinstall the MADlib database
objects using the madpack utility.
Requirements
------------
Check that you have completed the following tasks before running the MADlib
installation script:
- Make sure you have rpm, gpssh, and gpscp in your PATH.
- Make sure that you have HAWQ binaries installed properly on all master and
segment nodes in your cluster (also new segment nodes when adding new nodes).
- Add hawq_install.sh to your PATH.
- Make sure the HOSTFILE lists all segment nodes (also new segment nodes when
adding new nodes).
To install MADlib:
1. Run the following command to install MADlib:
hawq_install.sh -r <RPM_FILEPATH> -f <HOSTFILE> [-s] [-d <GPHOME>] [--prefix <MADLIB_INSTALL_PATH>]
Required Settings
-----------------
-r | --rpm-path <RPM_FILEPATH> The path to the MADlib RPM file.
-f | --host-file <HOSTFILE> The file containing the host names of all new segments.
Optional Settings
-----------------
-s | --skip-localhost Set this option to prevent MADlib installation on the localhost.
-d | --set-gphome <GPHOME> Indicates the HAWQ installation path. If you do not specify
one, the installer uses the value stored in the environment variable
GPHOME.
--prefix <MADLIB_INSTALL_PATH> Indicates MADlib installation path. If not set, the default value
${GPHOME}/madlib is used.
-h | -? | --help Displays help.
Example
-------
hawq_install.sh -r /home/gpadmin/madlib/madlib-1.5-Linux.rpm -f /usr/local/greenplum-db/hostfile