blob: 984ca81b8215a0d66d64236ff1d3896c5802dd9f [file] [log] [blame]
Installing Apache MADlib on Apache HAWQ (incubating)
=================================================================
Apache MADlib is a library of statistics and machine learning
functions that can be installed in Apache HAWQ. MADlib is installed
separately from the main HAWQ installation. For a description of the
general MADlib installation process, refer to the MADlib installation
guide for PostgreSQL and GPDB:
https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide
An installation script, hawq_install.sh, installs the MADlib RPM distribution on
the HAWQ master and segment nodes. It installs the MADlib files but does not
register MADlib functions with HAWQ databases. After running hawq_install.sh,
use the madpack utility program to install, reinstall, or upgrade the MADlib
database objects.
After adding new segment nodes to HAWQ, MADlib must be installed on the new
segment nodes. This should be done after the HAWQ binaries are properly
installed and preferably before running gpexpand.
Requirements
------------
Check that you have completed the following tasks before running the MADlib
installation script:
- Make sure you have rpm, gpssh, and gpscp in your PATH.
- Make sure that you have HAWQ binaries installed properly on all master and
segment nodes in your cluster (also new segment nodes when adding new nodes).
- Add hawq_install.sh to your PATH.
- Make sure the HOSTFILE lists all segment nodes (also new segment nodes when
adding new nodes).
To install MADlib:
1. Run the following command to install MADlib:
hawq_install.sh -r <RPM_FILEPATH> -f <HOSTFILE> [-s] [-d <GPHOME>] [--prefix <MADLIB_INSTALL_PATH>]
Required Settings
-----------------
-r | --rpm-path <RPM_FILEPATH> The path to the MADlib RPM file.
-f | --host-file <HOSTFILE> The file containing the host names of all new segments.
Optional Settings
-----------------
-s | --skip-localhost Set this option to prevent MADlib installation on the localhost.
-d | --set-gphome <GPHOME> Indicates the HAWQ installation path. If you do not specify
one, the installer uses the value stored in the environment variable
GPHOME.
--prefix <MADLIB_INSTALL_PATH> Indicates MADlib installation path. If not set, the default value
${GPHOME}/madlib is used.
-h | -? | --help Displays help.
Example
-------
hawq_install.sh -r /home/gpadmin/madlib/madlib-1.11-Linux.rpm -f /usr/local/greenplum-db/hostfile