| Installing Apache MADlib on Apache HAWQ (incubating) |
| ================================================================= |
| |
| Apache MADlib is a library of statistics and machine learning |
| functions that can be installed in Apache HAWQ. MADlib is installed |
| separately from the main HAWQ installation. For a description of the |
| general MADlib installation process, refer to the MADlib installation |
| guide for PostgreSQL and GPDB: |
| https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide |
| |
| An installation script, hawq_install.sh, installs the MADlib RPM distribution on |
| the HAWQ master and segment nodes. It installs the MADlib files but does not |
| register MADlib functions with HAWQ databases. After running hawq_install.sh, |
| use the madpack utility program to install, reinstall, or upgrade the MADlib |
| database objects. |
| |
| After adding new segment nodes to HAWQ, MADlib must be installed on the new |
| segment nodes. This should be done after the HAWQ binaries are properly |
| installed and preferably before running gpexpand. |
| |
| Requirements |
| ------------ |
| |
| Check that you have completed the following tasks before running the MADlib |
| installation script: |
| |
| - Make sure you have rpm, gpssh, and gpscp in your PATH. |
| |
| - Make sure that you have HAWQ binaries installed properly on all master and |
| segment nodes in your cluster (also new segment nodes when adding new nodes). |
| |
| - Add hawq_install.sh to your PATH. |
| |
| - Make sure the HOSTFILE lists all segment nodes (also new segment nodes when |
| adding new nodes). |
| |
| |
| To install MADlib: |
| |
| 1. Run the following command to install MADlib: |
| |
| hawq_install.sh -r <RPM_FILEPATH> -f <HOSTFILE> [-s] [-d <GPHOME>] [--prefix <MADLIB_INSTALL_PATH>] |
| |
| |
| Required Settings |
| ----------------- |
| -r | --rpm-path <RPM_FILEPATH> The path to the MADlib RPM file. |
| |
| -f | --host-file <HOSTFILE> The file containing the host names of all new segments. |
| |
| |
| Optional Settings |
| ----------------- |
| -s | --skip-localhost Set this option to prevent MADlib installation on the localhost. |
| |
| -d | --set-gphome <GPHOME> Indicates the HAWQ installation path. If you do not specify |
| one, the installer uses the value stored in the environment variable |
| GPHOME. |
| |
| --prefix <MADLIB_INSTALL_PATH> Indicates MADlib installation path. If not set, the default value |
| ${GPHOME}/madlib is used. |
| |
| -h | -? | --help Displays help. |
| |
| |
| Example |
| ------- |
| |
| hawq_install.sh -r /home/gpadmin/madlib/madlib-1.11-Linux.rpm -f /usr/local/greenplum-db/hostfile |
| |