blob: 52fd6b1da4a248ebea8689286f9786654b8d7483 [file] [log] [blame]
Installing Apache MADlib (incubating) on Apache HAWQ (incubating)
=================================================================
Apache MADlib (incubating) is a library of statistics and machine learning
functions that can be installed in Apache HAWQ. MADlib is installed separately
from the main HAWQ installation. For a description of the general MADlib
installation process, refer to the MADlib installation guide for PostgreSQL and
GPDB: https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide
An installation script, hawq_install.sh, installs the MADlib RPM distribution on
the HAWQ master and segment nodes. It installs the MADlib files but does not
register MADlib functions with HAWQ databases. After running hawq_install.sh,
use the madpack utility program to install, reinstall, or upgrade the MADlib
database objects.
After adding new segment nodes to HAWQ, MADlib must be installed on the new
segment nodes. This should be done after the HAWQ binaries are properly
installed and preferably before running gpexpand.
Apache MADlib is an effort undergoing incubation at the Apache Software
Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review
indicates that the infrastructure, communications, and decision making process
have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or
stability of the code, it does indicate that the project has yet to be fully
endorsed by the ASF.
Requirements
------------
Check that you have completed the following tasks before running the MADlib
installation script:
- Make sure you have rpm, gpssh, and gpscp in your PATH.
- Make sure that you have HAWQ binaries installed properly on all master and
segment nodes in your cluster (also new segment nodes when adding new nodes).
- Add hawq_install.sh to your PATH.
- Make sure the HOSTFILE lists all segment nodes (also new segment nodes when
adding new nodes).
To install MADlib:
1. Run the following command to install MADlib:
hawq_install.sh -r <RPM_FILEPATH> -f <HOSTFILE> [-s] [-d <GPHOME>] [--prefix <MADLIB_INSTALL_PATH>]
Required Settings
-----------------
-r | --rpm-path <RPM_FILEPATH> The path to the MADlib RPM file.
-f | --host-file <HOSTFILE> The file containing the host names of all new segments.
Optional Settings
-----------------
-s | --skip-localhost Set this option to prevent MADlib installation on the localhost.
-d | --set-gphome <GPHOME> Indicates the HAWQ installation path. If you do not specify
one, the installer uses the value stored in the environment variable
GPHOME.
--prefix <MADLIB_INSTALL_PATH> Indicates MADlib installation path. If not set, the default value
${GPHOME}/madlib is used.
-h | -? | --help Displays help.
Example
-------
hawq_install.sh -r /home/gpadmin/madlib/madlib-1.11-Linux.rpm -f /usr/local/greenplum-db/hostfile