This chapter describes how to install additional HAWQ components.
pgcrypto is available as a package that you can download from the Pivotal Download Center and install using the Package Manager utility (gppkg
). gppkg
installs pgcrypto and other HAWQ extensions, along with any dependencies, on all hosts across a cluster. It will also automatically install extensions on new hosts in the case of system expansion and segment recovery.
Note: Before you install the pgcrypto software package, make sure that your HAWQ database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY
and $GPHOME
variables are set.
Download the pgcrypto package from the Pivotal Download Center, then copy it to the master host. Install the software package by running the following command:
gppkg -i pgcrypto-1.0-rhel5-x86_64.gppkg
You will see output similar to the following.
[gpadmin@gp-single-host ~]$ gppkg -i pgcrypto-1.0-rhel5-x86_64.gppkg 20120418:23:54:20:gppkg:gp-single-host:gpadmin-[INFO]:-Starting gppkg with args: -i pgcrypto-1.0-rhel5-x86_64.gppkg 20120418:23:54:20:gppkg:gp-single-host:gpadmin-[INFO]:-Installing package pgcrypto-1.0-rhel5-x86_64.gppkg 20120418:23:54:21:gppkg:gp-single-host:gpadmin-[INFO]:-Validating rpm installation cmdStr='rpm --test -i /usr/local/greenplum-db/./.tmp/pgcrypto-1.0-1.x86_64.rpm --dbpath /usr/local/greenplum-db/./share/packages/database --prefix /usr/local/greenplum-db/.' 20120418:23:54:22:gppkg:gp-single-host:gpadmin-[INFO]:-Please run psql -d mydatabase -f $GPHOME/share/postgresql/contrib/pgcrypto.sql to enable the package. 20120418:23:54:22:gppkg:gp-single-host:gpadmin-[INFO]:-pgcrypto-1.0-rhel5-x86_64.gppkg successfully installed.
To uninstall the pgcrypto objects, use uninstall_pgcrypto.sql to remove pgcrypto support.
For each database on which you enabled pgcrypto support, execute the following:
psql -d dbname -f $GPHOME/share/postgresql/contrib/uninstall_pgcrypto.sql
Note: This script does not remove dependent user-created objects.
You can uninstall the pgcrypto software using the Greenplum Package Manager (gppkg), as follows:
gppkg -r pgcrypto-1.0
PL/R is available as a package that you can download from the Pivotal Download Center and install using the Package Manager utility (gppkg
). gppkg
installs PL/R and other Greenplum Database extensions, along with any dependencies, on all hosts across a cluster. It will also automatically install extensions on new hosts in the case of system expansion and segment recovery.
Note: Before you install the PL/R software package, make sure that your HAWQ database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY
and $GPHOME
variables are set.
Download the PL/R package from the Pivotal Download Center, then copy it to the master host. Install the software package by running the following command:
$ gppkg -i plr-1.0-rhel5-x86_64.gppkg
Source the $GPHOME/greenplum_path.sh file. The extension and the R environment are installed in the $GPHOME/ext/R-2.13.0/ directory.
Restart the database:
$ gpstop -r
For each database that requires its use, register the PL/R language with the CREATE LANGUAGE
SQL command or the createlang
utility. For example, running the following command as the gpadmin
user registers the language for a database named testdb
:
$ createlang plr -d testdb
PL/R is registered as an untrusted language.
You are now ready to create new PLR functions. A library of convenient PLR functions may be found in $GPHOME/share/postgresql/contrib/plr.sql. These functions may be installed by using the psql
utility to execute plr.sql, as follows:
psql -d <dbname> -f $GPHOME/share/postgresql/contrib/plr.sql
When you remove PL/R language support from a database, the PL/R routines that you created in the database will no longer work.
For a database that no long requires the PL/R language, remove support for PL/R with the SQL command DROP LANGUAGE
or the droplang
utility. For example, running the following command as the gpadmin
user removes support for PL/R from the database testdb
:
$ droplang plr -d testdb
If no databases have PL/R as a registered language, uninstall the Greenplum PL/R extension with the gppkg
utility. This example uninstalls PL/R package version 1.0:
$ gppkg -r plr-1.0
You can run the gppkg
utility with the options -q --all
to list the installed extensions and their versions.
Then, restart the database.
$ gpstop -r
For a given R library, identify all dependent R libraries and each library's web URL.This can be found by selecting the specific package from the following page: http://cran.r-project.org/web/packages/available_packages_by_name.html
From the page for the arm library, you can see that this library requires the following R libraries:
From the command line, use wget
to download the tar.gz files for the required libraries to the master node:
$ wget http://cran.r-project.org/src/contrib/arm_1.5-03.tar.gz $ wget http://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_1.0-1.tar.gz $ wget http://cran.r-project.org/src/contrib/Archive/lattice/lattice_0.19-33.tar.gz $ wget http://cran.r-project.org/src/contrib/lme4_0.999375-42.tar.gz $ wget http://cran.r-project.org/src/contrib/R2WinBUGS_2.1-18.tar.gz $ wget http://cran.r-project.org/src/contrib/coda_0.14-7.tar.gz $ wget http://cran.r-project.org/src/contrib/abind_1.4-0.tar.gz $ wget http://cran.r-project.org/src/contrib/foreign_0.8-49.tar.gz $ wget http://cran.r-project.org/src/contrib/MASS_7.3-17.tar.gz
Using gpscp
and the hostname file, copy the tar.gz files to the same directory on all nodes of the HAWQ cluster. You may require root
access to do this.
$ gpscp -f /home/gpadmin/hosts_all lattice_0.19-33.tar.gz =:/home/gpadmin $ gpscp -f /home/gpadmin/hosts_all Matrix_1.0-1.tar.gz =:/home/gpadmin $ gpscp -f /home/gpadmin/hosts_all abind_1.4-0.tar.gz =:/home/gpadmin $ gpscp -f /home/gpadmin/hosts_all coda_0.14-7.tar.gz =:/home/gpadmin $ gpscp -f /home/gpadmin/hosts_all R2WinBUGS_2.1-18.tar.gz =:/home/gpadmin $ gpscp -f /home/gpadmin/hosts_all lme4_0.999375-42.tar.gz =:/home/gpadmin $ gpscp -f /home/gpadmin/hosts_all MASS_7.3-17.tar.gz =:/home/gpadmin $ gpscp -f /home/gpadmin/hosts_all arm_1.5-03.tar.gz =:/home/gpadmin
Use R CMD INSTALL
to install the packages from the command line. You may require root
access to do this.
$ R CMD INSTALL lattice_0.19-33.tar.gz Matrix_1.0-1.tar.gz abind_1.4-0.tar.gz coda_0.14-7.tar.gz R2WinBUGS_2.1-18.tar.gz lme4_0.999375-42.tar.gz MASS_7.3-17.tar.gz arm_1.5-03.tar.gz
The PL/Java extension is available as a package that you can download from the Pivotal Download Center and then install with the Package Manager utility (gppkg
).
Note: Before you install PL/Java:
Ensure that the $JAVA_HOME
variable is set to the same path on the master and all the segments.
Perform the following step on all machines to set up ldconfig
for JDK:
$ echo "$JAVA_HOME/jre/lib/amd64/server" > /etc/ld.so.conf.d/libjdk.conf $ ldconfig
If you are upgrading to the latest version of Java or installing it as part of the expansion process, follow the instructions in the chapter, Expanding the HAWQ System in the HAWQ Administrator Guide.
PL/Java is compatible with JDK 1.6 and 1.7.
The gppkg
utility installs HAWQ extensions, along with any dependencies, on all hosts across a cluster. It also automatically installs extensions on new hosts in the case of system expansion and segment recovery.
To install and use PL/Java:
pljava_classpath
environment variable. The variable lists the installed JAR files.Note: Before you install the PL/Java extension, make sure that your Greenplum database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY
and $GPHOME
variables are set.
Download the PL/Java extension package from the Pivotal Download Center and copy it to the master host.
Install the software extension package by running the gppkg
command. This example installs the PL/Java extension package on a Linux system:
$ gppkg -i pljava-1.1-rhel5-x86_64.gppkg
Restart the database:
$ gpstop -r
Source the $GPHOME/greenplum_path.sh file.
Perform the following steps as the HAWQ administrator gpadmin
:
Enable PL/Java by running the $GPHOME/share/postgresql/pljava/install.sql SQL script in the databases that use PL/Java. For example, this example enables PL/Java on a database named mytestdb
:
$ psql -d mytestdb -f $GPHOME/share/postgresql/pljava/install.sql
The install.sql script registers both the trusted and untrusted PL/Java.
Copy your Java archives (JAR files) to $GPHOME/lib/postgresql/java/ on all the HAWQ hosts. This example uses the gpscp utility to copy the myclasses.jar file:
$ gpscp -f gphosts_file myclasses.jar =:/usr/local/greenplum-db/lib/postgresql/java/
The gphosts_file file contains a list of the Greenplum Database hosts.
Set the pljava_classpath
server configuration parameter in the master hawq-site.xml file. The parameter value is a colon (:
) separated list of the JAR files containing the Java classes used in any PL/Java functions. For example:
$ gpconfig -c pljava_classpath -v \'examples.jar:myclasses.jar\' --masteronly
Restart the database:
$ gpstop -r
(Optional) Pivotal provides an examples.sql file containing sample PL/Java functions that you can use for testing. Run the commands in this file to create the test functions (which use the Java classes in examples.jar):
$ psql -f $GPHOME/share/postgresql/pljava/examples.sql
Enabling the PL/Java extension in the template1
database enables PL/Java in any new Greenplum databases:
$ psql template1 -f $GPHOME/share/postgresql/pljava/install.sql
PL/Java JVM options can be configured via the pljava_vmoptions
parameter in the hawq-site.xml file. For example, pljava_vmoptions=-Xmx512M
sets the maximum heap size of the JVM. The default Xmx
value is set to -Xmx64M
.
To uninstall PL/Java, you should:
For a database that no long requires the PL/Java language, remove support for PL/Java by running the uninstall.sql file as the gpadmin
user. For example, the following command disables the PL/Java language in the specified database:
$ psql -d mydatabase -f $GPHOME/share/postgresql/pljava/uninstall.sql
If no databases have PL/Java as a registered language, remove the Java JAR files and uninstall the Greenplum PL/Java extension with the gppkg
utility:
Remove the pljava_classpath
server configuration parameter in the master hawq-site.xml file.
Remove the JAR files from the $GPHOME/lib/postgresql/java/ directory of the HAWQ hosts.
Use the gppkg
utility with the -r
option to uninstall the PL/Java extension. The following example uninstalls the PL/Java extension on a Linux system:
$ gppkg -r pljava-1.1
You can run the gppkg
utility with the options -q --all
to list the installed extensions and their versions.
After you uninstall the extension, restart the database:
$ gpstop -r
Copy the jar file on the master host in $GPHOME/lib/postgresql/java.
Copy the jar file on all segments in the same location using gpscp
from master:
$ cd $GPHOME/lib/postgresql/java $ gpscp -f ~/hosts.txt myfunc.jar =:$GPHOME/lib/postgresql/java/
Set pljava_classpath
to include the newly-copied jar file:
From the psql
session, execute set to affect the current session:
set pljava_classpath='myfunc.jar';
To affect all sessions, use gpconfig
:
gpconfig -c pljava_classpath -v \'myfunc.jar\'
The MADlib library adds statistical and machine learning functionality to HAWQ. MADlib is provided as a package that you can download from the Pivotal Download Center and install using the Package Manager utility (gppkg
). gppkg
installs MADlib and other Greenplum Database extensions, along with any dependencies, on all hosts across a cluster. It also automatically installs extensions on new hosts in the case of system expansion and segment recovery.
Note: Before you install the MADlib software package, make sure that your HAWQ database is running, that you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY
and $GPHOME
variables are set.
Download the MADlib package from the Pivotal Download Center, then copy it to the master host. Install the software package by running the command:
$ gppkg -i madlib-ossv1.7.1_pv1.9.3_hawq1.3-rhel5-x86_64.gppkg
The installation process begins and shows output similar to:
20150330:21:28:33:021734 gppkg:gpdb11:gpdbchina-[INFO]:-Starting gppkg with args: -i /data/home/gpdbchina/pulse2-data/agents/agent1/work/ MADlib%20TINC%20Feature%20Test%20on%20HAWQ%201.3/rhel5_x86_64/madlib/ madlib-ossv1.7.1_pv1.9.3_hawq1.3-rhel5-x86_64.gppkg 20150330:21:28:33:021734 gppkg:gpdb11:gpdbchina-[INFO]:-Installing package madlib-ossv1.7.1_pv1.9.3_hawq1.3-rhel5-x86_64.gppkgInstalled GPDB Version: pg_ctl (HAWQ) 1.3.0.0 build 12954 [...]
Restart the database:
$ gpstop -r
Source the $GPHOME/greenplum_path.sh file.
Deploy the MADlib objects to a database using the GPHOME/madlib/bin/madpack
utility. The syntax for installing objects is:
madpack install [-s schema\_name] -p hawq -c user@host:port/database
The default schema name is madlib
.
For example, the following command install the objects to a database named “testdb” on server mdw:5432 using the gpadmin user:
$ $GPHOME/madlib/bin/madpack install -s madlib -p hawq -c gpadmin@mdw:5432/testdb
Enter the password for the specified user when prompted.
To learn more about additional options for the madpack
utility, enter:
$GPHOME/madlib/bin/madpack --help
See also the documentation available at madlib.net.