| COMMAND NAME: gpmapreduce |
| |
| Runs Cloudberry MapReduce jobs as defined in a YAML specification document. |
| |
| |
| ***************************************************** |
| SYNOPSIS |
| ***************************************************** |
| |
| gpmapreduce -f <yaml_file> [<dbname> [<username>]] |
| [-k <name>=<value> | --key <name>=<value>] |
| [-h <hostname> | --host <hostname>] |
| [-p <port>| --port <port>] |
| [-U <username> | --username <username>] [-W] [-v] |
| |
| gpmapreduce -V | --version |
| |
| gpmapreduce -h | --help |
| |
| gpmapreduce -x | --explain |
| |
| gpmapreduce -X | --explain-analyze |
| |
| |
| ***************************************************** |
| PREREQUISITES |
| ***************************************************** |
| |
| The following are required prior to running this program: |
| |
| * You must have your MapReduce job defined in a YAML file. |
| |
| * You must be a Apache Cloudberry superuser to run MapReduce jobs |
| written in untrusted Perl or Python. |
| |
| * You must be a Apache Cloudberry superuser to run MapReduce jobs |
| with EXEC and FILE inputs. |
| |
| * Non-superuser roles must be granted external table permissions |
| using CREATE/ALTER ROLE in order to run MapReduce jobs. |
| |
| ***************************************************** |
| DESCRIPTION |
| ***************************************************** |
| |
| MapReduce is a programming model developed by Google for |
| processing and generating large data sets on an array of commodity |
| servers. Cloudberry MapReduce allows programmers who are familiar |
| with the MapReduce paradigm to write map and reduce functions and |
| submit them to the Apache Cloudberry parallel engine for processing. |
| |
| In order for Cloudberry to be able to process MapReduce functions, |
| the functions need to be defined in a YAML document, which is then |
| passed to the Cloudberry MapReduce program, gpmapreduce, for execution |
| by the Apache Cloudberry parallel engine. The Cloudberry system takes |
| care of the details of distributing the input data, executing the |
| program across a set of machines, handling machine failures, |
| and managing the required inter-machine communication. |
| |
| |
| ***************************************************** |
| OPTIONS |
| ***************************************************** |
| |
| |
| -f <yaml_file> |
| |
| Required. The YAML file that contains the Cloudberry MapReduce |
| job definitions. See the Apache Cloudberry Administrator Guide |
| for more information about creating YAML documents. |
| |
| |
| -? | --help |
| |
| Show help, then exit. |
| |
| |
| -V | --version |
| |
| Show version information, then exit. |
| |
| |
| -v | --verbose |
| |
| Show verbose output. |
| |
| |
| -x | --explain |
| |
| Do not run MapReduce jobs, but produce explain plans. |
| |
| |
| -X | --explain-analyze |
| |
| Run MapReduce jobs and produce explain-analyze plans. |
| |
| |
| -k | --key <name>=<value> |
| |
| Sets a YAML variable. A value is required. Defaults to "key" |
| if no variable name is specified. |
| |
| |
| -h <host> | --host <host> |
| |
| Specifies the host name of the machine on which the Cloudberry |
| coordinator database server is running. If not specified, reads |
| from the environment variable PGHOST or defaults to localhost. |
| |
| |
| -p <port> | --port <port> |
| |
| Specifies the TCP port on which the Cloudberry coordinator database |
| server is listening for connections. If not specified, reads |
| from the environment variable PGPORT or defaults to 5432. |
| |
| |
| -U <username> | --username <username> |
| |
| The database role name to connect as. If not specified, reads |
| from the environment variable PGUSER or defaults to the |
| current system user name. |
| |
| |
| -W | --password |
| |
| Force a password prompt. |
| |
| |
| ***************************************************** |
| EXAMPLES |
| ***************************************************** |
| |
| Run a MapReduce job as defined in my_yaml.txt: |
| |
| gpmapreduce -f my_yaml.txt |
| |
| |
| ***************************************************** |
| SEE ALSO |
| ***************************************************** |
| |
| "Cloudberry MapReduce YAML Specification" in the |
| Apache Cloudberry Administrator Guide |