| COMMAND NAME: gpfdist |
| |
| Serves data files to or writes data files out from HAWQ segments. |
| |
| ***************************************************** |
| SYNOPSIS |
| ***************************************************** |
| |
| |
| gpfdist [-d <directory>] [-p <http_port>] [-l <log_file>] [-t <timeout>] [-c <config_file>] |
| [-S] [-v | -V] [-m <maxlen>] [--ssl certificate_path] |
| |
| gpfdist [-? | --help] | --version |
| |
| ***************************************************** |
| DESCRIPTION |
| ***************************************************** |
| |
| |
| gpfdist is HAWQs parallel file distribution program. |
| It is used by readable external tables and gpload to serve |
| external table files to all HAWQ segments in parallel. |
| It is used by writable external tables to accept output |
| streams from HAWQ segments in parallel and write them out to a file. |
| |
| In order for gpfdist to be used by an external table, the |
| LOCATION clause of the external table definition must specify |
| the correct file location using the gpfdist:// protocol |
| (see CREATE EXTERNAL TABLE). |
| |
| NOTE: If the --ssl option is specified to enable SSL security, |
| create the external table with the gpfdists:// protocol. |
| |
| The benefit of using gpfdist is that you are guaranteed maximum |
| parallelism while reading from or writing to external tables, |
| thereby offering the best performance as well as easier |
| administration of external tables. |
| |
| For readable external tables, gpfdist parses and serves data |
| files evenly to all the segment instances in the HAWQ |
| system when users SELECT from the external table. For writable |
| external tables, gpfdist accepts parallel output streams from |
| the segments when users INSERT into the external table, and |
| writes to an output file. |
| |
| For readable external tables, if load files are compressed using |
| gzip or bzip2 (have a .gz or .bz2 file extension), gpfdist |
| uncompresses the files automatically before loading provided |
| that gunzip or bunzip2 is in your path. |
| |
| NOTE: Currently, readable external tables do not support |
| compression on Windows platforms, and writable external |
| tables do not support compression on any platforms. |
| |
| Most likely, you will want to run gpfdist on your ETL machines |
| rather than the hosts where HAWQ is installed. |
| To install gpfdist on another host, simply copy the utility |
| over to that host and add gpfdist to your $PATH. |
| |
| NOTE: When using IPv6, always enclose the numeric IP address |
| in brackets. |
| |
| You can also run gpfdist as a Windows Service. See below for |
| details. |
| |
| ***************************************************** |
| OPTIONS |
| ***************************************************** |
| |
| |
| -d <directory> |
| |
| The directory from which gpfdist will serve files for |
| readable external tables or create output files for writable |
| external tables. If not specified, defaults to the current directory. |
| |
| |
| -l <log_file> |
| |
| The fully qualified path and log file name where standard output |
| messages are to be logged. |
| |
| |
| -p <http_port> |
| |
| The HTTP port on which gpfdist will serve files. Defaults to 8080. |
| |
| |
| -t <timeout> |
| |
| Sets the time (in seconds) allowed for HAWQ to |
| establish a connection to a gpfdist process. Default is 5 seconds. |
| Valid values are 2 to 30 seconds. May need to be increased on |
| systems with a lot of network traffic. |
| |
| -m <max_length> |
| |
| Sets the maximum allowed data row length in bytes. Default is 32768. |
| Should be used when user data includes very wide rows, i.e when |
| "line too long" error message is receieved. Should not be used otherwise |
| as it increases resource allocation. |
| Valid range is 32K to 256MB. (The upper limit is 1MB on Windows systems.) |
| |
| |
| -S (use O_SYNC) |
| |
| Opens the file for synchronous I/O with the O_SYNC flag. Any writes to |
| the resulting file descriptor block gpfdist until the data is |
| physically written to the underlying hardware. |
| |
| --ssl certificate_path |
| |
| Adds SSL encryption to data transferred with gpfdist. After executing |
| gpfdist with the --ssl certificate_path option, the only way |
| to load data from this file server is with the gpfdists protocol. |
| The location specified in certificate_path must |
| contain the following files: |
| |
| - The server certificate file, server.crt |
| - The server private key file, server.key |
| - The trusted certificate authorities, root.crt |
| |
| The root directory (/) cannot be specified as certificate_path. |
| |
| -c <config_file> |
| |
| Configuration file for transformations.The option config_file specifies |
| the location of the transformation configuration file, passed to gpload via -c. |
| The gpfdist configuration is expected to be a YAML file with the following format: |
| --- |
| VERSION: 1.0.0.1 |
| TRANSFORMATIONS: |
| transformname1: |
| TYPE: input | output |
| COMMAND: command1 |
| CONTENT: data | paths |
| SAFE: posix-regex |
| |
| transformname2: |
| TYPE: input | output |
| COMMAND: command2 |
| ... |
| |
| -v (verbose) |
| |
| Verbose mode shows progress and status messages. |
| |
| |
| -V (very verbose) |
| |
| Verbose mode shows all output messages generated by this utility. |
| |
| |
| --version |
| |
| Prints out the version of this utility. |
| |
| |
| -? |
| --help |
| |
| Displays online help. |
| |
| ***************************************************** |
| RUNNING GPFDIST AS A WINDOWS SERVICE |
| ***************************************************** |
| |
| HAWQ Loaders allow gpfdist to run as a Windows Service. |
| |
| Follow the instructions below to download, register and |
| activate gpfdist as a service: |
| |
| 1. Update your HAWQ Loader package to the latest |
| version. This package is available from the |
| EMC Download Center (https://emc.subscribenet.com) |
| |
| 2. Register gpfdist as a Windows service: |
| * Open a Windows command window |
| * Run the following command: |
| sc create gpfdist binpath= "path_to_gpfdist.exe -p 8081 |
| -d External\load\files\path -l Log\file\path" |
| |
| You can create multiple instances of gpfdist by |
| running the same command again, with a unique |
| name and port number for each instance, for example: |
| sc create gpfdistN binpath= "path_to_gpfdist.exe |
| -p 8082 -d External\load\files\path -l Log\file\path" |
| |
| 3. Activate the gpfdist service: |
| * Open the Windows Control Panel and select |
| Administrative Tools>Services. |
| * Highlight then right-click on the gpfdist |
| service in the list of services. |
| * Select Properties from the right-click menu, |
| the Service Properties window opens. |
| Note that you can also stop this service |
| from the Service Properties window. |
| * Optional: Change the Startup Type to |
| Automatic (after a system restart, this |
| service will be running), then under Service |
| status, click Start. |
| * Click OK. |
| Repeat the above steps for each instance of |
| gpfdist that you created. |
| |
| |
| ***************************************************** |
| EXAMPLES |
| ***************************************************** |
| |
| Serve files from a specified directory using port 8081 |
| (and start gpfdist in the background): |
| |
| gpfdist -d /var/load_files -p 8081 & |
| |
| |
| Start gpfdist in the background and redirect output and |
| errors to a log file: |
| |
| gpfdist -d /var/load_files -p 8081 -l /home/gpadmin/log & |
| |
| |
| To stop gpfdist when it is running in the background: |
| |
| --First find its process id: |
| |
| ps ax | grep gpfdist |
| |
| OR on Solaris |
| |
| ps -ef | grep gpfdist |
| |
| --Then kill the process, for example: |
| |
| kill 3456 |
| |
| |
| ***************************************************** |
| SEE ALSO |
| ***************************************************** |
| |
| CREATE EXTERNAL TABLE |
| gpload |