| --- |
| title: Configuration File Format |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| The `gpfdist` configuration file uses the YAML 1.1 document format and implements a schema for defining the transformation parameters. The configuration file must be a valid YAML document. |
| |
| The `gpfdist` program processes the document in order and uses indentation (spaces) to determine the document hierarchy and relationships of the sections to one another. The use of white space is significant. Do not use white space for formatting and do not use tabs. |
| |
| The following is the basic structure of a configuration file. |
| |
| ``` pre |
| --- |
| VERSION: 1.0.0.1 |
| TRANSFORMATIONS: |
| transformation_name1: |
| TYPE: input | output |
| COMMAND: command |
| CONTENT: data | paths |
| SAFE: posix-regex |
| STDERR: server | console |
| transformation_name2: |
| TYPE: input | output |
| COMMAND: command |
| ... |
| ``` |
| |
| VERSION |
| Required. The version of the `gpfdist` configuration file schema. The current version is 1.0.0.1. |
| |
| TRANSFORMATIONS |
| Required. Begins the transformation specification section. A configuration file must have at least one transformation. When `gpfdist` receives a transformation request, it looks in this section for an entry with the matching transformation name. |
| |
| TYPE |
| Required. Specifies the direction of transformation. Values are `input` or `output`. |
| |
| - `input`: `gpfdist` treats the standard output of the transformation process as a stream of records to load into HAWQ. |
| - `output` <span class="ph">: </span> `gpfdist` treats the standard input of the transformation process as a stream of records from HAWQ to transform and write to the appropriate output. |
| |
| COMMAND |
| Required. Specifies the command `gpfdist` will execute to perform the transformation. |
| |
| For input transformations, `gpfdist` invokes the command specified in the `CONTENT` setting. The command is expected to open the underlying file(s) as appropriate and produce one line of `TEXT` for each row to load into HAWQ />. The input transform determines whether the entire content should be converted to one row or to multiple rows. |
| |
| For output transformations, `gpfdist` invokes this command as specified in the `CONTENT` setting. The output command is expected to open and write to the underlying file(s) as appropriate. The output transformation determines the final placement of the converted output. |
| |
| CONTENT |
| Optional. The values are `data` and `paths`. The default value is `data`. |
| |
| - When `CONTENT` specifies `data`, the text `%filename%` in the `COMMAND` section is replaced by the path to the file to read or write. |
| - When `CONTENT` specifies `paths`, the text `%filename%` in the `COMMAND` section is replaced by the path to the temporary file that contains the list of files to read or write. |
| |
| The following is an example of a `COMMAND` section showing the text `%filename%` that is replaced. |
| |
| ``` pre |
| COMMAND: /bin/bash input_transform.sh %filename% |
| ``` |
| |
| SAFE |
| Optional. A `POSIX `regular expression that the paths must match to be passed to the transformation. Specify `SAFE` when there is a concern about injection or improper interpretation of paths passed to the command. The default is no restriction on paths. |
| |
| STDERR |
| Optional.The values are `server` and `console`. |
| |
| This setting specifies how to handle standard error output from the transformation. The default, `server`, specifies that `gpfdist` will capture the standard error output from the transformation in a temporary file and send the first 8k of that file to HAWQ as an error message. The error message will appear as a SQL error. `Console` specifies that `gpfdist` does not redirect or transmit the standard error output from the transformation. |
| |
| |