blob: f375e5cbf1863394b9a2f74808f2e561f09bc3eb [file] [log] [blame]
---
title: Configuration File Format
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
The `gpfdist` configuration file uses the YAML 1.1 document format and implements a schema for defining the transformation parameters. The configuration file must be a valid YAML document.
The `gpfdist` program processes the document in order and uses indentation (spaces) to determine the document hierarchy and relationships of the sections to one another. The use of white space is significant. Do not use white space for formatting and do not use tabs.
The following is the basic structure of a configuration file.
``` pre
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
transformation_name1:
TYPE: input | output
COMMAND: command
CONTENT: data | paths
SAFE: posix-regex
STDERR: server | console
transformation_name2:
TYPE: input | output
COMMAND: command
...
```
VERSION
Required. The version of the `gpfdist` configuration file schema. The current version is 1.0.0.1.
TRANSFORMATIONS
Required. Begins the transformation specification section. A configuration file must have at least one transformation. When `gpfdist` receives a transformation request, it looks in this section for an entry with the matching transformation name.
TYPE
Required. Specifies the direction of transformation. Values are `input` or `output`.
- `input`: `gpfdist` treats the standard output of the transformation process as a stream of records to load into HAWQ.
- `output` <span class="ph">: </span> `gpfdist` treats the standard input of the transformation process as a stream of records from HAWQ to transform and write to the appropriate output.
COMMAND
Required. Specifies the command `gpfdist` will execute to perform the transformation.
For input transformations, `gpfdist` invokes the command specified in the `CONTENT` setting. The command is expected to open the underlying file(s) as appropriate and produce one line of `TEXT` for each row to load into HAWQ /&gt;. The input transform determines whether the entire content should be converted to one row or to multiple rows.
For output transformations, `gpfdist` invokes this command as specified in the `CONTENT` setting. The output command is expected to open and write to the underlying file(s) as appropriate. The output transformation determines the final placement of the converted output.
CONTENT
Optional. The values are `data` and `paths`. The default value is `data`.
- When `CONTENT` specifies `data`, the text `%filename%` in the `COMMAND` section is replaced by the path to the file to read or write.
- When `CONTENT` specifies `paths`, the text `%filename%` in the `COMMAND` section is replaced by the path to the temporary file that contains the list of files to read or write.
The following is an example of a `COMMAND` section showing the text `%filename%` that is replaced.
``` pre
COMMAND: /bin/bash input_transform.sh %filename%
```
SAFE
Optional. A `POSIX `regular expression that the paths must match to be passed to the transformation. Specify `SAFE` when there is a concern about injection or improper interpretation of paths passed to the command. The default is no restriction on paths.
STDERR
Optional.The values are `server` and `console`.
This setting specifies how to handle standard error output from the transformation. The default, `server`, specifies that `gpfdist` will capture the standard error output from the transformation in a temporary file and send the first 8k of that file to HAWQ as an error message. The error message will appear as a SQL error. `Console` specifies that `gpfdist` does not redirect or transmit the standard error output from the transformation.