blob: 702ae0445d00be33f588c344537adb65ad0aa7a1 [file] [log] [blame]
---
title: Loading Data with hawq load
---
The HAWQ `hawq load` utility loads data using readable external tables and the HAWQ parallel file server ( `gpfdist` or `gpfdists`). It handles parallel file-based external table setup and allows users to configure their data format, external table definition, and `gpfdist` or `gpfdists` setup in a single configuration file.
## To use hawq load<a id="topic62__du168147"></a>
1. Ensure that your environment is set up to run `hawq load`. Some dependent files from your HAWQ /&gt; installation are required, such as `gpfdist` and Python, as well as network access to the HAWQ segment hosts.
2. Create your load control file. This is a YAML-formatted file that specifies the HAWQ connection information, `gpfdist` configuration information, external table options, and data format.
For example:
``` pre
---
VERSION: 1.0.0.1
DATABASE: ops
USER: gpadmin
HOST: mdw-1
PORT: 5432
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- etl1-1
- etl1-2
- etl1-3
- etl1-4
PORT: 8081
FILE:
- /var/load/data/*
- COLUMNS:
- name: text
- amount: float4
- category: text
- description: text
- date: date
- FORMAT: text
- DELIMITER: '|'
- ERROR_LIMIT: 25
- ERROR_TABLE: payables.err_expenses
OUTPUT:
- TABLE: payables.expenses
- MODE: INSERT
SQL:
- BEFORE: "INSERT INTO audit VALUES('start', current_timestamp)"
- AFTER: "INSERT INTO audit VALUES('end', current_timestamp)"
```
3. Run `hawq load`, passing in the load control file. For example:
``` pre
hawq load -f my_load.yml
```