| --- |
| title: hawq extract |
| --- |
| |
| Extracts the metadata of a specified table into a YAML file. |
| |
| ## Synopsis |
| |
| ``` pre |
| hawq extract [<connection_options>] [-o <output_file>] <tablename> |
| |
| hawq extract -? |
| |
| hawq extract --version |
| ``` |
| where: |
| |
| ``` pre |
| <connection_options> = |
| [-h <host>] |
| [-p <port>] |
| [-U <username>] |
| [-d <database>] |
| [-W] |
| ``` |
| |
| ## Description |
| |
| `hawq extract` is a utility that extracts a table's metadata into a YAML formatted file. HAWQ's InputFormat uses this YAML-formatted file to read a HAWQ file stored on HDFS directly into the MapReduce program. The YAML configuration file can also be used provide the metadata for registering files into HAWQ with the `hawq register` command. |
| |
| **Note:** |
| `hawq extract` is bound by the following rules: |
| |
| - You must start up HAWQ to use `hawq extract`. |
| - `hawq extract` only supports AO and Parquet tables. |
| - `hawq extract` supports partitioned tables, but does not support sub-partitions. |
| |
| ## Arguments |
| |
| <dt><tablename> </dt> |
| <dd>Name of the table that you need to extract metadata. You can use the format *namespace\_name.table\_name*.</dd> |
| |
| ## Options |
| |
| <dt>-o <output\_file> </dt> |
| <dd>Is the name of a file that `hawq extract` uses to write the metadata. If you do not specify a name, `hawq extract` writes to `stdout`.</dd> |
| |
| <dt>-v (verbose mode) </dt> |
| <dd>Displays the verbose output of the extraction process.</dd> |
| |
| <dt>-? (help) </dt> |
| <dd>Displays the online help.</dd> |
| |
| <dt>-\\\-version </dt> |
| <dd>Displays the version of this utility.</dd> |
| |
| **<connection_options>** |
| |
| <dt>-h <host> </dt> |
| <dd>Specifies the host name of the machine on which the HAWQ master database server is running. If not specified, it reads from the environment variable `$PGHOST` or defaults to `localhost`.</dd> |
| |
| <dt>-p <port> </dt> |
| <dd>Specifies the TCP port on which the HAWQ master database server is listening for connections. If not specified, reads from the environment variable `$PGPORT` or defaults to 5432.</dd> |
| |
| <dt>-U <username> </dt> |
| <dd>The database role name to connect as. If not specified, reads from the environment variable `$PGUSER` or defaults to the current system user name.</dd> |
| |
| <dt>-d <database> </dt> |
| <dd>The database to connect to. If not specified, it reads from the environment variable `$PGDATABASE` or defaults to `template1`.</dd> |
| |
| <dt>-W (force password prompt) </dt> |
| <dd>Force a password prompt. If not specified, reads the password from the environment variable `$PGPASSWORD` or from a password file specified by `$PGPASSFILE` or in `~/.pgpass`.</dd> |
| |
| ## Metadata File Format |
| |
| `hawq extract` exports the table metadata into a file using YAML 1.1 document format. The file contains various key information about the table, such as table schema, data file locations and sizes, partition constraints and so on. |
| |
| The basic structure of the metadata file is as follows: |
| |
| ``` pre |
| Version: string (1.0.0) |
| DBVersion: string |
| FileFormat: string (AO/Parquet) |
| TableName: string (schemaname.tablename) |
| DFS_URL: string (hdfs://127.0.0.1:9000) |
| Encoding: UTF8 |
| AO_Schema: |
| - name: string |
| type: string |
| Bucketnum: 6 |
| Distribution_policy: DISTRIBUTED RANDOMLY |
| |
| AO_FileLocations: |
| Blocksize: int |
| Checksum: boolean |
| CompressionType: string |
| CompressionLevel: int |
| PartitionBy: string ('PARTITION BY ...') |
| Files: |
| - path: string (/gpseg0/16385/35469/35470.1) |
| size: long |
| |
| Partitions: |
| - Blocksize: int |
| Checksum: Boolean |
| CompressionType: string |
| CompressionLevel: int |
| Name: string |
| Constraint: string (PARTITION Jan08 START (date '2008-01-01') INCLUSIVE) |
| Files: |
| - path: string |
| size: long |
| |
| Parquet_Schema: |
| - name: string |
| type: string |
| |
| Parquet_FileLocations: |
| RowGroupSize: long |
| PageSize: long |
| CompressionType: string |
| CompressionLevel: int |
| Checksum: boolean |
| EnableDictionary: boolean |
| PartitionBy: string |
| Files: |
| - path: string |
| size: long |
| Partitions: |
| - Name: string |
| RowGroupSize: long |
| PageSize: long |
| CompressionType: string |
| CompressionLevel: int |
| Checksum: boolean |
| EnableDictionary: boolean |
| Constraint: string |
| Files: |
| - path: string |
| size: long |
| ``` |
| |
| ## Example - Extracting an AO table |
| |
| Extract the `rank` table's metadata into a file named `rank_table.yaml`: |
| |
| ``` shell |
| $ hawq extract -o rank_table.yaml -d postgres rank |
| ``` |
| |
| **Output content in rank\_table.yaml** |
| |
| ``` pre |
| AO_FileLocations: |
| Blocksize: 32768 |
| Checksum: false |
| CompressionLevel: 0 |
| CompressionType: null |
| Files: |
| - path: /gpseg0/16385/35469/35692.1 |
| size: 0 |
| - path: /gpseg1/16385/35469/35692.1 |
| size: 0 |
| PartitionBy: PARTITION BY list (gender) |
| Partitions: |
| - Blocksize: 32768 |
| Checksum: false |
| CompressionLevel: 0 |
| CompressionType: null |
| Constraint: PARTITION girls VALUES('F') WITH (appendonly=true) |
| Files: |
| - path: /gpseg0/16385/35469/35697.1 |
| size: 0 |
| - path: /gpseg1/16385/35469/35697.1 |
| size: 0 |
| Name: girls |
| - Blocksize: 32768 |
| Checksum: false |
| CompressionLevel: 0 |
| CompressionType: null |
| Constraint: PARTITION boys VALUES('M') WITH (appendonly=true) |
| Files: |
| - path: /gpseg0/16385/35469/35703.1 |
| size: 0 |
| - path: /gpseg1/16385/35469/35703.1 |
| size: 0 |
| Name: boys |
| - Blocksize: 32768 |
| Checksum: false |
| CompressionLevel: 0 |
| CompressionType: null |
| Constraint: DEFAULT PARTITION other WITH appendonly=true) |
| Files: |
| - path: /gpseg0/16385/35469/35709.1 |
| size: 90071728 |
| - path: /gpseg1/16385/35469/35709.1 |
| size: 90071512 |
| Name: other |
| AO_Schema: |
| - name: id |
| type: int4 |
| - name: rank |
| type: int4 |
| - name: year |
| type: int4 |
| - name: gender |
| type: bpchar |
| - name: count |
| type: int4 |
| DFS_URL: hdfs://127.0.0.1:9000 |
| Distribution_policy: DISTRIBUTED RANDOMLY |
| Encoding: UTF8 |
| FileFormat: AO |
| TableName: public.rank |
| Version: 1.0.0 |
| ``` |
| |
| ## Example - Extracting a Parquet table |
| |
| Extract the `orders` table's metadata into a file named `orders.yaml`: |
| |
| ``` shell |
| $ hawq extract -o orders.yaml -d postgres orders |
| ``` |
| |
| **Output content in orders.yaml** |
| |
| ``` pre |
| DFS_URL: hdfs://127.0.0.1:9000 |
| Encoding: UTF8 |
| FileFormat: Parquet |
| TableName: public.orders |
| Version: 1.0.0 |
| Parquet_FileLocations: |
| Checksum: false |
| CompressionLevel: 0 |
| CompressionType: none |
| EnableDictionary: false |
| Files: |
| - path: /hawq-data/gpseg0/16385/16626/16657.1 |
| size: 0 |
| - path: /hawq-data/gpseg1/16385/16626/16657.1 |
| size: 0 |
| PageSize: 1048576 |
| PartitionBy: PARTITION BY range (o_orderdate) |
| Partitions: |
| - Checksum: false |
| CompressionLevel: 0 |
| CompressionType: none |
| Constraint: PARTITION p1_1 START ('1992-01-01'::date) END ('1994-12-31'::date) |
| EVERY ('3 years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576, |
| rowgroupsize=8388608, compresstype=none, compresslevel=0) |
| EnableDictionary: false |
| Files: |
| - path: /hawq-data/gpseg0/16385/16626/16662.1 |
| size: 8140599 |
| - path: /hawq-data/gpseg1/16385/16626/16662.1 |
| size: 8099760 |
| Name: orders_1_prt_p1_1 |
| PageSize: 1048576 |
| RowGroupSize: 8388608 |
| - Checksum: false |
| CompressionLevel: 0 |
| CompressionType: none |
| Constraint: PARTITION p1_11 START ('1995-01-01'::date) END ('1997-12-31'::date) |
| EVERY ('e years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576, |
| rowgroupsize=8388608, compresstype=none, compresslevel=0) |
| EnableDictionary: false |
| Files: |
| - path: /hawq-data/gpseg0/16385/16626/16668.1 |
| size: 8088559 |
| - path: /hawq-data/gpseg1/16385/16626/16668.1 |
| size: 8075056 |
| Name: orders_1_prt_p1_11 |
| PageSize: 1048576 |
| RowGroupSize: 8388608 |
| - Checksum: false |
| CompressionLevel: 0 |
| CompressionType: none |
| Constraint: PARTITION p1_21 START ('1998-01-01'::date) END ('2000-12-31'::date) |
| EVERY ('3 years'::interval) WITH (appendonly=true, orientation=parquet, pagesize=1048576, |
| rowgroupsize=8388608, compresstype=none, compresslevel=0) |
| EnableDictionary: false |
| Files: |
| - path: /hawq-data/gpseg0/16385/16626/16674.1 |
| size: 8065770 |
| - path: /hawq-data/gpseg1/16385/16626/16674.1 |
| size: 8126669 |
| Name: orders_1_prt_p1_21 |
| PageSize: 1048576 |
| RowGroupSize: 8388608 |
| RowGroupSize: 8388608 |
| Parquet_Schema: |
| - name: o_orderkey |
| type: int8 |
| - name: o_custkey |
| type: int4 |
| - name: o_orderstatus |
| type: bpchar |
| - name: o_totalprice |
| type: numeric |
| - name: o_orderdate |
| type: date |
| - name: o_orderpriority |
| type: bpchar |
| - name: o_clerk |
| type: bpchar |
| - name: o_shippriority |
| type: int4 |
| - name: o_comment |
| type: varchar |
| Distribution_policy: DISTRIBUTED RANDOMLY |
| ``` |
| |
| ## See Also |
| |
| [hawq load](hawqload.html#topic1), [hawq register](hawqregister.html#topic1) |
| |
| |