| COMMAND NAME: hawq register |
| |
| Usage1: Register parquet files generated by other system into corrsponding table in HAWQ. |
| Usage2: Register parquet/ao table from yaml configuration file. |
| |
| ***************************************************** |
| SYNOPSIS |
| ***************************************************** |
| Usage1: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f filepath] [-e eof] [-l log_directory] <tablename> |
| Usage2: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c config] [-F, --force] [-l log_directory] <tablename> |
| |
| hawq register help |
| hawq register -? |
| |
| hawq register --version |
| |
| ***************************************************** |
| DESCRIPTION |
| ***************************************************** |
| Use Case1: |
| "hawq register" is an utility to register file(s) on HDFS into |
| the table in HAWQ. It moves file(s) in the path(if path |
| refers to a file) or files under the path(if path refers to a |
| directory) into the table directory corresponding to the table, |
| and then update the table meta data to include the files. |
| |
| To use "hawq register", HAWQ must have been started. |
| |
| Currently "hawq register" supports parquet tables only. |
| User have to make sure that the meta data of the parquet file(s) |
| and the table are consistent. |
| The table to be registered into should not be hash distributed, which |
| is created by using "distributed by" statement when creating that table. |
| The file(s) to be registered and the table in HAWQ must be in the |
| same HDFS cluster. |
| |
| Use Case2: |
| Hawq register can register both AO and parquet format table, and the files to be registered are listed in the .yml configuration file. |
| This configuration file can be generated by hawq extract. Register through .yml configuration doesn’t require the table already exist, |
| since .yml file contains table schema already. |
| HAWQ register behaviors differently with different options: |
| * If the table does not exist, hawq register will create table and do register. |
| * If table already exist, hawq register will append the files to the existing table. |
| * If --force option specified, hawq register will erase existing catalog |
| table pg_aoseg.pg_aoseg_$relid/pg_aoseg.pg_paqseg_$relid data for the table and |
| re-register according to .yml configuration file definition. Note. If there are |
| files under table directory which are not specified in .yml configuration file, it will throw error out. |
| Note. Without --force specified, if some file specified in .yml configuration file lie under the table directory, |
| hawq register will throw error out. |
| Note. With --force option specified, if there are files under table directory which are not specified in .yml configuration file, |
| hawq register will throw error out. |
| Note. In usage2, if the table is hash distributed, hawq register just check the file number to be registered |
| has to be multiple times of this table’s bucket number, and check whether the distribution key specified |
| in .yml configuration file is same as that of table. It does not check whether files are actually distributed by the key. |
| Note. To register a hash distributed table through yaml file , please make sure the order of the files in yaml keeps the hash distribution. |
| |
| To use "hawq register", HAWQ must have been started. |
| Currently "hawq register" supports both AO and Parquet formats in this case. |
| The partition table is not supported in this version, and we will support it soon. |
| |
| ***************************************************** |
| Arguments |
| ***************************************************** |
| <tablename> |
| |
| Name of the table to be registered into. |
| |
| ***************************************************** |
| OPTIONS |
| ***************************************************** |
| -? (help) |
| |
| Displays the online help. |
| |
| --version |
| |
| Displays the version of this utility. |
| |
| -l log_directory |
| |
| Specifies the name of the directory where hawq register log files will be stored |
| |
| ***************************************************** |
| CONNECTION OPTIONS |
| ***************************************************** |
| -h hostname |
| |
| Specifies the host name of the machine on which the HAWQ master |
| database server is running. If not specified, reads from the |
| environment variable $PGHOST which defaults to localhost. |
| |
| -p port |
| |
| Specifies the TCP port on which the HAWQ master database server |
| is listening for connections. If not specified, reads from the |
| environment variable $PGPORT which defaults to 5432. |
| |
| -U username |
| |
| The database role name to connect as. If not specified, reads |
| from the environment variable $PGUSER which defaults to the current |
| system user name. |
| |
| ***************************************************** |
| EXAMPLE FOR USAGE1 |
| ***************************************************** |
| Run "hawq register" to register a parquet file in HDFS with path |
| 'hdfs://localhost:8020/temp/hive.paq' generated by hive into table |
| 'parquet_table' in HAWQ, which is in the database named 'postgres'. |
| |
| Assume the location of the database is 'hdfs://localhost:8020/hawq_default', |
| tablespace id is '16385', database id is '16387', table filenode id is '77160', |
| last file under the filenode numbered '7'. |
| |
| $ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq parquet_table |
| |
| This will move the file 'hdfs://localhost:8020/temp/hive.paq' into the corresponding |
| new place 'hdfs://localhost:8020/hawq_default/16385/16387/77160/8' in HDFS, then |
| update the meta data of the table 'parquet_table' in HAWQ which is in the |
| table 'pg_aoseg.pg_paqseg_77160'. |
| |
| ***************************************************** |
| EXAMPLE FOR USAGE2 |
| ***************************************************** |
| This example shows hawq register functionality of hawq register according to yml configuration file. |
| Usually the yml configuration file is generated by hawq extract. |
| This example shows the life cycle of hawq extract and hawq register. |
| |
| Firstly, create a table and insert some data into it: |
| $ psql -c "create table paq1(a int, b varchar(10))with(appendonly=true, orientation=parquet);" |
| $ psql -c "insert into paq1 values(generate_series(1,1000), 'abcde');" |
| |
| Secondly, extract the table metadata information out: |
| $ hawq extract -o paq1.yml paq1 |
| |
| Thirdly, register to new table paq2 identifying yml file: |
| $ hawq register --config paq1.yml paq2 |
| |
| Finally, select the new table to look at whether the content has already been registered. |
| $ select count(*) from paq2; |
| |
| In the above example, the final result should be return 1000. |
| |
| ***************************************************** |
| DATA TYPES |
| ***************************************************** |
| The data types used in HAWQ and parquet format are not the same, so there is a |
| mapping between them, concluded as follow: |
| |
| Data types in HAWQ Data types in parquet |
| bool boolean |
| int2 int32 |
| int4 int32 |
| date int32 |
| int8 int64 |
| time int64 |
| timestamptz int64 |
| timestamp int64 |
| money int64 |
| float4 float |
| float8 double |
| bit byte_array |
| varbit byte_array |
| byte byte_array |
| numeric byte_array |
| name byte_array |
| char byte_array |
| bpchar byte_array |
| varchar byte_array |
| text byte_array |
| xml byte_array |
| timetz byte_array |
| interval byte_array |
| macaddr byte_array |
| inet byte_array |
| cidr byte_array |