blob: ad6689df94129ac129638e7280ad30afbb83bedd [file] [log] [blame]
COMMAND NAME: hawq register
Usage1: Register parquet files generated by other system into corrsponding table in HAWQ.
Usage2: Register parquet/ao table from yaml configuration file.
*****************************************************
SYNOPSIS
*****************************************************
Usage1: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f filepath] [-e eof] <tablename>
Usage2: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c config] [-F, --force] <tablename>
hawq register help
hawq register -?
hawq register --version
*****************************************************
DESCRIPTION
*****************************************************
Use Case1:
"hawq register" is an utility to register file(s) on HDFS into
the table in HAWQ. It moves file(s) in the path(if path
refers to a file) or files under the path(if path refers to a
directory) into the table directory corresponding to the table,
and then update the table meta data to include the files.
To use "hawq register", HAWQ must have been started.
Currently "hawq register" supports parquet tables only.
User have to make sure that the meta data of the parquet file(s)
and the table are consistent.
The table to be registered into should not be hash distributed, which
is created by using "distributed by" statement when creating that table.
The file(s) to be registered and the table in HAWQ must be in the
same HDFS cluster.
Use Case2:
Hawq register can register both AO and parquet format table, and the files to be registered are listed in the .yml configuration file.
This configuration file can be generated by hawq extract. Register through .yml configuration doesnt require the table already exist,
since .yml file contains table schema already.
HAWQ register behaviors differently with different options:
* If the table does not exist, hawq register will create table and do register.
* If table already exist, hawq register will append the files to the existing table.
* If --force option specified, hawq register will erase existing catalog
table pg_aoseg.pg_aoseg_$relid/pg_aoseg.pg_paqseg_$relid data for the table and
re-register according to .yml configuration file definition. Note. If there are
files under table directory which are not specified in .yml configuration file, it will throw error out.
Note. Without --force specified, if some file specified in .yml configuration file lie under the table directory,
hawq register will throw error out.
Note. With --force option specified, if there are files under table directory which are not specified in .yml configuration file,
hawq register will throw error out.
Note. In usage2, if the table is hash distributed, hawq register just check the file number to be registered
has to be multiple times of this tables bucket number, and check whether the distribution key specified
in .yml configuration file is same as that of table. It does not check whether files are actually distributed by the key.
Note. To register a hash distributed table through yaml file , please make sure the order of the files in yaml keeps the hash distribution.
To use "hawq register", HAWQ must have been started.
Currently "hawq register" supports both AO and Parquet formats in this case.
The partition table is not supported in this version, and we will support it soon.
*****************************************************
Arguments
*****************************************************
<tablename>
Name of the table to be registered into.
*****************************************************
OPTIONS
*****************************************************
-? (help)
Displays the online help.
--version
Displays the version of this utility.
*****************************************************
CONNECTION OPTIONS
*****************************************************
-h hostname
Specifies the host name of the machine on which the HAWQ master
database server is running. If not specified, reads from the
environment variable $PGHOST which defaults to localhost.
-p port
Specifies the TCP port on which the HAWQ master database server
is listening for connections. If not specified, reads from the
environment variable $PGPORT which defaults to 5432.
-U username
The database role name to connect as. If not specified, reads
from the environment variable $PGUSER which defaults to the current
system user name.
*****************************************************
EXAMPLE FOR USAGE1
*****************************************************
Run "hawq register" to register a parquet file in HDFS with path
'hdfs://localhost:8020/temp/hive.paq' generated by hive into table
'parquet_table' in HAWQ, which is in the database named 'postgres'.
Assume the location of the database is 'hdfs://localhost:8020/hawq_default',
tablespace id is '16385', database id is '16387', table filenode id is '77160',
last file under the filenode numbered '7'.
$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq parquet_table
This will move the file 'hdfs://localhost:8020/temp/hive.paq' into the corresponding
new place 'hdfs://localhost:8020/hawq_default/16385/16387/77160/8' in HDFS, then
update the meta data of the table 'parquet_table' in HAWQ which is in the
table 'pg_aoseg.pg_paqseg_77160'.
*****************************************************
EXAMPLE FOR USAGE2
*****************************************************
This example shows hawq register functionality of hawq register according to yml configuration file.
Usually the yml configuration file is generated by hawq extract.
This example shows the life cycle of hawq extract and hawq register.
Firstly, create a table and insert some data into it:
$ psql -c "create table paq1(a int, b varchar(10))with(appendonly=true, orientation=parquet);"
$ psql -c "insert into paq1 values(generate_series(1,1000), 'abcde');"
Secondly, extract the table metadata information out:
$ hawq extract -o paq1.yml paq1
Thirdly, register to new table paq2 identifying yml file:
$ hawq register --config paq1.yml paq2
Finally, select the new table to look at whether the content has already been registered.
$ select count(*) from paq2;
In the above example, the final result should be return 1000.
*****************************************************
DATA TYPES
*****************************************************
The data types used in HAWQ and parquet format are not the same, so there is a
mapping between them, concluded as follow:
Data types in HAWQ Data types in parquet
bool boolean
int2 int32
int4 int32
date int32
int8 int64
time int64
timestamptz int64
timestamp int64
money int64
float4 float
float8 double
bit byte_array
varbit byte_array
byte byte_array
numeric byte_array
name byte_array
char byte_array
bpchar byte_array
varchar byte_array
text byte_array
xml byte_array
timetz byte_array
interval byte_array
macaddr byte_array
inet byte_array
cidr byte_array