blob: 3aabfc2298eb28404ef784ee0fa5fb0d64a3a1e2 [file] [log] [blame]
**************************************************************************
ABOUT THE HAWQ GPFDIST TRANSFORM DEMOS
**************************************************************************
This package contains example programs of HAWQ gpfdist transformations:
1. dblp - This example demonstrates loading and extracting
database research data.
Specification File: 1_dblp.yaml
Config Files: config.yaml
Data Files: data/dblp.xml (downloaded)
2. mef - Example loading IRS Modernized eFile format data
Specification File: 2_mef.yaml
Config Files: config.yaml
Data Files: data/RET990EZ_2006.xml
3. rig - Example loading WITSML oil well data
Specification File: 3_rig.yaml
Config Files: config.yaml
Data Files: data/rig.xml (downloaded)
**************************************************************************
BEFORE YOU BEGIN
**************************************************************************
1. Download the demo data and jar files from www.cs.washington.edu,
w3.energistics.org and sourceforge.net:
$ cd data
$ make
2. Create a database to use for the HAWQ gpfdist transform demos:
$ createdb gptransform
**************************************************************************
RUNNING DEMO 1: input transformation via hawq load
**************************************************************************
DBLP example
------------
1. Edit the 1_dblp.yaml file and change the LOCAL_HOSTNAME setting
reflect the name of the host (e.g. myhostname) you where you will
run hawq load.
LOCAL_HOSTNAME: [ myhostname ]
HAWQ will attempt to connect to this host to access the data.
2. Create the dblp_thesis table
$ psql -d gptransform -f dblp/create.sql
3. load the data
$ hawq load -f 1_dblp.yaml
4. review some of the data
$ psql -d gptransform -c "select * from dblp_thesis limit 5;"
MeF example
------------
5. Edit the 2_mef.yaml file and change the LOCAL_HOSTNAME setting
reflect the name of the host (e.g. myhostname) you where you will
run hawq load.
LOCAL_HOSTNAME: [ myhostname ]
HAWQ will attempt to connect to this host to access the data.
6. Create the mef_xml table
$ psql -d gptransform -f mef/create.sql
7. load the data
$ hawq load -f 2_mef.yaml
8. review the data
$ psql -d gptransform -c "select id, substring(text(doc) from 1 for 200) as doctext from mef_xml;"
WITSML example
--------------
9. Edit the 3_rig.yaml file and change the LOCAL_HOSTNAME setting
reflect the name of the host (e.g. myhostname) you where you will
run hawq load.
LOCAL_HOSTNAME: [ myhostname ]
HAWQ will attempt to connect to this host to access the data.
10. Create the rig_xml table
$ psql -d gptransform -f rig/create.sql
11. load the data
$ hawq load -f 3_rig.yaml
12. review some data
$ psql -d gptransform -c "select well_uid, well_name, rig_uid, rig_name, rig_owner from rig_xml;"
**************************************************************************
RUNNING DEMO 2: input/output transformations via gpfdist
**************************************************************************
1. Edit the dblp/external.sql file and change the location settings
to reflect the name of the host (e.g. myhostname) you where you will
run gpfdist.
location ('gpfdist://myhostname:8080/data/dblp.xml#transform=dblp_input')
HAWQ will attempt to connect to this host to read and write the data.
2. in a separate window, start gpfdist:
$ gpfdist -c config.yaml
in this example we run gpfdist in the foreground in a separate window
until we're done.
3. Re-create the dblp_thesis table
$ psql -d gptransform -f dblp/create.sql
4. Create readable and writable external tables
$ psql -d gptransform -f dblp/external.sql
5. load the data
$ psql -d gptransform -c "insert into dblp_thesis select * from dblp_thesis_readable;"
6. review some of the data
$ psql -d gptransform -c "select * from dblp_thesis limit 5;"
7. extract the data
$ psql -d gptransform -c "insert into dblp_thesis_writable select * from dblp_thesis;"
This will create a file called 'data/out.dblp_thesis.xml'
**************************************************************************
DEMO CLEANUP
**************************************************************************
After you have run the demos, run the following commands to clean up:
1. stop the gpfdist started in step 2 of DEMO 2.
^C
$
2. drop the demo database
$ dropdb gptransform
3. (optional) remove the downloaded files
$ cd data
$ make clean