| # @(#)README 2.4.0 |
| |
| Table of Contents |
| =================== |
| 0. What is this document? |
| 1. What is DBGEN? |
| 2. What will DBGEN create? |
| 3. How is DBGEN built? |
| 4. Command Line Options for DBGEN |
| 5. Building Large Data Sets with DBGEN |
| 6. DBGEN limitations and compliant usage |
| 7. Sample DBGEN executions |
| 8. What is QGEN? |
| 9. What will QGEN create? |
| 10. How is QGEN built? |
| 11. Command Line Options for QGEN |
| 12. Query Template Syntax |
| 13. Sample QGEN executions and Query Templates |
| 14. Environment variable |
| 15. Version Numbering in DBGEN and QGEN |
| 16. Validated Platforms |
| |
| 0. What is this document? |
| |
| This is the general README file for DBGEN and QGEN, the data- |
| base population and executable query text generation programs |
| used in the TPC-H benchmark. It covers the proper use |
| of DBGEN and QGEN. For information on porting the utility to your |
| particular platform see Porting.Notes. |
| |
| 1. What is DBGEN? |
| |
| DBGEN is a database population program for use with the TPC-H benchmark. |
| It is written in ANSI 'C' for portability, and has |
| been successfully ported to over a dozen different systems. While the |
| TPC-H specification allow an implementor to use any utility |
| to populate the benchmark database, the resultant population must exactly |
| match the output of DBGEN. The source code has been provided to make the |
| process of building a compliant database population as simple as possible. |
| |
| 2. What will DBGEN create? |
| |
| Without any command line options, DBGEN will generate 8 separate ascii |
| files. Each file will contain pipe-delimited load data for one of the |
| tables defined in the TPC-H database schema. The default tables |
| will contain the load data required for a scale factor 1 database. By |
| default the file will be created in the current directory and be |
| named <table>.tbl. As an example, customer.tbl will contain the |
| load data for the customer table. |
| |
| When invoked with the '-U' flag, DBGEN will create the data sets to be |
| used in the update functions and the SQL syntax required to delete the |
| data sets. The update files will be created in the same directory as |
| the load data files and will be named "u_<table>.set". The delete |
| syntax will be written to "delete.set". For instance, the data set to |
| be used in the third query set to update the lineitem table will be |
| named "u_lineitem.tbl.3", and the SQL to remove those rows will be |
| found in "delete.3". The size of the update files can be controlled |
| with the '-r' flag. |
| |
| 3. How is DBGEN built? |
| |
| Create an appropriate makefile, using makefile.suite as a basis, |
| and type make. Refer to Porting.Notes for more details and for |
| suggested compile time options. |
| |
| 4. Command Line Options for DBGEN |
| |
| DBGEN's output is controlled by a combination of command line options |
| and environment variables. Command line options are assumed to be single |
| letter flags preceded by a minus sign. They may be followed by an |
| optional argument. |
| |
| option argument default action |
| ------ -------- ------- ------ |
| -h Display a usage summary |
| |
| -f none Force. Existing data files will be |
| overwritten. |
| |
| -F none yes Flat file output. |
| |
| -D none Direct database load. ld_XXXX() routines |
| must be defined in load_stub.c |
| |
| -s <scale> 1 Scale of the database population. Scale |
| 1.0 represents ~1 GB of data |
| |
| -T <table> Generate the data for a particular table |
| ONLY. Arguments: p -- part/partuspp, |
| c -- customer, s -- supplier, |
| o -- orders/lineitem, n -- nation, r -- region, |
| l -- code (same as n and r), |
| O -- orders, L -- lineitem, P -- part, |
| S -- partsupp |
| |
| -O d Generate SQL for delete function |
| instead of key ranges |
| |
| -O f Allow over-ride of default output file |
| names |
| |
| -O h Generate headers in flat ascii files. |
| hd_XXX routines must be defined in |
| load_stub.c |
| |
| -O m Flat files generate fixed length records |
| |
| -O r Generate key ranges for the UF2 update |
| function |
| |
| -O v Verify data set without generating it. |
| |
| -r <percentage> 10 Scale each udpate file to the given |
| percentage (expressed in basis points) |
| of the data set |
| |
| -v none Verbose. Progress messages are |
| displayed as data is generated. |
| |
| -n <name> Use database <name> for in-line load |
| |
| -C <children> Use <children> separate processes to |
| generate data |
| |
| -S <n> Generate the <n>th part of a multi-part load |
| or update set |
| |
| -U <updates> Create a specified number of data sets |
| in flat files for the update/delete |
| functions |
| |
| -i <n> Split the inserted rows in an refresh pair |
| between <n> files |
| |
| -d <n> Split the deleted rows in an refresh pair |
| between <n> files |
| |
| 5. DBGEN limitations and compliant usage |
| |
| DBGEN is meant to be a robust population generator for use with the |
| TPC-H benchmark. It is hoped that DBGEN will make it easier |
| to experiment with and become proficient in the execution of TPC decision |
| support benchmarks. As a result, it includes a number of command line |
| options which are not, strictly speaking, necessary to generate a compliant |
| data set for a TPC-D run. In addition, some command line options will accept |
| arguments which result in the generation of NON-COMPLIANT data sets. Options |
| which should be used with care include: |
| |
| -s -- scale factor. TPC-H runs are only compliant when run against SF's |
| of 1, 10, 100, 300, 1000, 3000, 10000, 30000, 100000 |
| -r -- refresh percentage. TPC-H runs are only compliant when run with |
| -r 10, the default. |
| |
| 6. Sample DBGEN executions |
| |
| DBGEN has been built to allow as much flexibility as possible, but is |
| fundementally intended to generate two things: a database population |
| against which the queries in TPC-H can be run, and the updates |
| that are used during the update functions in TPC-H. Here are |
| some sample uses of DBGEN. |
| |
| 1. To generate the database population for the qualification database |
| dbgen -s 1 |
| 2. To generate the lineitem table only, for a scale factor 10 database, |
| and over-write any existing flat files: |
| dbgen -s 10 -f -T L |
| 4. To geterate a 100GB data set in 1GB pieces, generate only the part and |
| partsupplier tables, and include some progress reports along the way: |
| dbgen -s 100 -S 1 -C 100 -T p -v (to generate the first 1GB file) |
| dbgen -s 100 -S 2 -C 100 -T p -v (to generate the second 1GB file) |
| (and so on, incrementing the argument to -S each time) |
| 5. To generate the update files needed for a 4 stream run of the throughput |
| test at 100 GB, using an existing set of seed files from an 8 process |
| load: |
| dbgen -s 100 -U 4 -C 8 |
| |
| |
| 7. What is QGEN? |
| |
| QGEN is a query generation program for use with the TPC-H benchmark. |
| It is written in ANSI 'C' for portability, and has been successfully |
| ported to over a dozen different systems. While the benchmark specifications |
| allow an implementor to use any utility to create the benchmark query |
| sets, QGEN has been provided to make the process of building |
| a benchmark implementation as simple as possible. |
| |
| 8. What will QGEN create? |
| |
| QGEN is a filter, triggered by :'s. It does line-at-a-time reads of its |
| input (more on that later), scanning for :foo, where foo determines the |
| substitution that occurs. Including: |
| |
| :<int> replace with the appropriate value for parameter <int> |
| :b replace with START_TRAN (from tpcd.h) |
| :c replace with SET_DBASE (from tpcd.h) |
| :n<int> replace with SET_ROWCOUNT(<int>) (from tpcd.h) |
| :o replace with SET_OUTPUT (from tpcd.h) |
| :q replace with query number |
| :s replace with stream number |
| :x replace with GEN_QUERY_PLAN (from tpcd.h) |
| |
| Qgen takes an assortment of command line options, controlling which of these |
| options should be active during the translation from template to EQT, and a |
| list of query "names". It then translates the template found in |
| $DSS_QUERY/<name>.sql and puts the result of stdout. |
| |
| Here is a sample query template: |
| |
| { Sccsid: @(#)1.sql 9.1.1.1 1/25/95 10:51:56 } |
| :n 0 |
| :o |
| select |
| l_returnflag, |
| l_linestatus, |
| sum(l_quantity) as sum_qty, |
| sum(l_extendedprice) as sum_base_price, |
| sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, |
| sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, |
| avg(l_quantity) as avg_qty, |
| avg(l_extendedprice) as avg_price, |
| avg(l_discount) as avg_disc, |
| count(*) as count_order |
| from lineitem |
| where l_shipdate <= date '1998-12-01' - interval :1 day |
| group by l_returnflag, l_linestatus |
| order by l_returnflag, l_linestatus; |
| |
| And here is what is generated: |
| $ qgen -d 1 |
| |
| {return 0 rows} |
| |
| select |
| l_returnflag, |
| l_linestatus, |
| sum(l_quantity) as sum_qty, |
| sum(l_extendedprice) as sum_base_price, |
| sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, |
| sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, |
| avg(l_quantity) as avg_qty, |
| avg(l_extendedprice) as avg_price, |
| avg(l_discount) as avg_disc, |
| count(*) as count_order |
| from lineitem |
| where l_shipdate <= date('1998-12-01') - interval (90) day to day |
| group by l_returnflag, l_linestatus |
| order by l_returnflag, l_linestatus; |
| |
| See "Query Template Syntax" below for more detail on converting your prefered query |
| phrasing for use with QGEN. |
| |
| 9. How is QGEN built? |
| |
| QGEN is built by the same makefile that creates DBGEN. If the makefile |
| is successfully creating DBGEN, no further compilation modifications |
| should be necessary. You may need to modify some of the options which |
| allow QGEN to integrate with your preferred query tool. Refer to |
| Porting.Notes for more detail. |
| |
| 10. Command Line Options for QGEN |
| |
| Like DBGEN, QGEN is controlled by a combination of command line options |
| and environment variables (See "Environment Variables", below for more |
| detail). Command line options are assumed to be single |
| letter flags preceded by a minus sign. They may be followed by an |
| optional argument. |
| |
| option argument default action |
| ------ -------- ------- ------ |
| -c none Retain comments in translation of template to |
| EQT |
| |
| -d none Default. Use the parameter substitutions |
| required for query validation |
| |
| -h Display a usage summary |
| |
| -i <file> Use contents of <file> to init a query stream |
| |
| -l <file> Save query parameters to <file> |
| |
| -n <name> Use database <name> for queries |
| |
| -N Always use default rowcount, and ignore :n directives |
| |
| -o <path> Save query n's output in <path>/n.<stream> |
| Uses -p option, and uses :o tag |
| |
| -p <stream> Use the query permutation defined for |
| stream <stream>. If this option is |
| omited, EQT will be generated for the |
| queries named on the command line. |
| |
| -r <n> Seed the rnadom number generator with <n> |
| |
| -s <n> Set scale to <n> for parameter |
| substitutions. |
| |
| -t <file> Use contents of <file> to complete a query |
| stream |
| |
| -T none Use time table format for date substitution |
| |
| -v none Verbose. Progress messages are |
| displayed as data is generated. |
| |
| -x none Generate a query plan as part of query |
| execution. |
| |
| 11. Query Template Syntax |
| |
| QGEN is a simple ASCII text filter, meant to translate query generalized |
| query syntax("query template") into the executable query text(EQT) re- |
| quired by the benchmarks. It provides a number of shorthands and syntactic |
| extensions that allow the automatic generation of query parameters and some |
| control over the operation of the benchmark implementation. |
| |
| QGEN first strips all comments from the query template, recognizing both |
| {comment} and --comment styles. Next it traverses the query template |
| one line at a time, locating required substitution points, called |
| parameter tags. The values substituted for a given tag are summarized |
| below. QGEN does not support nested substitutions. That is, if |
| the text substituted for tag itself contains a valid tag the second tag |
| will not be expanded. |
| |
| Tag Converted To Based on |
| === ============ ======== |
| :c database <dbname>;(1) -n from the command line |
| :x set explain on;(1) -x from the command line |
| :<number> paremeter <number> |
| :s stream number |
| :o output to outpath/qnum.stream;(1) |
| -o from command line, -s from |
| command line |
| :b BEGIN WORK;(1) -a from comand line |
| :e COMMIT WORK(1) -a from command line |
| :q query number |
| :n <number> sets rowcount to be returned |
| to <number>, unless -N appears on the command line |
| |
| Notes: |
| (1) This is Informix-specific syntax. Refer to Porting.Notes for |
| tailoring the generated text to your database environment. |
| |
| 12. Sample QGEN executions and Query Templates |
| |
| QGEN translates generic query templates into valid SQL. In addition, it |
| allows conditional inclusion of the commands necessary to connect to a |
| database, produce diagnostic output, etc. Here are some sample of QGEN |
| usage, and the way that command line parameters and the query templates |
| interact to produce valid SQL. |
| |
| Template, in $DSS_QUERY/1.sql: |
| :c |
| :o |
| select count(*) from foo; |
| :x |
| select count(*) from lineitem |
| where l_orderdate < ':1'; |
| |
| 1. "qgen 1", would produce: |
| select count(*) from foo; |
| select count(*) from lineitem |
| where l_orderdate < '1997-01-01'; |
| Assuming that 1 January 1997 was a valid substitution for parameter 1. |
| |
| 2. "qgen -d -c dss1 1, would produce: |
| database dss1; |
| select count(*) from foo; |
| select count(*) from lineitem |
| where l_orderdate < '1995-07-18'; |
| Assuming that 18 July 1995 was the default substitution for parameter 1, |
| and using Informix syntax. |
| |
| 3. "qgen -d -c dss1 -x -o somepath 1, would produce: |
| database dss1; |
| output to "somepath/1.0" |
| select count(*) from foo; |
| set explain on; |
| select count(*) from lineitem |
| where l_orderdate < '1995-07-18'; |
| Assuming that 18 July 1995 was the default substitution for parameter 1, |
| and using Informix syntax. |
| |
| |
| 13. Environment Variables |
| |
| Enviroment variables are used to control features of DBGEN and QGEN |
| which are unlikely to change from one execution to another. |
| |
| Variable Default Action |
| ------- ------- ------ |
| DSS_PATH . Directory in which to build flat files |
| DSS_CONFIG . Directory in which to find configuration files |
| DSS_DIST dists.dss Name of distribution definition file |
| DSS_QUERY . Directory in which to find query templates |
| |
| 14. Version Numbering in DBGEN and QGEN |
| |
| DBGEN and QGEN use a common version numbering algorithm. Each executable |
| is stamped with a version number which is displayed in the usage messages |
| available with the '-h' option. A version number is of the form: |
| |
| V.R.P.M |
| | | | | |
| | | | | |
| | | | | |
| | | | -- modification: alphabetic, incremented for any trivial changes |
| | | | to the source (e.g, porting ifdef's) |
| | | ---- patch level: numeric, incremented for any minor bug fix |
| | | (e.g, qgen parameter range) |
| | ------- release: numeric, incremented for each minor revision of the |
| | specification |
| |-------- version: numeric, incremented for each major revision of the |
| specification |
| |
| An implementation of TPC-H is valid only if it conforms to the |
| following version usage rules: |
| |
| -- The Version of DBGEN and QGEN must match the integer portion of the |
| current specification revision |
| |
| 15. The current revisions are: |
| DBGEN: 2.4.0 |
| QGEN: 2.4.0 |
| |
| 16. Validated Platforms |
| The following platforms have been validated to produce the reference |
| data set for TPC-H 2.4.0 |
| Processor Operating System (version) Compiler (version) Compiler Flags |
| ---------------------------------------------------------------------------- |
| POWER5 AIX 64-bit (5.3) C for AIX Compiler, v7 -q64 (no -g) |
| IA-64 HPUX 64-bit () icc |
| Linux 32-bit () gcc |
| |