# Licensed to the Apache Software Foundation (ASF) under one | |
# or more contributor license agreements. See the NOTICE file | |
# distributed with this work for additional information | |
# regarding copyright ownership. The ASF licenses this file | |
# to you under the Apache License, Version 2.0 (the | |
# "License"); you may not use this file except in compliance | |
# with the License. You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, | |
# software distributed under the License is distributed on an | |
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
# KIND, either express or implied. See the License for the | |
# specific language governing permissions and limitations | |
# under the License. | |
# @(#)PORTING.NOTES 2.1.8.1 | |
Table of Contents | |
================== | |
1. General Program Structure | |
2. Naming Conventions and Variable Usage | |
3. Porting Procedures | |
4. Compilation Options | |
5. Customizing QGEN | |
6. Further Enhancements | |
7. Known Porting Problems | |
8. Reporting Problems | |
1. General Program Structure | |
The code provided with TPC-H and TPC-R benchmarks includes a database | |
population generator (DBGEN) and a query template translator(QGEN). It | |
is written in ANSI-C, and is meant to be easily portable to a broad variety | |
of platforms. The program is composed of five source files and some | |
support and header files. The main modules are: | |
build.c: each table in the database schema is represented by a | |
routine mk_XXXX, which populates a structure | |
representing one row in table XXXX. | |
See Also: dss_types.h, bm_utils.c, rnd.* | |
print.c: each table in the database schema is represented by a | |
routine pr_XXXX, which prints the contents of a | |
structure representing one row in table XXX. | |
See Also: dss_types.h, dss.h | |
driver.c: this module contains the main control functions for | |
DBGEN, including command line parsing, distribution | |
management, database scaling and the calls to mk_XXXX | |
and pr_XXXX for each table generated. | |
qgen.c: this module contains the main control functions for | |
QGEN, including query template parsing. | |
varsub.c: each query template includes one or more parameter | |
substitution points; this routine handles the | |
parameter generation for the TPC-H/TPC-R benchmark. | |
The support utilities provide a generalized set of functions for data | |
generation and include: | |
bm_utils.c: data type generators, string management and | |
portability routines. | |
rnd.*: a general purpose random number generator used | |
throughout the code. | |
dss.h: | |
shared.h: a set of '#defines' for limits, formats and fixed | |
values | |
dsstypes.h: structure definitions for each table definition | |
2. Naming Conventions and Variable Usage | |
Since DBGEN will be maintained by a large number of people, it is | |
particularly important to observe the coding, variable naming and usage | |
conventions detailed here. | |
#define | |
-------- | |
All #define directives are found in header files (*.h). In general, | |
the header files segregate variables and macros as follows: | |
rnd.h -- anything exclusively referenced by rnd.c | |
dss.h -- general defines for the benchmark, including *all* | |
extern declarations (see below). | |
shared.h -- defines related to the tuple definitions in | |
dsstypes.h. Isolated to ease automatic processing needed by many | |
direct load routines (see below). | |
dsstypes.h -- structure definitons and typedef directives to | |
detail the contents of each table's tuples. | |
config.h -- any porting and configuration related defines should | |
go here, to localize the changes necessary to move the suite | |
from one machine to another. | |
tpcd.h -- defines related to QGEN, rather than DBGEN | |
extern | |
------ | |
DBGEN and QGEN make extensive use of extern declarations. This could | |
probably stand to be changed at some point, but has made the rapid | |
turnaround of prototypes easier. In order to be sure that each | |
declaration was matched by exactly one definition per executatble, | |
they are all declared as EXTERN, a macro dependent on DECLARER. In | |
any module that defines DECLARER, all variables declared EXTERN will | |
be defined as globals. DECLARER should be declared only in modules | |
containing a main() routine. | |
Naming Conventions | |
------------------ | |
defines | |
o All defines use upper case | |
o All defines use a table prefix, if appropriate: | |
O_* relates to orders table | |
L_* realtes to lineitem table | |
P_* realtes to part table | |
PS_* relates to partsupplier table | |
C_* realtes to customer table | |
S_* relates to supplier table | |
N_* relates to nation table | |
R_* realtes to region table | |
T_* relates to time table | |
o All defines have a usage prefix, if appropriate: | |
*_TAG environment variable name | |
*_DFLT environment variable default | |
*_MAX upper bound | |
*_MIN lower bound | |
*_LEN average length | |
*_SD random number seed (see rnd.*) | |
*_FMT printf format string | |
*_SCL divisor (for scaled arithmetic) | |
*_SIZE tuple length | |
3. Porting Procedures | |
The code provided should be easily portable to any machine providing an | |
ANSI C compiler. | |
-- Copy makefile.suite to makefile | |
-- Edit the makefile to match the name of your C compiler | |
and to include appropriate compilation options in the CFLAGS | |
definition | |
-- make. | |
Special care should be taken in modifying any of the monetary calcu- | |
lations in DBGEN. These have proven to be particularly sensitive to | |
portability problems. If you decide to create the routines for inline | |
data load (see below), be sure to compare the resulting data to that | |
generated by a flat file data generation to be sure that all numeric | |
conversions have been correct. | |
If the compile generates errors, refer to "Compilation Options", below. | |
The problem you are encountering may already have been addressed in the | |
code. | |
If the compile is successful, but QGEN is not generating the appropriate | |
query syntax for your environment, refer to "Customizing QGEN", below. | |
For other problems, refer to "Reporting Problems" at the end of this | |
document. | |
4. Compilation Options | |
config.h and makefile.suite contain a number of compile time options intended | |
to make the process of porting the code provided with TPC-H/TPC-R as easy as | |
possible on a broad range of platforms. Most ports should consist of reviewing | |
the possible settings described in config.h and modifying the makefile | |
to employ them appropriately. | |
5. Customizing QGEN | |
QGEN relies on a number of vendor-specific conventions to generate | |
appropriate query syntax. These are controlled by #defines in tpcd.h, | |
and enabled by a #define in config.h. If you find that the syntax | |
generated by QGEN is not sufficient for your environment you will need | |
to modify these to files. It is strongly recomended that you not change | |
the general organization of the files. | |
Currently defined options are: | |
VTAG -- marks a variable substitution point [:] | |
QDIR_TAG -- environent variable which points to query templates | |
[DSS_QUERY] | |
GEN_QUERY_PLAN -- syntax to generate a query plan ["Set Explain On;"] | |
START_TRAN -- syntax to begin a transaction ["Begin Work;"] | |
END_TRAN -- syntax to end a transaction ["Commit Work;"] | |
SET_OUTPUT -- syntax to redirect query output ["Output to"] | |
SET_ROWCOUNT -- syntax to set the number of rows returned | |
["{return %d rows}"] | |
SET_DBASE -- syntax to connect to a database | |
6. Further Enhancements | |
load_stub.c provides entry points for two likely enhancements. | |
The ld_XXXX routines make it possible to load the | |
database directly from DBGEN without first writing the database | |
population out to the filesystem. This may prove particularly useful | |
when loading larger database populations. Be particularly careful about | |
monetary amounts. To assure portability, all monetary calcualtion are | |
done using long integers (which hold money amounts as a number of | |
pennies). These will need to be scaled to dollars and cents (by dividing | |
by 100), before the values are presented to the DBMS. | |
The hd_XXXX routines allow header information to be written before the | |
creation of the flat files. This should allow system which require | |
formatting information in database load files to use DBGEN with only | |
a small amount of custom code. | |
qgen.c defines the translation table for query templates in the | |
routine qsub(). | |
varsub.c defines the parameter substitutions in the routine varsub(). | |
If you are porting DBGEN to a machine that is not supports a native word | |
size larger that 32 bits, you may wish to modify the default values for | |
BITS_PER_LONG and MAX_LONG. These values are used in the generation of | |
the sparse primary keys in the order and lineitem tables. The code has | |
been structured to run on any machine supporting a 32 bit long, but | |
may be slightly more efficient on machines that are able to make use of | |
a larger native type. | |
7. Known Porting Problems | |
The current codeline will not compile under SunOS 4.1. Solaris 2.4 and later | |
are supported, and anyone wishing to use DBGEN on a Sun platform is | |
encouraged to use one of these OS releases. | |
8. Reporting Problems | |
The code provided with TPC-H/TPC-R has been written to be easily portable, | |
and has been tested on a wide variety of platforms, If you have any | |
trouble porting the code to your platform, please help us to correct | |
the problem in a later release by sending the following information | |
to the TPC D subcommittee: | |
Computer Make and Model | |
Compiler Type and Revision Number | |
Brief Description of the problem | |
Suggested modification to correct the problem | |