blob: bc8bf012b4b58d32e76d3ca57b34bb0cbb93c116 [file] [log] [blame]
tidycat tools
=============
The system catalog is a complicated beast. Adding or modifying
catalog tables is difficult, since this task requires a set of
co-ordinated changes across multiple files. The difficulty is further
compounded by Greenplum's additional requirements to support catalog
upgrade and perform catalog integrity checks. The tidycat tools were
developed to simplify this process by automatically generating catalog
code and validation scripts.
In Postgres, the header files are sprinkled with special CATALOG and
DATA macros that define the structure and contents of the system
catalog tables. Postgres uses scripts (such as Gen_fmgrtab.sh and
genbki.sh) to process header files to generate the postgres.bki file,
which is used for system bootstrap. Usage of these macros requires
some detailed knowledge of internal system types, and it is quite easy
to introduce subtle bugs when updating these structures manually.
The Greenplum tidycat tools eliminate much of the difficulty
associated with this process. Instead of manually constructing these
CATALOG and DATA statements, the developer writes SQL-like
declarations like "CREATE TABLE..." or "CREATE FUNCTION..." that get
converted to the required C code and macro statements. A single,
centralized specification for a table in a header file will generate a
set of updates to the header file and other places like the catalog
Makefile, indexing.h, toasting.h, toasting.c, catalog.c, bootparse.y,
pg_type.h, and pg_tidycat.h, plus regression scripts to self-check the
table definitions. In addition, these tools build a structured JSON
document of catalog tables and their dependencies that is used for
upgrade, catalog validation, and visual graphs of the catalog
structure for documentation. The generated code is properly formatted
and commented, and it is almost breathtaking in its harmonious beauty
and elegant construction.
catullus - function definitions
===============================
catullus.pl is part of the tidycat suite dedicated to function
definitions in pg_proc.h and basic bootstrap types in pg_type.h. For
example, the function definition for int2 equality is defined using
this DATA statement:
DATA(insert OID = 63 ( int2eq PGNSP PGUID 12 f f t f i 2 16 f "21 21" _null_ _null_ _null_ int2eq - _null_ ));
DESCR("equal");
Instead of constructing these DATA statements by hand, catullus uses a
pg_proc.sql file with SQL-like definitions to generate them:
CREATE FUNCTION int2eq(int2, int2) RETURNS bool LANGUAGE internal IMMUTABLE STRICT AS 'int2eq' WITH (OID=63, DESCRIPTION="equal");
When catullus.pl is run, it converts these SQL statements to DATA
macros and updates the contents of pg_proc.h and pg_type.h. In
Greenplum, the entire set of pg_proc bootstrap DATA is automatically
generated, and the majority of pg_type.h DATA is generated from
pg_type.sql and tidycat table definitions spread throught the catalog
header files. The generated statments retain the order and comments
of the original, hand-coded statements (with some additional comments
describing input and return types).
pablopcatso - graphing table dependencies
=========================================
The pablopcatso.pl tool reads the JSON document and generates a
Graphviz .dot file of the table relationships. A variety of tools can
process the .dot file to produce JPEG or PDF output for documentation.
tidycat - table definitions
===========================
In Postgres, the definition of a catalog table is spread across
multiple files. The basic structure of the table is usually defined
in an eponymous header file. Any indexes on that table are defined in
indexing.h. If the table uses "large" columns, it requires a separate
toast table in toasting.c|h. If the table is shared, then
catalog.c:isSharedRelation() must list the table and its associated
indexes, and toast table and toast indexes if they exist. In order to
support catalog upgrade, Greenplum has the further requirement that
new catalog tables have a static type in pg_type.h, and a special
modification of bootparse.y to handle the upgrade of the new table
type. Finally, any new catalog table header must be listed in the
Postgres .bki source file dependencies in the catalog Makefile.
The tidycat.pl tool reads a single, central, SQL-like specification
for a table and generates all of the required code across this set of
files.
For example, the Greenplum resource queue capability table is a shared
table with a unique index and two subsidiary indexes. It also has a
foreign key relationship with pg_resqueue and pg_resourcetype. The
tidycat definition is:
CREATE TABLE pg_resqueuecapability
with (camelcase=ResQueueCapability, shared=true, relid=6060, reltype_oid=6446)
(
resqueueid oid, -- OID of the queue with this capability
restypid smallint, -- resource type id (key to pg_resourcetype)
ressetting text -- resource setting (opaque type)
);
create unique index on pg_resqueuecapability(oid) with (indexid=6064);
create index on pg_resqueuecapability(resqueueid) with (indexid=6065);
create index on pg_resqueuecapability(restypid) with (indexid=6066);
alter table pg_resqueuecapability add fk resqueueid on pg_resqueue(oid);
alter table pg_resqueuecapability add fk restypid on pg_resourcetype(restypid);
Note that the CREATE TABLE and CREATE INDEX statements are very
similar to standard SQL, but they have special WITH clauses that
define some of the special characteristics of catalog table and
indexes, such as the static oids. In addition, tidycat supports a
special ALTER TABLE ... ADD FK statement that records any foreign
dependencies for the table. pablopcatso can graph these dependencies,
and the Greenplum checkcat tool can use this information to perform
catalog verification checks.
tidycat.pl reads this definition and spits out heaps of modified
catalog files like some crazy mofo with a chainsaw. Because the
modifications are widespread, the developer must manually copy the
changed files from the /tmp directory to the appropriate locations in
the source tree and do a "make distclean", a new make, install, and
initdb. When the build and install is complete, the basic
installcheck regression contains a tidycat test that scans for tidycat
table definitions in the catalog headers and ensures that the catalog
has been rebuilt and updated correctly.
About one-third of the catalog tables in Greenplum have been converted
to actual tidycat definitions and generated code. The remainder have
placeholder "fake defs" that track index and dependency information,
but do not generate catalog code.