| tidycat tools |
| ============= |
| |
| The system catalog is a complicated beast. Adding or modifying |
| catalog tables is difficult, since this task requires a set of |
| co-ordinated changes across multiple files. The difficulty is further |
| compounded by Greenplum's additional requirements to support catalog |
| upgrade and perform catalog integrity checks. The tidycat tools were |
| developed to simplify this process by automatically generating catalog |
| code and validation scripts. |
| |
| In Postgres, the header files are sprinkled with special CATALOG and |
| DATA macros that define the structure and contents of the system |
| catalog tables. Postgres uses scripts (such as Gen_fmgrtab.sh and |
| genbki.sh) to process header files to generate the postgres.bki file, |
| which is used for system bootstrap. Usage of these macros requires |
| some detailed knowledge of internal system types, and it is quite easy |
| to introduce subtle bugs when updating these structures manually. |
| |
| The Greenplum tidycat tools eliminate much of the difficulty |
| associated with this process. Instead of manually constructing these |
| CATALOG and DATA statements, the developer writes SQL-like |
| declarations like "CREATE TABLE..." or "CREATE FUNCTION..." that get |
| converted to the required C code and macro statements. A single, |
| centralized specification for a table in a header file will generate a |
| set of updates to the header file and other places like the catalog |
| Makefile, indexing.h, toasting.h, toasting.c, catalog.c, bootparse.y, |
| pg_type.h, and pg_tidycat.h, plus regression scripts to self-check the |
| table definitions. In addition, these tools build a structured JSON |
| document of catalog tables and their dependencies that is used for |
| upgrade, catalog validation, and visual graphs of the catalog |
| structure for documentation. The generated code is properly formatted |
| and commented, and it is almost breathtaking in its harmonious beauty |
| and elegant construction. |
| |
| catullus - function definitions |
| =============================== |
| |
| catullus.pl is part of the tidycat suite dedicated to function |
| definitions in pg_proc.h and basic bootstrap types in pg_type.h. For |
| example, the function definition for int2 equality is defined using |
| this DATA statement: |
| |
| DATA(insert OID = 63 ( int2eq PGNSP PGUID 12 f f t f i 2 16 f "21 21" _null_ _null_ _null_ int2eq - _null_ )); |
| DESCR("equal"); |
| |
| Instead of constructing these DATA statements by hand, catullus uses a |
| pg_proc.sql file with SQL-like definitions to generate them: |
| |
| CREATE FUNCTION int2eq(int2, int2) RETURNS bool LANGUAGE internal IMMUTABLE STRICT AS 'int2eq' WITH (OID=63, DESCRIPTION="equal"); |
| |
| When catullus.pl is run, it converts these SQL statements to DATA |
| macros and updates the contents of pg_proc.h and pg_type.h. In |
| Greenplum, the entire set of pg_proc bootstrap DATA is automatically |
| generated, and the majority of pg_type.h DATA is generated from |
| pg_type.sql and tidycat table definitions spread throught the catalog |
| header files. The generated statments retain the order and comments |
| of the original, hand-coded statements (with some additional comments |
| describing input and return types). |
| |
| pablopcatso - graphing table dependencies |
| ========================================= |
| |
| The pablopcatso.pl tool reads the JSON document and generates a |
| Graphviz .dot file of the table relationships. A variety of tools can |
| process the .dot file to produce JPEG or PDF output for documentation. |
| |
| tidycat - table definitions |
| =========================== |
| |
| In Postgres, the definition of a catalog table is spread across |
| multiple files. The basic structure of the table is usually defined |
| in an eponymous header file. Any indexes on that table are defined in |
| indexing.h. If the table uses "large" columns, it requires a separate |
| toast table in toasting.c|h. If the table is shared, then |
| catalog.c:isSharedRelation() must list the table and its associated |
| indexes, and toast table and toast indexes if they exist. In order to |
| support catalog upgrade, Greenplum has the further requirement that |
| new catalog tables have a static type in pg_type.h, and a special |
| modification of bootparse.y to handle the upgrade of the new table |
| type. Finally, any new catalog table header must be listed in the |
| Postgres .bki source file dependencies in the catalog Makefile. |
| |
| The tidycat.pl tool reads a single, central, SQL-like specification |
| for a table and generates all of the required code across this set of |
| files. |
| |
| For example, the Greenplum resource queue capability table is a shared |
| table with a unique index and two subsidiary indexes. It also has a |
| foreign key relationship with pg_resqueue and pg_resourcetype. The |
| tidycat definition is: |
| |
| CREATE TABLE pg_resqueuecapability |
| with (camelcase=ResQueueCapability, shared=true, relid=6060, reltype_oid=6446) |
| ( |
| resqueueid oid, -- OID of the queue with this capability |
| restypid smallint, -- resource type id (key to pg_resourcetype) |
| ressetting text -- resource setting (opaque type) |
| ); |
| |
| create unique index on pg_resqueuecapability(oid) with (indexid=6064); |
| create index on pg_resqueuecapability(resqueueid) with (indexid=6065); |
| create index on pg_resqueuecapability(restypid) with (indexid=6066); |
| |
| alter table pg_resqueuecapability add fk resqueueid on pg_resqueue(oid); |
| alter table pg_resqueuecapability add fk restypid on pg_resourcetype(restypid); |
| |
| Note that the CREATE TABLE and CREATE INDEX statements are very |
| similar to standard SQL, but they have special WITH clauses that |
| define some of the special characteristics of catalog table and |
| indexes, such as the static oids. In addition, tidycat supports a |
| special ALTER TABLE ... ADD FK statement that records any foreign |
| dependencies for the table. pablopcatso can graph these dependencies, |
| and the Greenplum checkcat tool can use this information to perform |
| catalog verification checks. |
| |
| tidycat.pl reads this definition and spits out heaps of modified |
| catalog files like some crazy mofo with a chainsaw. Because the |
| modifications are widespread, the developer must manually copy the |
| changed files from the /tmp directory to the appropriate locations in |
| the source tree and do a "make distclean", a new make, install, and |
| initdb. When the build and install is complete, the basic |
| installcheck regression contains a tidycat test that scans for tidycat |
| table definitions in the catalog headers and ensures that the catalog |
| has been rebuilt and updated correctly. |
| |
| About one-third of the catalog tables in Greenplum have been converted |
| to actual tidycat definitions and generated code. The remainder have |
| placeholder "fake defs" that track index and dependency information, |
| but do not generate catalog code. |
| |