Fix upgrade test: replace data-integrity checks with catalog comparison (#2403)

The age_upgrade regression test (added in #2364, improved in #2377, #2397)
was designed to validate the upgrade template (age--<VER>--y.y.y.sql) by
creating graph data before the upgrade and verifying it survived afterward.
This approach had two fundamental problems:

1. It did not detect incomplete upgrade templates. The test verified that
   graph data (vertices, edges, checksums, GIN indexes) survived ALTER
   EXTENSION UPDATE, but never checked whether new SQL objects (functions,
   views, relations, indexes, types, operators, casts, constraints) were
   actually created by the template. A developer could add a new function
   to sql/ and sql_files, forget to add it to the upgrade template, and
   all tests would pass — the function existed via the fresh CREATE
   EXTENSION install that ran before the upgrade test, but would be
   missing for users who upgraded via ALTER EXTENSION UPDATE.

2. The data-integrity checks relied on cypher queries (MATCH/RETURN) within
   the same backend session after DROP EXTENSION + CREATE EXTENSION. This
   caused intermittent failures on some PostgreSQL versions where AGE's
   internal type cache (agtype OID) was not properly refreshed after the
   extension was dropped and recreated, resulting in 'type with OID 0
   does not exist' errors. The data-integrity aspect was also redundant —
   ALTER EXTENSION UPDATE runs DDL statements and does not touch heap data,
   so data survival is guaranteed by PostgreSQL and not a meaningful test.

The fix replaces the entire test with a comprehensive catalog comparison:

  1. Snapshot the ag_catalog schema from the fresh install across seven
     PostgreSQL system catalogs:
       - pg_proc: functions, aggregates, procedures (name, args, and
         properties: volatility, strictness, kind, return type, setof)
       - pg_class: tables, views, sequences, indexes (name, kind)
       - pg_type: types (name, type category)
       - pg_operator: operators (name, left/right operand types)
       - pg_cast: casts involving AGE types (source, target, context)
       - pg_opclass: operator classes (name, access method)
       - pg_constraint: constraints (name, type, table, referenced table)
  2. DROP EXTENSION, CREATE EXTENSION at the synthetic initial version,
     then ALTER EXTENSION UPDATE to the current version via the stamped
     upgrade template.
  3. Snapshot the catalog again after upgrade.
  4. Compare: any object present in the fresh snapshot but missing after
     upgrade means the template is incomplete. Any object present after
     upgrade but not in the fresh snapshot means the template creates
     something unexpected. Function properties (volatility, strictness,
     prokind, return type) are also compared for functions that exist in
     both — catching cases where a CREATE OR REPLACE in the template
     changes a function's signature or behavior.

Additional improvements from code review feedback:

  - Graph cleanup in Step 1 uses a DO block with PERFORM and suppressed
    NOTICEs to produce deterministic output regardless of prior test state.
  - The pg_class snapshot includes indexes (relkind 'i') in addition to
    tables, views, and sequences.
  - Diagnostic output includes relkind/typtype suffixes for actionable diffs.
  - Summary uses boolean equality checks (funcs_match, rels_match, etc.)
    instead of absolute counts, so the expected output does not need
    updating when new objects are added to AGE. Developers who correctly
    add objects to both sql/ and the template will never need to modify
    this test or its expected output.

This approach:
  - Catches the actual failure mode: incomplete upgrade templates.
  - Covers all SQL object categories: functions (including aggregates),
    relations, types, operators, casts, operator classes, and constraints.
  - Detects property changes on existing functions (volatility, strictness,
    kind, return type changes).
  - Uses only plain SQL catalog queries — no cypher, no .so cache issues.
  - Works reliably across all PostgreSQL versions.
  - Reports the exact missing/extra/changed object in the diff output.
  - Is maintenance-free: no expected output changes needed when AGE grows.

Makefile: updated step 5 comment to reflect catalog comparison approach.

All 33 regression tests pass.

Co-authored-by: Claude <noreply@anthropic.com>

modified:   Makefile
modified:   regress/expected/age_upgrade.out
modified:   regress/sql/age_upgrade.sql
3 files changed
tree: 6b08021af72b952ab1fadfb1f69086c06bea5529
  1. .github/
  2. docker/
  3. drivers/
  4. img/
  5. regress/
  6. sql/
  7. src/
  8. tools/
  9. .asf.yaml
  10. .dockerignore
  11. .gitignore
  12. age--1.6.0--1.7.0.sql
  13. age--1.7.0--y.y.y.sql
  14. age.control
  15. clang-format.5
  16. CONTRIBUTING.md
  17. LICENSE
  18. Makefile
  19. META.json
  20. NOTICE
  21. README.md
  22. RELEASE
README.md

Apache AGE is an extension for PostgreSQL that enables users to leverage a graph database on top of the existing relational databases. AGE is an acronym for A Graph Extension and is inspired by Bitnine's AgensGraph, a multi-model database fork of PostgreSQL. The basic principle of the project is to create a single storage that handles both the relational and graph data model so that the users can use the standard ANSI SQL along with openCypher, one of the most popular graph query languages today. There is a strong need for cohesive, easy-to-implement multi-model databases. As an extension of PostgreSQL, AGE supports all the functionalities and features of PostgreSQL while also offering a graph model to boot.

Apache AGE is :

  • Powerful: adds graph database support to the already popular PostgreSQL database: PostgreSQL is used by organizations including Apple, Spotify, and NASA.
  • Flexible: allows you to perform openCypher queries, which makes complex queries much easier to write. It also enables querying multiple graphs at the same time.
  • Intelligent: allows you to perform graph queries that are the basis for many next-level web services such as fraud detection, master data management, product recommendations, identity and relationship management, experience personalization, knowledge management, and more.
  • Cypher Query: supports graph query language
  • Hybrid Querying: enables SQL and/or Cypher
  • Querying: enables multiple graphs
  • Hierarchical: graph label organization
  • Property Indexes: on both vertices(nodes) and edges
  • Full PostgreSQL: supports PG features

Refer to our latest Apache AGE documentation to learn about installation, features, built-in functions, and Cypher queries.

Install the following essential libraries according to each OS. Building AGE from the source depends on the following Linux libraries (Ubuntu package names shown below):

  • CentOS
yum install gcc glibc glib-common readline readline-devel zlib zlib-devel flex bison
  • Fedora
dnf install gcc glibc bison flex readline readline-devel zlib zlib-devel
  • Ubuntu
sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison

Apache AGE is intended to be simple to install and run. It can be installed with Docker and other traditional ways.

You will need to install an AGE compatible version of Postgres, for now AGE supports Postgres 11, 12, 13, 14, 15, 16, 17 & 18. Supporting the latest versions is on AGE roadmap.

You can use a package management that your OS provides to download PostgreSQL.

sudo apt install postgresql

You can download the Postgres source code and install your own instance of Postgres. You can read instructions on how to install from source code for different versions on the official Postgres Website.

Clone the github repository or download the download an official release. Run the pg_config utility and check the version of PostgreSQL. Currently, only PostgreSQL versions 11, 12, 13, 14, 15, 16, 17 & 18 are supported. If you have any other version of Postgres, you will need to install PostgreSQL version 11, 12, 13, 14, 15, 16, 17 & 18.

pg_config

Run the following command in the source code directory of Apache AGE to build and install the extension.

make install

If the path to your Postgres installation is not in the PATH variable, add the path in the arguments:

make PG_CONFIG=/path/to/postgres/bin/pg_config install
docker pull apache/age

docker run \
    --name age  \
    -p 5455:5432 \
    -e POSTGRES_USER=postgresUser \
    -e POSTGRES_PASSWORD=postgresPW \
    -e POSTGRES_DB=postgresDB \
    -d \
    apache/age
docker exec -it age psql -d postgresDB -U postgresUser

For every connection of AGE you start, you will need to load the AGE extension.

CREATE EXTENSION age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

If you are using AGE from a database client that does not default to autocommit — most commonly psycopg v3 or JDBC — you must understand how PostgreSQL‘s transaction semantics apply to AGE’s setup and DDL-like functions. Otherwise, you may see graphs or labels that appear to be created successfully, but are not visible from new connections.

This is not a bug in AGE — it is standard PostgreSQL behavior. AGE's DDL-like functions write to the catalog, and catalog writes only become visible to other sessions after the enclosing transaction is committed.

What is and isn't transactional

StatementScopeNeeds commit to be visible elsewhere?
LOAD 'age'Session-local (loads the .so into the current backend)No
SET search_path = ag_catalog, "$user", publicSession-localNo
SELECT create_graph('g')Writes to ag_graph and creates a schemaYes
SELECT create_vlabel('g', 'L') / create_elabel(...)Writes to ag_label and creates a tableYes
SELECT drop_graph('g', true) / drop_label(...)Writes to catalogYes
SELECT load_labels_from_file(...) / load_edges_from_file(...)Writes to catalog + dataYes
cypher('g', $$ CREATE (:L {...}) $$)Writes dataYes

In a client that defaults to autocommit (e.g. psql), every statement commits automatically, so this is never noticed. In a non-autocommit client, the first statement you run implicitly opens a transaction that stays open until you call commit(), rollback(), or close the connection.

psycopg v3 — the “savepoint gotcha”

The common pitfall is that with connection.transaction(): in psycopg does not start a new top-level transaction when one is already open — it creates a savepoint inside the existing outer transaction. Releasing a savepoint is not a commit, so your create_graph write stays invisible to other sessions until the outer transaction is explicitly committed.

❌ Broken: graph is not visible from a new connection

import psycopg

params = {"host": "localhost", "port": 5432, "user": "postgres",
          "password": "pw", "dbname": "mydb"}

# --- First connection ---
conn = psycopg.connect(**params)
conn.execute("LOAD 'age'")                                      # implicitly opens a txn
conn.execute("SET search_path = ag_catalog, '$user', public")

with conn.transaction(), conn.cursor() as cur:                  # <-- SAVEPOINT, not a real txn
    cur.execute("SELECT * FROM create_graph('my_graph')")
# outer transaction is STILL OPEN here

conn.close()  # outer transaction is rolled back on close → my_graph is gone

# --- New connection ---
conn = psycopg.connect(**params)
conn.execute("LOAD 'age'")
conn.execute("SET search_path = ag_catalog, '$user', public")
with conn.cursor() as cur:
    cur.execute("SELECT name FROM ag_graph;")
    # 'my_graph' is NOT in the results

✅ Fix 1: explicit commit() after setup

conn = psycopg.connect(**params)
conn.execute("LOAD 'age'")
conn.execute("SET search_path = ag_catalog, '$user', public")
conn.commit()   # <-- closes the implicit outer txn

with conn.transaction(), conn.cursor() as cur:
    cur.execute("SELECT * FROM create_graph('my_graph')")
# this transaction block is now top-level and commits on exit
conn.close()

✅ Fix 2: enable autocommit on the connection

conn = psycopg.connect(**params, autocommit=True)
conn.execute("LOAD 'age'")
conn.execute("SET search_path = ag_catalog, '$user', public")
conn.execute("SELECT * FROM create_graph('my_graph')")          # commits immediately
conn.close()

You can also toggle autocommit at runtime with conn.set_autocommit(True).

JDBC

JDBC connections also default to autocommit true per the JDBC spec, but many frameworks (Spring, etc.) flip it off. If you are running AGE DDL-like calls from JDBC, either:

connection.setAutoCommit(true);
// ... LOAD 'age'; SET search_path ...; SELECT create_graph(...);

or keep autocommit off and explicitly commit after DDL-like calls:

stmt.execute("LOAD 'age'");
stmt.execute("SET search_path = ag_catalog, \"$user\", public;");
stmt.execute("SELECT create_graph('my_graph');");
connection.commit();   // make the graph visible to other sessions

Rule of thumb

If an AGE call creates, drops, or modifies a graph, label, vertex, edge, or property, it is a transactional write. In a non-autocommit client, it will not be visible to other sessions until you explicitly commit().

To create a graph, use the create_graph function located in the ag_catalog namespace.

SELECT create_graph('graph_name');

To create a single vertex with label and properties, use the CREATE clause.

SELECT * 
FROM cypher('graph_name', $$
    CREATE (:label {property:"Node A"})
$$) as (v agtype);
SELECT * 
FROM cypher('graph_name', $$
    CREATE (:label {property:"Node B"})
$$) as (v agtype);

To create an edge between two nodes and set its properties:

SELECT * 
FROM cypher('graph_name', $$
    MATCH (a:label), (b:label)
    WHERE a.property = 'Node A' AND b.property = 'Node B'
    CREATE (a)-[e:RELTYPE {property:a.property + '<->' + b.property}]->(b)
    RETURN e
$$) as (e agtype);

And to query the connected nodes:

SELECT * from cypher('graph_name', $$
        MATCH (V)-[R]-(V2)
        RETURN V,R,V2
$$) as (V agtype, R agtype, V2 agtype);

Starting with Apache AGE is very simple. You can easily select your platform and incorporate the relevant SDK into your code.

Apache AGE Viewer is a user interface for Apache AGE that provides visualization and exploration of data. This web visualization tool allows users to enter complex graph queries and explore the results in graph and table forms. Apache AGE Viewer is enhanced to proceed with extensive graph data and discover insights through various graph algorithms. Apache AGE Viewer will become a graph data administration and development platform for Apache AGE to support multiple relational databases: https://github.com/apache/age-viewer.

This is a visualization tool. After installing AGE Extension, you may use this tool to get access to the visualization features.

Viewer gdb, and graph

You can also get help from these videos.

You can improve ongoing efforts or initiate new ones by sending pull requests to this repository. Also, you can learn from the code review process, how to merge pull requests, and from code style compliance to documentation by visiting the Apache AGE official site - Developer Guidelines. Send all your comments and inquiries to the user mailing list, users@age.apache.org.