A searchable catalog of all Apache Airflow providers and their modules (operators, hooks, sensors, triggers, transfers, and more). Built with Eleventy and deployed to airflow.apache.org/registry/.
pyyaml Python package (uv pip install pyyaml)# 1. Extract metadata from provider.yaml files into JSON uv run python dev/registry/extract_metadata.py # 2. Install Node.js dependencies cd registry pnpm install # 3. Start dev server (http://localhost:8080) pnpm dev
The full registry data extraction (metadata, parameters, and connections) is available as a breeze subcommand:
breeze registry extract-data # Extract all registry data breeze registry extract-data --python 3.12 # With a specific Python version breeze registry extract-data --provider amazon # Extract only one provider (incremental)
This runs inside the breeze CI container where all providers are installed. It is the same command used by CI in registry-build.yml.
The dev script sets REGISTRY_PATH_PREFIX=/ so links work at the root during local development. In production the prefix defaults to /registry/.
registry_tools/types.py ← Single source of truth for module type definitions
│
├─── generate_types_json.py → registry/src/_data/types.json (for frontend)
│
provider.yaml files (providers/*/provider.yaml)
│
▼
extract_metadata.py ← Parses YAML, fetches PyPI stats, resolves logos
│ → providers.json
│
extract_parameters.py ← Runtime class discovery + parameter extraction (breeze)
│ → modules.json + parameters.json
▼
registry/src/_data/
├── providers.json ← Provider metadata (name, versions, downloads, lifecycle, ...)
├── modules.json ← Individual modules (operators, hooks, sensors, ...)
├── types.json ← Module type definitions (generated from types.py)
└── versions/{id}/{ver}/ ← Per-version metadata, parameters, connections
│
▼
Eleventy build (pnpm build) ← Generates static HTML + Pagefind search index
│
▼
registry/_site/ ← Deployable static site
The root-level JSON files (providers.json, modules.json) are generated artifacts and are listed in .gitignore. The versions/ directory is also gitignored. Only exploreCategories.js, statsData.js, latestVersionData.js, and providerVersions.js are checked in because they contain hand-authored or computed logic.
dev/registry/)extract_metadata.py (runs on host) walks every providers/*/provider.yaml and:
pypistats.org/api/packages/{name}/recentpublic/logos/ for matching imageslifecycle from provider.yamlproviders.jsonextract_parameters.py (runs inside breeze) handles module discovery and parameter extraction:
provider.yaml, iterates over classes with inspect.getmembers(), and uses issubclass() to classify them (operator, hook, sensor, trigger, etc.)objects.inv files from S3 for each provider (cached locally with 12-hour TTL), parses them to get canonical documentation URLs for each class. Falls back to manual URL construction for providers not yet published. See Documentation URL Resolution below.modules.json — the full module catalog with all 11 fields (id, name, type, import_path, module_path, short_description, docs_url, source_url, category, provider_id, provider_name)__init__ parameters — walks the MRO and extracts parameter names, types, defaults, and docstrings. Writes versions/{provider_id}/{version}/parameters.json.registry_contract_models.py defines Pydantic models that validate the shape of every JSON payload the registry produces. Each extraction script calls _validate(ModelType, payload) before writing JSON — this catches schema drift at generation time without a separate jsonschema layer. The same models generate the OpenAPI 3.1 spec served at /api/openapi.json.
export_registry_schemas.py generates registry/schemas/openapi.json from the contract models. It runs automatically via pnpm prebuild before Eleventy builds.
extract_connections.py (runs inside breeze) reads connection-types from provider.yaml, falling back to runtime inspection of hook get_connection_form_widgets() and get_ui_field_behaviour(). Writes versions/{provider_id}/{version}/connections.json.
The docs_url field in modules.json links each class to its Sphinx-generated API reference page. Rather than constructing these URLs manually (which breaks if the Sphinx output structure changes), the extractor uses Sphinx inventory files (objects.inv).
How it works:
Every Sphinx build produces an objects.inv file that maps every documented symbol to its URL. Apache publishes these for all released providers at: http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/{package_name}/stable/objects.inv
Before discovering modules, extract_parameters.py fetches inventories for all providers in parallel using a thread pool.
For each discovered class, it looks up the fully qualified name (e.g., airflow.providers.amazon.hooks.s3.S3Hook) in the inventory. If found, the inventory-sourced URL is used. If not (e.g., a brand new class not yet in a published docs build), it falls back to manual URL construction.
Caching:
Inventory files are cached locally in dev/registry/.inventory_cache/ with a 12-hour TTL. This matches the caching strategy used by devel-common/src/sphinx_exts/docs_build/fetch_inventories.py. The cache directory is gitignored.
Why not just construct URLs manually?
The previous approach assembled URLs like {base_docs_url}/_api/{module/path}/index.html#{full.class.path}. This is fragile: Sphinx can change its output layout, and some classes end up in different URL structures than expected. The inventory file is the canonical source of truth — it's produced by the same Sphinx build that generates the docs.
src/_data/)| File | Type | Purpose |
|---|---|---|
providers.json | Generated | All providers with metadata, sorted alphabetically |
modules.json | Generated | All extracted modules (operators, hooks, etc.) |
types.json | Generated | Module type definitions (from registry_tools/types.py) |
versions/ | Generated | Per-provider, per-version metadata/parameters/connections |
exploreCategories.js | Checked-in | Category definitions with keyword lists for the Explore page |
statsData.js | Checked-in | Computed statistics (lifecycle counts, top providers, etc.) |
providerVersions.js | Checked-in | Builds the provider × version page collection |
latestVersionData.js | Checked-in | Latest version parameters/connections lookup |
openapiSpec.js | Checked-in | Builds OpenAPI 3.1 spec from Pydantic contract models |
providerVersionPayloads.js | Checked-in | Generates /api/providers/{id}/versions.json payloads |
| Page | Template | URL |
|---|---|---|
| Home | src/index.njk | / |
| All Providers | src/providers.njk | /providers/ |
| Explore by Category | src/explore.njk | /explore/ |
| Statistics | src/stats.njk | /stats/ |
| Provider Detail | src/provider-detail.njk | /providers/{id}/ (redirects to latest) |
| Provider Version | src/provider-version.njk | /providers/{id}/{version}/ |
| API Explorer | src/api-explorer.njk | /api-explorer/ |
| Script | Purpose |
|---|---|
js/provider-filters.js | Search, lifecycle filter, category filter, sort on /providers/ |
js/search.js | Global search modal (Cmd+K) powered by Pagefind |
js/provider-detail.js | Module tabs, copy-to-clipboard on provider version pages |
js/connection-builder.js | Interactive connection form builder on provider detail pages |
js/copy-button.js | Generic copy button utility |
js/theme.js | Dark/light mode toggle |
js/mobile-menu.js | Responsive navigation |
The site is deployed under /registry/ on airflow.apache.org. Eleventy's pathPrefix is configured via the REGISTRY_PATH_PREFIX environment variable:
REGISTRY_PATH_PREFIX=/registry/ (the default)REGISTRY_PATH_PREFIX=/ (set automatically by pnpm dev)All internal links in templates use the | url Nunjucks filter, which prepends the prefix. Client-side JavaScript accesses the base path via window.__REGISTRY_BASE__ (injected in base.njk).
Full-text search is powered by Pagefind. During postbuild:
scripts/build-pagefind-index.mjs creates a Pagefind index with custom records from providers.json and modules.jsonREGISTRY_PATH_PREFIXjs/search.js)Providers follow the AIP-95 lifecycle stages:
| Stage | Meaning |
|---|---|
incubation | New provider, API may change |
production / stable | Stable, recommended for use |
mature | Well-established, widely adopted |
deprecated | No longer maintained, consider alternatives |
The UI displays stable for both production and mature stages.
The Explore page and provider filtering use categories defined in src/_data/exploreCategories.js. Each category has:
id — URL-safe identifier (e.g., cloud, databases, ai-ml)name — Display namekeywords — List of substrings matched against provider.idicon, color, description — Visual propertiesProviders are assigned to categories at build time by checking if any keyword in a category matches (substring) the provider's ID. A provider can belong to multiple categories.
The homepage has a “New Providers” section powered by dates fetched from the PyPI JSON API during extraction. It shows providers sorted by first_released (the upload date of their earliest PyPI release) descending, highlighting providers that are new to the ecosystem.
The src/api/ directory contains Eleventy templates that generate JSON API endpoints, providing programmatic access to provider and module data:
/api/providers.json — All providers/api/modules.json — All modules/api/providers/{id}/modules.json — Modules for a specific provider/api/providers/{id}/parameters.json — Parameters for a provider/api/providers/{id}/connections.json — Connection types/api/providers/{id}/versions.json — Deployed versions (generated by publish_versions.py from S3)/api/providers/{id}/{version}/modules.json — Version-specific modules/api/providers/{id}/{version}/parameters.json — Version-specific parameters/api/providers/{id}/{version}/connections.json — Version-specific connections/api/openapi.json — OpenAPI 3.1 spec (generated at build time from Pydantic contracts)An interactive API Explorer at /api-explorer/ renders the OpenAPI spec using swagger-ui (vendored from node_modules/swagger-ui-dist).
The registry supports two build modes: full builds (all providers) and per-provider incremental builds (single provider).
Each full CI run builds pages for only the latest version of each provider. Old version pages persist in S3 from previous deploys.
This follows the same pattern as Airflow docs (see publish_docs_to_s3.py and packages-metadata.json): the source of truth for which versions exist is the S3 bucket itself, not git or a stored manifest.
extract_metadata.py writes providers.json and extract_parameters.py writes modules.json with all known versions (from provider.yaml), but only the latest version gets a full page built by Eleventy.--delete — new pages are uploaded; old version pages already in S3 are left untouched.publish_versions.py — after sync, this script lists S3 directories under providers/{id}/ to discover every deployed version, then writes api/providers/{id}/versions.json with the full version list.provider-detail.js fetches versions.json on page load and replaces the static <select> options, so even old pages get an up-to-date dropdown. The statically-rendered dropdown is the fallback if the fetch fails.When publish-docs-to-s3.yml publishes provider docs (e.g., providers-amazon/9.22.0), it triggers registry-build.yml with the provider ID. The incremental flow:
providers.json and modules.json are fetched from the current S3 bucket (/api/providers.json, /api/modules.json).extract_metadata.py --provider amazon extracts metadata and PyPI stats; extract_parameters.py discovers modules for only the specified provider.merge_registry_data.py replaces the updated provider's entries in the downloaded JSON while keeping all other providers intact. Only global files (providers.json, modules.json) are merged — per-version files like connections.json and parameters.json are not downloaded from S3.connections.json and parameters.json API endpoints (see Known limitation below).api/providers/ subtree to avoid overwriting real data with incomplete/empty stubs. A second sync uploads only the target provider's API files.publish_versions.py updates api/providers/{id}/versions.json.The merge script (dev/registry/merge_registry_data.py) handles edge cases:
Known limitation: Eleventy's pagination templates generate API files for every provider in providers.json, even when per-version data (connections, parameters) only exists for the target provider. The templates emit empty fallback JSON ({"connection_types":[]}) for providers without data. The S3 sync step works around this with --exclude patterns during incremental builds. A proper template-level fix (skipping file generation) is tracked as a follow-up — permalink: false does not work with Eleventy 3.x pagination templates.
To run an incremental build locally:
# Extract only amazon breeze registry extract-data --python 3.12 --provider amazon # If you have existing full JSON from a previous build or S3 download: uv run dev/registry/merge_registry_data.py \ --existing-providers /tmp/existing/providers.json \ --existing-modules /tmp/existing/modules.json \ --new-providers dev/registry/providers.json \ --new-modules dev/registry/modules.json \ --output dev/registry/ # Build site from merged data cd registry && pnpm build
To populate S3 with historical version pages (e.g., when setting up a new bucket), temporarily restore the older-versions loop in providerVersions.js so Eleventy builds all version pages, then:
# Extract metadata (includes all versions from provider.yaml) uv run python dev/registry/extract_metadata.py # Build the full site (with older-versions loop enabled) cd registry && pnpm build # Sync everything, then generate versions.json aws s3 sync registry/_site/ s3://bucket/registry/ --cache-control "public, max-age=300" breeze registry publish-versions --s3-bucket s3://bucket/registry/
.github/workflows/registry-build.yml — Reusable workflow that extracts metadata (host), builds a breeze CI image to run parameter/connection extraction, builds the Eleventy site, syncs to S3 (without --delete), and runs publish_versions.py to update version metadata. Supports staging and live destinations. Accepts an optional provider input for incremental builds..github/workflows/registry-tests.yml — Runs extraction script unit tests on PRs that touch dev/registry/, registry/, or providers/*/provider.yaml..github/workflows/publish-docs-to-s3.yml — Main docs workflow. When publishing provider docs, the update-registry job automatically triggers registry-build.yml with the provider ID for an incremental registry update.The registry can be rebuilt independently via workflow_dispatch on registry-build.yml. Only designated committers can trigger manual builds. The provider input can be set to run an incremental build for a specific provider (e.g., amazon).
The built site is synced to:
s3://staging-docs-airflow-apache-org/registry/s3://live-docs-airflow-apache-org/registry/Module discovery (modules.json) uses runtime inspection inside Breeze, where all providers are installed. extract_parameters.py imports each module listed in provider.yaml, iterates over its classes with inspect.getmembers(), and uses issubclass() checks against base classes like BaseOperator and BaseHook to classify them.
Runtime discovery is more accurate than AST-based alternatives: it resolves dynamic class definitions, runtime-computed attributes, and complex inheritance chains that static analysis misses. Validation showed runtime discovery found 9 classes that AST missed (triggers and a hook) with 0 type mismatches across 1600+ modules.
Since extract_parameters.py already runs inside Breeze for parameter inspection, module discovery adds no extra infrastructure cost — the same Breeze session handles both.
run_provider_yaml_files_check.pyscripts/in_container/run_provider_yaml_files_check.py (run by the check-provider-yaml-valid pre-commit hook inside Breeze) validates that provider.yaml is correct and complete: modules exist, classes are importable, and every Python file in the operators//hooks//sensors//triggers/ directories is listed. This is a correctness guarantee that extract_parameters.py builds on.
The distinction: provider.yaml lists operators/hooks/sensors/triggers/transfers/bundles at the module level (e.g., airflow.providers.amazon.operators.s3), while the registry needs individual class names within each module. Runtime discovery fills that gap by importing each module and inspecting its members. For class-level entries (notifications, secrets-backends, logging, executors, task-decorators), provider.yaml already has the full class path and extract_parameters.py uses it directly.
extract_parameters.py and extract_connections.py need runtime access to provider classes (to discover modules via issubclass(), inspect __init__ signatures, and call get_connection_form_widgets()). They run inside Breeze where all providers are installed. extract_parameters.py produces both modules.json (the module catalog) and per-provider parameters.json files. extract_metadata.py and extract_versions.py only need filesystem access and run on the host. This split means the CI workflow can run the fast scripts (metadata) without spinning up Breeze, while module discovery, parameter extraction, and connection extraction are a separate step inside Breeze.
Static site generators produce zero-JS pages by default. The registry works without JavaScript — filtering and search are layered on top progressively. Eleventy has no opinion on frontend frameworks, which keeps the dependency surface small (~30 packages in the lockfile).
The site deploys at /registry/ on airflow.apache.org but runs at / during local dev. Eleventy's pathPrefix config handles this via the REGISTRY_PATH_PREFIX env var. Templates use the | url filter, and client-side JS reads window.__REGISTRY_BASE__ (injected in base.njk).
Classes are discovered by runtime issubclass() checks against type-specific base classes — e.g., BaseOperator for operators, BaseHook for hooks. Since extract_parameters.py runs inside Breeze where all providers are installed, Python's MRO handles transitive inheritance natively: chains like S3ListOperator → AwsBaseOperator → BaseOperator are resolved without needing to build a cross-file inheritance map. After inheritance filtering, a post-filter skips private, Base*, Abstract*, and *Mixin classes. There is no suffix-based matching.
No registry-specific changes are needed. When extract_metadata.py runs during CI, it automatically discovers all providers under providers/*/provider.yaml. To ensure your provider appears well in the registry:
provider.yaml — include description, integrations with how-to-guide links, and logo referencesregistry/public/logos/{provider-id}-{Name}.pngModule types (operator, hook, sensor, etc.) are defined in a single place: dev/registry/registry_tools/types.py. To add a new type (e.g., auth_manager):
MODULE_TYPES in dev/registry/registry_tools/types.pyuv run python dev/registry/generate_types_json.py to update registry/src/_data/types.json (auto-propagates to frontend templates and JS)--color-auth-manager and class .auth-manager in src/css/tokens.css and src/css/main.cssBASE_CLASS_IMPORTS in types.pyuv run python dev/registry/extract_metadata.py whenever provider metadata changespnpm dev command runs both the Eleventy build and starts a live-reload dev serversrc/css/tokens.css for themingjs/ scripts