This document describes Apache Paimon's detailed security threat model for maintainers and automated security triage.
It complements the shorter public-facing security model in docs/docs/project/security.md (published at the project website) by making Paimon's trust assumptions, security boundaries, and recurring non-security bug classes more explicit.
Apache Paimon is a streaming data lake platform that is often deployed as a library and integration layer inside larger systems (Flink, Spark, Hive, and other query engines) that provide their own authentication, authorization, and credential management. Because of that deployment model, many bug classes that look security-relevant in the abstract are not actually security vulnerabilities in Paimon itself.
This model is intended to answer:
This model is scoped to the Apache Paimon project itself:
It is not a general threat model for every deployment that embeds Paimon.
In particular, it does not attempt to define the complete security model for:
Paimon should:
Paimon does not aim to be the primary enforcement point for:
The operator deploys and configures the catalog, REST Catalog server, engine, and storage integration around Paimon. This role is trusted to choose endpoints, warehouses, and storage integrations, configure credentials, and decide which users may create, read, or modify tables.
The catalog control plane is responsible for resolving tables and supplying metadata, locations, configuration, and delegated credentials to Paimon. This role may be implemented by:
Regardless of implementation, it should not expose secrets to unintended principals or leak credential-bearing state across unintended boundaries.
Paimon assumes a trusted catalog or metastore, which is outside its primary security boundary.
In REST deployments, part of the catalog control plane is implemented by a server that returns metadata, configuration, delegated storage credentials (data tokens), and query-level authorization (row filters and column masking) to the client. This server is generally treated as a trusted control-plane component.
The REST Catalog server is responsible for:
In REST deployments, the client-side catalog (RESTCatalog, RESTApi) consumes server-provided metadata, configuration, and credentials. Where the client and server are meaningfully distinct, client-side bugs in token handling, caching, or reuse may still be security-relevant. This is especially true when the Paimon-owned client implementation leaks credential-bearing state across catalog, session, or principal boundaries it is expected to preserve.
The REST Catalog client is responsible for:
AuthProviderFileIO instances keyed by data token (via RESTTokenFileIO) and evicting them when tokens expireQuery engines (Flink, Spark, Hive, Trino, StarRocks, etc.) and applications may expose only a subset of Paimon capabilities to users. They are responsible for their own user-facing authorization boundaries unless Paimon explicitly documents otherwise.
This role may already have legitimate power to write or replace table metadata, write or delete data files, manage snapshots, create or delete branches and tags, and invoke destructive maintenance operations (compaction, expiration, rollback). If a report only shows a new way to achieve the same effect this role can already cause legitimately, it is usually not a security issue in Paimon.
The following are generally treated as trusted operator or deployment inputs:
uri, warehouse, token.provider)security.kerberos.login.keytab, security.kerberos.login.principal)header.*)If a report depends on the attacker controlling those values directly, it is usually not a vulnerability in Paimon itself.
Paimon often accepts metadata locations, table properties, database properties, schema definitions, and related control-plane information from a catalog or metastore. By default, Paimon treats those sources as trusted.
This means a malicious catalog supplying incorrect or malicious metadata is usually not a Paimon vulnerability by itself.
In REST deployments, Paimon accepts the following from the REST Catalog server:
/v1/config endpoint, including catalog prefix and additional headers/v1/{prefix}/databases/{database}/tables/{table}/token endpoint, used by RESTTokenFileIO to access the underlying object store/v1/{prefix}/databases/{database}/tables/{table}/auth endpointBy default, these are treated as trusted control-plane inputs unless Paimon explicitly documents a stronger guarantee.
This means a malicious REST Catalog server sending dangerous configuration or overly broad data tokens is usually not a Paimon vulnerability by itself. It also means many client-side token-selection bugs are often correctness or specification issues rather than security boundary failures.
The major exception is secret exposure. If Paimon surfaces credentials or secrets to a new audience that was not already trusted with them, that is security-relevant. In particular:
Object store permissions (e.g., OSS, S3, HDFS ACLs) are enforced by the storage provider and the credentials the surrounding deployment chooses to hand to Paimon. Paimon is not the root authority for bucket- or object-level authorization.
Reports that depend primarily on over-broad IAM policies or permissive storage ACLs are usually deployment-sensitive rather than product-security issues in Paimon.
Paimon integrations may surface data and operations through a query engine or application, but Paimon is not a complete user-authorization framework for those systems.
Paimon does provide a mechanism for the REST Catalog server to supply row-level filters and column masking rules via authTableQuery, but enforcement of those rules is a shared responsibility between the engine integration and the catalog server. Paimon relays the rules; the engine must apply them.
The following categories are generally security-relevant in Paimon when the report is credible and reproducible.
Examples include:
RESTTokenFileIO or RESTApi state beyond their intended scopeSecurity issues exist when Paimon itself is expected to separate catalogs, principals, or sessions and fails to do so.
Examples include:
FILE_IO_CACHE in RESTTokenFileIO returning a FileIO belonging to a different principal)RESTApi instance leaking into anotherIf Paimon's client-side handling of authTableQuery responses (row filters or column masking rules) allows a caller to bypass filters that the server intended to enforce, that is security-relevant when the bypass occurs within Paimon-owned code rather than in the engine integration.
These categories may still be real bugs worth fixing, but they are not usually security vulnerabilities in Paimon itself.
Examples:
Malformed-input crashes, raw runtime exceptions from invalid JSON or Avro data, and memory amplification from oversized manifests or schemas are usually treated as robustness or hardening work rather than security issues in Paimon itself.
Reports that require a malicious catalog, metastore, REST Catalog server, or other external service are usually outside Paimon's primary security boundary.
Examples:
If the actor already has a legitimate capability that can cause the same harm, the new path is usually not a security issue. This often applies to writers or maintainers who already control metadata layout, file layout, or destructive maintenance operations (snapshot expiration, orphan file cleanup, branch deletion).
Resource exhaustion caused by legitimate but expensive operations (e.g., large compaction, scanning many partitions, listing all snapshots) is usually treated as an operational concern rather than a security vulnerability.
Paimon's REST Catalog client supports pluggable authentication through the AuthProvider interface.
Authentication providers are created via the AuthProviderFactory SPI, loaded using Java's ServiceLoader mechanism based on the token.provider configuration. The authentication provider is process-level per catalog instance and must not share mutable state across instances.
When data-token.enabled is true, RESTTokenFileIO manages delegated storage credentials:
FileIO instance for storage accessFileIO instances are cached in a process-global cache (FILE_IO_CACHE) keyed by RESTToken, with a maximum size of 1000 entries and 10-hour expirySecurity-relevant invariants:
FILE_IO_CACHE keys on the full RESTToken (token content + expiration), so different tokens produce different FileIO instancesRESTApi instance from the catalog context if the original instance is unavailable (e.g., after deserialization)Paimon supports Kerberos authentication for Hadoop-based deployments through SecurityContext and SecurityConfiguration. Keytab paths and principals are treated as trusted operator configuration.
A scanner targeting Paimon should treat a finding as higher-confidence only if it plausibly shows one of the following:
A finding should be downgraded or rejected by default if it instead depends primarily on: