| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # SQL Integration |
| |
| [Apache DataFusion](https://datafusion.apache.org/) is a fast, extensible query engine for building data-centric systems in Rust. The `paimon-datafusion` crate provides a full SQL integration that lets you create, query, and modify Paimon tables. |
| |
| ## Setup |
| |
| ```toml |
| [dependencies] |
| paimon = "0.1.0" |
| paimon-datafusion = "0.1.0" |
| datafusion = "53" |
| tokio = { version = "1", features = ["full"] } |
| ``` |
| |
| ## Registering Catalog |
| |
| Register an entire Paimon catalog so all databases and tables are accessible via `paimon.database.table` syntax: |
| |
| ```rust |
| use std::sync::Arc; |
| use paimon::{CatalogOptions, FileSystemCatalog, Options}; |
| use paimon_datafusion::SQLContext; |
| |
| async fn example() -> Result<(), Box<dyn std::error::Error>> { |
| let mut options = Options::new(); |
| options.set(CatalogOptions::WAREHOUSE, "file:///tmp/paimon-warehouse"); |
| let catalog = Arc::new(FileSystemCatalog::new(options)?); |
| |
| let mut ctx = SQLContext::new(); |
| ctx.register_catalog("paimon", catalog)?; |
| let df = ctx.sql("SELECT * FROM paimon.default.my_table").await?; |
| df.show().await?; |
| Ok(()) |
| } |
| ``` |
| |
| `SQLContext::new` creates a session context with the Paimon relation planner pre-registered. Use `register_catalog` to add one or more Paimon catalogs. It also manages session-scoped dynamic options internally for `SET`/`RESET` support. |
| |
| ## Data Types |
| |
| The following SQL data types are supported in CREATE TABLE and mapped to their corresponding Paimon types: |
| |
| | SQL Type | Paimon Type | Notes | |
| |---|---|---| |
| | `BOOLEAN` | BooleanType | | |
| | `TINYINT` | TinyIntType | | |
| | `SMALLINT` | SmallIntType | | |
| | `INT` / `INTEGER` | IntType | | |
| | `BIGINT` | BigIntType | | |
| | `FLOAT` / `REAL` | FloatType | | |
| | `DOUBLE` / `DOUBLE PRECISION` | DoubleType | | |
| | `VARCHAR` / `TEXT` / `STRING` / `CHAR` | VarCharType | | |
| | `BINARY` / `VARBINARY` / `BYTEA` | VarBinaryType | | |
| | `BLOB` | BlobType | Binary large object | |
| | `DATE` | DateType | | |
| | `TIMESTAMP[(p)]` | TimestampType | Precision p: 0/3/6/9, default 3 | |
| | `TIMESTAMP WITH TIME ZONE` | LocalZonedTimestampType | | |
| | `DECIMAL(p, s)` | DecimalType | | |
| | `ARRAY<element>` | ArrayType | e.g. `ARRAY<INT>` | |
| | `MAP(key, value)` | MapType | e.g. `MAP(STRING, INT)` | |
| | `STRUCT<field TYPE, ...>` | RowType | e.g. `STRUCT<city STRING, zip INT>` | |
| |
| ## DDL |
| |
| ### CREATE SCHEMA / DROP SCHEMA |
| |
| ```sql |
| CREATE SCHEMA paimon.my_db; |
| DROP SCHEMA paimon.my_db CASCADE; |
| ``` |
| |
| ### CREATE TABLE |
| |
| ```sql |
| CREATE TABLE paimon.my_db.users ( |
| id INT NOT NULL, |
| name STRING, |
| age INT, |
| PRIMARY KEY (id) |
| ) WITH ('bucket' = '4'); |
| ``` |
| |
| `IF NOT EXISTS` is supported: |
| |
| ```sql |
| CREATE TABLE IF NOT EXISTS paimon.my_db.users ( |
| id INT NOT NULL |
| ); |
| ``` |
| |
| Unsupported syntax (will return an error): |
| - `CREATE EXTERNAL TABLE` |
| - `LOCATION` |
| - `CREATE TABLE AS SELECT` |
| |
| ### Partitioned Tables |
| |
| Use `PARTITIONED BY` to specify partition columns. Partition columns must already be declared in the column definitions and must not include a type: |
| |
| ```sql |
| CREATE TABLE paimon.my_db.events ( |
| id INT NOT NULL, |
| name STRING, |
| dt STRING, |
| PRIMARY KEY (id, dt) |
| ) PARTITIONED BY (dt) |
| WITH ('bucket' = '2'); |
| ``` |
| |
| Invalid usage (will return an error): |
| |
| ```sql |
| -- Partition columns must not specify a type |
| CREATE TABLE paimon.my_db.events ( |
| id INT NOT NULL, |
| dt STRING |
| ) PARTITIONED BY (dt STRING); |
| ``` |
| |
| ### Complex Types |
| |
| ```sql |
| CREATE TABLE paimon.my_db.complex_types ( |
| id INT NOT NULL, |
| tags ARRAY<STRING>, |
| props MAP(STRING, INT), |
| address STRUCT<city STRING, zip INT>, |
| PRIMARY KEY (id) |
| ); |
| ``` |
| |
| ### DROP TABLE |
| |
| ```sql |
| DROP TABLE paimon.my_db.users; |
| DROP TABLE IF EXISTS paimon.my_db.users; |
| ``` |
| |
| ### CREATE TEMPORARY TABLE |
| |
| Create an in-memory temporary table from a query result. Temporary tables exist only for the lifetime of the `SQLContext` instance and are automatically cleaned up when the context is dropped. |
| |
| ```sql |
| -- Without column types (types inferred from the query) |
| CREATE TEMPORARY TABLE paimon.my_db.source AS SELECT * FROM (VALUES (1, 'alice'), (2, 'bob')) AS t(id, name); |
| |
| -- With explicit column types (recommended when integer precision matters) |
| CREATE TEMPORARY TABLE paimon.my_db.source (id INT, name STRING) AS SELECT * FROM (VALUES (1, 'alice'), (2, 'bob')) AS t(id, name); |
| ``` |
| |
| `IF NOT EXISTS` is supported — if the table already exists, the statement is silently ignored: |
| |
| ```sql |
| CREATE TEMPORARY TABLE IF NOT EXISTS paimon.my_db.source AS SELECT 1; |
| ``` |
| |
| > **Note:** When using `VALUES` without explicit column types, DataFusion infers integer literals as `Int64`. If the temporary table will be used as a source in `MERGE INTO` against a Paimon table with `Int32` columns, specify the column types explicitly to avoid type mismatch errors. |
| |
| ### CREATE TEMPORARY VIEW |
| |
| Create a temporary view from a query: |
| |
| ```sql |
| CREATE TEMPORARY VIEW paimon.my_db.active_users AS SELECT * FROM paimon.my_db.users WHERE id > 0; |
| ``` |
| |
| `IF NOT EXISTS` is supported: |
| |
| ```sql |
| CREATE TEMPORARY VIEW IF NOT EXISTS paimon.my_db.active_users AS SELECT * FROM paimon.my_db.users WHERE id > 0; |
| ``` |
| |
| ### DROP TEMPORARY TABLE / DROP TEMPORARY VIEW |
| |
| Remove a temporary table or view: |
| |
| ```sql |
| DROP TEMPORARY TABLE paimon.my_db.source; |
| DROP TEMPORARY TABLE IF EXISTS paimon.my_db.source; |
| DROP TEMPORARY VIEW paimon.my_db.active_users; |
| DROP TEMPORARY VIEW IF EXISTS paimon.my_db.active_users; |
| ``` |
| |
| ### ALTER TABLE |
| |
| ```sql |
| -- Add a column |
| ALTER TABLE paimon.my_db.users ADD COLUMN email STRING; |
| |
| -- Drop a column |
| ALTER TABLE paimon.my_db.users DROP COLUMN age; |
| |
| -- Rename a column |
| ALTER TABLE paimon.my_db.users RENAME COLUMN name TO username; |
| |
| -- Rename a table |
| ALTER TABLE paimon.my_db.users RENAME TO members; |
| |
| -- Set table properties |
| ALTER TABLE paimon.my_db.users SET TBLPROPERTIES('data-evolution.enabled' = 'true'); |
| ``` |
| |
| `IF EXISTS` is supported: |
| |
| ```sql |
| ALTER TABLE IF EXISTS paimon.my_db.users ADD COLUMN age INT; |
| ``` |
| |
| ## DML |
| |
| ### INSERT INTO |
| |
| ```sql |
| INSERT INTO paimon.my_db.users VALUES (1, 'alice'), (2, 'bob'), (3, 'carol'); |
| ``` |
| |
| `INSERT INTO ... SELECT ...` is also supported: |
| |
| ```sql |
| INSERT INTO paimon.my_db.users SELECT * FROM source_table; |
| ``` |
| |
| For primary-key tables, records with duplicate keys are deduplicated according to the merge engine (default: Deduplicate engine, where the last written value wins). |
| |
| ### INSERT OVERWRITE |
| |
| For partitioned tables, `INSERT OVERWRITE` replaces only the affected partitions. For unpartitioned tables, it replaces the entire table: |
| |
| ```sql |
| -- Dynamic partition overwrite: overwrites only the dt='2024-01-01' partition |
| INSERT OVERWRITE paimon.my_db.events VALUES ('2024-01-01', 10, 'new_alice'); |
| ``` |
| |
| Hive-style static partition overwrite is also supported via the `PARTITION` clause. The source query provides only non-partition columns, and partition values are specified explicitly: |
| |
| ```sql |
| -- Static partition overwrite: explicitly specify the target partition |
| INSERT OVERWRITE paimon.my_db.events PARTITION (dt = '2024-01-01') |
| VALUES (10, 'new_alice'), (20, 'new_bob'); |
| |
| -- With a SELECT source |
| INSERT OVERWRITE paimon.my_db.events PARTITION (dt = '2024-01-01') |
| SELECT id, name FROM source_table; |
| ``` |
| |
| For multi-level partitioned tables, you can specify a subset of partition columns. Unspecified partition columns are read from the source query (dynamic partition). All sub-partitions under the specified partition are replaced: |
| |
| ```sql |
| -- Only dt is static; all data under dt='2024-01-01' is replaced. |
| -- region comes from the source data. |
| INSERT OVERWRITE paimon.my_db.events PARTITION (dt = '2024-01-01') |
| VALUES ('us', 10, 'alice'), ('eu', 20, 'bob'); |
| ``` |
| |
| ### UPDATE |
| |
| For append-only tables (no primary key), updates are executed using Copy-on-Write: |
| |
| ```sql |
| UPDATE paimon.my_db.t SET name = 'a_new' WHERE id = 1; |
| ``` |
| |
| For primary-key tables, `data-evolution.enabled` must be enabled to perform UPDATE. |
| |
| ### DELETE |
| |
| For append-only tables, deletes are executed using Copy-on-Write: |
| |
| ```sql |
| DELETE FROM paimon.my_db.t WHERE name = 'b'; |
| ``` |
| |
| ### MERGE INTO |
| |
| Standard SQL MERGE INTO syntax is supported, allowing INSERT, UPDATE, and DELETE in a single statement: |
| |
| ```sql |
| MERGE INTO paimon.my_db.target |
| USING source ON target.a = source.a |
| WHEN MATCHED THEN UPDATE SET a = source.a, b = source.b, c = source.c |
| WHEN NOT MATCHED THEN INSERT (a, b, c) VALUES (source.a, source.b, source.c); |
| ``` |
| |
| Delete matched rows only: |
| |
| ```sql |
| MERGE INTO paimon.my_db.target |
| USING source ON target.a = source.a |
| WHEN MATCHED THEN DELETE; |
| ``` |
| |
| UPDATE + INSERT combination: |
| |
| ```sql |
| MERGE INTO paimon.my_db.target |
| USING source ON target.a = source.a |
| WHEN MATCHED THEN UPDATE SET b = source.b |
| WHEN NOT MATCHED THEN INSERT (a, b, c) VALUES (source.a, source.b, source.c); |
| ``` |
| |
| The source can also be a subquery: |
| |
| ```sql |
| MERGE INTO paimon.my_db.target |
| USING (SELECT * FROM other_table WHERE active = true) AS source |
| ON target.id = source.id |
| WHEN MATCHED THEN UPDATE SET name = source.name; |
| ``` |
| |
| For data-evolution tables, MERGE INTO uses the `_ROW_ID` virtual column for row-level tracking. For append-only tables, it uses Copy-on-Write file rewriting. |
| |
| ### TRUNCATE TABLE |
| |
| Truncate an entire table or specific partitions: |
| |
| ```sql |
| -- Truncate the entire table |
| TRUNCATE TABLE paimon.my_db.users; |
| |
| -- Truncate specific partitions |
| TRUNCATE TABLE paimon.my_db.events PARTITION (dt = '2024-01-01'); |
| ``` |
| |
| ### DROP PARTITION |
| |
| Drop specific partitions from a table using `ALTER TABLE ... DROP PARTITION`: |
| |
| ```sql |
| ALTER TABLE paimon.my_db.events DROP PARTITION (dt = '2024-01-01'); |
| ``` |
| |
| Multiple partition key-value pairs can be specified: |
| |
| ```sql |
| ALTER TABLE paimon.my_db.events DROP PARTITION (dt = '2024-01-01', region = 'us'); |
| ``` |
| |
| ## Procedures |
| |
| Use `CALL` to invoke built-in procedures. All procedures are under the `sys` namespace. |
| |
| ### create_tag |
| |
| Create a named tag from a snapshot: |
| |
| ```sql |
| CALL sys.create_tag(table => 'paimon.my_db.my_table', tag => 'my_tag', snapshot_id => 1); |
| ``` |
| |
| ### create_tag_from_timestamp |
| |
| Create a named tag from a timestamp (finds the latest snapshot at or before the given time): |
| |
| ```sql |
| CALL sys.create_tag_from_timestamp(table => 'paimon.my_db.my_table', tag => 'my_tag', timestamp => 1234567890000); |
| ``` |
| |
| ### delete_tag |
| |
| Delete a named tag: |
| |
| ```sql |
| CALL sys.delete_tag(table => 'paimon.my_db.my_table', tag => 'my_tag'); |
| ``` |
| |
| ### rollback_to |
| |
| Rollback a table to a specific snapshot or tag: |
| |
| ```sql |
| -- Rollback to a snapshot |
| CALL sys.rollback_to(table => 'paimon.my_db.my_table', snapshot_id => 1); |
| |
| -- Rollback to a tag |
| CALL sys.rollback_to(table => 'paimon.my_db.my_table', tag => 'my_tag'); |
| ``` |
| |
| ### rollback_to_timestamp |
| |
| Rollback a table to a specific timestamp: |
| |
| ```sql |
| CALL sys.rollback_to_timestamp(table => 'paimon.my_db.my_table', timestamp => 1234567890000); |
| ``` |
| |
| ## Queries |
| |
| ### Basic Queries |
| |
| All DataFusion query capabilities are supported (JOINs, aggregations, subqueries, CTEs, etc.): |
| |
| ```sql |
| SELECT id, name FROM paimon.my_db.users WHERE id > 10 ORDER BY id LIMIT 100; |
| ``` |
| |
| ### Column Projection |
| |
| Only the required columns are read, reducing I/O: |
| |
| ```sql |
| SELECT name FROM paimon.my_db.users; |
| ``` |
| |
| ### Filter Pushdown |
| |
| The following filter predicates are pushed down to the Paimon storage layer: |
| |
| - Comparison: `=`, `!=`, `<`, `<=`, `>`, `>=` |
| - Logical: `AND`, `OR` |
| - Null checks: `IS NULL`, `IS NOT NULL` |
| - Range: `IN`, `NOT IN`, `BETWEEN` |
| |
| Filters on partition columns enable exact partition pruning, avoiding scans of irrelevant data. |
| |
| ### COUNT(*) Pushdown |
| |
| When the following conditions are met, `COUNT(*)` retrieves exact row counts directly from split metadata without a full table scan: |
| |
| - All splits have a known `merged_row_count` |
| - No LIMIT clause |
| - Filter predicates only involve partition columns (Exact level) |
| |
| ## Vector Search |
| |
| Paimon supports approximate nearest neighbor (ANN) vector search via the Lumina vector index. The `vector_search` table-valued function is registered as a UDTF on the DataFusion session context. |
| |
| ### Registration |
| |
| ```rust |
| use paimon_datafusion::register_vector_search; |
| |
| register_vector_search(&ctx, catalog.clone(), "default"); |
| ``` |
| |
| ### Usage |
| |
| ```sql |
| SELECT * FROM vector_search('table_name', 'column_name', 'query_vector_json', limit) |
| ``` |
| |
| | Argument | Type | Description | |
| |---|---|---| |
| | `table_name` | STRING | Table name, fully qualified (`catalog.db.table`) or short form | |
| | `column_name` | STRING | The vector column to search | |
| | `query_vector_json` | STRING | Query vector as a JSON array of floats | |
| | `limit` | INT | Maximum number of results (top-k) | |
| |
| Example: |
| |
| ```sql |
| SELECT * FROM vector_search('paimon.my_db.items', 'embedding', '[1.0, 0.0, 0.0, 0.0]', 10); |
| ``` |
| |
| The function performs ANN search across all Lumina vector index files for the target column, merges results, and returns the top-k rows ordered by relevance score. If no matching index is found, an empty result is returned. |
| |
| ### Supported Metrics |
| |
| The distance metric is configured at index creation time via table options: |
| |
| | Metric | Description | |
| |---|---| |
| | `inner_product` | Inner product (default) | |
| | `cosine` | Cosine similarity | |
| | `l2` | Euclidean (L2) distance | |
| |
| ### Vector Index Options |
| |
| Vector index behavior is configured via table options prefixed with `lumina.`: |
| |
| | Option | Description | |
| |---|---| |
| | `lumina.dimension` | Vector dimension | |
| | `lumina.metric` | Distance metric (`inner_product`, `cosine`, `l2`) | |
| | `lumina.index-type` | Index type (default: `diskann`) | |
| |
| ### Environment |
| |
| The Lumina native library must be available at runtime. Set the `LUMINA_LIB_PATH` environment variable to the path of the shared library, or place it in the platform default location. |
| |
| ## Full-Text Search |
| |
| Paimon supports full-text search via the Tantivy search engine. The `full_text_search` table-valued function is registered as a UDTF on the DataFusion session context. |
| |
| > **Note:** Full-text search requires the `fulltext` feature flag to be enabled on both `paimon` and `paimon-datafusion` crates. |
| |
| ```toml |
| [dependencies] |
| paimon = { version = "0.1.0", features = ["fulltext"] } |
| paimon-datafusion = { version = "0.1.0", features = ["fulltext"] } |
| ``` |
| |
| ### Registration |
| |
| ```rust |
| use paimon_datafusion::register_full_text_search; |
| |
| register_full_text_search(&ctx, catalog.clone(), "default"); |
| ``` |
| |
| ### Usage |
| |
| ```sql |
| SELECT * FROM full_text_search('table_name', 'column_name', 'query_text', limit) |
| ``` |
| |
| | Argument | Type | Description | |
| |---|---|---| |
| | `table_name` | STRING | Table name, fully qualified (`catalog.db.table`) or short form | |
| | `column_name` | STRING | The text column to search | |
| | `query_text` | STRING | Search query (Tantivy query syntax) | |
| | `limit` | INT | Maximum number of results (top-k) | |
| |
| Example: |
| |
| ```sql |
| SELECT * FROM full_text_search('paimon.my_db.docs', 'content', 'paimon search', 10); |
| ``` |
| |
| The function searches across all Tantivy full-text index files for the target column, merges results by relevance score, and returns the top-k matching rows. If no matching index is found, an empty result is returned. |
| |
| ## Time Travel |
| |
| Paimon supports time travel queries to read historical data. |
| |
| ### By Snapshot ID |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table VERSION AS OF 1; |
| ``` |
| |
| ### By Tag Name |
| |
| Use a quoted tag name with `VERSION AS OF`: |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table VERSION AS OF 'my_tag'; |
| ``` |
| |
| Resolution order: first checks if a tag with that name exists, then tries to parse it as a snapshot ID. |
| |
| ### By Timestamp |
| |
| Read data as of a specific point in time. The format is `YYYY-MM-DD HH:MM:SS`: |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table TIMESTAMP AS OF '2024-01-01 00:00:00'; |
| ``` |
| |
| This finds the latest snapshot whose commit time is less than or equal to the given timestamp. The timestamp is interpreted in the local timezone. |
| |
| ## Dynamic Options (SET / RESET) |
| |
| Use `SET` to configure session-scoped Paimon dynamic options that apply to subsequent table loads: |
| |
| ```sql |
| -- Set an option |
| SET 'paimon.scan.version' = '1'; |
| |
| -- Reset an option |
| RESET 'paimon.scan.version'; |
| ``` |
| |
| Options prefixed with `paimon.` are handled by Paimon; all others are delegated to DataFusion. Dynamic options are applied at table load time via `table.copy_with_options()`. |
| |
| Example — enable BLOB descriptor mode: |
| |
| ```sql |
| SET 'paimon.blob-as-descriptor' = 'true'; |
| SELECT * FROM paimon.my_db.assets; |
| RESET 'paimon.blob-as-descriptor'; |
| ``` |
| |
| ## Temporary Tables |
| |
| You can register in-memory temporary tables under any catalog. Temporary tables exist only for the lifetime of the `SQLContext` instance and are automatically cleaned up when the context is dropped. |
| |
| The table name accepts flexible references, similar to DataFusion: |
| - `"my_table"` — uses the current catalog and current database |
| - `"database.my_table"` — uses the current catalog with the specified database |
| - `"catalog.database.my_table"` — fully qualified |
| |
| ### register_temp_table |
| |
| Register any `Arc<dyn TableProvider>` as a temporary table (including `MemTable`, `ViewTable`, custom providers, etc.): |
| |
| ```rust |
| use datafusion::arrow::array::Int32Array; |
| use datafusion::arrow::datatypes::{DataType as ArrowDataType, Field, Schema}; |
| use datafusion::arrow::record_batch::RecordBatch; |
| use datafusion::datasource::MemTable; |
| |
| let schema = Arc::new(Schema::new(vec![ |
| Field::new("id", ArrowDataType::Int32, false), |
| Field::new("name", ArrowDataType::Utf8, true), |
| ])); |
| let batch = RecordBatch::try_new( |
| schema.clone(), |
| vec![ |
| Arc::new(Int32Array::from(vec![1, 2, 3])), |
| Arc::new(StringArray::from(vec!["alice", "bob", "carol"])), |
| ], |
| )?; |
| |
| // Register a MemTable as a temp table |
| let mem_table = Arc::new(MemTable::try_new(schema.clone(), vec![vec![batch.clone()]])?); |
| ctx.register_temp_table("paimon.my_db.users", mem_table)?; |
| let df = ctx.sql("SELECT * FROM paimon.my_db.users WHERE id > 1").await?; |
| df.show().await?; |
| |
| // Register a ViewTable as a temp table |
| use datafusion::datasource::ViewTable; |
| let view_table = Arc::new(ViewTable::new(logical_plan, Some(query_sql))); |
| ctx.register_temp_table("paimon.my_db.my_view", view_table)?; |
| ``` |
| |
| ### CREATE TEMPORARY TABLE |
| |
| You can also create temporary tables directly from SQL. See the [DDL section](#create-temporary-table) for details. |
| |
| ```sql |
| CREATE TEMPORARY TABLE paimon.my_db.source (id INT, name STRING) AS SELECT * FROM (VALUES (1, 'alice'), (2, 'bob')) AS t(id, name); |
| ``` |
| |
| ### CREATE TEMPORARY VIEW |
| |
| Create a temporary view directly from SQL. See the [DDL section](#create-temporary-view) for details. |
| |
| ```sql |
| CREATE TEMPORARY VIEW paimon.my_db.active_users AS SELECT * FROM paimon.my_db.users WHERE id > 0; |
| ``` |
| |
| ### Deregister |
| |
| Use `deregister_temp_table` to remove a temporary table or view programmatically, or use the `DROP TEMPORARY TABLE` / `DROP TEMPORARY VIEW` SQL statements (see the [DDL section](#drop-temporary-table--drop-temporary-view)): |
| |
| ```rust |
| ctx.deregister_temp_table("paimon.my_db.users")?; |
| ``` |
| |
| Multiple temporary tables can share the same database — the database is created automatically on first use: |
| |
| ```rust |
| let mem_a = Arc::new(MemTable::try_new(schema_a, vec![vec![batch_a]])?); |
| let mem_b = Arc::new(MemTable::try_new(schema_b, vec![vec![batch_b]])?); |
| ctx.register_temp_table("my_db.table_a", mem_a)?; |
| ctx.register_temp_table("my_db.table_b", mem_b)?; |
| |
| // Join two temp tables |
| let df = ctx.sql("SELECT * FROM paimon.my_db.table_a JOIN paimon.my_db.table_b ON a.id = b.id").await?; |
| ``` |
| |
| ## System Tables |
| |
| Access table metadata via the `$` syntax. |
| |
| ### $options |
| |
| View all configuration options for a table: |
| |
| ```sql |
| SELECT key, value FROM paimon.default.my_table$options; |
| ``` |
| |
| Returns two columns: `key` (STRING) and `value` (STRING). |
| |
| ### $schemas |
| |
| View the schema history of a table: |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table$schemas; |
| ``` |
| |
| Columns: |
| |
| | Column | Type | Description | |
| |---|---|---| |
| | `schema_id` | BIGINT | Schema ID | |
| | `fields` | STRING | Field definitions (JSON) | |
| | `partition_keys` | STRING | Partition keys (JSON) | |
| | `primary_keys` | STRING | Primary keys (JSON) | |
| | `options` | STRING | Table options (JSON) | |
| | `comment` | STRING | Comment | |
| | `update_time` | TIMESTAMP | Update time | |
| |
| ### $snapshots |
| |
| View the snapshot history of a table: |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table$snapshots; |
| ``` |
| |
| Columns: |
| |
| | Column | Type | Description | |
| |---|---|---| |
| | `snapshot_id` | BIGINT | Snapshot ID | |
| | `schema_id` | BIGINT | Schema ID | |
| | `commit_user` | STRING | Commit user | |
| | `commit_identifier` | BIGINT | Commit identifier | |
| | `commit_kind` | STRING | `APPEND` / `COMPACT` / `OVERWRITE` / `ANALYZE` | |
| | `commit_time` | TIMESTAMP | Commit time | |
| | `base_manifest_list` | STRING | Base manifest list file | |
| | `delta_manifest_list` | STRING | Delta manifest list file | |
| | `changelog_manifest_list` | STRING | Changelog manifest list file | |
| | `total_record_count` | BIGINT | Total record count | |
| | `delta_record_count` | BIGINT | Delta record count | |
| | `changelog_record_count` | BIGINT | Changelog record count | |
| | `watermark` | BIGINT | Watermark | |
| | `next_row_id` | BIGINT | Next row id | |
| |
| ### $tags |
| |
| View all named tags of a table: |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table$tags; |
| ``` |
| |
| Columns: |
| |
| | Column | Type | Description | |
| |---|---|---| |
| | `tag_name` | STRING | Tag name | |
| | `snapshot_id` | BIGINT | Snapshot ID | |
| | `schema_id` | BIGINT | Schema ID | |
| | `commit_time` | TIMESTAMP | Commit time | |
| | `record_count` | BIGINT | Record count | |
| | `create_time` | TIMESTAMP | Tag creation time | |
| | `time_retained` | STRING | Retention duration | |
| |
| ### $manifests |
| |
| View manifest files of the latest snapshot: |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table$manifests; |
| ``` |
| |
| Columns: |
| |
| | Column | Type | Description | |
| |---|---|---| |
| | `file_name` | STRING | Manifest file name | |
| | `file_size` | BIGINT | File size in bytes | |
| | `num_added_files` | BIGINT | Number of added data files | |
| | `num_deleted_files` | BIGINT | Number of deleted data files | |
| | `schema_id` | BIGINT | Schema ID | |
| | `min_partition_stats` | STRING | Minimum partition stats, formatted as a Java row cast string | |
| | `max_partition_stats` | STRING | Maximum partition stats, formatted as a Java row cast string | |
| | `min_row_id` | BIGINT | Minimum row id covered (when row tracking is enabled) | |
| | `max_row_id` | BIGINT | Maximum row id covered (when row tracking is enabled) | |
| |
| ### Branch References |
| |
| System tables support branch syntax: |
| |
| ```sql |
| SELECT * FROM paimon.default.my_table$branch_main$options; |
| ``` |
| |
| ## Table Options |
| |
| Set via `WITH ('key' = 'value')` at table creation time, or dynamically via `SET`. |
| |
| ### Bucket Configuration |
| |
| | Option | Description | |
| |---|---| |
| | `'bucket' = 'N'` | Fixed N buckets (e.g. 1, 2, 4) | |
| | `'bucket' = '-1'` | Dynamic bucket mode (HASH index) | |
| | `'bucket' = '-2'` | Postpone bucket mode (deferred assignment) | |
| | `'bucket-key' = 'col'` | Explicit bucket key column | |
| |
| ### Merge Engine |
| |
| | Option | Description | |
| |---|---| |
| | `'merge-engine' = 'deduplicate'` | Deduplicate engine (default for PK tables), last write wins | |
| | `'merge-engine' = 'first-row'` | Keeps the first written row | |
| |
| ### Other Options |
| |
| | Option | Description | |
| |---|---| |
| | `'sequence.field' = 'col'` | Sequence field used to determine which record wins during deduplication | |
| | `'data-evolution.enabled' = 'true'` | Enable data evolution (partial-column writes, row-level UPDATE/MERGE) | |
| | `'deletion-vectors.enabled' = 'true'` | Enable deletion vectors | |
| | `'cross-partition-update.enabled' = 'true'` | Allow cross-partition updates | |
| | `'changelog-producer' = 'input'` | Changelog producer (PK tables with input mode reject writes) | |
| |
| ## Full Example |
| |
| ```rust |
| use std::sync::Arc; |
| use paimon::{CatalogOptions, FileSystemCatalog, Options}; |
| use paimon_datafusion::SQLContext; |
| |
| #[tokio::main] |
| async fn main() -> Result<(), Box<dyn std::error::Error>> { |
| // Create catalog |
| let mut options = Options::new(); |
| options.set(CatalogOptions::WAREHOUSE, "file:///tmp/paimon-warehouse"); |
| let catalog = Arc::new(FileSystemCatalog::new(options)?); |
| |
| // Create SQL context and register catalog |
| let mut ctx = SQLContext::new(); |
| ctx.register_catalog("paimon", catalog)?; |
| |
| // Create database and table |
| ctx.sql("CREATE SCHEMA paimon.my_db").await?; |
| ctx.sql( |
| "CREATE TABLE paimon.my_db.users ( |
| id INT NOT NULL, |
| name STRING, |
| PRIMARY KEY (id) |
| ) WITH ('bucket' = '1')" |
| ).await?; |
| |
| // Insert data |
| ctx.sql("INSERT INTO paimon.my_db.users VALUES (1, 'alice'), (2, 'bob')") |
| .await?.collect().await?; |
| |
| // Query |
| let df = ctx.sql("SELECT * FROM paimon.my_db.users ORDER BY id").await?; |
| df.show().await?; |
| |
| Ok(()) |
| } |
| ``` |