| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Architecture |
| |
| ## Overview |
| |
| Apache Paimon Rust is organized as a Cargo workspace with multiple crates, each responsible for a distinct layer of functionality. |
| |
| ## Crate Structure |
| |
| ### `crates/paimon` — Core Library |
| |
| The core crate implements the Paimon table format, including: |
| |
| - **Catalog** — Catalog client for discovering and managing databases and tables |
| - **Table** — Table abstraction for reading Paimon tables |
| - **Snapshot & Manifest** — Reading snapshot and manifest metadata |
| - **Schema** — Table schema management and evolution |
| - **File IO** — Abstraction layer for storage backends (local filesystem, S3) |
| - **File Format** — Parquet file reading and writing via Apache Arrow |
| |
| ### `crates/integrations/datafusion` — DataFusion Integration |
| |
| Provides a `TableProvider` implementation that allows querying Paimon tables using [Apache DataFusion](https://datafusion.apache.org/)'s SQL engine. |
| |
| ## Data Model |
| |
| Paimon organizes data in a layered structure: |
| |
| ``` |
| Catalog |
| └── Database |
| └── Table |
| ├── Schema |
| └── Snapshot |
| └── Manifest |
| └── Data Files (Parquet) |
| ``` |
| |
| - **Catalog** manages databases and tables, accessed via REST API |
| - **Snapshot** represents a consistent view of a table at a point in time |
| - **Manifest** lists the data files that belong to a snapshot |
| - **Data Files** store the actual data in Parquet format |