blob: 7763fea89e82a37af9f9adfe300946aaef42d117 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Getting Started
## Installation
Add `paimon` to your `Cargo.toml`:
```toml
[dependencies]
paimon = "0.0.0"
tokio = { version = "1", features = ["full"] }
```
By default, the `storage-fs` (local filesystem) and `storage-memory` (in-memory) backends are enabled. To use additional storage backends, enable the corresponding feature flags:
```toml
[dependencies]
paimon = { version = "0.0.0", features = ["storage-s3"] }
```
Available storage features:
| Feature | Backend |
|------------------|------------------|
| `storage-fs` | Local filesystem |
| `storage-memory` | In-memory |
| `storage-s3` | Amazon S3 |
| `storage-oss` | Alibaba Cloud OSS|
| `storage-all` | All of the above |
## Catalog Management
Paimon supports multiple catalog types. The `CatalogFactory` provides a unified way to create catalogs based on configuration options.
### Create a Catalog
The `CatalogFactory` automatically determines the catalog type based on the `metastore` option:
```rust
use paimon::{CatalogFactory, CatalogOptions, Options};
// Local filesystem (no credentials needed)
let mut options = Options::new();
options.set(CatalogOptions::WAREHOUSE, "/path/to/warehouse");
let catalog = CatalogFactory::create(options).await?;
// Amazon S3
let mut options = Options::new();
options.set(CatalogOptions::WAREHOUSE, "s3://bucket/warehouse");
options.set("s3.access-key-id", "your-access-key-id");
options.set("s3.secret-access-key", "your-secret-access-key");
options.set("s3.region", "us-east-1");
let catalog = CatalogFactory::create(options).await?;
// Alibaba Cloud OSS
let mut options = Options::new();
options.set(CatalogOptions::WAREHOUSE, "oss://bucket/warehouse");
options.set("fs.oss.accessKeyId", "your-access-key-id");
options.set("fs.oss.accessKeySecret", "your-access-key-secret");
options.set("fs.oss.endpoint", "oss-cn-hangzhou.aliyuncs.com");
let catalog = CatalogFactory::create(options).await?;
// REST catalog
let mut options = Options::new();
options.set(CatalogOptions::METASTORE, "rest");
options.set(CatalogOptions::URI, "http://localhost:8080");
options.set(CatalogOptions::WAREHOUSE, "my_warehouse");
let catalog = CatalogFactory::create(options).await?;
```
Supported metastore types:
| Metastore Type | Description |
|----------------|----------------------------------|
| `filesystem` | Local or remote filesystem (default) |
| `rest` | REST catalog server |
### Manage Databases
```rust
use paimon::Catalog; // import the trait
use std::collections::HashMap;
// Create a database
catalog.create_database("my_db", false, HashMap::new()).await?;
// List databases
let databases = catalog.list_databases().await?;
// Drop a database (cascade = true to drop all tables inside)
catalog.drop_database("my_db", false, true).await?;
```
### Manage Tables
```rust
use paimon::catalog::Identifier;
use paimon::spec::{DataType, IntType, VarCharType, Schema};
// Define a schema
let schema = Schema::builder()
.column("id", DataType::Int(IntType::new()))
.column("name", DataType::VarChar(VarCharType::string_type()))
.build()?;
// Create a table
let identifier = Identifier::new("my_db", "my_table");
catalog.create_table(&identifier, schema, false).await?;
// List tables in a database
let tables = catalog.list_tables("my_db").await?;
// Get a table handle
let table = catalog.get_table(&identifier).await?;
```
## Reading a Table
Paimon Rust uses a scan-then-read pattern: first scan the table to produce splits, then read data from those splits as Arrow `RecordBatch` streams.
```rust
use futures::StreamExt;
// Get a table from the catalog
let table = catalog.get_table(&Identifier::new("my_db", "my_table")).await?;
// Create a read builder
let read_builder = table.new_read_builder();
// Step 1: Scan — produces a Plan containing DataSplits
let plan = {
let scan = read_builder.new_scan();
scan.plan().await?
};
// Step 2: Read — consumes splits and returns Arrow RecordBatches
let reader = read_builder.new_read()?;
let mut stream = reader.to_arrow(plan.splits())?;
while let Some(batch) = stream.next().await {
let batch = batch?;
println!("RecordBatch: {batch:#?}");
}
```
## Building from Source
```bash
git clone https://github.com/apache/paimon-rust.git
cd paimon-rust
cargo build
```
## Running Tests
```bash
# Unit tests
cargo test
# Integration tests (requires Docker)
make docker-up
cargo test -p integration_tests
make docker-down
```