fix: check data batch nullability match with schema in write process (#122)
Paimon C++ is a high-performance C++ implementation of Apache Paimon. Paimon C++ aims to provide a native, high-performance and extensible implementation that allows native engines to access the Paimon datalake format with maximum efficiency.
The writing is divided into two stages:
std::string table_path = "/tmp/paimon/my.db/test_table/"; WriteContextBuilder context_builder(table_path, "commit_user"); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<WriteContext> write_context, context_builder.AddOption(Options::TARGET_FILE_SIZE, "1024mb") .AddOption(Options::FILE_SYSTEM, "local") .Finish()); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<FileStoreWrite> file_store_write, FileStoreWrite::Create(std::move(write_context))); ::ArrowArray arrow_array; // prepare your arrow array // ... RecordBatchBuilder batch_builder(&arrow_array); batch_builder.SetPartition({{"col1", "20240813"}, {"col2", "23"}}).SetBucket(1); PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<RecordBatch> batch, batch_builder.Finish()); PAIMON_RETURN_NOT_OK(file_store_write->Write(batch)); PAIMON_ASSIGN_OR_RAISE(std::vector<std::shared_ptr<CommitMessage>> commit_messages, file_store_write->PrepareCommit()); CommitContextBuilder commit_context_builder(table_path, "commit_user"); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<CommitContext> commit_context, commit_context_builder.AddOption(Options::MANIFEST_TARGET_FILE_SIZE, "8mb") .AddOption(Options::FILE_SYSTEM, "local") .IgnoreEmptyCommit(false) .Finish()); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<FileStoreCommit> commit, FileStoreCommit::Create(std::move(commit_context))); PAIMON_RETURN_NOT_OK(commit->Commit(commit_messages));
The reading is divided into two stages:
std::string table_path = "/tmp/paimon/my.db/test_table/"; ScanContextBuilder context_builder(table_path); // prepare predicate if needed std::shared_ptr<Predicate> predicate = PredicateBuilder::GreaterThan(/*field_index=*/0, /*field_name=*/"f0", /*field_type=*/FieldType::INT, Literal(10)); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<ScanContext> scan_context, context_builder.SetPredicate(predicate) .AddOption(Options::SCAN_SNAPSHOT_ID, "2") .AddOption(Options::FILE_SYSTEM, "local") .Finish()); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<TableScan> table_scan, TableScan::Create(std::move(scan_context))); PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<Plan> plan, table_scan->CreatePlan()); ReadContextBuilder read_context_builder(table_path, /*schema_id=*/0); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<ReadContext> read_context, read_context_builder.SetReadSchema({"f0", "f1"}) .SetPredicate(predicate) .AddOption(Options::FILE_SYSTEM, "local") .EnablePrefetch(true) .Finish()); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<TableRead> table_read, TableRead::Create(std::move(read_context))); PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<BatchReader> batch_reader, table_read->CreateReader(plan->Splits())); while (true) { PAIMON_ASSIGN_OR_RAISE(BatchReader::ReadBatch read_batch, batch_reader->NextBatch()); if (BatchReader::IsEofBatch(read_batch)) { break; } auto& [c_array, c_schema] = read_batch; // process the arrow array auto arrow_result = arrow::ImportArray(c_array.get(), c_schema.get()); }
$ mkdir build $ cd build $ cmake .. $ make
Paimon-cpp is an active open-source project and we welcome people who want to contribute or share good ideas! Before contributing, you are encouraged to check out our documentation.
If you have suggestions, feedback, want to report a bug or request a feature, please open an issue. Pull requests are also very welcome!
We value respectful and open collaboration, and appreciate everyone who helps make paimon-cpp better. Thank you for your support!
Install the python package pre-commit and run once pre-commit install.
pip install pre-commit pre-commit install
This will setup a git pre-commit-hook that is executed on each commit and will report the linting problems. To run all hooks on all files use pre-commit run -a.
We provide Dev Container configuration file templates.
To use a Dev Container as your development environment, follow the steps below, then select Dev Containers: Reopen in Container from VS Code's Command Palette.
cd .devcontainer cp Dockerfile.template Dockerfile cp devcontainer.json.template devcontainer.json
If you make improvements that could benefit all developers, please update the template files and submit a pull request.
Licensed under the Apache License, Version 2.0