| --- |
| title: "Cpp API" |
| weight: 6 |
| type: docs |
| aliases: |
| - /api/cpp-api.html |
| --- |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Cpp API |
| |
| Paimon C++ is a high-performance C++ implementation of Apache Paimon. Paimon C++ aims to provide a native, |
| high-performance and extensible implementation that allows native engines to access the Paimon datalake |
| format with maximum efficiency. |
| |
| ## Environment Settings |
| |
| [Paimon C++](https://github.com/alibaba/paimon-cpp.git) is currently governed under Alibaba open source |
| community. You can checkout the [document](https://alibaba.github.io/paimon-cpp/getting_started.html) |
| for more details about envinroment settings. |
| |
| ```sh |
| git clone https://github.com/alibaba/paimon-cpp.git |
| cd paimon-cpp |
| mkdir build-release |
| cd build-release |
| cmake .. |
| make -j8 # if you have 8 CPU cores, otherwise adjust |
| make install |
| ``` |
| |
| ## Create Catalog |
| |
| Before coming into contact with the Table, you need to create a Catalog. |
| |
| ```c++ |
| #include "paimon/catalog/catalog.h" |
| |
| // Note that keys and values are all string |
| std::map<std::string, std::string> options; |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::Catalog> catalog, |
| paimon::Catalog::Create(root_path, options)); |
| ``` |
| |
| Current C++ Paimon only supports filesystem catalog. In the future, we will support REST catalog. |
| See [Catalog]({{< ref "concepts/catalog" >}}). |
| |
| You can use the catalog to create table for writing data. |
| |
| ## Create Database |
| |
| Table is located in a database. If you want to create table in a new database, you should create it. |
| |
| ```c++ |
| PAIMON_RETURN_NOT_OK(catalog->CreateDatabase('database_name', options, /*ignore_if_exists=*/false)); |
| ``` |
| |
| ## Create Table |
| |
| Table schema contains fields definition, partition keys, primary keys, table options. |
| The field definition is described by `Arrow::Schema`. All arguments except fields definition are optional. |
| |
| for example: |
| |
| ```c++ |
| arrow::FieldVector fields = { |
| arrow::field("f0", arrow::utf8()), |
| arrow::field("f1", arrow::int32()), |
| arrow::field("f2", arrow::int32()), |
| arrow::field("f3", arrow::float64()), |
| }; |
| std::shared_ptr<arrow::Schema> schema = arrow::schema(fields); |
| ::ArrowSchema arrow_schema; |
| arrow::Status arrow_status = arrow::ExportSchema(*schema, &arrow_schema); |
| if (!arrow_status.ok()) { |
| return paimon::Status::Invalid(arrow_status.message()); |
| } |
| PAIMON_RETURN_NOT_OK(catalog->CreateTable(paimon::Identifier(db_name, table_name), |
| &arrow_schema, |
| /*partition_keys=*/{}, |
| /*primary_keys=*/{}, options, |
| /*ignore_if_exists=*/false)); |
| ``` |
| |
| See [Data Types](https://alibaba.github.io/paimon-cpp/user_guide/data_types.html) for all supported |
| `arrow-to-paimon` data types mapping. |
| |
| ## Batch Write |
| |
| Paimon table write is Two-Phase Commit, you can write many times, but once committed, no more data can be written. |
| C++ Paimon uses Apache Arrow as [in-memory format], check out [document](https://alibaba.github.io/paimon-cpp/user_guide/arrow.html) |
| for more details. |
| |
| for example: |
| ```c++ |
| arrow::Result<std::shared_ptr<arrow::StructArray>> PrepareData(const arrow::FieldVector& fields) { |
| arrow::StringBuilder f0_builder; |
| arrow::Int32Builder f1_builder; |
| arrow::Int32Builder f2_builder; |
| arrow::DoubleBuilder f3_builder; |
| |
| std::vector<std::tuple<std::string, int, int, double>> data = { |
| {"Alice", 1, 0, 11.0}, {"Bob", 1, 1, 12.1}, {"Cathy", 1, 2, 13.2}}; |
| |
| for (const auto& row : data) { |
| ARROW_RETURN_NOT_OK(f0_builder.Append(std::get<0>(row))); |
| ARROW_RETURN_NOT_OK(f1_builder.Append(std::get<1>(row))); |
| ARROW_RETURN_NOT_OK(f2_builder.Append(std::get<2>(row))); |
| ARROW_RETURN_NOT_OK(f3_builder.Append(std::get<3>(row))); |
| } |
| |
| std::shared_ptr<arrow::Array> f0_array, f1_array, f2_array, f3_array; |
| ARROW_RETURN_NOT_OK(f0_builder.Finish(&f0_array)); |
| ARROW_RETURN_NOT_OK(f1_builder.Finish(&f1_array)); |
| ARROW_RETURN_NOT_OK(f2_builder.Finish(&f2_array)); |
| ARROW_RETURN_NOT_OK(f3_builder.Finish(&f3_array)); |
| |
| std::vector<std::shared_ptr<arrow::Array>> children = {f0_array, f1_array, f2_array, f3_array}; |
| auto struct_type = arrow::struct_(fields); |
| return std::make_shared<arrow::StructArray>(struct_type, f0_array->length(), children); |
| } |
| ``` |
| |
| ```c++ |
| std::string table_path = root_path + "/" + db_name + ".db/" + table_name; |
| std::string commit_user = "some_commit_user"; |
| // write |
| paimon::WriteContextBuilder context_builder(table_path, commit_user); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::WriteContext> write_context, |
| context_builder.SetOptions(options).Finish()); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::FileStoreWrite> writer, |
| paimon::FileStoreWrite::Create(std::move(write_context))); |
| // prepare data |
| auto struct_array = PrepareData(fields); |
| if (!struct_array.ok()) { |
| return paimon::Status::Invalid(struct_array.status().ToString()); |
| } |
| ::ArrowArray arrow_array; |
| arrow_status = arrow::ExportArray(*struct_array.ValueUnsafe(), &arrow_array); |
| if (!arrow_status.ok()) { |
| return paimon::Status::Invalid(arrow_status.message()); |
| } |
| paimon::RecordBatchBuilder batch_builder(&arrow_array); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::RecordBatch> record_batch, |
| batch_builder.Finish()); |
| PAIMON_RETURN_NOT_OK(writer->Write(std::move(record_batch))); |
| PAIMON_ASSIGN_OR_RAISE(std::vector<std::shared_ptr<paimon::CommitMessage>> commit_message, |
| writer->PrepareCommit()); |
| |
| // commit |
| paimon::CommitContextBuilder commit_context_builder(table_path, commit_user); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::CommitContext> commit_context, |
| commit_context_builder.SetOptions(options).Finish()); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::FileStoreCommit> committer, |
| paimon::FileStoreCommit::Create(std::move(commit_context))); |
| PAIMON_RETURN_NOT_OK(committer->Commit(commit_message)); |
| ``` |
| |
| ## Batch Read |
| |
| ### Predicate pushdown |
| |
| A `ReadContextBuilder` is used to pass context to reader, push down and filter is done by reader. |
| |
| ```c++ |
| ReadContextBuilder read_context_builder(table_path); |
| ``` |
| |
| You can use `PredicateBuilder` to build filters and pushdown them by `ReadContextBuilder`: |
| |
| ```c++ |
| # Example filter: 'f3' > 12.0 OR 'f1' == 1 |
| PAIMON_ASSIGN_OR_RAISE( |
| auto predicate, |
| PredicateBuilder::Or( |
| {PredicateBuilder::GreaterThan(/*field_index=*/3, /*field_name=*/"f3", |
| FieldType::DOUBLE, Literal(static_cast<double>(12.0))), |
| PredicateBuilder::Equal(/*field_index=*/1, /*field_name=*/"f1", FieldType::INT, |
| Literal(1))})); |
| ReadContextBuilder read_context_builder(table_path); |
| read_context_builder.SetPredicate(predicate).EnablePredicateFilter(true); |
| ``` |
| |
| You can also pushdown projection by `ReadContextBuilder`: |
| |
| ```c++ |
| # select f3 and f2 columns |
| read_context_builder.SetReadSchema({"f3", "f1", "f2"}); |
| ``` |
| |
| ### Generate Splits |
| |
| Then you can step into Scan Plan stage to get `splits`: |
| |
| ```c++ |
| // scan |
| paimon::ScanContextBuilder scan_context_builder(table_path); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::ScanContext> scan_context, |
| scan_context_builder.SetOptions(options).Finish()); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::TableScan> scanner, |
| paimon::TableScan::Create(std::move(scan_context))); |
| PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<paimon::Plan> plan, scanner->CreatePlan()); |
| auto splits = plan->Splits(); |
| ``` |
| |
| Finally, you can read data from the `splits` to arrow format. |
| |
| ### Read Apache Arrow |
| |
| This requires `C++ Arrow` to be installed. |
| |
| ```c++ |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::ReadContext> read_context, |
| read_context_builder.SetOptions(options).Finish()); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::TableRead> table_read, |
| paimon::TableRead::Create(std::move(read_context))); |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::BatchReader> batch_reader, |
| table_read->CreateReader(splits)); |
| arrow::ArrayVector result_array_vector; |
| while (true) { |
| PAIMON_ASSIGN_OR_RAISE(paimon::BatchReader::ReadBatch batch, batch_reader->NextBatch()); |
| if (paimon::BatchReader::IsEofBatch(batch)) { |
| break; |
| } |
| auto& [c_array, c_schema] = batch; |
| auto arrow_result = arrow::ImportArray(c_array.get(), c_schema.get()); |
| if (!arrow_result.ok()) { |
| return paimon::Status::Invalid(arrow_result.status().ToString()); |
| } |
| auto result_array = arrow_result.ValueUnsafe(); |
| result_array_vector.push_back(result_array); |
| } |
| auto chunk_result = arrow::ChunkedArray::Make(result_array_vector); |
| if (!chunk_result.ok()) { |
| return paimon::Status::Invalid(chunk_result.status().ToString()); |
| } |
| ``` |
| |
| ## Documentation |
| |
| For more information, See [C++ Paimon Documentation](https://alibaba.github.io/paimon-cpp/index.html). |