blob: e812bf164468ca7f28545816f86045a641cfb03f [file] [log] [blame] [view]
---
title: "Cpp API"
weight: 6
type: docs
aliases:
- /api/cpp-api.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Cpp API
Paimon C++ is a high-performance C++ implementation of Apache Paimon. Paimon C++ aims to provide a native,
high-performance and extensible implementation that allows native engines to access the Paimon datalake
format with maximum efficiency.
## Environment Settings
[Paimon C++](https://github.com/alibaba/paimon-cpp.git) is currently governed under Alibaba open source
community. You can checkout the [document](https://alibaba.github.io/paimon-cpp/getting_started.html)
for more details about envinroment settings.
```sh
git clone https://github.com/alibaba/paimon-cpp.git
cd paimon-cpp
mkdir build-release
cd build-release
cmake ..
make -j8 # if you have 8 CPU cores, otherwise adjust
make install
```
## Create Catalog
Before coming into contact with the Table, you need to create a Catalog.
```c++
#include "paimon/catalog/catalog.h"
// Note that keys and values are all string
std::map<std::string, std::string> options;
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::Catalog> catalog,
paimon::Catalog::Create(root_path, options));
```
Current C++ Paimon only supports filesystem catalog. In the future, we will support REST catalog.
See [Catalog]({{< ref "concepts/catalog" >}}).
You can use the catalog to create table for writing data.
## Create Database
Table is located in a database. If you want to create table in a new database, you should create it.
```c++
PAIMON_RETURN_NOT_OK(catalog->CreateDatabase('database_name', options, /*ignore_if_exists=*/false));
```
## Create Table
Table schema contains fields definition, partition keys, primary keys, table options.
The field definition is described by `Arrow::Schema`. All arguments except fields definition are optional.
for example:
```c++
arrow::FieldVector fields = {
arrow::field("f0", arrow::utf8()),
arrow::field("f1", arrow::int32()),
arrow::field("f2", arrow::int32()),
arrow::field("f3", arrow::float64()),
};
std::shared_ptr<arrow::Schema> schema = arrow::schema(fields);
::ArrowSchema arrow_schema;
arrow::Status arrow_status = arrow::ExportSchema(*schema, &arrow_schema);
if (!arrow_status.ok()) {
return paimon::Status::Invalid(arrow_status.message());
}
PAIMON_RETURN_NOT_OK(catalog->CreateTable(paimon::Identifier(db_name, table_name),
&arrow_schema,
/*partition_keys=*/{},
/*primary_keys=*/{}, options,
/*ignore_if_exists=*/false));
```
See [Data Types](https://alibaba.github.io/paimon-cpp/user_guide/data_types.html) for all supported
`arrow-to-paimon` data types mapping.
## Batch Write
Paimon table write is Two-Phase Commit, you can write many times, but once committed, no more data can be written.
C++ Paimon uses Apache Arrow as [in-memory format], check out [document](https://alibaba.github.io/paimon-cpp/user_guide/arrow.html)
for more details.
for example:
```c++
arrow::Result<std::shared_ptr<arrow::StructArray>> PrepareData(const arrow::FieldVector& fields) {
arrow::StringBuilder f0_builder;
arrow::Int32Builder f1_builder;
arrow::Int32Builder f2_builder;
arrow::DoubleBuilder f3_builder;
std::vector<std::tuple<std::string, int, int, double>> data = {
{"Alice", 1, 0, 11.0}, {"Bob", 1, 1, 12.1}, {"Cathy", 1, 2, 13.2}};
for (const auto& row : data) {
ARROW_RETURN_NOT_OK(f0_builder.Append(std::get<0>(row)));
ARROW_RETURN_NOT_OK(f1_builder.Append(std::get<1>(row)));
ARROW_RETURN_NOT_OK(f2_builder.Append(std::get<2>(row)));
ARROW_RETURN_NOT_OK(f3_builder.Append(std::get<3>(row)));
}
std::shared_ptr<arrow::Array> f0_array, f1_array, f2_array, f3_array;
ARROW_RETURN_NOT_OK(f0_builder.Finish(&f0_array));
ARROW_RETURN_NOT_OK(f1_builder.Finish(&f1_array));
ARROW_RETURN_NOT_OK(f2_builder.Finish(&f2_array));
ARROW_RETURN_NOT_OK(f3_builder.Finish(&f3_array));
std::vector<std::shared_ptr<arrow::Array>> children = {f0_array, f1_array, f2_array, f3_array};
auto struct_type = arrow::struct_(fields);
return std::make_shared<arrow::StructArray>(struct_type, f0_array->length(), children);
}
```
```c++
std::string table_path = root_path + "/" + db_name + ".db/" + table_name;
std::string commit_user = "some_commit_user";
// write
paimon::WriteContextBuilder context_builder(table_path, commit_user);
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::WriteContext> write_context,
context_builder.SetOptions(options).Finish());
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::FileStoreWrite> writer,
paimon::FileStoreWrite::Create(std::move(write_context)));
// prepare data
auto struct_array = PrepareData(fields);
if (!struct_array.ok()) {
return paimon::Status::Invalid(struct_array.status().ToString());
}
::ArrowArray arrow_array;
arrow_status = arrow::ExportArray(*struct_array.ValueUnsafe(), &arrow_array);
if (!arrow_status.ok()) {
return paimon::Status::Invalid(arrow_status.message());
}
paimon::RecordBatchBuilder batch_builder(&arrow_array);
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::RecordBatch> record_batch,
batch_builder.Finish());
PAIMON_RETURN_NOT_OK(writer->Write(std::move(record_batch)));
PAIMON_ASSIGN_OR_RAISE(std::vector<std::shared_ptr<paimon::CommitMessage>> commit_message,
writer->PrepareCommit());
// commit
paimon::CommitContextBuilder commit_context_builder(table_path, commit_user);
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::CommitContext> commit_context,
commit_context_builder.SetOptions(options).Finish());
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::FileStoreCommit> committer,
paimon::FileStoreCommit::Create(std::move(commit_context)));
PAIMON_RETURN_NOT_OK(committer->Commit(commit_message));
```
## Batch Read
### Predicate pushdown
A `ReadContextBuilder` is used to pass context to reader, push down and filter is done by reader.
```c++
ReadContextBuilder read_context_builder(table_path);
```
You can use `PredicateBuilder` to build filters and pushdown them by `ReadContextBuilder`:
```c++
# Example filter: 'f3' > 12.0 OR 'f1' == 1
PAIMON_ASSIGN_OR_RAISE(
auto predicate,
PredicateBuilder::Or(
{PredicateBuilder::GreaterThan(/*field_index=*/3, /*field_name=*/"f3",
FieldType::DOUBLE, Literal(static_cast<double>(12.0))),
PredicateBuilder::Equal(/*field_index=*/1, /*field_name=*/"f1", FieldType::INT,
Literal(1))}));
ReadContextBuilder read_context_builder(table_path);
read_context_builder.SetPredicate(predicate).EnablePredicateFilter(true);
```
You can also pushdown projection by `ReadContextBuilder`:
```c++
# select f3 and f2 columns
read_context_builder.SetReadSchema({"f3", "f1", "f2"});
```
### Generate Splits
Then you can step into Scan Plan stage to get `splits`:
```c++
// scan
paimon::ScanContextBuilder scan_context_builder(table_path);
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::ScanContext> scan_context,
scan_context_builder.SetOptions(options).Finish());
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::TableScan> scanner,
paimon::TableScan::Create(std::move(scan_context)));
PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<paimon::Plan> plan, scanner->CreatePlan());
auto splits = plan->Splits();
```
Finally, you can read data from the `splits` to arrow format.
### Read Apache Arrow
This requires `C++ Arrow` to be installed.
```c++
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::ReadContext> read_context,
read_context_builder.SetOptions(options).Finish());
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::TableRead> table_read,
paimon::TableRead::Create(std::move(read_context)));
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<paimon::BatchReader> batch_reader,
table_read->CreateReader(splits));
arrow::ArrayVector result_array_vector;
while (true) {
PAIMON_ASSIGN_OR_RAISE(paimon::BatchReader::ReadBatch batch, batch_reader->NextBatch());
if (paimon::BatchReader::IsEofBatch(batch)) {
break;
}
auto& [c_array, c_schema] = batch;
auto arrow_result = arrow::ImportArray(c_array.get(), c_schema.get());
if (!arrow_result.ok()) {
return paimon::Status::Invalid(arrow_result.status().ToString());
}
auto result_array = arrow_result.ValueUnsafe();
result_array_vector.push_back(result_array);
}
auto chunk_result = arrow::ChunkedArray::Make(result_array_vector);
if (!chunk_result.ok()) {
return paimon::Status::Invalid(chunk_result.status().ToString());
}
```
## Documentation
For more information, See [C++ Paimon Documentation](https://alibaba.github.io/paimon-cpp/index.html).