| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| # Paimon C++ Code Style Guide |
| |
| This document defines the coding conventions for the paimon-cpp project. All pull requests are expected to follow these rules. Automated tooling (clang-format, clang-tidy, cpplint, pre-commit) enforces many of them. |
| |
| --- |
| |
| ## Language Standard |
| |
| - **C++17** is the target standard. |
| - Do **not** use C++20 or later features. |
| - Only the **x86_64** architecture is currently supported. |
| |
| --- |
| |
| ## Formatting |
| |
| Formatting is based on **Google C++ Style** with the following overrides (defined in `.clang-format`): |
| |
| | Setting | Value | |
| |---------|-------| |
| | `ColumnLimit` | 100 | |
| | `IndentWidth` | 4 | |
| | `AccessModifierOffset` | -3 | |
| | `AllowShortFunctionsOnASingleLine` | Empty | |
| |
| --- |
| |
| ## Naming Conventions |
| |
| | Category | Style | Example | |
| |----------|-------|---------| |
| | Class / Struct | PascalCase | `TableScan`, `FileUtils` | |
| | Method | PascalCase | `CreatePlan()`, `ReadFile()` | |
| | Member variable | snake_case with trailing `_` | `schema_id_`, `commit_user_` | |
| | Local variable / parameter | snake_case | `table_path`, `json_str` | |
| | Constant (new code) | `k` prefix + PascalCase | `kFieldVersion`, `kMaxRetryCount` | |
| | Constant (existing code) | Match surrounding style | `FIELD_VERSION` if that is the local convention | |
| | Namespace | lowercase | `paimon`, `paimon::test` | |
| | File name | snake_case | `table_scan.cpp`, `file_utils.h` | |
| | Test file | `*_test.cpp` next to source | `table_scan_test.cpp` | |
| |
| --- |
| |
| ## File Layout |
| |
| ### License Header |
| |
| Every file **must** start with the Apache 2.0 license header. |
| |
| ```cpp |
| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, |
| * software distributed under the License is distributed on an |
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| * KIND, either express or implied. See the License for the |
| * specific language governing permissions and limitations |
| * under the License. |
| */ |
| ``` |
| |
| ### Header Guard |
| |
| Use `#pragma once`. Do **not** use `#ifndef` / `#define` guards. |
| |
| ### Include Order |
| |
| Group includes in the following order, separated by blank lines: |
| |
| 1. **Corresponding header** — e.g. `snapshot.cpp` includes `paimon/core/snapshot.h` first. |
| 2. **C standard headers** — `<cstdint>`, `<cstring>`, etc. |
| 3. **C++ standard headers** — `<string>`, `<vector>`, `<memory>`, etc. |
| 4. **Third-party headers** — `fmt/format.h`, `rapidjson/document.h`, etc. |
| 5. **Project headers** — `paimon/status.h`, `paimon/result.h`, etc. |
| |
| ### Namespace |
| |
| All code lives inside `namespace paimon { }`. Close with a comment: |
| |
| ```cpp |
| namespace paimon { |
| // ... |
| } // namespace paimon |
| ``` |
| |
| Test code uses `namespace paimon::test`. |
| |
| --- |
| |
| ## Error Handling |
| |
| ### `Status` / `Result<T>` — No Exceptions |
| |
| All fallible functions return `Status` (no value) or `Result<T>` (with value). **Exceptions are not used** for error propagation in production code. |
| |
| ### Error Propagation Macros |
| |
| | Macro | Purpose | |
| |-------|---------| |
| | `PAIMON_RETURN_NOT_OK(status)` | Propagate a `Status` error | |
| | `PAIMON_ASSIGN_OR_RAISE(lhs, rexpr)` | Extract value from `Result<T>`, return on error | |
| | `PAIMON_RETURN_NOT_OK_FROM_ARROW(arrow_status)` | Bridge Apache Arrow `Status` | |
| | `PAIMON_ASSIGN_OR_RAISE_FROM_ARROW(lhs, rexpr)` | Bridge Apache Arrow `Result` | |
| |
| ### `PAIMON_ASSIGN_OR_RAISE` and `PAIMON_ASSIGN_OR_RAISE_FROM_ARROW` — Always Use Explicit Types |
| |
| ```cpp |
| // ✅ Good — explicit type |
| PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<TableScan> scan, |
| TableScan::Create(std::move(ctx))); |
| |
| // ❌ Bad — auto |
| PAIMON_ASSIGN_OR_RAISE(auto scan, TableScan::Create(std::move(ctx))); |
| ``` |
| |
| If you need a `shared_ptr` from a function returning `unique_ptr`, write the target type directly — the macro already moves: |
| |
| ```cpp |
| PAIMON_ASSIGN_OR_RAISE(std::shared_ptr<T> ret, FuncReturningUniquePtr()); |
| ``` |
| |
| --- |
| |
| ## Class Design |
| |
| ### Factory Pattern — `static Create()` + Private Constructor |
| |
| When construction can fail, use a factory method: |
| |
| ```cpp |
| class TableScan { |
| public: |
| static Result<std::unique_ptr<TableScan>> Create(std::unique_ptr<ScanContext> context); |
| ~TableScan() = default; |
| |
| private: |
| explicit TableScan(std::unique_ptr<ScanContext> context); |
| }; |
| ``` |
| |
| - `Create()` returns `Result<std::unique_ptr<T>>`. |
| - The private constructor does only trivial member assignment. |
| - All fallible initialization goes in `Create()`, **after** construction. |
| - If construction cannot fail (POD, data-only classes), a public constructor is fine. |
| |
| ### Utility Classes |
| |
| Static-only utility classes delete their constructors and deconstructors: |
| |
| ```cpp |
| class StringUtils { |
| public: |
| StringUtils() = delete; |
| ~StringUtils() = delete; |
| |
| static std::vector<std::string> Split(const std::string& s, char delim); |
| // ... |
| }; |
| ``` |
| |
| ### Interface Classes |
| |
| Pure-virtual base classes use a virtual default destructor: |
| |
| ```cpp |
| virtual ~ClassName() = default; |
| ``` |
| |
| --- |
| |
| ## Smart Pointers & Memory |
| |
| - Prefer `std::unique_ptr` for sole ownership. |
| - Use `std::shared_ptr` only when shared ownership is genuinely needed. |
| - **No raw `new` / `delete`** except inside factory `Create()` methods that call a private constructor. |
| - Use `std::optional<T>` for optional values. |
| - Use `ScopeGuard` (in `src/paimon/common/utils/`) for RAII cleanup. |
| |
| --- |
| |
| ## Comments & Documentation |
| |
| - Use `///` (Doxygen) for classes and public methods. |
| - Use `@param` and `@return` for parameters and return values. |
| - Do **not** comment obvious code. |
| - Inline comments go **above** the line, not at the end. |
| |
| ```cpp |
| /// Create a scan plan from the current snapshot. |
| /// |
| /// @param snapshot_id The snapshot to scan. |
| /// @return A Plan on success, or an error Status. |
| Result<std::shared_ptr<Plan>> CreatePlan(int64_t snapshot_id); |
| ``` |
| |
| ### TODO Comments |
| |
| TODOs are allowed but **must** include an author tag: |
| |
| ```cpp |
| // TODO(alice): Support changelog manifest list size validation |
| ``` |
| |
| --- |
| |
| ## String Formatting |
| |
| Use `fmt::format()` for string formatting. Do **not** concatenate with `std::to_string()` + `+`. |
| |
| ```cpp |
| // ✅ Good |
| auto msg = fmt::format("Snapshot {} not found in {}", snapshot_id, path); |
| |
| // ❌ Bad |
| auto msg = "Snapshot " + std::to_string(snapshot_id) + " not found in " + path; |
| ``` |
| |
| --- |
| |
| ## Public API Visibility |
| |
| Mark all public API symbols with `PAIMON_EXPORT`: |
| |
| ```cpp |
| class PAIMON_EXPORT Timestamp { ... }; |
| ``` |
| |
| --- |
| |
| ## Reuse Existing Utilities |
| |
| Before writing new helper code, check whether the project already provides what you need. The tables below list the most commonly used utilities. |
| |
| ### General Utilities (`src/paimon/common/utils/`) |
| |
| | Class | Purpose | |
| |-------|---------| |
| | `StringUtils` | Split, Replace, StartsWith, EndsWith, Trim, etc. | |
| | `ObjectUtils` | Collection helpers (ContainsAll, etc.) | |
| | `DateTimeUtils` | Date / time conversions | |
| | `RapidJsonUtil` | JSON serialization / deserialization | |
| | `StreamUtils` | Stream read / write helpers | |
| | `OptionsUtils` | Configuration parsing | |
| | `PathUtil` | Path joining and manipulation | |
| | `Preconditions` | Precondition checks | |
| | `ScopeGuard` | RAII cleanup guard | |
| | `BloomFilter` / `BloomFilter64` | Bloom filters | |
| | `MurmurHashUtils` | Hashing | |
| | `CRC32C` | Checksums | |
| | `BitSet` | Bit set | |
| | `RoaringBitmap32` / `RoaringBitmap64` | Roaring bitmaps | |
| | `SerializationUtils` | Serialization helpers | |
| | `DataConverterUtils` | Data conversion | |
| | `DecimalUtils` | Decimal arithmetic | |
| | `FieldTypeUtils` | Field type helpers | |
| | `BinPacking` | Bin-packing algorithm | |
| | `ConcurrentHashMap` | Thread-safe hash map | |
| | `ThreadsafeQueue` | Thread-safe queue | |
| | `LinkedHashMap` | Insertion-ordered hash map | |
| | `LongCounter` | Atomic counter | |
| | `UUID` | UUID generation | |
| |
| ### Arrow Utilities (`src/paimon/common/utils/arrow/`) |
| |
| | File | Purpose | |
| |------|---------| |
| | `ArrowUtils` | Arrow format helpers | |
| | `MemUtils` | Arrow memory helpers | |
| | `status_utils.h` | Arrow ↔ Paimon status bridge macros | |
| |
| ### Core Utilities (`src/paimon/core/utils/`) |
| |
| | Class | Purpose | |
| |-------|---------| |
| | `FileUtils` | File listing, version file operations | |
| | `FileStorePathFactory` | Path factory for data / manifest files | |
| | `SnapshotManager` | Snapshot management | |
| | `TagManager` | Tag management | |
| | `BranchManager` | Branch management | |
| | `PartitionPathUtils` | Partition path helpers | |
| | `FieldMapping` | Field mapping | |
| | `FieldsComparator` | Field comparison | |
| | `PrimaryKeyTableUtils` | Primary-key table helpers | |
| |
| ### Concurrency (`src/paimon/common/executor/`) |
| |
| | Function | Purpose | |
| |----------|---------| |
| | `CollectAll()` | Collect results from multiple futures | |
| | `Wait()` | Wait for multiple void futures | |
| |
| ### Testing Utilities (`src/paimon/testing/utils/`) |
| |
| | Class | Purpose | |
| |-------|---------| |
| | `TestHelper` | End-to-end helper: create tables, write & commit data, scan & read results | |
| | `DataGenerator` | Generate `RecordBatch` data, split by partition/bucket, extract partial rows | |
| | `BinaryRowGenerator` | Generate `BinaryRow`, `InternalRow`, `SimpleStats`, and `BinaryArray` for tests | |
| | `KeyValueChecker` | Assert expected vs. actual `KeyValue` results in primary-key table tests | |
| | `ReadResultCollector` | Collect `KeyValue` or Arrow `ChunkedArray` results from a `BatchReader` | |
| | `DictArrayConverter` | Convert Arrow dictionary-encoded arrays to plain arrays | |
| | `UniqueTestDirectory` | RAII helper that creates a unique temp directory and cleans it up on destruction | |
| | `TimezoneGuard` | RAII guard that sets `TZ` environment variable and restores it on destruction | |
| | `IOExceptionHelper` | Inject I/O errors via `IOHook` for fault-injection testing | |
| | `testharness.h` | Test macros: `ASSERT_OK`, `ASSERT_NOK`, `ASSERT_NOK_WITH_MSG`, `ASSERT_RAISES` | |
| |
| ### Mock Objects (`src/paimon/testing/mock/`) |
| |
| | Class | Purpose | |
| |-------|---------| |
| | `MockFileSystem` / `MockFileSystemFactory` | Mock file system and its factory for unit tests | |
| | `MockFileFormat` / `MockFileFormatFactory` | Mock file format and its factory | |
| | `MockFormatReaderBuilder` / `MockFormatWriterBuilder` | Mock reader/writer builders | |
| | `MockFormatWriter` / `MockFileBatchReader` | Mock format writer and batch reader | |
| | `MockIndexPathFactory` | Mock index path factory | |
| | `MockKeyValueDataFileRecordReader` | Mock key-value data file reader holding in-memory data | |
| | `MockStatsExtractor` | Mock statistics extractor | |
| |
| --- |
| |
| ## Testing |
| |
| - Test files live **next to** the source file they test, named `*_test.cpp`. |
| - Use **Google Test** (`gtest`). |
| - Test classes go in `namespace paimon::test`. |
| - Use project test macros: `ASSERT_OK`, `ASSERT_NOK`, `ASSERT_NOK_WITH_MSG`. |
| - **Prefer `ASSERT_*` over `EXPECT_*`** — `ASSERT_*` stops the test immediately on failure, preventing cascading errors. |
| - **Exception**: In non-void helper functions, use `EXPECT_*` because `ASSERT_*` expands to `return;` which is incompatible with non-void return types. |
| |
| --- |
| |
| ## Tooling |
| |
| ### Pre-commit Hooks (Recommended) |
| |
| The easiest way to stay compliant is to install [pre-commit](https://pre-commit.com/): |
| |
| ```bash |
| pip install pre-commit |
| pre-commit install |
| ``` |
| |
| This installs Git hooks that automatically run on every commit: |
| |
| | Hook | What it checks | |
| |------|----------------| |
| | `clang-format` | C++ formatting (`.clang-format`) | |
| | `cpplint` | Google C++ lint rules | |
| | `cmake-format` | CMake file formatting (`.cmake-format.py`) | |
| | `codespell` | Spelling errors | |
| | `trailing-whitespace` | Trailing whitespace | |
| | `end-of-file-fixer` | Missing newline at end of file | |
| | `check-added-large-files` | Files > 5 MB | |
| |
| To run all hooks on all files manually: |
| |
| ```bash |
| pre-commit run --all-files |
| ``` |
| |
| ### clang-tidy |
| |
| Static analysis is configured in `.clang-tidy`. Key enabled check groups: |
| |
| - `bugprone-*` — Common bug patterns |
| - `google-*` — Google style checks |
| - `modernize-*` — C++ modernization suggestions |
| - `clang-analyzer-*` — Clang static analyzer |
| |
| ### Building & Testing |
| |
| ```bash |
| # Configure (from project root) |
| mkdir -p build && cd build |
| cmake ../ \ |
| -DCMAKE_BUILD_TYPE=debug \ |
| -DPAIMON_BUILD_TESTS=ON |
| |
| # Build |
| make -j$(nproc) |
| |
| # Run a specific test suite |
| ./debug/paimon-core-test --gtest_filter="CoreOptionsTest*" |
| ``` |
| |
| --- |
| |
| ## Pull Request Checklist |
| |
| Before submitting a PR, please verify: |
| |
| - [ ] Code compiles without warnings under `-Wall` (enabled by default in the build system). |
| - [ ] `pre-commit run --all-files` passes (or individual tools: clang-format, cpplint, codespell). |
| - [ ] New / modified public APIs are marked with `PAIMON_EXPORT`. |
| - [ ] Every new file has the Apache 2.0 license header. |
| - [ ] Error handling uses `Status` / `Result<T>` — no exceptions. |
| - [ ] New utility code checks for existing helpers before reinventing. |
| - [ ] Tests are added or updated for the changed functionality. |
| - [ ] `ASSERT_*` is preferred over `EXPECT_*` in tests. |
| - [ ] No raw `new` / `delete` outside factory methods. |
| - [ ] PR description follows the [template](.github/PULL_REQUEST_TEMPLATE.md). |