AGENTS.md

This file provides comprehensive guidance to AI coding agents when working with the Apache Fory codebase.

Core Principles

While working on Fory, please remember:

  • Performance First: Performance is the top priority. Never introduce code that reduces performance without explicit justification.
  • English Only: Always use English in code, comments, and documentation.
  • Meaningful Comments: Only add comments when the code's behavior is difficult to understand or when documenting complex algorithms.
  • Focused Testing: Only add tests that verify internal behaviors or fix specific bugs; don't create unnecessary tests unless requested.
  • Git-Tracked Files: When reading code, skip all files not tracked by git by default unless generated by yourself.
  • Cross-Language Consistency: Maintain consistency across language implementations while respecting language-specific idioms.
  • Graalvm Support using fory codegen: For graalvm, please use fory codegen to generate the serializer when building graalvm native image, do not use graallvm reflect-related configuration unless for JDK proxy.

Build and Development Commands

Java Development

  • All maven commands must be executed within the java directory.
  • All changes to java must pass the code style check and tests.
  • Fory java needs JDK 17+ installed.
  • Use ‘.*’ form of import is not allowed.
# Clean the build
mvn -T16 clean

# Build
mvn -T16 package

# Install
mvn -T16 install -DskipTests

# Code format check
mvn -T16 spotless:check

# Code format
mvn -T16 spotless:apply

# Code style check
mvn -T16 checkstyle:check

# Run tests
mvn -T16 test

# Run specific tests
mvn -T16 test -Dtest=org.apache.fory.TestClass#testMethod

C++ Development

  • All commands must be executed within the cpp directory.
  • Fory c++ use c++ 17, you must not use features from higher version of C++.
  • Whnen you updated the code, use clang-format to update the code
  • When invoking a method that returns Result, always use FORY_TRY unless in a control flow context.
  • Wrap error checks with FORY_PREDICT_FALSE for branch prediction optimization.
  • Continue on error for trivial errors; only return early for critical errors like buffer overflow.
  • private methods should be put last in class def, before private fields.
# Build C++ library
bazel build //cpp/...

# Build Cython extensions (replace X.Y with your Python version, e.g., 3.10)
bazel build //:cp_fory_so --@rules_python//python/config_settings:python_version=X.Y

# Run tests
bazel test $(bazel query //cpp/...)

# Run serialization tests
bazel test $(bazel query //cpp/fory/serialization/...)

# Run specific test
bazel test //cpp/fory/util:buffer_test

# format c++ code
clang-format -i $file

Run C++ xlang tests:

cd java
mvn -T16 install -DskipTests
cd fory-core
FORY_CPP_JAVA_CI=1 ENABLE_FORY_DEBUG_OUTPUT=1 mvn -T16 test -Dtest=org.apache.fory.xlang.CPPXlangTest

Python Development

  • All commands must be executed within the python directory.
  • All changes to python must pass the code style check and tests.
  • When running tests, you can use the ENABLE_FORY_CYTHON_SERIALIZATION environment variable to enable or disable cython serialization.
  • When debugging protocol related issues, you should use ENABLE_FORY_CYTHON_SERIALIZATION=0 first to verify the behavior.
  • Fory python needs cpython 3.8+ installed although some modules such as fory-core use java8.
# clean build
rm -rf build dist .pytest_cache
bazel clean --expunge

# Code format
ruff format .
ruff check --fix .

# Install
pip install -v -e .

# Build native extension when cython code changed (replace X.Y with your Python version)
bazel build //:cp_fory_so --@rules_python//python/config_settings:python_version=X.Y --config=x86_64 # For x86_64
bazel build //:cp_fory_so --@rules_python//python/config_settings:python_version=X.Y --copt=-fsigned-char # For arm64 and aarch64

# Run tests without cython
ENABLE_FORY_CYTHON_SERIALIZATION=0 pytest -v -s .
# Run tests with cython
ENABLE_FORY_CYTHON_SERIALIZATION=1 pytest -v -s .

Run Python xlang tests:

cd java
mvn -T16 install -DskipTests
cd fory-core
# disable fory cython for faster debugging
FORY_PYTHON_JAVA_CI=1 ENABLE_FORY_CYTHON_SERIALIZATION=0 mvn -T16 test -Dtest=org.apache.fory.xlang.PythonXlangTest
# enable fory cython
FORY_PYTHON_JAVA_CI=1 ENABLE_FORY_CYTHON_SERIALIZATION=1 ENABLE_FORY_DEBUG_OUTPUT=1 mvn -T16 test -Dtest=org.apache.fory.xlang.PythonXlangTest

Golang Development

  • All commands must be executed within the go/fory directory.
  • All changes to go must pass the format check and tests.
  • Go implementation focuses on reflection-based and codegen-based serialization.
# Format code
go fmt ./...

# Run tests
go test -v ./...

# Run tests with race detection
go test -race -v ./...

# Build
go build

# Generate code (if using go:generate)
go generate ./...

Run Go xlang tests:

cd java
mvn -T16 install -DskipTests
cd fory-core
FORY_GO_JAVA_CI=1 ENABLE_FORY_DEBUG_OUTPUT=1 mvn test -Dtest=org.apache.fory.xlang.GoXlangTest

Rust Development

  • All cargo commands must be executed within the rust directory.
  • All changes to rust must pass the clippy check and tests.
  • You must set RUST_BACKTRACE=1 FORY_PANIC_ON_ERROR=1 when debuging rust tests to get backtrace.
  • You must add -- --nocapture to cargo test command when debuging tests.
  • You must not set FORY_PANIC_ON_ERROR=1 when runing all rust tests to check whether all tests pass, some tests will check Error content, which will fail if error just panic.
# Check code
cargo check

# Build
cargo build

# Run linter for all services.
cargo clippy --all-targets --all-features -- -D warnings

# Run tests (requires test features)
cargo test --features tests

# run specific test
cargo test -p tests  --test $test_file $test_method

# run specific test under subdirectory
cargo test --test mod $dir$::$test_file::$test_method

# debug specific test under subdirectory and get backtrace
RUST_BACKTRACE=1 FORY_PANIC_ON_ERROR=1 ENABLE_FORY_DEBUG_OUTPUT=1 cargo test --test mod $dir$::$test_file::$test_method -- --nocapture

# inspect generated code by fory derive macro
cargo expand --test mod $mod$::$file$ > expanded.rs

# Format code
cargo fmt

# Check formatting
cargo fmt --check

# Build documentation
cargo doc --lib --no-deps --all-features

# Run benchmarks
cd $project_dir/benchmarks/rust_benchmark
cargo bench

Run Rust xlang tests:

cd java
mvn -T16 install -DskipTests
cd fory-core
FORY_RUST_JAVA_CI=1 ENABLE_FORY_DEBUG_OUTPUT=1 mvn test -Dtest=org.apache.fory.xlang.RustXlangTest

JavaScript/TypeScript Development

  • All commands must be executed within the javascript directory.
  • Uses npm/yarn for package management.
# Install dependencies
npm install

# Run tests
node ./node_modules/.bin/jest --ci --reporters=default --reporters=jest-junit

# Format code
git ls-files -- '*.ts' | xargs -P 5 node ./node_modules/.bin/eslint

Dart Development

  • All commands must be executed within the dart directory.
  • Uses pub for package management.
# First, generate necessary code
dart run build_runner build

# Run all tests
dart test

# Format code
dart analyze
dart fix --dry-run
dart fix --apply

Kotlin Development

  • All maven commands must be executed within the kotlin directory.
  • Kotlin implementation provides extra serializers for kotlin types.
  • Kotlin implementation is built on fory java, please install the java libraries first by cd ../java && mvn -T16 install -DskipTests. If no code changes after installed fory java, you can skip the installation step.
# Build
mvn clean package

# Run tests
mvn test

Scala Development

  • All commands must be executed within the scala directory.
  • Scala implementation provides extra serializers for Scala types.
  • Scala implementation is built on fory java, please install the java libraries first by cd ../java && mvn -T16 install -DskipTests. If no code changes after installed fory java, you can skip the installation step.
# Build with sbt
sbt compile

# Run tests
sbt test

# Format code
sbt scalafmt

Integration Tests

  • All commands must be executed within the integration_tests directory.
  • For java related integration tests, please install the java libraries first by cd ../java && mvn -T16 install -DskipTests. If no code changes after installed fory java, you can skip the installation step.
it_dir=$(pwd)
# Run graalvm tests
cd $it_dir/graalvm_tests && mvn -T16 -DskipTests=true -Pnative package && target/main

# Run latest_jdk_tests
cd $it_dir/latest_jdk_tests && mvn -T16 test

# Run JDK compatibility tests
cd $it_dir/jdk_compatibility_tests && mvn -T16 test

# Run JPMS tests
cd $it_dir/jpms_tests && mvn -T16 test

# Run Python benchmarks
cd $it_dir/cpython_benchmark && pip install -r requirements.txt && python benchmark.py

Documentation and Formatting

  • Markdown Formatting: When updating markdown documentation, use prettier --write $file to format.
  • API Documentation: When updating important public APIs, update documentation under docs/.
  • Protocol Specifications: docs/specification/** contains Fory protocol specifications. Read these documents carefully before making protocol changes.
  • User Guides: docs/guide/** contains user guides for different features and languages.

Repository Structure Understanding

Git Repository

Apache Fory is an open-source project hosted on GitHub. The git repository for Apache Fory is https://github.com/apache/fory . Contributors always fork the repository and create a pull request to propose changes. The origin points to forked repository instead of the official repository.

Key Directories

  • docs/: Documentation, specifications, and guides

    • docs/specification/: Protocol specifications (critical for understanding)
    • docs/guide/: User guides and development guides
    • docs/benchmarks/: Performance benchmarks documentation
  • Language Implementations:

    • java/: Java implementation (maven-based, multi-module)
    • python/: Python implementation (pip/setuptools + bazel)
    • cpp/: C++ implementation (bazel-based)
    • go/: Go implementation (go modules)
    • rust/: Rust implementation (cargo-based)
    • javascript/: JavaScript/TypeScript implementation (npm-based)
    • dart/: Dart implementation (pub-based)
    • kotlin/: Kotlin implementation (maven-based)
    • scala/: Scala implementation (sbt-based)
  • Testing and CI:

    • integration_tests/: Cross-language integration tests
    • .github/workflows/: GitHub Actions CI/CD workflows
    • ci/: CI scripts and configurations
  • Build Configuration:

    • BUILD, WORKSPACE: Bazel configuration
    • .bazelrc, .bazelversion: Bazel settings
    • Various pom.xml, package.json, Cargo.toml, etc.

Important Files

  • AGENTS.md: This file - AI coding guidance
  • CLAUDE.md: Claude Code specific instructions
  • CONTRIBUTING.md: Contribution guidelines
  • README.md: Project overview and quick start
  • .gitignore: Git ignore patterns (includes build dirs)
  • licenserc.toml: License header configuration

Architecture Overview

Apache Fory is a blazingly-fast multi-language serialization framework that revolutionizes data exchange between systems and languages. By leveraging JIT compilation, code generation and zero-copy techniques, Fory delivers up to 170x faster performance compared to other serialization frameworks while being extremely easy to use.

Binary Protocols

Fory uses binary protocols for efficient serialization and deserialization. Fory designed and implemented multiple binary protocols for different scenarios:

  • xlang serialization format:
    • Cross-language serialize any object automatically, no need for IDL definition, schema compilation and object to/from protocol conversion.
    • Support optional shared reference and circular reference, no duplicate data or recursion error.
    • Support object polymorphism.
  • Row format: A cache-friendly binary random access format, supports skipping serialization and partial serialization, and can convert to column-format automatically.
  • Java serialization format: Highly-optimized and drop-in replacement for Java serialization.
  • Python serialization format: Highly-optimized and drop-in replacement for Python pickle, which is an extension built upon xlang serialization format.

**docs/specification/** are the specification for the Fory protocol, please read those documents carefully and think hard and make sure you understand them before making changes to code and documentation.

Core Structure

Fory serialization for every language is implemented independently to minimize the object memory layout interoperability, object allocation, memory access cost, thus maximize the performance. There is no code reuse between languages except for fory python, which reused code from fory c++.

Java

  • fory-core: Java library implementing the core object graph serialization

    • java/fory-core/src/main/java/org/apache/fory/Fory.java: main serialization entry point
    • java/fory-core/src/main/java/org/apache/fory/resolver/TypeResolver.java: type resolution and serializer dispatch
    • java/fory-core/src/main/java/org/apache/fory/resolver/RefResolver.java: class for resolving shared/circular references when ref tracking is enabled
    • java/fory-core/src/main/java/org/apache/fory/serializer: serializers for each supported type
    • java/fory-core/src/main/java/org/apache/fory/codegen: code generators, provide expression abstraction and compile expression tree to java code and byte code
    • java/fory-core/src/main/java/org/apache/fory/builder: build expression tree for serialization to generate serialization code
    • java/fory-core/src/main/java/org/apache/fory/reflect: reflection utilities
    • java/fory-core/src/main/java/org/apache/fory/type: java generics and type inference utilities
    • java/fory-core/src/main/java/org/apache/fory/util: utility classes
  • fory-format: Java library implementing the core row format encoding and decoding

    • java/fory-format/src/main/java/org/apache/fory/format/row: row format data structures
    • java/fory-format/src/main/java/org/apache/fory/format/encoder: generate row format encoder and decoder to encode/decode objects to/from row format
    • java/fory-format/src/main/java/org/apache/fory/format/type: type inference for row format
    • java/fory-format/src/main/java/org/apache/fory/format/vectorized: interoperation with apache arrow columnar format
  • fory-extensions: extension libraries for java, including:

    • Protobuf serializers for fory java native object graph protocol.
    • Meta compression based on zstd
  • fory-simd: SIMD-accelerated serialization and deserialization based on java vector API

    • java/fory-simd/src/main/java/org/apache/fory/util: SIMD utilities
    • java/fory-simd/src/main/java/org/apache/fory/serializer: SIMD accelerated serializers
  • fory-test-core: Core test utilities and data generators

  • testsuite: Complex test suite for issues reported by users and hard to reproduce using simple test cases

  • benchmark: Benchmark suite based on jmh

Bazel

bazel dir provides build support for fory C++ and Cython:

  • bazel/cython_library.bzl: pyx_library rule for building Cython extensions

Dependencies are managed via MODULE.bazel using bzlmod (Bazel 8+).

C++

  • cpp/fory/row: Row format data structures
  • cpp/fory/meta: Compile-time reflection utilities for extract struct fields information.
  • cpp/fory/encoder: Row format encoder and decoder
  • cpp/fory/util: Common utilities
    • cpp/fory/util/buffer.h: Buffer for reading and writing data
    • cpp/fory/util/bit_util.h: utilities for bit manipulation
    • cpp/fory/util/string_util.h: String utilities
    • cpp/fory/util/status.h: Status code for error handling

Python

Fory python has two implementations for the protocol:

  • Python mode: Pure python implementation based on xlang serialization format, used for debugging and testing only. This mode can be enabled by setting ENABLE_FORY_CYTHON_SERIALIZATION=0 environment variable.
  • Cython mode: Cython based implementation based on xlang serialization format, which is used by default and has better performance than pure python. This mode can be enabled by setting ENABLE_FORY_CYTHON_SERIALIZATION=1 environment variable.
  • Python mode and Cython mode reused some code from each other to reduce code duplication.

Code structure:

  • python/pyfory/serialization.pyx: Core serialization logic and entry point for cython mode based on xlang serialization format
  • python/pyfory/_fory.py: Serialization entry point for pure python mode based on xlang serialization format
  • python/pyfory/_registry.py: Type registry, resolution and serializer dispatch for pure python mode, which is also used by cython mode. Cython mode use a cache to reduce invocations to this module.
  • python/pyfory/serializer.py: Serializers for non-internal types
  • python/pyfory/includes: Cython headers for c++ functions and classes.
  • python/pyfory/resolver.py: resolving shared/circular references when ref tracking is enabled in pure python mode
  • python/pyfory/format: Fory row format encoding and decoding, arrow columnar format interoperation
  • python/pyfory/_util.pyx: Buffer for reading/writing data, string utilities. Used by serialization.pyx and python/pyfory/format at the same time.

Go

Fory go provides reflection-based and codegen-based serialization and deserialization.

  • go/fory/fory.go: serialization entry point
  • go/fory/resolver.go: resolving shared/circular references when ref tracking is enabled
  • go/fory/type.go: type system and type resolution, serializer dispatch
  • go/fory/slice.go: serializers for slice type
  • go/fory/map.go: serializers for map type
  • go/fory/set.go: serializers for set type
  • go/fory/struct.go: serializers for struct type
  • go/fory/string.go: serializers for string type
  • go/fory/buffer.go: Buffer for reading/writing data
  • go/fory/codegen: code generators, provide code generator to be invoked by go:generate to generate serialization code to speed up the serialization.
  • go/fory/meta: Meta string compression

Rust

Fory rust provides macro-based serialization and deserialization. Fory rust consists of:

  • fory: Main library entry point
    • rust/fory/src/lib.rs: main library entry point to export API to users
  • fory-core: Core library for serialization and deserialization
    • rust/fory-core/src/fory.rs: main serialization entry point
    • rust/fory-core/src/resolver/type_resolver.rs: type resolution and registration
    • rust/fory-core/src/resolver/metastring_resolver.rs: resolver for meta string
    • rust/fory-core/src/resolver/context.rs: context for reading/writing
    • rust/fory-core/src/buffer.rs: buffer for reading/writing data
    • rust/fory-core/src/meta: meta string compression, type meta encoding
    • rust/fory-core/src/serializer: serializers for each supported type
    • rust/fory-core/src/row: row format encoding and decoding
  • fory-derive: Rust macro-based codegen for serialization and deserialization
    • rust/fory-derive/src/object: macro for serializing/deserializing structs
    • rust/fory-derive/src/fory_row: macro for encoding/decoding row format

Integration Tests

integration_tests contains integration tests with following modules:

  • cpython_benchmark: benchmark suite for fory python
  • graalvm_tests: test suite for fory java on graalvm.
    • Note that fory use codegen to support graalvm instead of reflection, fory don't use reflect-config.json for serialization, this is the core advantage of compared to graalvm JDK serialization.
  • jdk_compatibility_tests: test suite for fory serialization compatibility between multiple JDK versions
  • latest_jdk_tests: test suite for jdk17+ versions

Key Development Guidelines

Performance Guidelines

  • Performance First: Never introduce code that reduces performance without explicit justification
  • Zero-Copy: Leverage zero-copy techniques when possible
  • JIT Compilation: Consider JIT compilation opportunities
  • Memory Layout: Optimize for cache-friendly memory access patterns

Code Quality

  • Public APIs: Must be well-documented and easy to understand
  • Error Handling: Implement comprehensive error handling with meaningful messages
  • Type Safety: Use strong typing and generics appropriately
  • Null Safety: Handle null values appropriately for each language

Cross-Language Considerations

  • Protocol Compatibility: Ensure serialization compatibility across languages
  • Type Mapping: Understand type mapping between languages (see docs/specification/xlang_type_mapping.md)
  • Endianness: Handle byte order correctly for cross-platform compatibility
  • Version Compatibility: Maintain backward compatibility when possible

Testing Strategy

  • Unit Tests: Focus on internal behavior verification
  • Integration Tests: Use integration_tests/ for cross-language compatibility
  • Langauge alignment and Protocol Compatibility: Executing test_cross_language.py for language and protocol alignment
  • Performance Tests: Include benchmarks for performance-critical changes

Documentation Requirements

  • API Changes: Update relevant documentation in docs/
  • Protocol Changes: Update specifications in docs/specification/
  • Examples: Provide working examples for new features
  • Migration Guides: Document breaking changes and migration paths

Development Workflow

Before Making Changes

  1. Read Specifications: Review relevant docs in docs/specification/
  2. Understand Architecture: Study the language-specific implementation structure
  3. Check Existing Tests: Look at existing test patterns and coverage
  4. Review Related Issues: Check GitHub issues for context

Making Changes

  1. Follow Language Conventions: Respect each language's idioms and patterns
  2. Maintain Performance: Profile performance-critical changes
  3. Add Tests: Include appropriate tests for new functionality
  4. Update Documentation: Update docs for API changes
  5. Format Code: Use language-specific formatters before committing

Debugging Guidelines

Protocol Issues

  • Use Python Mode: Set ENABLE_FORY_CYTHON_SERIALIZATION=0 for debugging
  • Check Specifications: Refer to protocol specs in docs/specification/
  • Cross-Language Testing: Use integration tests to verify compatibility

Performance Issues

  • Profile First: Use appropriate profilers for each language
  • Memory Analysis: Check for memory leaks and allocation patterns

Build Issues

  • Clean Builds: Use language-specific clean commands
  • Dependency Issues: Check version compatibility
  • Bazel Issues: Use bazel clean --expunge for deep cleaning

CI/CD Understanding

GitHub Actions Workflows

  • ci.yml: Main CI workflow for all languages
  • build-native-*.yml: Mac/Window python wheel build workflows
  • build-containerized-*.yml: Containerized python wheel build workflows for linux
  • lint.yml: Code formatting and linting
  • pr-lint.yml: PR-specific checks

Fixing GitHub CI Errors

Use the GitHub CLI (gh) to inspect and fix CI failures:

# List all checks for a PR and their status
gh pr checks <PR_NUMBER> --repo apache/fory

# View failed job logs (get job ID from pr checks output)
gh run view <RUN_ID> --repo apache/fory --job <JOB_ID> --log-failed

# View full job logs
gh run view <RUN_ID> --repo apache/fory --job <JOB_ID> --log

# Example workflow for fixing CI errors:
# 1. List checks to find failing jobs
gh pr checks 2942 --repo apache/fory

# 2. Get the failed job logs (RUN_ID and JOB_ID from step 1)
gh run view 19735911308 --repo apache/fory --job 56547673283 --log-failed

# 3. Fix the issues based on error messages
# 4. Commit and push fixes

Common CI failures and fixes:

  • Code Style Check: Run formatters (clang-format, prettier, spotless:apply, etc.)
  • Markdown Lint: Run prettier --write <file> for markdown files
  • C++ Build Errors: Check for missing dependencies or header includes
  • Test Failures: Run tests locally to reproduce and fix

Commit Message Format

Use conventional commits with language scope:

feat(java): add codegen support for xlang serialization
fix(rust): fix collection header when collection is empty
docs(python): add docs for xlang serialization
refactor(java): unify serialization exceptions hierarchy
perf(cpp): optimize buffer allocation in encoder
test(integration): add cross-language reference cycle tests
ci: update build matrix for latest JDK versions
chore(deps): update guava dependency to 32.0.0