| commit | 2be808d953d346d297e0582bc3dc9e1bd0624805 | [log] [tgz] |
|---|---|---|
| author | Shawn Yang <shawn.ck.yang@gmail.com> | Thu Jan 01 11:27:42 2026 +0800 |
| committer | GitHub <noreply@github.com> | Thu Jan 01 08:57:42 2026 +0530 |
| tree | 3d381a0b8c38285133c8da92951c168a2d24a7ab | |
| parent | d6f646139521c8e822534a75bc40c803c9aca5cd [diff] |
feat(java/python/rust/go/c++): xlang nullable/ref alignment (#3104) ## Why? Cross-language serialization requires consistent handling of nullable fields and reference tracking across all language implementations. Previously, there were inconsistencies in: - Field sorting order for nullable vs non-nullable fields - Handling of `std::optional` / `Optional` types during serialization - TypeDef encoding/decoding for field nullability metadata - MetaCompressor configuration not being passed through in cython mode ## What does this PR do? ### Core Changes 1. **Unified Field Sorting Order** (Java, C++, Go, Rust, Python) - Fixed numeric field sorter to use type_id descending order to match Java's implementation - Ensures consistent field order across all languages for schema compatibility 2. **Nullable Field Xlang Tests** - Added comprehensive nullable field tests for SCHEMA_CONSISTENT and COMPATIBLE modes - New test structs: `NullableComprehensiveSchemaConsistent` (type_id=401) and `NullableComprehensiveCompatible` (type_id=402) - Tests cover all primitive types, boxed types, and reference types (String, List, Set, Map) - Enabled tests for C++, Python, Go, and Rust 3. **C++ Improvements** - Fixed `std::optional` serializer to properly propagate `has_generics` flag - Added `NullableComprehensiveSchemaConsistent` and `NullableComprehensiveCompatible` structs - Implemented nullable field test handlers 4. **Python Improvements** - Added `NoOpMetaCompressor` for testing without compression - Added `meta_compressor` parameter to `Fory` and `TypeResolver` constructors - Fixed cython mode to properly pass `meta_compressor` parameter - Updated `NullableComprehensiveCompatible` to use `Optional` for all nullable fields - Fixed field name resolution with smart fallback lookup (snake_case β camelCase) 5. **Go Improvements** - Added nullable field test support - Fixed field ordering for xlang compatibility 6. **Rust Improvements** - Added nullable field test handlers - Fixed field sorting consistency 7. **Java Improvements** - Refactored `ObjectSerializer` for better nullable/ref tracking handling - Fixed `StringUtils.lowerUnderscoreToLowerCamelCase` off-by-one bug - Added custom test overrides for C++ and Python that properly handle null values ### Language-Specific Null Handling - **C++** uses `std::optional<T>` - properly preserves null values - **Python** uses `Optional[T]` - properly preserves null values - **Rust** sends default values for nullable fields (different behavior) - **Go** handles nullable fields with proper nil checks ## Related issues #1017 #2982 #2906 ## Does this PR introduce any user-facing change? - [x] Does this PR introduce any public API change? - Python: Added `meta_compressor` parameter to `Fory` constructor - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark N/A
Apache Foryβ’ is a blazingly-fast multi-language serialization framework powered by JIT compilation, zero-copy techniques, and advanced code generation, achieving up to 170x performance improvement while maintaining simplicity and ease of use.
[!IMPORTANT] Apache Foryβ’ was previously named as Apache Fury. For versions before 0.11, please use “fury” instead of “fory” in package names, imports, and dependencies, see Fury Docs for how to use Fury in older versions.
Apache Foryβ’ delivers exceptional performance through advanced optimization techniques:
The xlang serialization format enables seamless data exchange across programming languages:
A cache-friendly row format optimized for analytics workloads:
Enterprise-grade security and compatibility:
Apache Foryβ’ implements multiple binary protocols optimized for different scenarios:
| Protocol | Use Case | Key Features |
|---|---|---|
| Xlang Serialization | Cross-language object exchange | Automatic serialization, references, polymorphism |
| Java Serialization | High-performance Java-only | Drop-in JDK serialization replacement, 100x faster |
| Row Format | Analytics and data processing | Zero-copy random access, Arrow compatibility |
| Python Native | Python-specific serialization | Pickle/cloudpickle replacement with better performance |
All protocols share the same optimized codebase, allowing improvements in one protocol to benefit others.
Note: Different serialization frameworks excel in different scenarios. Benchmark results are for reference only. For your specific use case, conduct benchmarks with appropriate configurations and workloads.
The following benchmarks compare Fory against popular Java serialization frameworks. Charts labeled “compatible” show schema evolution mode with forward/backward compatibility enabled, while others show schema consistent mode where class schemas must match.
Test Classes:
Struct: Class with 100 primitive fieldsMediaContent: Class from jvm-serializersSample: Class from Kryo benchmarkSerialization Throughput:
Deserialization Throughput:
Important: Fory's runtime code generation requires proper warm-up for performance measurement:
For additional benchmarks covering type forward/backward compatibility, off-heap support, and zero-copy serialization, see Java Benchmarks.
Fory Rust demonstrates competitive performance compared to other Rust serialization frameworks.
For more detailed benchmarks and methodology, see Rust Benchmarks.
Fory Rust demonstrates competitive performance compared to protobuf c++ serialization framework.
For more detailed benchmarks and methodology, see C++ Benchmarks.
Java:
<dependency> <groupId>org.apache.fory</groupId> <artifactId>fory-core</artifactId> <version>0.14.1</version> </dependency>
Snapshots are available from https://repository.apache.org/snapshots/ (version 0.14.0-SNAPSHOT).
Scala:
// Scala 2.13 libraryDependencies += "org.apache.fory" % "fory-scala_2.13" % "0.14.1" // Scala 3 libraryDependencies += "org.apache.fory" % "fory-scala_3" % "0.14.1"
Kotlin:
<dependency> <groupId>org.apache.fory</groupId> <artifactId>fory-kotlin</artifactId> <version>0.14.1</version> </dependency>
Python:
pip install pyfory # With row format support pip install pyfory[format]
Rust:
[dependencies] fory = "0.14"
C++:
Fory C++ supports both CMake and Bazel build systems. See C++ Installation Guide for detailed instructions.
Golang:
go get github.com/apache/fory/go/fory
This section provides quick examples for getting started with Apache Foryβ’. For comprehensive guides, see the Documentation.
Always use native mode when working with a single language. Native mode delivers optimal performance by avoiding the type metadata overhead required for cross-language compatibility. Xlang mode introduces additional metadata encoding costs and restricts serialization to types that are common across all supported languages. Language-specific types will be rejected during serialization in xlang-mode.
When you don't need cross-language support, use Java mode for optimal performance.
import org.apache.fory.*; import org.apache.fory.config.*; public class Example { public static class Person { String name; int age; } public static void main(String[] args) { // Create Fory instance - should be reused across serializations BaseFory fory = Fory.builder() .withLanguage(Language.JAVA) .requireClassRegistration(true) // replace `build` with `buildThreadSafeFory` for Thread-Safe Usage .build(); // Register your classes (required when class registration is enabled) // Registration order must be consistent if id is not specified fory.register(Person.class); // Serialize Person person = new Person(); person.name = "chaokunyang"; person.age = 28; byte[] bytes = fory.serialize(person); Person result = (Person) fory.deserialize(bytes); System.out.println(result.name + " " + result.age); // Output: chaokunyang 28 } }
For detailed Java usage including compatibility modes, compression, and advanced features, see Java Serialization Guide and java/README.md.
Python native mode provides a high-performance drop-in replacement for pickle/cloudpickle with better speed and compatibility.
from dataclasses import dataclass import pyfory @dataclass class Person: name: str age: pyfory.int32 # Create Fory instance - should be reused across serializations fory = pyfory.Fory() # Register your classes (required when class registration is enabled) fory.register_type(Person) person = Person(name="chaokunyang", age=28) data = fory.serialize(person) result = fory.deserialize(data) print(result.name, result.age) # Output: chaokunyang 28
For detailed Python usage including type hints, compatibility modes, and advanced features, see Python Guide.
Rust native mode provides compile-time code generation via derive macros for high-performance serialization without runtime overhead.
use fory::{Fory, ForyObject}; #[derive(ForyObject, Debug, PartialEq)] struct Person { name: String, age: i32, } fn main() -> Result<(), fory::Error> { // Create Fory instance - should be reused across serializations let mut fory = Fory::default(); // Register your structs (required when class registration is enabled) fory.register::<Person>(1); let person = Person { name: "chaokunyang".to_string(), age: 28, }; let bytes = fory.serialize(&person); let result: Person = fory.deserialize(&bytes)?; println!("{} {}", result.name, result.age); // Output: chaokunyang 28 Ok(()) }
For detailed Rust usage including collections, references, and custom serializers, see Rust Guide.
C++ native mode provides compile-time reflection via the FORY_STRUCT macro for efficient serialization with zero runtime overhead.
#include "fory/serialization/fory.h" using namespace fory::serialization; struct Person { std::string name; int32_t age; }; FORY_STRUCT(Person, name, age); int main() { // Create Fory instance - should be reused across serializations auto fory = Fory::builder().build(); // Register your structs (required when class registration is enabled) fory.register_struct<Person>(1); Person person{"chaokunyang", 28}; auto bytes = fory.serialize(person).value(); auto result = fory.deserialize<Person>(bytes).value(); std::cout << result.name << " " << result.age << std::endl; // Output: chaokunyang 28 }
For detailed C++ usage including collections, smart pointers, and error handling, see C++ Guide.
Scala native mode provides optimized serialization for Scala-specific types including case classes, collections, and Option types.
import org.apache.fory.Fory import org.apache.fory.config.Language import org.apache.fory.serializer.scala.ScalaSerializers case class Person(name: String, age: Int) object Example { def main(args: Array[String]): Unit = { // Create Fory instance - should be reused across serializations val fory = Fory.builder() .withLanguage(Language.JAVA) .requireClassRegistration(true) .build() // Register Scala serializers for Scala-specific types ScalaSerializers.registerSerializers(fory) // Register your case classes fory.register(classOf[Person]) val bytes = fory.serialize(Person("chaokunyang", 28)) val result = fory.deserialize(bytes).asInstanceOf[Person] println(s"${result.name} ${result.age}") // Output: chaokunyang 28 } }
For detailed Scala usage including collection serialization and integration patterns, see Scala Guide.
Kotlin native mode provides optimized serialization for Kotlin-specific types including data classes, nullable types, and Kotlin collections.
import org.apache.fory.Fory import org.apache.fory.config.Language import org.apache.fory.serializer.kotlin.KotlinSerializers data class Person(val name: String, val age: Int) fun main() { // Create Fory instance - should be reused across serializations val fory = Fory.builder() .withLanguage(Language.JAVA) .requireClassRegistration(true) .build() // Register Kotlin serializers for Kotlin-specific types KotlinSerializers.registerSerializers(fory) // Register your data classes fory.register(Person::class.java) val bytes = fory.serialize(Person("chaokunyang", 28)) val result = fory.deserialize(bytes) as Person println("${result.name} ${result.age}") // Output: chaokunyang 28 }
For detailed Kotlin usage including null safety and default value support, see kotlin/README.md.
Only use xlang mode when you need cross-language data exchange. Xlang mode adds type metadata overhead for cross-language compatibility and only supports types that can be mapped across all languages. For single-language use cases, always prefer native mode for better performance.
The following examples demonstrate serializing a Person object across Java and Rust. For other languages (Python, Go, JavaScript, etc.), simply set the language mode to XLANG and follow the same pattern.
Java
import org.apache.fory.*; import org.apache.fory.config.*; public class XlangExample { public record Person(String name, int age) {} public static void main(String[] args) { // Create Fory instance with XLANG mode Fory fory = Fory.builder() .withLanguage(Language.XLANG) .build(); // Register with cross-language type id/name fory.register(Person.class, 1); // fory.register(Person.class, "example.Person"); Person person = new Person("chaokunyang", 28); byte[] bytes = fory.serialize(person); // bytes can be deserialized by Rust, Python, Go, or other languages Person result = (Person) fory.deserialize(bytes); System.out.println(result.name + " " + result.age); // Output: chaokunyang 28 } }
Rust
use fory::{Fory, ForyObject}; #[derive(ForyObject, Debug)] struct Person { name: String, age: i32, } fn main() -> Result<(), Error> { let mut fory = Fory::default(); fory.register::<Person>(1)?; // fory.register_by_name::<Person>("example.Person")?; let person = Person { name: "chaokunyang".to_string(), age: 28, }; let bytes = fory.serialize(&person); // bytes can be deserialized by Java, Python, Go, or other languages let result: Person = fory.deserialize(&bytes)?; println!("{} {}", result.name, result.age); // Output: chaokunyang 28 }
Key Points for Cross-Language Serialization:
Language.XLANG mode in all languagesfory.register(Person.class, 1)): Faster serialization, more compact encoding, but requires coordination to avoid ID conflictsfory.register(Person.class, "example.Person")): More flexible, less prone to conflicts, easier to manage across teams, but slightly larger encodingFor examples with circular references, shared references, and polymorphism across languages, see:
Row format provides zero-copy random access to serialized data, making it ideal for analytics workloads and data processing pipelines.
import org.apache.fory.format.*; import java.util.*; import java.util.stream.*; public class Bar { String f1; List<Long> f2; } public class Foo { int f1; List<Integer> f2; Map<String, Integer> f3; List<Bar> f4; } RowEncoder<Foo> encoder = Encoders.bean(Foo.class); Foo foo = new Foo(); foo.f1 = 10; foo.f2 = IntStream.range(0, 1000000).boxed().collect(Collectors.toList()); foo.f3 = IntStream.range(0, 1000000).boxed().collect(Collectors.toMap(i -> "k"+i, i -> i)); List<Bar> bars = new ArrayList<>(1000000); for (int i = 0; i < 1000000; i++) { Bar bar = new Bar(); bar.f1 = "s" + i; bar.f2 = LongStream.range(0, 10).boxed().collect(Collectors.toList()); bars.add(bar); } foo.f4 = bars; // Serialize to row format (can be zero-copy read by Python) BinaryRow binaryRow = encoder.toRow(foo); // Deserialize entire object Foo newFoo = encoder.fromRow(binaryRow); // Zero-copy access to nested fields without full deserialization BinaryArray binaryArray2 = binaryRow.getArray(1); // Access f2 field BinaryArray binaryArray4 = binaryRow.getArray(3); // Access f4 field BinaryRow barStruct = binaryArray4.getStruct(10); // Access 11th Bar element long value = barStruct.getArray(1).getInt64(5); // Access nested value // Partial deserialization RowEncoder<Bar> barEncoder = Encoders.bean(Bar.class); Bar newBar = barEncoder.fromRow(barStruct); Bar newBar2 = barEncoder.fromRow(binaryArray4.getStruct(20));
from dataclasses import dataclass from typing import List, Dict import pyarrow as pa import pyfory @dataclass class Bar: f1: str f2: List[pa.int64] @dataclass class Foo: f1: pa.int32 f2: List[pa.int32] f3: Dict[str, pa.int32] f4: List[Bar] encoder = pyfory.encoder(Foo) foo = Foo( f1=10, f2=list(range(1000_000)), f3={f"k{i}": i for i in range(1000_000)}, f4=[Bar(f1=f"s{i}", f2=list(range(10))) for i in range(1000_000)] ) # Serialize to row format binary: bytes = encoder.to_row(foo).to_bytes() # Zero-copy random access without full deserialization foo_row = pyfory.RowData(encoder.schema, binary) print(foo_row.f2[100000]) # Access element directly print(foo_row.f4[100000].f1) # Access nested field print(foo_row.f4[200000].f2[5]) # Access deeply nested field
For more details on row format, see Row Format Specification.
| Guide | Description | Source | Website |
|---|---|---|---|
| Java Serialization | Comprehensive guide for Java serialization | java | π View |
| Python | Python-specific features and usage | python | π View |
| Rust | Rust implementation and patterns | rust | π View |
| C++ | C++ implementation and patterns | cpp | π View |
| Scala | Scala integration and best practices | scala | π View |
| Cross-Language Serialization | Multi-language object exchange | xlang | π View |
| GraalVM | Native image support and AOT compilation | graalvm_guide.md | π View |
| Development | Building and contributing to Fory | DEVELOPMENT.md | π View |
| Specification | Description | Source | Website |
|---|---|---|---|
| Xlang Serialization | Cross-language binary protocol | xlang_serialization_spec.md | π View |
| Java Serialization | Java-optimized protocol | java_serialization_spec.md | π View |
| Row Format | Row-based binary format | row_format_spec.md | π View |
| Type Mapping | Cross-language type conversion | xlang_type_mapping.md | π View |
Apache Foryβ’ supports class schema forward/backward compatibility across Java, Python, Rust, and Golang, enabling seamless schema evolution in production systems without requiring coordinated upgrades across all services. Fory provides two schema compatibility modes:
Schema Consistent Mode (Default): Assumes identical class schemas between serialization and deserialization peers. This mode offers minimal serialization overhead, smallest data size, and fastest performance: ideal for stable schemas or controlled environments.
Compatible Mode: Supports independent schema evolution with forward and backward compatibility. This mode enables field addition/deletion, limited type evolution, and graceful handling of schema mismatches. Enable using withCompatibleMode(CompatibleMode.COMPATIBLE) in Java, compatible=True in Python, compatible_mode(true) in Rust, or NewFory(true) in Go.
Current Status: Binary compatibility is not guaranteed between Fory major releases as the protocol continues to evolve. However, compatibility is guaranteed between minor versions (e.g., 0.13.x).
Recommendations:
Future: Binary compatibility will be guaranteed starting from Fory 1.0 release.
Serialization security varies by protocol:
Dynamic serialization can deserialize arbitrary types, which may introduces risks. For example, the deserialization may invoke init constructor or equals/hashCode method, if the method body contains malicious code, the system will be at risk.
Fory enables class registration by default for dynamic protocols, allowing only trusted registered types. Do not disable class registration unless you can ensure your environment is secure.
If this option is disabled, you are responsible for serialization security. You should implement and configure a customized ClassChecker or DeserializationPolicy for fine-grained security control
To report security vulnerabilities in Apache Foryβ’, please follow the ASF vulnerability reporting process.
We welcome contributions! Please read our Contributing Guide to get started.
Ways to Contribute:
See Development Guide for build instructions and development workflow.
Apache Foryβ’ is licensed under the Apache License 2.0.