title: Java Serialization Format sidebar_position: 1 id: java_serialization_spec license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
This document specifies the Apache Fory Java native binary format: the format used by Java when withXlang(false) is configured. The format is optimized for Java object graphs, Java collection implementations, Java primitive arrays, Java class registration, Java serialization hooks, and optional schema evolution.
Java native mode and xlang mode share low-level building blocks such as little-endian numeric payloads, variable-length integer encodings, reference flags, meta string encodings, and TypeDef/ClassDef concepts. They are different wire formats. In Java native mode, only the scalar type IDs from BOOL through STRING are shared with xlang. Collection, map, struct, array, enum, and native Java implementation type IDs are Java native IDs unless this document explicitly says otherwise.
See Xlang Serialization Format for the cross-language format.
A Java native stream contains one header byte followed by one or more root objects. Each root object is encoded as a normal object slot:
| header | root_0 | root_1 | ... | root: | reference flag | [type metadata] | [value payload] |
All multi-byte fixed-width values are little endian. A big-endian Java runtime must still write and read little-endian payloads.
The stream is stateful. Type metadata, class definitions, and object references are assigned indexes as they are first encountered and may be referenced later in the same stream.
The header is a single byte:
| bits 7..2 reserved | bit 1 out-of-band | bit 0 xlang |
xlang must be 0 for Java native mode.out-of-band is 1 when a BufferCallback is configured.0.Java native mode does not write a language ID after the header.
Objects, nullable fields, and reference-tracked fields use the standard Fory reference slot. The first byte is signed:
| Flag | Byte | Payload that follows |
|---|---|---|
NULL_FLAG | -3 | No payload. The slot value is null. |
REF_FLAG | -2 | varuint32 reference ID of an earlier object. |
NOT_NULL_VALUE_FLAG | -1 | Value payload. No reference ID is assigned for this occurrence. |
REF_VALUE_FLAG | 0 | Value payload. Assign the next reference ID before reading data. |
When reference tracking is disabled for a slot, writers use only NULL_FLAG and NOT_NULL_VALUE_FLAG.
Primitive field fast paths do not wrap non-null primitive values in a reference slot. Boxed primitives and other nullable values use the slot selected by field metadata.
Dynamic object slots write type metadata before the value payload. Type metadata identifies the serializer and, when needed, carries class names or ClassDef metadata.
| varuint32 type_id | [type-specific metadata] |
Registered Java classes, Java native built-ins, and Fory internal serializers use numeric type IDs. Unregistered classes or classes registered by name carry name metadata. Schema-evolution classes may carry a ClassDef.
| Range | Meaning |
|---|---|
0 | UNKNOWN, used in metadata for dynamic or object-typed positions. |
1..21 | Shared scalar IDs from BOOL through STRING. |
22..63 | Reserved in Java native mode for the xlang internal ID range. |
64..68 | Reserved for future Java native internal IDs. |
69..98 | Java native built-ins listed below. |
99+ | User and runtime class IDs assigned by the Java ClassResolver. |
The shared scalar IDs are:
| ID | Name | Java value domain |
|---|---|---|
| 1 | BOOL | boolean values in xlang metadata |
| 2 | INT8 | signed 8-bit integer metadata |
| 3 | INT16 | signed 16-bit integer metadata |
| 4 | INT32 | fixed-width signed 32-bit metadata |
| 5 | VARINT32 | variable-width signed 32-bit metadata |
| 6 | INT64 | fixed-width signed 64-bit metadata |
| 7 | VARINT64 | variable-width signed 64-bit metadata |
| 8 | TAGGED_INT64 | tagged signed 64-bit metadata |
| 9 | UINT8 | unsigned 8-bit metadata |
| 10 | UINT16 | unsigned 16-bit metadata |
| 11 | UINT32 | fixed-width unsigned 32-bit metadata |
| 12 | VAR_UINT32 | variable-width unsigned 32-bit metadata |
| 13 | UINT64 | fixed-width unsigned 64-bit metadata |
| 14 | VAR_UINT64 | variable-width unsigned 64-bit metadata |
| 15 | TAGGED_UINT64 | tagged unsigned 64-bit metadata |
| 16 | FLOAT8 | reserved 8-bit float metadata |
| 17 | FLOAT16 | half precision float metadata |
| 18 | BFLOAT16 | bfloat16 metadata |
| 19 | FLOAT32 | 32-bit floating point metadata |
| 20 | FLOAT64 | 64-bit floating point metadata |
| 21 | STRING | Java String |
Java native built-ins start at ID 69:
| ID | Name | Java type or serializer owner |
|---|---|---|
| 69 | VOID_ID | java.lang.Void |
| 70 | CHAR_ID | java.lang.Character |
| 71 | PRIMITIVE_VOID_ID | void |
| 72 | PRIMITIVE_BOOL_ID | boolean |
| 73 | PRIMITIVE_INT8_ID | byte |
| 74 | PRIMITIVE_CHAR_ID | char |
| 75 | PRIMITIVE_INT16_ID | short |
| 76 | PRIMITIVE_INT32_ID | int |
| 77 | PRIMITIVE_FLOAT32_ID | float |
| 78 | PRIMITIVE_INT64_ID | long |
| 79 | PRIMITIVE_FLOAT64_ID | double |
| 80 | PRIMITIVE_BOOLEAN_ARRAY_ID | boolean[] |
| 81 | PRIMITIVE_BYTE_ARRAY_ID | byte[] |
| 82 | PRIMITIVE_CHAR_ARRAY_ID | char[] |
| 83 | PRIMITIVE_SHORT_ARRAY_ID | short[] |
| 84 | PRIMITIVE_INT_ARRAY_ID | int[] |
| 85 | PRIMITIVE_FLOAT_ARRAY_ID | float[] |
| 86 | PRIMITIVE_LONG_ARRAY_ID | long[] |
| 87 | PRIMITIVE_DOUBLE_ARRAY_ID | double[] |
| 88 | STRING_ARRAY_ID | String[] |
| 89 | OBJECT_ARRAY_ID | Object[] and object array serializers |
| 90 | ARRAYLIST_ID | java.util.ArrayList |
| 91 | HASHMAP_ID | java.util.HashMap |
| 92 | HASHSET_ID | java.util.HashSet |
| 93 | CLASS_ID | java.lang.Class |
| 94 | EMPTY_OBJECT_ID | Empty-object serializer |
| 95 | LAMBDA_STUB_ID | Lambda replacement type ID |
| 96 | JDK_PROXY_STUB_ID | JDK proxy replacement type ID |
| 97 | REPLACE_STUB_ID | writeReplace/readResolve type ID |
| 98 | NONEXISTENT_META_SHARED_ID | Unknown-class marker type ID |
Java native mode supports three class identity forms:
Class registration is the fastest and most compact form. Name-based forms are used when stable names are required or class registration is disabled.
When meta sharing is enabled, class metadata is written once and referenced by a stream-local index:
| varuint32 marker | [class definition bytes if new] | marker = (index << 1) | flag flag = 0: new definition, class definition bytes follow flag = 1: reference to an earlier definition
Indexes are assigned in first-use order.
Java native mode has two object schema modes.
Schema-consistent mode is used when compatible mode is disabled. The writer and reader must have matching fields and field order. No per-object ClassDef is required for ordinary registered classes. Field values are written directly in protocol order.
Compatible mode writes ClassDef metadata for struct-like classes. Readers match local fields against remote ClassDef fields by identifier, read matching fields, and skip unknown fields using the remote field type metadata. Compatible mode is the Java native schema-evolution path.
Java native object serializers use the same deterministic field-order categories as the current xlang protocol:
Primitive groups keep the primitive comparator:
Non-primitive fields sort directly by field identifier. Non-primitive type ID, serializer kind, collection kind, map kind, and Java implementation class do not participate in field order.
Field identifiers are selected as follows:
@ForyField(id = ...), that numeric ID is the field identifier.-1 means no explicit ID and is ignored for identifier selection.Identifier comparison is:
Generated serializers may keep separate internal descriptor groups for primitive, collection, map, built-in, and user-defined serializers so they can emit specialized fast paths. Those internal groups are an implementation detail and must not change wire field order.
Compatible mode and meta sharing encode Java class definitions as TypeDef records. A TypeDef has an 8-byte header followed by class metadata bytes:
| 8-byte header | [varuint32 extra_size] | class metadata bytes |
Header bits:
| 52-bit hash | 3 reserved bits | 1 compress bit | 8 size bits |
size: the lower 8 bits. If the value is 0xff, read extra_size as varuint32 and add it to 0xff.compress: set when class metadata bytes are compressed by the configured meta compressor.reserved: must be zero.hash: 52 bits derived from MurmurHash3 x64_128 seed 47 over class_metadata_bytes || header_low12_le. header_low12_le is the low 12 header bits encoded as two little-endian bytes with the upper four bits of the second byte clear. Take lane 0 of the MurmurHash3 result, left-shift it by 12 with signed 64-bit wraparound, apply signed absolute value, and mask with 0xfffffffffffff000.| root_kind_and_layer_count | class_layer_0 | class_layer_1 | ... | class_layer: | varuint32 class_header | [registered type IDs or names] | field_info... |
root_kind_and_layer_count stores the root TypeDef kind in the high four bits and (num_layers - 1) in the low four bits. If the low four bits are 0b1111, read an extra varuint32 and add it to 15.
Root kind codes:
| Code | Kind |
|---|---|
| 0 | STRUCT |
| 1 | COMPATIBLE_STRUCT |
| 2 | NAMED_STRUCT |
| 3 | NAMED_COMPATIBLE_STRUCT |
| 4 | ENUM |
| 5 | NAMED_ENUM |
| 6 | EXT |
| 7 | NAMED_EXT |
| 8 | TYPED_UNION |
| 9 | NAMED_UNION |
| 10-14 | Reserved |
| 15 | Extended-kind escape, rejected until defined |
class_header = (num_fields << 1) | registered_flag.
registered_flag == 1, write the class type ID as one byte. For user-registered ENUM, STRUCT, COMPATIBLE_STRUCT, EXT, and TYPED_UNION, write the user type ID as varuint32.registered_flag == 0, write namespace and type name as meta strings.Class layers are encoded from parent to leaf. Field lists inside each layer use the field order defined above.
Each field is encoded as:
| field_header | [extended_name_or_id_size] | [field name bytes] | field_type |
field_header bits:
| Bits | Meaning |
|---|---|
| 0 | trackingRef |
| 1 | nullable |
| 2..3 | field name encoding |
| 4..6 | encoded name length minus one, or compact tag ID |
| 7 | reserved, must be zero |
Field name encodings:
| Code | Encoding |
|---|---|
| 0 | UTF-8 |
| 1 | all-to-lower special encoding |
| 2 | lower/upper/digit special encoding |
| 3 | tag ID; field name bytes are omitted |
For name encodings, bits 4..6 store encoded_length - 1 when it is less than 7. If the value is 7, read an extra varuint32 and add it to 7.
For tag ID encoding, bits 4..6 store the numeric field ID when it is less than 7. If the value is 7, read an extra varuint32 and add it to 7. Field IDs must be non-negative. Duplicate field IDs in one TypeDef are invalid.
Field types describe how compatible readers read or skip the field payload. Top-level field types write only the type tag. Nested field types store nullable and trackingRef in the low bits:
nested_field_type_header = (type_tag << 2) | (nullable << 1) | trackingRef
Type tags:
| Tag | Field type | Payload |
|---|---|---|
| 0 | Object/dynamic | none |
| 1 | Map | key field type, value field type |
| 2 | Collection/List/Set | element field type |
| 3 | Java array | dimensions, component field type |
| 4 | Enum | none |
| 5+ | Registered or built-in type | tag - 5 is the type ID |
Namespaces, type names, and field names use the meta string encodings defined by the xlang specification. A meta string header stores the byte length and encoding kind; extended lengths are written as varuint32.
Package and namespace names use UTF-8, all-to-lower special encoding, or lower/upper/digit special encoding. Type names use UTF-8, lower/upper/digit special encoding, first-to-lower special encoding, or all-to-lower special encoding. Field names use the field-info encoding table above.
Primitive values are written without type metadata when the field serializer is known statically:
| Java type | Payload |
|---|---|
boolean | one byte: 0 or 1 |
byte | one signed byte |
char | two-byte UTF-16 code unit, little endian |
short | two-byte signed integer, little endian |
int | fixed int32 little endian, or ZigZag varint32 when configured |
long | fixed int64 little endian, ZigZag varint64, or tagged int64 when configured |
float | IEEE 754 binary32, little endian |
double | IEEE 754 binary64, little endian |
Boxed primitives use the same value payload after the selected null/reference slot.
Java strings are encoded as:
| varuint36_small7 header | bytes | header = (num_bytes << 2) | coder
coder values:
| Value | Encoding |
|---|---|
| 0 | Latin-1 |
| 1 | UTF-16 little endian |
| 2 | UTF-8 |
num_bytes is the byte length of the encoded payload.
Enum value payload depends on configuration:
varuint32.@ForyEnumId mode writes the configured non-negative enum tag as varuint32.@ForyEnumId may be declared on enum constants, on one integer field, or on one zero-argument integer getter, according to the Java API contract. Duplicate or negative enum tags are invalid.
Primitive arrays write a length prefix and contiguous little-endian element payload:
| varuint32 byte_length | raw element bytes |
Compressed int[] and long[] arrays use element count followed by compressed elements:
int[] compressed: | varuint32 length | varint32... | long[] compressed: | varuint32 length | varint64 or tagged_int64... |
byte[] is the binary serializer and writes varuint32 length followed by raw bytes.
Object arrays write the array length and an element type mode:
| varuint32_small7 (length << 1 | monomorphic_flag) | | [shared element class metadata] | | element slots... |
monomorphic_flag == 1, all non-null elements use the same element serializer. The shared element class metadata is written once.monomorphic_flag == 0, each non-null element writes its own type metadata.Each nullable or reference-tracked element is still represented by a reference slot before its element payload.
Java collection serializers write collection size, element flags, optional shared element type metadata, and element payloads:
| varuint32_small7 size | elements_header | [element type metadata] | elements... |
elements_header bits:
| Bit | Meaning |
|---|---|
| 0 | Element reference tracking is enabled |
| 1 | At least one element may be null |
| 2 | Declared element type is used |
| 3 | All non-null elements share one type |
When all non-null elements share a type and the declared element type is not used, the shared element type metadata is written once before element payloads. Otherwise each non-null element writes its own type metadata. Null and reference flags follow the reference-slot rules.
Specialized serializers for supported JDK collection subclasses write subclass-owned field layers before the element payload:
| varuint32_small7 size | | [comparator reference for sorted/priority collections] | | varuint32_small7 num_class_layers | | class_layer_fields... | | elements_header | [element type metadata] | elements... |
num_class_layers is the exact number of subclass field layers encoded in the payload. Readers must reject a payload whose layer count does not match the local serializer because the value payload does not carry enough layer identity to skip a mismatched subclass layout.
Maps write entry count followed by one or more chunks. Each chunk groups entries with compatible key and value metadata:
| varuint32_small7 size | chunk... |
Non-null chunks:
| header | uint8 chunk_size | [key type metadata] | [value type metadata] | entries... |
chunk_size is in 1..255.
header bits:
| Bit | Meaning |
|---|---|
| 0 | Key reference tracking is enabled |
| 1 | Chunk may contain null keys |
| 2 | Declared key type is used |
| 3 | Value reference tracking is enabled |
| 4 | Chunk may contain null values |
| 5 | Declared value type is used |
Null key or null value entries are encoded as single-entry special chunks without a chunk_size byte:
KV_NULL header only.EnumMap writes its entry count, key enum class metadata, and then its normal map entry payload:
| varuint32_small7 size | key enum class metadata | chunk... |
The key enum class metadata is present even when size is zero so an empty EnumMap can be reconstructed with the original key type.
Specialized serializers for supported JDK map subclasses write subclass-owned field layers before entry chunks:
| varuint32_small7 size | | [comparator reference for sorted maps] | | varuint32_small7 num_class_layers | | class_layer_fields... | | chunk... |
Readers must reject mismatched num_class_layers for the same reason as collection subclasses.
Java native mode has serializers for selected JDK wrappers and views:
0 writes visible elements as a collection payload. Mode 1 writes view offset, size, and source list reference.Collections.newSetFromMap writes the backing map payload.Android and JVM implementations may choose different concrete public backing types for wrapper payloads, but the serializer-owned payload modes above define the wire shape.
Struct-like object payloads contain field values in protocol field order. The selected serializer owns the exact field fast path:
| field_0 payload | field_1 payload | ... |
For each field, field metadata decides whether the field writes a primitive payload directly, a nullable slot, a reference-tracked slot, type metadata, or a specialized collection/map/array payload.
Compatible-mode readers use the remote ClassDef field list to map fields by identifier. Unknown fields are skipped using their remote field type metadata.
Generated serializers may split large generated methods and hoist serializers, field offsets, collection metadata, or map metadata. Those generated-code decisions must preserve the same object payload order.
Throwable serializers preserve standard Java throwable state and subclass-owned fields:
| stack_trace_ref | cause_ref | message_ref | | varuint32 suppressed_count | suppressed_ref... | | varuint32 extra_field_count | extra_field_name/value... | | varuint32_small7 num_class_layers | | class_layer_fields... |
extra_field_count is reserved for serializer-owned extension fields and is currently written as zero. num_class_layers must match the local throwable serializer layout on read.
Java native mode supports serializer-owned handling for Java object replacement and Java serialization hooks:
writeReplace/readResolve values use replacement metadata and payloads owned by the replacement serializer.These serializers still obey the stream header, reference slot, and type metadata rules in this document.
When meta sharing is enabled and a reader does not have a local class for a remote ClassDef, Java may materialize an unknown-class value using NONEXISTENT_META_SHARED_ID. The value stores enough field data to preserve and copy the unknown value according to the unknown-class serializer. It does not make the unknown Java class available to user code.
When the header out-of-band bit is set, serializers may write references to external buffers instead of writing all bytes inline. The callback defines the external buffer transport. The main stream remains a valid Fory stream containing references to those buffers at serializer-owned payload positions.