title: Java Serialization Format sidebar_position: 1 id: java_serialization_spec license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Apache Fory Java serialization is a dynamic binary format for Java object graphs. It supports shared references, circular references, polymorphism, and optional schema evolution. The format is stream friendly: shared type metadata is written inline when needed and there is no meta start offset.
The Java native format is an extension of the xlang wire format and reuses the same core framing and encodings; see docs/specification/xlang_serialization_spec.md for the shared baseline.
Overall layout:
| fory header | object ref meta | object type meta | object value data |
All data is encoded in little endian byte order. When running on a big endian platform, array serializers swap byte order on write/read so the on-wire layout remains little endian.
Java native serialization writes a one byte bitmap header. The header layout mirrors the xlang bitmap and uses the same flag bits.
| 5 bits | 1 bit | 1 bit | 1 bit | +--------------+-------+-------+-------+ | reserved | oob | xlang | null |
BufferCallback is not null, 0 otherwise.The header is always a single byte; no language ID is written.
Reference tracking uses the same flags as the xlang specification.
| Flag | Byte Value | Description |
|---|---|---|
| NULL FLAG | -3 | Object is null. No further bytes are written for this object. |
| REF FLAG | -2 | Object was already serialized. Followed by unsigned varint32 reference ID. |
| NOT_NULL VALUE FLAG | -1 | Object is non-null but reference tracking is disabled for this type. Object data follows immediately. |
| REF VALUE FLAG | 0 | Object is referencable and this is its first occurrence. Object data follows. Assigns next reference ID. |
When reference tracking is disabled globally or for a specific field/type, only NULL FLAG and NOT_NULL VALUE FLAG are used.
Java native serialization uses the unified type ID layout shared with xlang:
full_type_id = (user_type_id << 8) | internal_type_id
internal_type_id is the low 8 bits describing the kind (enum/struct/ext, named variants, or a built-in type).user_type_id is the numeric registration ID (0-based) for user-defined enum/struct/ext types.NAMED_* internal IDs and carry names in metadata rather than embedding a user ID.Java native mode shares the xlang internal IDs for all values below 64. IDs 0~56 are defined by the xlang spec, while 57~63 are reserved for future internal use. These IDs are stable across languages.
See the internal type ID table in Xlang Serialization Format. Java shares all IDs < 64, with 57~63 reserved for future internal use.
Java native serialization assigns Java-specific built-ins starting at Types.BOUND + 5 (Types.BOUND is 64; 5 IDs are reserved for future use). Type IDs in 0~56 are shared with xlang; 57~63 are reserved; 64+ are only valid in Java native mode.
| Type ID | Name | Description |
|---|---|---|
| 69 | VOID_ID | java.lang.Void |
| 70 | CHAR_ID | java.lang.Character |
| 71 | PRIMITIVE_VOID_ID | void |
| 72 | PRIMITIVE_BOOL_ID | boolean |
| 73 | PRIMITIVE_INT8_ID | byte |
| 74 | PRIMITIVE_CHAR_ID | char |
| 75 | PRIMITIVE_INT16_ID | short |
| 76 | PRIMITIVE_INT32_ID | int |
| 77 | PRIMITIVE_FLOAT32_ID | float |
| 78 | PRIMITIVE_INT64_ID | long |
| 79 | PRIMITIVE_FLOAT64_ID | double |
| 80 | PRIMITIVE_BOOLEAN_ARRAY_ID | boolean[] |
| 81 | PRIMITIVE_BYTE_ARRAY_ID | byte[] |
| 82 | PRIMITIVE_CHAR_ARRAY_ID | char[] |
| 83 | PRIMITIVE_SHORT_ARRAY_ID | short[] |
| 84 | PRIMITIVE_INT_ARRAY_ID | int[] |
| 85 | PRIMITIVE_FLOAT_ARRAY_ID | float[] |
| 86 | PRIMITIVE_LONG_ARRAY_ID | long[] |
| 87 | PRIMITIVE_DOUBLE_ARRAY_ID | double[] |
| 88 | STRING_ARRAY_ID | String[] |
| 89 | OBJECT_ARRAY_ID | Object[] |
| 90 | ARRAYLIST_ID | java.util.ArrayList |
| 91 | HASHMAP_ID | java.util.HashMap |
| 92 | HASHSET_ID | java.util.HashSet |
| 93 | CLASS_ID | java.lang.Class |
| 94 | EMPTY_OBJECT_ID | empty object stub |
| 95 | LAMBDA_STUB_ID | lambda stub |
| 96 | JDK_PROXY_STUB_ID | JDK proxy stub |
| 97 | REPLACE_STUB_ID | writeReplace/readResolve stub |
| 98 | NONEXISTENT_META_SHARED_ID | meta-shared unknown class stub |
User-defined enum/struct/ext types can be registered by numeric ID or by name.
full_type_id = (user_id << 8) | internal_type_id.Named type selection rules for unregistered types:
Every value is written with a type ID followed by optional type metadata:
type_id using varuint32 small7 encoding.NAMED_ENUM, NAMED_STRUCT, NAMED_EXT, NAMED_COMPATIBLE_STRUCT:COMPATIBLE_STRUCT:When meta share is enabled, Java uses the streaming shared meta protocol and writes TypeDef bytes inline on first use.
| varuint32: index_marker | [class def bytes if new] | index_marker = (index << 1) | flag flag = 1 -> reference flag = 0 -> new type
flag == 1, this is a reference to a previously written type. No class def bytes follow.flag == 0, this is a new type definition and class def bytes are written inline.The index is assigned sequentially in the order types are first encountered.
Java native serialization supports two schema modes:
STRUCT or NAMED_STRUCT for user-defined classes.COMPATIBLE_STRUCT or NAMED_COMPATIBLE_STRUCT.ClassDef is the schema evolution metadata encoded for compatible structs. It is written inline when shared meta is enabled, or referenced by index when already seen.
| 8 bytes header | [varuint32 extra size] | class meta bytes |
Header layout (lower bits on the right):
| 50-bit hash | 4 bits reserved | 1 bit compress | 1 bit has_fields_meta | 8-bit size |
Class meta encodes a linearized class hierarchy (from parent to leaf) and field metadata:
| num_classes | class_layer_0 | class_layer_1 | ... | class_layer: | num_fields << 1 | registered_flag | [type_id if registered] | | namespace | type_name | field_infos |
num_classes stores (num_layers - 1) in a single byte.0b1111, read an extra varuint32 small7 and add it.num_classes + 1.registered_flag is 1 if the class is registered by numeric ID.Each field uses a compact header followed by its name bytes (omitted when TAG_ID is used) and its type info:
| field_header | [field_name_bytes] | field_type |
field_header bits:
Field name encoding:
If length is extended (size==7), an extra varuint32 small7 storing (len-1) - 7 follows.
Field types are encoded with a type tag and optional nested type info. For nested types, the header includes nullable/trackingRef flags in the low bits. Top-level field types use the tag only (no flags).
Type tags:
| Tag | Field type |
|---|---|
| 0 | Object (ObjectFieldType) |
| 1 | Map (MapFieldType) |
| 2 | Collection/List/Set (CollectionFieldType) |
| 3 | Array (ArrayFieldType) |
| 4 | Enum (EnumFieldType) |
| 5+ | Registered type (RegisteredFieldType) |
Encoding rules:
5 + type_id.For nested types, nullable/trackingRef flags are stored in the low bits of the header as (type_tag << 2) | (nullable << 1) | tracking_ref.
Namespace, type names, and field names use the same meta string encodings as the xlang spec.
Header format:
| 6 bits size | 2 bits encoding |
(size - 63) as varuint32 small7.Encodings:
Field name encoding is described in the ClassDef field header section. When using TAG_ID, the field name bytes are omitted and the tag ID is stored in the size field.
See the xlang specification for encoding algorithms and tables: docs/specification/xlang_serialization_spec.md#meta-string.
This section describes the byte layouts for common built-in serializers used in Java native serialization. Custom serializers (EXT) may define additional formats but must still follow the reference and type meta rules described above.
compressInt is enabled.longEncoding=VARINT.longEncoding=TAGGED.Varint encodings follow the xlang spec: docs/specification/xlang_serialization_spec.md#unsigned-varint32.
Strings are encoded as:
| varuint36_small: (num_bytes << 2) | coder | string bytes |
UTF16 is encoded as little endian 2-byte code units.
serializeEnumByName is enabled: write enum name as a meta string.Primitive byte arrays are encoded as:
| varuint32: num_bytes | raw bytes |
Primitive arrays use writePrimitiveArrayWithSize unless compression is enabled:
| varuint32: byte_length | raw bytes |
compressIntArray: int[] encoded as | varuint32: length | varint32... |.compressLongArray: long[] encoded as | varuint32: length | varint64/tagged... |.Object arrays encode length and a monomorphic flag:
| varuint32_small7: (length << 1) | mono_flag |
mono_flag == 1, all elements share a known component serializer. Each element uses ref flags and the component serializer writes the value.mono_flag == 0, each element uses ref flags and writes its own class info and data.Collections encode length and a one-byte elements header:
| varuint32_small7: length | elements_header | [elem_class_info] | elements... |
elements_header bits (see CollectionFlags):
If IS_SAME_TYPE is set and IS_DECL_ELEMENT_TYPE is not set, the element class info is written once before the elements. Element values then follow with either ref flags (if TRACKING_REF) or per-element null flags (if HAS_NULL).
If IS_SAME_TYPE is not set, each element is written with its own class info and data (and optionally ref flags).
Maps encode entry count and then a sequence of chunks. Each chunk groups entries that share key and value types.
| varuint32_small7: size | chunk_1 | chunk_2 | ... | chunk (non-null entries): | header | chunk_size | [key_class_info] | [value_class_info] | entries... |
header bits (see MapFlags):
If KEY_DECL_TYPE or VALUE_DECL_TYPE is unset, the corresponding class info is written once at the start of the chunk. chunk_size is a single byte (1..255) and MAX_CHUNK_SIZE is 255.
Entries with null key or null value are encoded as special single-entry chunks without a chunk_size byte:
NULL_KEY_VALUE_DECL_TYPE* flags, then value payloadNULL_VALUE_KEY_DECL_TYPE* flags, then key payloadKV_NULL header onlyThese chunks always represent exactly one entry.
Object values are encoded as:
| ref meta | type meta | field data |
Field data is written by the serializer selected by the class info. For standard object serialization:
DescriptorGrouper order: primitives, boxed primitives, built-ins, collections, maps, then other fields, with names sorted within each category.MetaSharedSerializer uses ClassDef field metadata to read and skip unknown fields.Extension types are encoded by their registered serializer. Type meta is still written before the value as described above. The serializer is responsible for the value layout.
When a BufferCallback is provided, the oob flag is set in the header and serializers may emit buffer references instead of inline bytes (for example, large primitive arrays). The out-of-band buffer protocol is specific to the callback implementation; the main stream only contains references to those buffers.