The data are serialized using little endian order overall. If bytes swap is costly, the byte order will be encoded as a flag in data.
The overall format are:
| fury header | object ref meta | object class meta | object value data |
Fury header consists starts one byte:
| resvered 4 bits | oob | xlang | endian | null |
BufferCallback is not null, unset otherwise.If meta share mode is enabled, uncompressed little-endian 4 bytes is appended to indicate the start offset of meta data.
Reference tracking handles whether the object is null, and whether to track reference for the object by writing corresponding flags and maintain internal state.
Reference flags:
| Flag | Byte Value | Description |
|---|---|---|
| NULL FLAG | -3 | This flag indicates that object is a not-null value. We don't use another byte to indicate REF, so that we can save one byte. |
| REF FLAG | -2 | this flag indicates the object is written before, and fury will write a unsigned ref id instead of serialize it again |
| NOT_NULL VALUE FLAG | -1 | this flag indicates that the object is a non-null value and fury doesn't track ref for this type of object. |
| REF VALUE FLAG | 0 | this flag indicates that the object is a referencable and first read. |
When reference tracking is disabled globally or only for some type, or for some type under some context such as some field of a class, only NULL FLAG and NOT_NULL VALUE FLAG will be used.
Depending on whether meta share mode is enabled, Fury will write class meta differently.
If schema consistent mode is enabled globally or enabled for current class, class meta will be written as follows:
class_id << 1 using fury unsigned int format.0b1 first, the little bit is different first bit of encoded class id, which is 0. Fury can use this information to determine whether read class by class id.If schema evolution mode is enabled globally or enabled for current class, class meta will be written as follows:
This mode will forbid streaming writing since it needs to look back for update the offset after the whole object graph writing and mete collecting is finished. TODO: We have plan to streamline meta writing but not started yet.
Enumerated string are mainly used to encode class name and field names. The format consists of header and binary.
Header are written using little endian order, Fury can read this flag first to determine how to deserialize the data.
If string hasn't been written before, the data will be written as follows:
| unsigned int: string binary size + 1bit: not written before | 61bits: murmur hash + 3 bits encoding flags | string binary |
| Encoding Flag | Pattern | Encoding Action |
|---|---|---|
| 0 | every char is in a-z._$| | LOWER_SPECIAL |
| 1 | every char is in a-z._$ except first char is upper case | replace first upper case char to lower case, then use LOWER_SPECIAL |
| 2 | every char is in a-zA-Z._$ | replace every upper case char by | + lower case, then use LOWER_SPECIAL, use this encoding if it's smaller than Encoding 3 |
| 3 | every char is in a-zA-Z._$ | use LOWER_UPPER_DIGIT_SPECIAL encoding if it's smaller than Encoding 2 |
| 4 | any utf-8 char | use UTF-8 encoding |
If string has been written before, the data will be written as follows:
| unsigned int: written string id + 1bit: written before |
String binary encoding:
| Algorithm | Pattern | Description |
|---|---|---|
| LOWER_SPECIAL | a-z._$| | every char is writen using 5 bits, a-z: 0b00000~0b11001, ._$|: 0b11010~0b11101 |
| LOWER_UPPER_DIGIT_SPECIAL | a-zA-Z0~9._$ | every char is writen using 6 bits, a-z: 0b00000~0b11110, A-Z: 0b11010~0b110011, 0~9: 0b110100~0b111101, ._$: 0b111110~0b1000000 |
| UTF-8 | any chars | UTF-8 encoding |
false, 1 for trueb & 0x80 == 0x80, then next byte should be read util first bit of next byte is unset.(v << 1) ^ (v >> 31) ZigZag algorithm, then encoding it as an unsigned int.b & 0x80 == 0x80, then next byte should be read util first bit is unset.| little-endian: ((int) value) << 1 || 0b1 | little-endian 8bytes long | (v << 1) ^ (v >> 63) ZigZag algorithm to reduce cost of small negative numbers, then encoding it as an unsigned long.Float.floatToRawIntBits, then write as binary by little endian order.Double.doubleToRawLongBits, then write as binary by little endian order.Format:
latin, 1 for utf-16, 2 for utf-8.latin/utf-16/utf-8.Which encoding to choose:
latin at runtime, if string is latin string, then use latin encoding, otherwise use utf-16.coder in String object for encoding, latin/utf-16 will be used for encoding.utf-8, then fury will use utf-8 to decode the data. But currently fury doesn't enable utf-8 encoding by default for java. Cross-language string serialization of fury use utf-8 by default.All collection serializer must extends
io.fury.serializer.collection.CollectionSerializer.
Format:
length(unsigned varint) | collection header | elements header | elements data
ArrayList/LinkedArrayList/HashSet/LinkedHashSet, this will be empty.TreeSet, this will be ComparatorArrayList, this may be extra object field info.In most cases, all collection elements are same type and not null, elements header will encode those homogeneous information to avoid the cost of writing it for every elements. Specifically, there are four kinds of information which will be encoded by elements header, each use one bit:
0b1 of header to flag it.0b10 of header to flag it. If ref tracking is enabled for this element type, this flag is invalid.0b100 of header to flag it.0b1000 of header to flag it.By default, all bits are unset, which means all elements won't track ref, all elements are same type,, not null and the actual element is the declare type in custom class field.
Based on the elements header, the serialization of elements data may skip ref flag/null flag/element class info.
io.fury.serializer.collection.CollectionSerializer#write/read can be taken as an example.
Enum are serialized as an