| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| https://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "https://forrest.apache.org/dtd/document-v20.dtd" [ |
| <!ENTITY % avro-entities PUBLIC "-//Apache//ENTITIES Avro//EN" |
| "../../../../build/avro.ent"> |
| %avro-entities; |
| ]> |
| <document> |
| <header> |
| <title>Apache Avro™ &AvroVersion; Specification</title> |
| </header> |
| <body> |
| |
| <section id="preamble"> |
| <title>Introduction</title> |
| |
| <p>This document defines Apache Avro. It is intended to be the |
| authoritative specification. Implementations of Avro must |
| adhere to this document. |
| </p> |
| |
| </section> |
| |
| <section id="schemas"> |
| <title>Schema Declaration</title> |
| <p>A Schema is represented in <a href="ext:json">JSON</a> by one of:</p> |
| <ul> |
| <li>A JSON string, naming a defined type.</li> |
| |
| <li>A JSON object, of the form: |
| |
| <source>{"type": "<em>typeName</em>" ...<em>attributes</em>...}</source> |
| |
| where <em>typeName</em> is either a primitive or derived |
| type name, as defined below. Attributes not defined in this |
| document are permitted as metadata, but must not affect |
| the format of serialized data. |
| </li> |
| <li>A JSON array, representing a union of embedded types.</li> |
| </ul> |
| |
| <section id="schema_primitive"> |
| <title>Primitive Types</title> |
| <p>The set of primitive type names is:</p> |
| <ul> |
| <li><code>null</code>: no value</li> |
| <li><code>boolean</code>: a binary value</li> |
| <li><code>int</code>: 32-bit signed integer</li> |
| <li><code>long</code>: 64-bit signed integer</li> |
| <li><code>float</code>: single precision (32-bit) IEEE 754 floating-point number</li> |
| <li><code>double</code>: double precision (64-bit) IEEE 754 floating-point number</li> |
| <li><code>bytes</code>: sequence of 8-bit unsigned bytes</li> |
| <li><code>string</code>: unicode character sequence</li> |
| </ul> |
| |
| <p>Primitive types have no specified attributes.</p> |
| |
| <p>Primitive type names are also defined type names. Thus, for |
| example, the schema "string" is equivalent to:</p> |
| |
| <source>{"type": "string"}</source> |
| |
| </section> |
| |
| <section id="schema_complex"> |
| <title>Complex Types</title> |
| |
| <p>Avro supports six kinds of complex types: records, enums, |
| arrays, maps, unions and fixed.</p> |
| |
| <section id="schema_record"> |
| <title>Records</title> |
| |
| <p>Records use the type name "record" and support three attributes:</p> |
| <ul> |
| <li><code>name</code>: a JSON string providing the name |
| of the record (required).</li> |
| <li><em>namespace</em>, a JSON string that qualifies the name;</li> |
| <li><code>doc</code>: a JSON string providing documentation to the |
| user of this schema (optional).</li> |
| <li><code>aliases:</code> a JSON array of strings, providing |
| alternate names for this record (optional).</li> |
| <li><code>fields</code>: a JSON array, listing fields (required). |
| Each field is a JSON object with the following attributes: |
| <ul> |
| <li><code>name</code>: a JSON string providing the name |
| of the field (required), and </li> |
| <li><code>doc</code>: a JSON string describing this field |
| for users (optional).</li> |
| <li><code>type:</code> a <a href="#schemas">schema</a>, as defined above</li> |
| <li><code>default:</code> A default value for this |
| field, used when reading instances that lack this |
| field (optional). Permitted values depend on the |
| field's schema type, according to the table below. |
| Default values for union fields correspond to the |
| first schema in the union. Default values for bytes |
| and fixed fields are JSON strings, where Unicode |
| code points 0-255 are mapped to unsigned 8-bit byte |
| values 0-255. |
| <table class="right"> |
| <caption>field default values</caption> |
| <tr><th>avro type</th><th>json type</th><th>example</th></tr> |
| <tr><td>null</td><td>null</td><td>null</td></tr> |
| <tr><td>boolean</td><td>boolean</td><td>true</td></tr> |
| <tr><td>int,long</td><td>integer</td><td>1</td></tr> |
| <tr><td>float,double</td><td>number</td><td>1.1</td></tr> |
| <tr><td>bytes</td><td>string</td><td>"\u00FF"</td></tr> |
| <tr><td>string</td><td>string</td><td>"foo"</td></tr> |
| <tr><td>record</td><td>object</td><td>{"a": 1}</td></tr> |
| <tr><td>enum</td><td>string</td><td>"FOO"</td></tr> |
| <tr><td>array</td><td>array</td><td>[1]</td></tr> |
| <tr><td>map</td><td>object</td><td>{"a": 1}</td></tr> |
| <tr><td>fixed</td><td>string</td><td>"\u00ff"</td></tr> |
| </table> |
| </li> |
| <li><code>order:</code> specifies how this field |
| impacts sort ordering of this record (optional). |
| Valid values are "ascending" (the default), |
| "descending", or "ignore". For more details on how |
| this is used, see the the <a href="#order">sort |
| order</a> section below.</li> |
| <li><code>aliases:</code> a JSON array of strings, providing |
| alternate names for this field (optional).</li> |
| </ul> |
| </li> |
| </ul> |
| |
| <p>For example, a linked-list of 64-bit values may be defined with:</p> |
| <source> |
| { |
| "type": "record", |
| "name": "LongList", |
| "aliases": ["LinkedLongs"], // old name for this |
| "fields" : [ |
| {"name": "value", "type": "long"}, // each element has a long |
| {"name": "next", "type": ["null", "LongList"]} // optional next element |
| ] |
| } |
| </source> |
| </section> |
| |
| <section> |
| <title>Enums</title> |
| |
| <p>Enums use the type name "enum" and support the following |
| attributes:</p> |
| <ul> |
| <li><code>name</code>: a JSON string providing the name |
| of the enum (required).</li> |
| <li><em>namespace</em>, a JSON string that qualifies the name;</li> |
| <li><code>aliases:</code> a JSON array of strings, providing |
| alternate names for this enum (optional).</li> |
| <li><code>doc</code>: a JSON string providing documentation to the |
| user of this schema (optional).</li> |
| <li><code>symbols</code>: a JSON array, listing symbols, |
| as JSON strings (required). All symbols in an enum must |
| be unique; duplicates are prohibited. Every symbol must |
| match the regular expression <code>[A-Za-z_][A-Za-z0-9_]*</code> |
| (the same requirement as for <a href="#names">names</a>).</li> |
| <li><code>default</code>: A default value for this |
| enumeration, used during resolution when the reader |
| encounters a symbol from the writer that isn't defined |
| in the reader's schema (optional). The value provided |
| here must be a JSON string that's a member of |
| the <code>symbols</code> array. |
| See documentation on schema resolution for how this gets |
| used.</li> |
| </ul> |
| <p>For example, playing card suits might be defined with:</p> |
| <source> |
| { |
| "type": "enum", |
| "name": "Suit", |
| "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] |
| } |
| </source> |
| </section> |
| |
| <section> |
| <title>Arrays</title> |
| <p>Arrays use the type name <code>"array"</code> and support |
| a single attribute:</p> |
| <ul> |
| <li><code>items</code>: the schema of the array's items.</li> |
| </ul> |
| <p>For example, an array of strings is declared |
| with:</p> |
| <source> |
| { |
| "type": "array", |
| "items" : "string", |
| "default": [] |
| } |
| </source> |
| </section> |
| |
| <section> |
| <title>Maps</title> |
| <p>Maps use the type name <code>"map"</code> and support |
| one attribute:</p> |
| <ul> |
| <li><code>values</code>: the schema of the map's values.</li> |
| </ul> |
| <p>Map keys are assumed to be strings.</p> |
| <p>For example, a map from string to long is declared |
| with:</p> |
| <source> |
| { |
| "type": "map", |
| "items" : "long", |
| "default": {} |
| } |
| </source> |
| </section> |
| |
| <section> |
| <title>Unions</title> |
| <p>Unions, as mentioned above, are represented using JSON |
| arrays. For example, <code>["null", "string"]</code> |
| declares a schema which may be either a null or string.</p> |
| <p>(Note that when a <a href="#schema_record">default |
| value</a> is specified for a record field whose type is a |
| union, the type of the default value must match the |
| <em>first</em> element of the union. Thus, for unions |
| containing "null", the "null" is usually listed first, since |
| the default value of such unions is typically null.)</p> |
| <p>Unions may not contain more than one schema with the same |
| type, except for the named types record, fixed and enum. For |
| example, unions containing two array types or two map types |
| are not permitted, but two types with different names are |
| permitted. (Names permit efficient resolution when reading |
| and writing unions.)</p> |
| <p>Unions may not immediately contain other unions.</p> |
| </section> |
| |
| <section> |
| <title>Fixed</title> |
| <p>Fixed uses the type name <code>"fixed"</code> and supports |
| two attributes:</p> |
| <ul> |
| <li><code>name</code>: a string naming this fixed (required).</li> |
| <li><em>namespace</em>, a string that qualifies the name;</li> |
| <li><code>aliases:</code> a JSON array of strings, providing |
| alternate names for this enum (optional).</li> |
| <li><code>size</code>: an integer, specifying the number |
| of bytes per value (required).</li> |
| </ul> |
| <p>For example, 16-byte quantity may be declared with:</p> |
| <source>{"type": "fixed", "size": 16, "name": "md5"}</source> |
| </section> |
| |
| |
| </section> <!-- end complex types --> |
| |
| <section id="names"> |
| <title>Names</title> |
| <p>Record, enums and fixed are named types. Each has |
| a <em>fullname</em> that is composed of two parts; |
| a <em>name</em> and a <em>namespace</em>. Equality of names |
| is defined on the fullname.</p> |
| <p>The name portion of a fullname, record field names, and |
| enum symbols must:</p> |
| <ul> |
| <li>start with <code>[A-Za-z_]</code></li> |
| <li>subsequently contain only <code>[A-Za-z0-9_]</code></li> |
| </ul> |
| <p>A namespace is a dot-separated sequence of such names. |
| The empty string may also be used as a namespace to indicate the |
| null namespace. |
| Equality of names (including field names and enum symbols) |
| as well as fullnames is case-sensitive.</p> |
| <p>In record, enum and fixed definitions, the fullname is |
| determined in one of the following ways:</p> |
| <ul> |
| <li>A name and namespace are both specified. For example, |
| one might use <code>"name": "X", "namespace": |
| "org.foo"</code> to indicate the |
| fullname <code>org.foo.X</code>.</li> |
| <li>A fullname is specified. If the name specified contains |
| a dot, then it is assumed to be a fullname, and any |
| namespace also specified is ignored. For example, |
| use <code>"name": "org.foo.X"</code> to indicate the |
| fullname <code>org.foo.X</code>.</li> |
| <li>A name only is specified, i.e., a name that contains no |
| dots. In this case the namespace is taken from the most |
| tightly enclosing schema or protocol. For example, |
| if <code>"name": "X"</code> is specified, and this occurs |
| within a field of the record definition |
| of <code>org.foo.Y</code>, then the fullname |
| is <code>org.foo.X</code>. If there is no enclosing |
| namespace then the null namespace is used.</li> |
| </ul> |
| <p>References to previously defined names are as in the latter |
| two cases above: if they contain a dot they are a fullname, if |
| they do not contain a dot, the namespace is the namespace of |
| the enclosing definition.</p> |
| <p>Primitive type names have no namespace and their names may |
| not be defined in any namespace.</p> |
| <p> A schema or protocol may not contain multiple definitions |
| of a fullname. Further, a name must be defined before it is |
| used ("before" in the depth-first, left-to-right traversal of |
| the JSON parse tree, where the <code>types</code> attribute of |
| a protocol is always deemed to come "before" the |
| <code>messages</code> attribute.) |
| </p> |
| </section> |
| |
| <section> |
| <title>Aliases</title> |
| <p>Named types and fields may have aliases. An implementation |
| may optionally use aliases to map a writer's schema to the |
| reader's. This faciliates both schema evolution as well as |
| processing disparate datasets.</p> |
| <p>Aliases function by re-writing the writer's schema using |
| aliases from the reader's schema. For example, if the |
| writer's schema was named "Foo" and the reader's schema is |
| named "Bar" and has an alias of "Foo", then the implementation |
| would act as though "Foo" were named "Bar" when reading. |
| Similarly, if data was written as a record with a field named |
| "x" and is read as a record with a field named "y" with alias |
| "x", then the implementation would act as though "x" were |
| named "y" when reading.</p> |
| <p>A type alias may be specified either as a fully |
| namespace-qualified, or relative to the namespace of the name |
| it is an alias for. For example, if a type named "a.b" has |
| aliases of "c" and "x.y", then the fully qualified names of |
| its aliases are "a.c" and "x.y".</p> |
| </section> |
| |
| </section> <!-- end schemas --> |
| |
| <section> |
| <title>Data Serialization and Deserialization</title> |
| |
| <p>Binary encoded Avro data does not include type information or |
| field names. The benefit is that the serialized data is small, but |
| as a result a schema must always be used in order to read Avro data |
| correctly. The best way to ensure that the schema is structurally |
| identical to the one used to write the data is to use the exact same |
| schema.</p> |
| |
| <p>Therefore, files or systems that store Avro data should always |
| include the writer's schema for that data. Avro-based remote procedure |
| call (RPC) systems must also guarantee that remote recipients of data |
| have a copy of the schema used to write that data. In general, it is |
| advisable that any reader of Avro data should use a schema that is |
| the same (as defined more fully in |
| <a href="#Parsing+Canonical+Form+for+Schemas">Parsing Canonical Form for |
| Schemas</a>) as the schema that was used to write the data in order to |
| deserialize it correctly. Deserializing data into a newer schema is |
| accomplished by specifying an additional schema, the results of which are |
| described in <a href="#Schema+Resolution">Schema Resolution</a>.</p> |
| |
| <p>In general, both serialization and deserialization proceed as a |
| depth-first, left-to-right traversal of the schema, serializing or |
| deserializing primitive types as they are encountered. Therefore, it is |
| possible, though not advisable, to read Avro data with a schema that |
| does not have the same Parsing Canonical Form as the schema with which |
| the data was written. In order for this to work, the serialized primitive |
| values must be compatible, in order value by value, with the items in the |
| deserialization schema. For example, int and long are always serialized |
| the same way, so an int could be deserialized as a long. Since the |
| compatibility of two schemas depends on both the data and the |
| serialization format (eg. binary is more permissive than JSON because JSON |
| includes field names, eg. a long that is too large will overflow an int), |
| it is simpler and more reliable to use schemas with identical Parsing |
| Canonical Form.</p> |
| |
| <section> |
| <title>Encodings</title> |
| <p>Avro specifies two serialization encodings: binary and |
| JSON. Most applications will use the binary encoding, as it |
| is smaller and faster. But, for debugging and web-based |
| applications, the JSON encoding may sometimes be |
| appropriate.</p> |
| </section> |
| |
| <section id="binary_encoding"> |
| <title>Binary Encoding</title> |
| <p>Binary encoding does not include field names, self-contained |
| information about the types of individual bytes, nor field or |
| record separators. Therefore readers are wholly reliant on |
| the schema used when the data was encoded.</p> |
| |
| <section id="binary_encode_primitive"> |
| <title>Primitive Types</title> |
| <p>Primitive types are encoded in binary as follows:</p> |
| <ul> |
| <li><code>null</code> is written as zero bytes.</li> |
| <li>a <code>boolean</code> is written as a single byte whose |
| value is either <code>0</code> (false) or <code>1</code> |
| (true).</li> |
| <li><code>int</code> and <code>long</code> values are written |
| using <a href="ext:vint">variable-length</a> |
| <a href="ext:zigzag">zig-zag</a> coding. Some examples: |
| <table class="right"> |
| <tr><th>value</th><th>hex</th></tr> |
| <tr><td><code> 0</code></td><td><code>00</code></td></tr> |
| <tr><td><code>-1</code></td><td><code>01</code></td></tr> |
| <tr><td><code> 1</code></td><td><code>02</code></td></tr> |
| <tr><td><code>-2</code></td><td><code>03</code></td></tr> |
| <tr><td><code> 2</code></td><td><code>04</code></td></tr> |
| <tr><td colspan="2"><code>...</code></td></tr> |
| <tr><td><code>-64</code></td><td><code>7f</code></td></tr> |
| <tr><td><code> 64</code></td><td><code> 80 01</code></td></tr> |
| <tr><td colspan="2"><code>...</code></td></tr> |
| </table> |
| </li> |
| <li>a <code>float</code> is written as 4 bytes. The float is |
| converted into a 32-bit integer using a method equivalent |
| to <a href="https://java.sun.com/javase/6/docs/api/java/lang/Float.html#floatToIntBits%28float%29">Java's floatToIntBits</a> and then encoded |
| in little-endian format.</li> |
| <li>a <code>double</code> is written as 8 bytes. The double |
| is converted into a 64-bit integer using a method equivalent |
| to <a href="https://java.sun.com/javase/6/docs/api/java/lang/Double.html#doubleToLongBits%28double%29">Java's |
| doubleToLongBits</a> and then encoded in little-endian |
| format.</li> |
| <li><code>bytes</code> are encoded as |
| a <code>long</code> followed by that many bytes of data. |
| </li> |
| <li>a <code>string</code> is encoded as |
| a <code>long</code> followed by that many bytes of UTF-8 |
| encoded character data. |
| <p>For example, the three-character string "foo" would |
| be encoded as the long value 3 (encoded as |
| hex <code>06</code>) followed by the UTF-8 encoding of |
| 'f', 'o', and 'o' (the hex bytes <code>66 6f |
| 6f</code>): |
| </p> |
| <source>06 66 6f 6f</source> |
| </li> |
| </ul> |
| |
| </section> |
| |
| |
| <section id="binary_encode_complex"> |
| <title>Complex Types</title> |
| <p>Complex types are encoded in binary as follows:</p> |
| |
| <section id="record_encoding"> |
| <title>Records</title> |
| <p>A record is encoded by encoding the values of its |
| fields in the order that they are declared. In other |
| words, a record is encoded as just the concatenation of |
| the encodings of its fields. Field values are encoded per |
| their schema.</p> |
| <p>For example, the record schema</p> |
| <source> |
| { |
| "type": "record", |
| "name": "test", |
| "fields" : [ |
| {"name": "a", "type": "long"}, |
| {"name": "b", "type": "string"} |
| ] |
| } |
| </source> |
| <p>An instance of this record whose <code>a</code> field has |
| value 27 (encoded as hex <code>36</code>) and |
| whose <code>b</code> field has value "foo" (encoded as hex |
| bytes <code>06 66 6f 6f</code>), would be encoded simply |
| as the concatenation of these, namely the hex byte |
| sequence:</p> |
| <source>36 06 66 6f 6f</source> |
| </section> |
| |
| <section id="enum_encoding"> |
| <title>Enums</title> |
| <p>An enum is encoded by a <code>int</code>, representing |
| the zero-based position of the symbol in the schema.</p> |
| <p>For example, consider the enum:</p> |
| <source> |
| {"type": "enum", "name": "Foo", "symbols": ["A", "B", "C", "D"] } |
| </source> |
| <p>This would be encoded by an <code>int</code> between |
| zero and three, with zero indicating "A", and 3 indicating |
| "D".</p> |
| </section> |
| |
| |
| <section id="array_encoding"> |
| <title>Arrays</title> |
| <p>Arrays are encoded as a series of <em>blocks</em>. |
| Each block consists of a <code>long</code> <em>count</em> |
| value, followed by that many array items. A block with |
| count zero indicates the end of the array. Each item is |
| encoded per the array's item schema.</p> |
| |
| <p>If a block's count is negative, its absolute value is used, |
| and the count is followed immediately by a <code>long</code> |
| block <em>size</em> indicating the number of bytes in the |
| block. This block size permits fast skipping through data, |
| e.g., when projecting a record to a subset of its fields.</p> |
| |
| <p>For example, the array schema</p> |
| <source>{"type": "array", "items": "long"}</source> |
| <p>an array containing the items 3 and 27 could be encoded |
| as the long value 2 (encoded as hex 04) followed by long |
| values 3 and 27 (encoded as hex <code>06 36</code>) |
| terminated by zero:</p> |
| <source>04 06 36 00</source> |
| |
| <p>The blocked representation permits one to read and write |
| arrays larger than can be buffered in memory, since one can |
| start writing items without knowing the full length of the |
| array.</p> |
| |
| </section> |
| |
| <section id="map_encoding"> |
| <title>Maps</title> |
| <p>Maps are encoded as a series of <em>blocks</em>. Each |
| block consists of a <code>long</code> <em>count</em> |
| value, followed by that many key/value pairs. A block |
| with count zero indicates the end of the map. Each item |
| is encoded per the map's value schema.</p> |
| |
| <p>If a block's count is negative, its absolute value is used, |
| and the count is followed immediately by a <code>long</code> |
| block <em>size</em> indicating the number of bytes in the |
| block. This block size permits fast skipping through data, |
| e.g., when projecting a record to a subset of its fields.</p> |
| |
| <p>The blocked representation permits one to read and write |
| maps larger than can be buffered in memory, since one can |
| start writing items without knowing the full length of the |
| map.</p> |
| |
| </section> |
| |
| <section id="union_encoding"> |
| <title>Unions</title> |
| <p>A union is encoded by first writing an <code>int</code> |
| value indicating the zero-based position within the |
| union of the schema of its value. The value is then |
| encoded per the indicated schema within the union.</p> |
| <p>For example, the union |
| schema <code>["null","string"]</code> would encode:</p> |
| <ul> |
| <li><code>null</code> as zero (the index of "null" in the union): |
| <source>00</source></li> |
| <li>the string <code>"a"</code> as one (the index of |
| "string" in the union, encoded as hex <code>02</code>), |
| followed by the serialized string: |
| <source>02 02 61</source></li> |
| </ul> |
| <p><em>NOTE</em>: Currently for C/C++ implementtions, the positions are practically an int, but theoretically a long. |
| In reality, we don't expect unions with 215M members </p> |
| </section> |
| |
| <section id="fixed_encoding"> |
| <title>Fixed</title> |
| <p>Fixed instances are encoded using the number of bytes |
| declared in the schema.</p> |
| </section> |
| |
| </section> <!-- end complex types --> |
| |
| </section> |
| |
| <section id="json_encoding"> |
| <title>JSON Encoding</title> |
| |
| <p>Except for unions, the JSON encoding is the same as is used |
| to encode <a href="#schema_record">field default |
| values</a>.</p> |
| |
| <p>The value of a union is encoded in JSON as follows:</p> |
| |
| <ul> |
| <li>if its type is <code>null</code>, then it is encoded as |
| a JSON null;</li> |
| <li>otherwise it is encoded as a JSON object with one |
| name/value pair whose name is the type's name and whose |
| value is the recursively encoded value. For Avro's named |
| types (record, fixed or enum) the user-specified name is |
| used, for other types the type name is used.</li> |
| </ul> |
| |
| <p>For example, the union |
| schema <code>["null","string","Foo"]</code>, where Foo is a |
| record name, would encode:</p> |
| <ul> |
| <li><code>null</code> as <code>null</code>;</li> |
| <li>the string <code>"a"</code> as |
| <code>{"string": "a"}</code>; and</li> |
| <li>a Foo instance as <code>{"Foo": {...}}</code>, |
| where <code>{...}</code> indicates the JSON encoding of a |
| Foo instance.</li> |
| </ul> |
| |
| <p>Note that the original schema is still required to correctly |
| process JSON-encoded data. For example, the JSON encoding does not |
| distinguish between <code>int</code> |
| and <code>long</code>, <code>float</code> |
| and <code>double</code>, records and maps, enums and strings, |
| etc.</p> |
| |
| </section> |
| |
| <section id="single_object_encoding"> |
| <title>Single-object encoding</title> |
| |
| <p>In some situations a single Avro serialized object is to be stored for a |
| longer period of time. One very common example is storing Avro records |
| for several weeks in an <a href="https://kafka.apache.org/">Apache Kafka</a> topic.</p> |
| <p>In the period after a schema change this persistence system will contain records |
| that have been written with different schemas. So the need arises to know which schema |
| was used to write a record to support schema evolution correctly. |
| In most cases the schema itself is too large to include in the message, |
| so this binary wrapper format supports the use case more effectively.</p> |
| |
| <section id="single_object_encoding_spec"> |
| <title>Single object encoding specification</title> |
| <p>Single Avro objects are encoded as follows:</p> |
| <ol> |
| <li>A two-byte marker, <code>C3 01</code>, to show that the message is Avro and uses this single-record format (version 1).</li> |
| <li>The 8-byte little-endian CRC-64-AVRO <a href="#schema_fingerprints">fingerprint</a> of the object's schema</li> |
| <li>The Avro object encoded using <a href="#binary_encoding">Avro's binary encoding</a></li> |
| </ol> |
| </section> |
| |
| <p>Implementations use the 2-byte marker to determine whether a payload is Avro. |
| This check helps avoid expensive lookups that resolve the schema from a |
| fingerprint, when the message is not an encoded Avro payload.</p> |
| |
| </section> |
| |
| </section> |
| |
| <section id="order"> |
| <title>Sort Order</title> |
| |
| <p>Avro defines a standard sort order for data. This permits |
| data written by one system to be efficiently sorted by another |
| system. This can be an important optimization, as sort order |
| comparisons are sometimes the most frequent per-object |
| operation. Note also that Avro binary-encoded data can be |
| efficiently ordered without deserializing it to objects.</p> |
| |
| <p>Data items may only be compared if they have identical |
| schemas. Pairwise comparisons are implemented recursively |
| with a depth-first, left-to-right traversal of the schema. |
| The first mismatch encountered determines the order of the |
| items.</p> |
| |
| <p>Two items with the same schema are compared according to the |
| following rules.</p> |
| <ul> |
| <li><code>null</code> data is always equal.</li> |
| <li><code>boolean</code> data is ordered with false before true.</li> |
| <li><code>int</code>, <code>long</code>, <code>float</code> |
| and <code>double</code> data is ordered by ascending numeric |
| value.</li> |
| <li><code>bytes</code> and <code>fixed</code> data are |
| compared lexicographically by unsigned 8-bit values.</li> |
| <li><code>string</code> data is compared lexicographically by |
| Unicode code point. Note that since UTF-8 is used as the |
| binary encoding for strings, sorting of bytes and string |
| binary data is identical.</li> |
| <li><code>array</code> data is compared lexicographically by |
| element.</li> |
| <li><code>enum</code> data is ordered by the symbol's position |
| in the enum schema. For example, an enum whose symbols are |
| <code>["z", "a"]</code> would sort <code>"z"</code> values |
| before <code>"a"</code> values.</li> |
| <li><code>union</code> data is first ordered by the branch |
| within the union, and, within that, by the type of the |
| branch. For example, an <code>["int", "string"]</code> |
| union would order all int values before all string values, |
| with the ints and strings themselves ordered as defined |
| above.</li> |
| <li><code>record</code> data is ordered lexicographically by |
| field. If a field specifies that its order is: |
| <ul> |
| <li><code>"ascending"</code>, then the order of its values |
| is unaltered.</li> |
| <li><code>"descending"</code>, then the order of its values |
| is reversed.</li> |
| <li><code>"ignore"</code>, then its values are ignored |
| when sorting.</li> |
| </ul> |
| </li> |
| <li><code>map</code> data may not be compared. It is an error |
| to attempt to compare data containing maps unless those maps |
| are in an <code>"order":"ignore"</code> record field. |
| </li> |
| </ul> |
| </section> |
| |
| <section> |
| <title>Object Container Files</title> |
| <p>Avro includes a simple object container file format. A file |
| has a schema, and all objects stored in the file must be written |
| according to that schema, using binary encoding. Objects are |
| stored in blocks that may be compressed. Syncronization markers |
| are used between blocks to permit efficient splitting of files |
| for MapReduce processing.</p> |
| |
| <p>Files may include arbitrary user-specified metadata.</p> |
| |
| <p>A file consists of:</p> |
| <ul> |
| <li>A <em>file header</em>, followed by</li> |
| <li>one or more <em>file data blocks</em>.</li> |
| </ul> |
| |
| <p>A file header consists of:</p> |
| <ul> |
| <li>Four bytes, ASCII 'O', 'b', 'j', followed by 1.</li> |
| <li><em>file metadata</em>, including the schema.</li> |
| <li>The 16-byte, randomly-generated sync marker for this file.</li> |
| </ul> |
| |
| <p>File metadata is written as if defined by the following <a |
| href="#map_encoding">map</a> schema:</p> |
| <source>{"type": "map", "values": "bytes"}</source> |
| |
| <p>All metadata properties that start with "avro." are reserved. |
| The following file metadata properties are currently used:</p> |
| <ul> |
| <li><strong>avro.schema</strong> contains the schema of objects |
| stored in the file, as JSON data (required).</li> |
| <li><strong>avro.codec</strong> the name of the compression codec |
| used to compress blocks, as a string. Implementations |
| are required to support the following codecs: "null" and "deflate". |
| If codec is absent, it is assumed to be "null". The codecs |
| are described with more detail below.</li> |
| </ul> |
| |
| <p>A file header is thus described by the following schema:</p> |
| <source> |
| {"type": "record", "name": "org.apache.avro.file.Header", |
| "fields" : [ |
| {"name": "magic", "type": {"type": "fixed", "name": "Magic", "size": 4}}, |
| {"name": "meta", "type": {"type": "map", "values": "bytes"}}, |
| {"name": "sync", "type": {"type": "fixed", "name": "Sync", "size": 16}}, |
| ] |
| } |
| </source> |
| |
| <p>A file data block consists of:</p> |
| <ul> |
| <li>A long indicating the count of objects in this block.</li> |
| <li>A long indicating the size in bytes of the serialized objects |
| in the current block, after any codec is applied</li> |
| <li>The serialized objects. If a codec is specified, this is |
| compressed by that codec.</li> |
| <li>The file's 16-byte sync marker.</li> |
| </ul> |
| <p>Thus, each block's binary data can be efficiently extracted or skipped without |
| deserializing the contents. The combination of block size, object counts, and |
| sync markers enable detection of corrupt blocks and help ensure data integrity.</p> |
| <section> |
| <title>Required Codecs</title> |
| <section> |
| <title>null</title> |
| <p>The "null" codec simply passes through data uncompressed.</p> |
| </section> |
| |
| <section> |
| <title>deflate</title> |
| <p>The "deflate" codec writes the data block using the |
| deflate algorithm as specified in |
| <a href="https://www.isi.edu/in-notes/rfc1951.txt">RFC 1951</a>, |
| and typically implemented using the zlib library. Note that this |
| format (unlike the "zlib format" in RFC 1950) does not have a |
| checksum. |
| </p> |
| </section> |
| </section> |
| <section> |
| <title>Optional Codecs</title> |
| <section> |
| <title>bzip2</title> |
| <p>The "bzip2" codec uses the <a href="https://sourceware.org/bzip2/">bzip2</a> |
| compression library.</p> |
| </section> |
| |
| <section> |
| <title>snappy</title> |
| <p>The "snappy" codec uses |
| Google's <a href="https://code.google.com/p/snappy/">Snappy</a> |
| compression library. Each compressed block is followed |
| by the 4-byte, big-endian CRC32 checksum of the |
| uncompressed data in the block.</p> |
| </section> |
| |
| <section> |
| <title>xz</title> |
| <p>The "xz" codec uses the <a href="https://tukaani.org/xz/">XZ</a> |
| compression library.</p> |
| </section> |
| |
| <section> |
| <title>zstandard</title> |
| <p>The "zstandard" codec uses |
| Facebook's <a href="https://facebook.github.io/zstd/">Zstandard</a> |
| compression library.</p> |
| </section> |
| </section> |
| </section> |
| |
| <section> |
| <title>Protocol Declaration</title> |
| <p>Avro protocols describe RPC interfaces. Like schemas, they are |
| defined with JSON text.</p> |
| |
| <p>A protocol is a JSON object with the following attributes:</p> |
| <ul> |
| <li><em>protocol</em>, a string, the name of the protocol |
| (required);</li> |
| <li><em>namespace</em>, an optional string that qualifies the name;</li> |
| <li><em>doc</em>, an optional string describing this protocol;</li> |
| <li><em>types</em>, an optional list of definitions of named types |
| (records, enums, fixed and errors). An error definition is |
| just like a record definition except it uses "error" instead |
| of "record". Note that forward references to named types |
| are not permitted.</li> |
| <li><em>messages</em>, an optional JSON object whose keys are |
| message names and whose values are objects whose attributes |
| are described below. No two messages may have the same |
| name.</li> |
| </ul> |
| <p>The name and namespace qualification rules defined for schema objects |
| apply to protocols as well.</p> |
| |
| <section> |
| <title>Messages</title> |
| <p>A message has attributes:</p> |
| <ul> |
| <li>a <em>doc</em>, an optional description of the message,</li> |
| <li>a <em>request</em>, a list of named, |
| typed <em>parameter</em> schemas (this has the same form |
| as the fields of a record declaration);</li> |
| <li>a <em>response</em> schema; </li> |
| <li>an optional union of declared <em>error</em> schemas. |
| The <em>effective</em> union has <code>"string"</code> |
| prepended to the declared union, to permit transmission of |
| undeclared "system" errors. For example, if the declared |
| error union is <code>["AccessError"]</code>, then the |
| effective union is <code>["string", "AccessError"]</code>. |
| When no errors are declared, the effective error union |
| is <code>["string"]</code>. Errors are serialized using |
| the effective union; however, a protocol's JSON |
| declaration contains only the declared union. |
| </li> |
| <li>an optional <em>one-way</em> boolean parameter.</li> |
| </ul> |
| <p>A request parameter list is processed equivalently to an |
| anonymous record. Since record field lists may vary between |
| reader and writer, request parameters may also differ |
| between the caller and responder, and such differences are |
| resolved in the same manner as record field differences.</p> |
| <p>The one-way parameter may only be true when the response type |
| is <code>"null"</code> and no errors are listed.</p> |
| </section> |
| <section> |
| <title>Sample Protocol</title> |
| <p>For example, one may define a simple HelloWorld protocol with:</p> |
| <source> |
| { |
| "namespace": "com.acme", |
| "protocol": "HelloWorld", |
| "doc": "Protocol Greetings", |
| |
| "types": [ |
| {"name": "Greeting", "type": "record", "fields": [ |
| {"name": "message", "type": "string"}]}, |
| {"name": "Curse", "type": "error", "fields": [ |
| {"name": "message", "type": "string"}]} |
| ], |
| |
| "messages": { |
| "hello": { |
| "doc": "Say hello.", |
| "request": [{"name": "greeting", "type": "Greeting" }], |
| "response": "Greeting", |
| "errors": ["Curse"] |
| } |
| } |
| } |
| </source> |
| </section> |
| </section> |
| |
| <section> |
| <title>Protocol Wire Format</title> |
| |
| <section> |
| <title>Message Transport</title> |
| <p>Messages may be transmitted via |
| different <em>transport</em> mechanisms.</p> |
| |
| <p>To the transport, a <em>message</em> is an opaque byte sequence.</p> |
| |
| <p>A transport is a system that supports:</p> |
| <ul> |
| <li><strong>transmission of request messages</strong> |
| </li> |
| <li><strong>receipt of corresponding response messages</strong> |
| <p>Servers may send a response message back to the client |
| corresponding to a request message. The mechanism of |
| correspondance is transport-specific. For example, in |
| HTTP it is implicit, since HTTP directly supports requests |
| and responses. But a transport that multiplexes many |
| client threads over a single socket would need to tag |
| messages with unique identifiers.</p> |
| </li> |
| </ul> |
| |
| <p>Transports may be either <em>stateless</em> |
| or <em>stateful</em>. In a stateless transport, messaging |
| assumes no established connection state, while stateful |
| transports establish connections that may be used for multiple |
| messages. This distinction is discussed further in |
| the <a href="#handshake">handshake</a> section below.</p> |
| |
| <section> |
| <title>HTTP as Transport</title> |
| <p>When |
| <a href="https://www.w3.org/Protocols/rfc2616/rfc2616.html">HTTP</a> |
| is used as a transport, each Avro message exchange is an |
| HTTP request/response pair. All messages of an Avro |
| protocol should share a single URL at an HTTP server. |
| Other protocols may also use that URL. Both normal and |
| error Avro response messages should use the 200 (OK) |
| response code. The chunked encoding may be used for |
| requests and responses, but, regardless the Avro request |
| and response are the entire content of an HTTP request and |
| response. The HTTP Content-Type of requests and responses |
| should be specified as "avro/binary". Requests should be |
| made using the POST method.</p> |
| <p>HTTP is used by Avro as a stateless transport.</p> |
| </section> |
| </section> |
| |
| <section> |
| <title>Message Framing</title> |
| <p>Avro messages are <em>framed</em> as a list of buffers.</p> |
| <p>Framing is a layer between messages and the transport. |
| It exists to optimize certain operations.</p> |
| |
| <p>The format of framed message data is:</p> |
| <ul> |
| <li>a series of <em>buffers</em>, where each buffer consists of: |
| <ul> |
| <li>a four-byte, big-endian <em>buffer length</em>, followed by</li> |
| <li>that many bytes of <em>buffer data</em>.</li> |
| </ul> |
| </li> |
| <li>A message is always terminated by a zero-length buffer.</li> |
| </ul> |
| |
| <p>Framing is transparent to request and response message |
| formats (described below). Any message may be presented as a |
| single or multiple buffers.</p> |
| |
| <p>Framing can permit readers to more efficiently get |
| different buffers from different sources and for writers to |
| more efficiently store different buffers to different |
| destinations. In particular, it can reduce the number of |
| times large binary objects are copied. For example, if an RPC |
| parameter consists of a megabyte of file data, that data can |
| be copied directly to a socket from a file descriptor, and, on |
| the other end, it could be written directly to a file |
| descriptor, never entering user space.</p> |
| |
| <p>A simple, recommended, framing policy is for writers to |
| create a new segment whenever a single binary object is |
| written that is larger than a normal output buffer. Small |
| objects are then appended in buffers, while larger objects are |
| written as their own buffers. When a reader then tries to |
| read a large object the runtime can hand it an entire buffer |
| directly, without having to copy it.</p> |
| </section> |
| |
| <section id="handshake"> |
| <title>Handshake</title> |
| |
| <p>The purpose of the handshake is to ensure that the client |
| and the server have each other's protocol definition, so that |
| the client can correctly deserialize responses, and the server |
| can correctly deserialize requests. Both clients and servers |
| should maintain a cache of recently seen protocols, so that, |
| in most cases, a handshake will be completed without extra |
| round-trip network exchanges or the transmission of full |
| protocol text.</p> |
| |
| <p>RPC requests and responses may not be processed until a |
| handshake has been completed. With a stateless transport, all |
| requests and responses are prefixed by handshakes. With a |
| stateful transport, handshakes are only attached to requests |
| and responses until a successful handshake response has been |
| returned over a connection. After this, request and response |
| payloads are sent without handshakes for the lifetime of that |
| connection.</p> |
| |
| <p>The handshake process uses the following record schemas:</p> |
| |
| <source> |
| { |
| "type": "record", |
| "name": "HandshakeRequest", "namespace":"org.apache.avro.ipc", |
| "fields": [ |
| {"name": "clientHash", |
| "type": {"type": "fixed", "name": "MD5", "size": 16}}, |
| {"name": "clientProtocol", "type": ["null", "string"]}, |
| {"name": "serverHash", "type": "MD5"}, |
| {"name": "meta", "type": ["null", {"type": "map", "values": "bytes"}]} |
| ] |
| } |
| { |
| "type": "record", |
| "name": "HandshakeResponse", "namespace": "org.apache.avro.ipc", |
| "fields": [ |
| {"name": "match", |
| "type": {"type": "enum", "name": "HandshakeMatch", |
| "symbols": ["BOTH", "CLIENT", "NONE"]}}, |
| {"name": "serverProtocol", |
| "type": ["null", "string"]}, |
| {"name": "serverHash", |
| "type": ["null", {"type": "fixed", "name": "MD5", "size": 16}]}, |
| {"name": "meta", |
| "type": ["null", {"type": "map", "values": "bytes"}]} |
| ] |
| } |
| </source> |
| |
| <ul> |
| <li>A client first prefixes each request with |
| a <code>HandshakeRequest</code> containing just the hash of |
| its protocol and of the server's protocol |
| (<code>clientHash!=null, clientProtocol=null, |
| serverHash!=null</code>), where the hashes are 128-bit MD5 |
| hashes of the JSON protocol text. If a client has never |
| connected to a given server, it sends its hash as a guess of |
| the server's hash, otherwise it sends the hash that it |
| previously obtained from this server.</li> |
| |
| <li>The server responds with |
| a <code>HandshakeResponse</code> containing one of: |
| <ul> |
| <li><code>match=BOTH, serverProtocol=null, |
| serverHash=null</code> if the client sent the valid hash |
| of the server's protocol and the server knows what |
| protocol corresponds to the client's hash. In this case, |
| the request is complete and the response data |
| immediately follows the HandshakeResponse.</li> |
| |
| <li><code>match=CLIENT, serverProtocol!=null, |
| serverHash!=null</code> if the server has previously |
| seen the client's protocol, but the client sent an |
| incorrect hash of the server's protocol. The request is |
| complete and the response data immediately follows the |
| HandshakeResponse. The client must use the returned |
| protocol to process the response and should also cache |
| that protocol and its hash for future interactions with |
| this server.</li> |
| |
| <li><code>match=NONE</code> if the server has not |
| previously seen the client's protocol. |
| The <code>serverHash</code> |
| and <code>serverProtocol</code> may also be non-null if |
| the server's protocol hash was incorrect. |
| |
| <p>In this case the client must then re-submit its request |
| with its protocol text (<code>clientHash!=null, |
| clientProtocol!=null, serverHash!=null</code>) and the |
| server should respond with a successful match |
| (<code>match=BOTH, serverProtocol=null, |
| serverHash=null</code>) as above.</p> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| |
| <p>The <code>meta</code> field is reserved for future |
| handshake enhancements.</p> |
| |
| </section> |
| |
| <section> |
| <title>Call Format</title> |
| <p>A <em>call</em> consists of a request message paired with |
| its resulting response or error message. Requests and |
| responses contain extensible metadata, and both kinds of |
| messages are framed as described above.</p> |
| |
| <p>The format of a call request is:</p> |
| <ul> |
| <li><em>request metadata</em>, a map with values of |
| type <code>bytes</code></li> |
| <li>the <em>message name</em>, an Avro string, |
| followed by</li> |
| <li>the message <em>parameters</em>. Parameters are |
| serialized according to the message's request |
| declaration.</li> |
| </ul> |
| |
| <p>When the empty string is used as a message name a server |
| should ignore the parameters and return an empty response. A |
| client may use this to ping a server or to perform a handshake |
| without sending a protocol message.</p> |
| |
| <p>When a message is declared one-way and a stateful |
| connection has been established by a successful handshake |
| response, no response data is sent. Otherwise the format of |
| the call response is:</p> |
| <ul> |
| <li><em>response metadata</em>, a map with values of |
| type <code>bytes</code></li> |
| <li>a one-byte <em>error flag</em> boolean, followed by either: |
| <ul> |
| <li>if the error flag is false, the message <em>response</em>, |
| serialized per the message's response schema.</li> |
| <li>if the error flag is true, the <em>error</em>, |
| serialized per the message's effective error union |
| schema.</li> |
| </ul> |
| </li> |
| </ul> |
| </section> |
| |
| </section> |
| |
| <section> |
| <title>Schema Resolution</title> |
| |
| <p>A reader of Avro data, whether from an RPC or a file, can |
| always parse that data because the original schema must be |
| provided along with the data. However, the reader may be |
| programmed to read data into a different schema. |
| For example, if the data was written with a different version |
| of the software than it is read, then fields may have been |
| added or removed from records. This section specifies how such |
| schema differences should be resolved.</p> |
| |
| <p>We refer to the schema used to write the data as |
| the <em>writer's</em> schema, and the schema that the |
| application expects the <em>reader's</em> schema. Differences |
| between these should be resolved as follows:</p> |
| |
| <ul> |
| <li><p>It is an error if the two schemas do not <em>match</em>.</p> |
| <p>To match, one of the following must hold:</p> |
| <ul> |
| <li>both schemas are arrays whose item types match</li> |
| <li>both schemas are maps whose value types match</li> |
| <li>both schemas are enums whose (unqualified) names match</li> |
| <li>both schemas are fixed whose sizes and (unqualified) names match</li> |
| <li>both schemas are records with the same (unqualified) name</li> |
| <li>either schema is a union</li> |
| <li>both schemas have same primitive type</li> |
| <li>the writer's schema may be <em>promoted</em> to the |
| reader's as follows: |
| <ul> |
| <li>int is promotable to long, float, or double</li> |
| <li>long is promotable to float or double</li> |
| <li>float is promotable to double</li> |
| <li>string is promotable to bytes</li> |
| <li>bytes is promotable to string</li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| |
| <li><strong>if both are records:</strong> |
| <ul> |
| <li>the ordering of fields may be different: fields are |
| matched by name.</li> |
| |
| <li>schemas for fields with the same name in both records |
| are resolved recursively.</li> |
| |
| <li>if the writer's record contains a field with a name |
| not present in the reader's record, the writer's value |
| for that field is ignored.</li> |
| |
| <li>if the reader's record schema has a field that |
| contains a default value, and writer's schema does not |
| have a field with the same name, then the reader should |
| use the default value from its field.</li> |
| |
| <li>if the reader's record schema has a field with no |
| default value, and writer's schema does not have a field |
| with the same name, an error is signalled.</li> |
| </ul> |
| </li> |
| |
| <li><strong>if both are enums:</strong> |
| <p>if the writer's symbol is not present in the reader's |
| enum and the reader has a <code>default</code> value, then |
| that value is used, otherwise an error is signalled.</p> |
| </li> |
| |
| <li><strong>if both are arrays:</strong> |
| <p>This resolution algorithm is applied recursively to the reader's and |
| writer's array item schemas.</p> |
| </li> |
| |
| <li><strong>if both are maps:</strong> |
| <p>This resolution algorithm is applied recursively to the reader's and |
| writer's value schemas.</p> |
| </li> |
| |
| <li><strong>if both are unions:</strong> |
| <p>The first schema in the reader's union that matches the |
| selected writer's union schema is recursively resolved |
| against it. if none match, an error is signalled.</p> |
| </li> |
| |
| <li><strong>if reader's is a union, but writer's is not</strong> |
| <p>The first schema in the reader's union that matches the |
| writer's schema is recursively resolved against it. If none |
| match, an error is signalled.</p> |
| </li> |
| |
| <li><strong>if writer's is a union, but reader's is not</strong> |
| <p>If the reader's schema matches the selected writer's schema, |
| it is recursively resolved against it. If they do not |
| match, an error is signalled.</p> |
| </li> |
| |
| </ul> |
| |
| <p>A schema's "doc" fields are ignored for the purposes of schema resolution. Hence, |
| the "doc" portion of a schema may be dropped at serialization.</p> |
| |
| </section> |
| |
| <section> |
| <title>Parsing Canonical Form for Schemas</title> |
| |
| <p>One of the defining characteristics of Avro is that a reader |
| must use the schema used by the writer of the data in |
| order to know how to read the data. This assumption results in a data |
| format that's compact and also amenable to many forms of schema |
| evolution. However, the specification so far has not defined |
| what it means for the reader to have the "same" schema as the |
| writer. Does the schema need to be textually identical? Well, |
| clearly adding or removing some whitespace to a JSON expression |
| does not change its meaning. At the same time, reordering the |
| fields of records clearly <em>does</em> change the meaning. So |
| what does it mean for a reader to have "the same" schema as a |
| writer?</p> |
| |
| <p><em>Parsing Canonical Form</em> is a transformation of a |
| writer's schema that let's us define what it means for two |
| schemas to be "the same" for the purpose of reading data written |
| against the schema. It is called <em>Parsing</em> Canonical Form |
| because the transformations strip away parts of the schema, like |
| "doc" attributes, that are irrelevant to readers trying to parse |
| incoming data. It is called <em>Canonical Form</em> because the |
| transformations normalize the JSON text (such as the order of |
| attributes) in a way that eliminates unimportant differences |
| between schemas. If the Parsing Canonical Forms of two |
| different schemas are textually equal, then those schemas are |
| "the same" as far as any reader is concerned, i.e., there is no |
| serialized data that would allow a reader to distinguish data |
| generated by a writer using one of the original schemas from |
| data generated by a writing using the other original schema. |
| (We sketch a proof of this property in a companion |
| document.)</p> |
| |
| <p>The next subsection specifies the transformations that define |
| Parsing Canonical Form. But with a well-defined canonical form, |
| it can be convenient to go one step further, transforming these |
| canonical forms into simple integers ("fingerprints") that can |
| be used to uniquely identify schemas. The subsection after next |
| recommends some standard practices for generating such |
| fingerprints.</p> |
| |
| <section> |
| <title>Transforming into Parsing Canonical Form</title> |
| |
| <p>Assuming an input schema (in JSON form) that's already |
| UTF-8 text for a <em>valid</em> Avro schema (including all |
| quotes as required by JSON), the following transformations |
| will produce its Parsing Canonical Form:</p> |
| <ul> |
| <li> [PRIMITIVES] Convert primitive schemas to their simple |
| form (e.g., <code>int</code> instead of |
| <code>{"type":"int"}</code>).</li> |
| |
| <li> [FULLNAMES] Replace short names with fullnames, using |
| applicable namespaces to do so. Then eliminate |
| <code>namespace</code> attributes, which are now redundant.</li> |
| |
| <li> [STRIP] Keep only attributes that are relevant to |
| parsing data, which are: <code>type</code>, |
| <code>name</code>, <code>fields</code>, |
| <code>symbols</code>, <code>items</code>, |
| <code>values</code>, <code>size</code>. Strip all others |
| (e.g., <code>doc</code> and <code>aliases</code>).</li> |
| |
| <li> [ORDER] Order the appearance of fields of JSON objects |
| as follows: <code>name</code>, <code>type</code>, |
| <code>fields</code>, <code>symbols</code>, |
| <code>items</code>, <code>values</code>, <code>size</code>. |
| For example, if an object has <code>type</code>, |
| <code>name</code>, and <code>size</code> fields, then the |
| <code>name</code> field should appear first, followed by the |
| <code>type</code> and then the <code>size</code> fields.</li> |
| |
| <li> [STRINGS] For all JSON string literals in the schema |
| text, replace any escaped characters (e.g., \uXXXX escapes) |
| with their UTF-8 equivalents.</li> |
| |
| <li> [INTEGERS] Eliminate quotes around and any leading |
| zeros in front of JSON integer literals (which appear in the |
| <code>size</code> attributes of <code>fixed</code> schemas).</li> |
| |
| <li> [WHITESPACE] Eliminate all whitespace in JSON outside of string literals.</li> |
| </ul> |
| </section> |
| |
| <section id="schema_fingerprints"> |
| <title>Schema Fingerprints</title> |
| |
| <p>"[A] fingerprinting algorithm is a procedure that maps an |
| arbitrarily large data item (such as a computer file) to a |
| much shorter bit string, its <em>fingerprint,</em> that |
| uniquely identifies the original data for all practical |
| purposes" (quoted from [<a |
| href="https://en.wikipedia.org/wiki/Fingerprint_(computing)">Wikipedia</a>]). |
| In the Avro context, fingerprints of Parsing Canonical Form |
| can be useful in a number of applications; for example, to |
| cache encoder and decoder objects, to tag data items with a |
| short substitute for the writer's full schema, and to quickly |
| negotiate common-case schemas between readers and writers.</p> |
| |
| <p>In designing fingerprinting algorithms, there is a |
| fundamental trade-off between the length of the fingerprint |
| and the probability of collisions. To help application |
| designers find appropriate points within this trade-off space, |
| while encouraging interoperability and ease of implementation, |
| we recommend using one of the following three algorithms when |
| fingerprinting Avro schemas:</p> |
| |
| <ul> |
| <li> When applications can tolerate longer fingerprints, we |
| recommend using the <a |
| href="https://en.wikipedia.org/wiki/SHA-2">SHA-256 digest |
| algorithm</a> to generate 256-bit fingerprints of Parsing |
| Canonical Forms. Most languages today have SHA-256 |
| implementations in their libraries.</li> |
| |
| <li> At the opposite extreme, the smallest fingerprint we |
| recommend is a 64-bit <a |
| href="https://en.wikipedia.org/wiki/Rabin_fingerprint">Rabin |
| fingerprint</a>. Below, we provide pseudo-code for this |
| algorithm that can be easily translated into any programming |
| language. 64-bit fingerprints should guarantee uniqueness |
| for schema caches of up to a million entries (for such a |
| cache, the chance of a collision is 3E-8). We don't |
| recommend shorter fingerprints, as the chances of collisions |
| is too great (for example, with 32-bit fingerprints, a cache |
| with as few as 100,000 schemas has a 50% chance of having a |
| collision).</li> |
| |
| <li>Between these two extremes, we recommend using the <a |
| href="https://en.wikipedia.org/wiki/MD5">MD5 message |
| digest</a> to generate 128-bit fingerprints. These make |
| sense only where very large numbers of schemas are being |
| manipulated (tens of millions); otherwise, 64-bit |
| fingerprints should be sufficient. As with SHA-256, MD5 |
| implementations are found in most libraries today.</li> |
| </ul> |
| |
| <p> These fingerprints are <em>not</em> meant to provide any |
| security guarantees, even the longer SHA-256-based ones. Most |
| Avro applications should be surrounded by security measures |
| that prevent attackers from writing random data and otherwise |
| interfering with the consumers of schemas. We recommend that |
| these surrounding mechanisms be used to prevent collision and |
| pre-image attacks (i.e., "forgery") on schema fingerprints, |
| rather than relying on the security properties of the |
| fingerprints themselves.</p> |
| |
| <p>Rabin fingerprints are <a |
| href="https://en.wikipedia.org/wiki/Cyclic_redundancy_check">cyclic |
| redundancy checks</a> computed using irreducible polynomials. |
| In the style of the Appendix of <a |
| href="https://www.ietf.org/rfc/rfc1952.txt">RFC 1952</a> |
| (pg 10), which defines the CRC-32 algorithm, here's our |
| definition of the 64-bit AVRO fingerprinting algorithm:</p> |
| |
| <source> |
| long fingerprint64(byte[] buf) { |
| if (FP_TABLE == null) initFPTable(); |
| long fp = EMPTY; |
| for (int i = 0; i < buf.length; i++) |
| fp = (fp >>> 8) ^ FP_TABLE[(int)(fp ^ buf[i]) & 0xff]; |
| return fp; |
| } |
| |
| static long EMPTY = 0xc15d213aa4d7a795L; |
| static long[] FP_TABLE = null; |
| |
| void initFPTable() { |
| FP_TABLE = new long[256]; |
| for (int i = 0; i < 256; i++) { |
| long fp = i; |
| for (int j = 0; j < 8; j++) |
| fp = (fp >>> 1) ^ (EMPTY & -(fp & 1L)); |
| FP_TABLE[i] = fp; |
| } |
| } |
| </source> |
| |
| <p>Readers interested in the mathematics behind this |
| algorithm may want to read |
| <a href="https://books.google.com/books?id=XD9iAwAAQBAJ&pg=PA319" |
| >Chapter 14 of the Second Edition of <em>Hacker's Delight</em></a>. |
| (Unlike RFC-1952 and the book chapter, we prepend |
| a single one bit to messages. We do this because CRCs ignore |
| leading zero bits, which can be problematic. Our code |
| prepends a one-bit by initializing fingerprints using |
| <code>EMPTY</code>, rather than initializing using zero as in |
| RFC-1952 and the book chapter.)</p> |
| </section> |
| </section> |
| |
| <section> |
| <title>Logical Types</title> |
| |
| <p>A logical type is an Avro primitive or complex type with extra attributes to |
| represent a derived type. The attribute <code>logicalType</code> must |
| always be present for a logical type, and is a string with the name of one of |
| the logical types listed later in this section. Other attributes may be defined |
| for particular logical types.</p> |
| |
| <p>A logical type is always serialized using its underlying Avro type so |
| that values are encoded in exactly the same way as the equivalent Avro |
| type that does not have a <code>logicalType</code> attribute. Language |
| implementations may choose to represent logical types with an |
| appropriate native type, although this is not required.</p> |
| |
| <p>Language implementations must ignore unknown logical types when |
| reading, and should use the underlying Avro type. If a logical type is |
| invalid, for example a decimal with scale greater than its precision, |
| then implementations should ignore the logical type and use the |
| underlying Avro type.</p> |
| |
| <section> |
| <title>Decimal</title> |
| <p>The <code>decimal</code> logical type represents an arbitrary-precision signed |
| decimal number of the form <em>unscaled × 10<sup>-scale</sup></em>.</p> |
| |
| <p>A <code>decimal</code> logical type annotates Avro |
| <code>bytes</code> or <code>fixed</code> types. The byte array must |
| contain the two's-complement representation of the unscaled integer |
| value in big-endian byte order. The scale is fixed, and is specified |
| using an attribute.</p> |
| |
| <p>The following attributes are supported:</p> |
| <ul> |
| <li><code>scale</code>, a JSON integer representing the scale |
| (optional). If not specified the scale is 0.</li> |
| <li><code>precision</code>, a JSON integer representing the (maximum) |
| precision of decimals stored in this type (required).</li> |
| </ul> |
| |
| <p>For example, the following schema represents decimal numbers with a |
| maximum precision of 4 and a scale of 2:</p> |
| <source> |
| { |
| "type": "bytes", |
| "logicalType": "decimal", |
| "precision": 4, |
| "scale": 2 |
| } |
| </source> |
| |
| <p>Precision must be a positive integer greater than zero. If the |
| underlying type is a <code>fixed</code>, then the precision is |
| limited by its size. An array of length <code>n</code> can store at |
| most <em>floor(log_10(2<sup>8 × n - 1</sup> - 1))</em> |
| base-10 digits of precision.</p> |
| |
| <p>Scale must be zero or a positive integer less than or equal to the |
| precision.</p> |
| |
| <p>For the purposes of schema resolution, two schemas that are |
| <code>decimal</code> logical types <em>match</em> if their scales and |
| precisions match.</p> |
| |
| </section> |
| |
| <section> |
| <title>UUID</title> |
| <p> |
| The <code>uuid</code> logical type represents a random generated universally unique identifier (UUID). |
| </p> |
| |
| <p> |
| A <code>uuid</code> logical type annotates an Avro <code>string</code>. The string has to conform with <a href="https://www.ietf.org/rfc/rfc4122.txt">RFC-4122</a> |
| </p> |
| </section> |
| |
| <section> |
| <title>Date</title> |
| <p> |
| The <code>date</code> logical type represents a date within the calendar, with no reference to a particular time zone or time of day. |
| </p> |
| <p> |
| A <code>date</code> logical type annotates an Avro <code>int</code>, where the int stores the number of days from the unix epoch, 1 January 1970 (ISO calendar). |
| </p> |
| <p>The following schema represents a date:</p> |
| <source> |
| { |
| "type": "int", |
| "logicalType": "date" |
| } |
| </source> |
| </section> |
| |
| <section> |
| <title>Time (millisecond precision)</title> |
| <p> |
| The <code>time-millis</code> logical type represents a time of day, with no reference to a particular calendar, time zone or date, with a precision of one millisecond. |
| </p> |
| <p> |
| A <code>time-millis</code> logical type annotates an Avro <code>int</code>, where the int stores the number of milliseconds after midnight, 00:00:00.000. |
| </p> |
| </section> |
| |
| <section> |
| <title>Time (microsecond precision)</title> |
| <p> |
| The <code>time-micros</code> logical type represents a time of day, with no reference to a particular calendar, time zone or date, with a precision of one microsecond. |
| </p> |
| <p> |
| A <code>time-micros</code> logical type annotates an Avro <code>long</code>, where the long stores the number of microseconds after midnight, 00:00:00.000000. |
| </p> |
| </section> |
| |
| <section> |
| <title>Timestamp (millisecond precision)</title> |
| <p> |
| The <code>timestamp-millis</code> logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one millisecond. |
| Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. |
| In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. |
| </p> |
| <p> |
| A <code>timestamp-millis</code> logical type annotates an Avro <code>long</code>, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC. |
| </p> |
| </section> |
| |
| <section> |
| <title>Timestamp (microsecond precision)</title> |
| <p> |
| The <code>timestamp-micros</code> logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one microsecond. |
| Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. |
| In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. |
| </p> |
| <p> |
| A <code>timestamp-micros</code> logical type annotates an Avro <code>long</code>, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC. |
| </p> |
| </section> |
| |
| <section> |
| <title>Local timestamp (millisecond precision)</title> |
| <p> |
| The <code>local-timestamp-millis</code> logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one millisecond. |
| </p> |
| <p> |
| A <code>local-timestamp-millis</code> logical type annotates an Avro <code>long</code>, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000. |
| </p> |
| </section> |
| |
| <section> |
| <title>Local timestamp (microsecond precision)</title> |
| <p> |
| The <code>local-timestamp-micros</code> logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one microsecond. |
| </p> |
| <p> |
| A <code>local-timestamp-micros</code> logical type annotates an Avro <code>long</code>, where the long stores the number of microseconds, from 1 January 1970 00:00:00.000000. |
| </p> |
| </section> |
| |
| <section> |
| <title>Duration</title> |
| <p> |
| The <code>duration</code> logical type represents an amount of time defined by a number of months, days and milliseconds. This is not equivalent to a number of milliseconds, because, depending on the moment in time from which the duration is measured, the number of days in the month and number of milliseconds in a day may differ. Other standard periods such as years, quarters, hours and minutes can be expressed through these basic periods. |
| </p> |
| <p> |
| A <code>duration</code> logical type annotates Avro <code>fixed</code> type of size 12, which stores three little-endian unsigned integers that represent durations at different granularities of time. The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds. |
| </p> |
| </section> |
| |
| </section> |
| |
| <p><em>Apache Avro, Avro, Apache, and the Avro and Apache logos are |
| trademarks of The Apache Software Foundation.</em></p> |
| |
| </body> |
| </document> |