doc/src/content/xdocs/spec.xml - avro - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd" [
   <!ENTITY % avro-entities PUBLIC "-//Apache//ENTITIES Avro//EN"
 	   "../../../../build/avro.ent">
   %avro-entities;
 ]>
 <document>
   <header>
     <title>Apache Avro&#153; &AvroVersion; Specification</title>
   </header>
   <body>

     <section id="preamble">
       <title>Introduction</title>

       <p>This document defines Apache Avro.  It is intended to be the
         authoritative specification. Implementations of Avro must
         adhere to this document.
       </p>

     </section>

     <section id="schemas">
       <title>Schema Declaration</title>
       <p>A Schema is represented in <a href="ext:json">JSON</a> by one of:</p>
       <ul>
         <li>A JSON string, naming a defined type.</li>

         <li>A JSON object, of the form:

           <source>{"type": "<em>typeName</em>" ...<em>attributes</em>...}</source>

           where <em>typeName</em> is either a primitive or derived
           type name, as defined below.  Attributes not defined in this
           document are permitted as metadata, but must not affect
           the format of serialized data.
           </li>
         <li>A JSON array, representing a union of embedded types.</li>
       </ul>

       <section id="schema_primitive">
         <title>Primitive Types</title>
         <p>The set of primitive type names is:</p>
         <ul>
           <li><code>null</code>: no value</li>
           <li><code>boolean</code>: a binary value</li>
           <li><code>int</code>: 32-bit signed integer</li>
           <li><code>long</code>: 64-bit signed integer</li>
           <li><code>float</code>: single precision (32-bit) IEEE 754 floating-point number</li>
           <li><code>double</code>: double precision (64-bit) IEEE 754 floating-point number</li>
           <li><code>bytes</code>: sequence of 8-bit unsigned bytes</li>
           <li><code>string</code>: unicode character sequence</li>
         </ul>

         <p>Primitive types have no specified attributes.</p>

         <p>Primitive type names are also defined type names.  Thus, for
           example, the schema "string" is equivalent to:</p>

         <source>{"type": "string"}</source>

       </section>

       <section id="schema_complex">
         <title>Complex Types</title>

         <p>Avro supports six kinds of complex types: records, enums,
         arrays, maps, unions and fixed.</p>

         <section id="schema_record">
           <title>Records</title>

 	  <p>Records use the type name "record" and support three attributes:</p>
 	  <ul>
 	    <li><code>name</code>: a JSON string providing the name
 	    of the record (required).</li>
 	    <li><em>namespace</em>, a JSON string that qualifies the name;</li>
 	    <li><code>doc</code>: a JSON string providing documentation to the
 	    user of this schema (optional).</li>
 	    <li><code>aliases:</code> a JSON array of strings, providing
 	      alternate names for this record (optional).</li>
 	    <li><code>fields</code>: a JSON array, listing fields (required).
 	    Each field is a JSON object with the following attributes:
 	      <ul>
 		<li><code>name</code>: a JSON string providing the name
 		  of the field (required), and </li>
 		<li><code>doc</code>: a JSON string describing this field
                   for users (optional).</li>
 		<li><code>type:</code> A JSON object defining a schema, or
 		  a JSON string naming a record definition
 		  (required).</li>
 		<li><code>default:</code> A default value for this
 		  field, used when reading instances that lack this
 		  field (optional).  Permitted values depend on the
 		  field's schema type, according to the table below.
 		  Default values for union fields correspond to the
 		  first schema in the union. Default values for bytes
 		  and fixed fields are JSON strings, where Unicode
 		  code points 0-255 are mapped to unsigned 8-bit byte
 		  values 0-255.
 		  <table class="right">
 		    <caption>field default values</caption>
 		    <tr><th>avro type</th><th>json type</th><th>example</th></tr>
 		    <tr><td>null</td><td>null</td><td>null</td></tr>
 		    <tr><td>boolean</td><td>boolean</td><td>true</td></tr>
 		    <tr><td>int,long</td><td>integer</td><td>1</td></tr>
 		    <tr><td>float,double</td><td>number</td><td>1.1</td></tr>
 		    <tr><td>bytes</td><td>string</td><td>"\u00FF"</td></tr>
 		    <tr><td>string</td><td>string</td><td>"foo"</td></tr>
 		    <tr><td>record</td><td>object</td><td>{"a": 1}</td></tr>
 		    <tr><td>enum</td><td>string</td><td>"FOO"</td></tr>
 		    <tr><td>array</td><td>array</td><td>[1]</td></tr>
 		    <tr><td>map</td><td>object</td><td>{"a": 1}</td></tr>
 		    <tr><td>fixed</td><td>string</td><td>"\u00ff"</td></tr>
 		  </table>
 		</li>
 		<li><code>order:</code> specifies how this field
 		  impacts sort ordering of this record (optional).
 		  Valid values are "ascending" (the default),
 		  "descending", or "ignore".  For more details on how
 		  this is used, see the the <a href="#order">sort
 		  order</a> section below.</li>
 		<li><code>aliases:</code> a JSON array of strings, providing
 		  alternate names for this field (optional).</li>
 	      </ul>
 	    </li>
 	  </ul>

 	  <p>For example, a linked-list of 64-bit values may be defined with:</p>
 	  <source>
 {
   "type": "record",
   "name": "LongList",
   "aliases": ["LinkedLongs"],                      // old name for this
   "fields" : [
     {"name": "value", "type": "long"},             // each element has a long
     {"name": "next", "type": ["LongList", "null"]} // optional next element
   ]
 }
 	  </source>
 	</section>

         <section>
           <title>Enums</title>

 	  <p>Enums use the type name "enum" and support the following
 	  attributes:</p>
 	  <ul>
 	    <li><code>name</code>: a JSON string providing the name
 	    of the enum (required).</li>
 	    <li><em>namespace</em>, a JSON string that qualifies the name;</li>
 	    <li><code>aliases:</code> a JSON array of strings, providing
 	      alternate names for this enum (optional).</li>
 	    <li><code>doc</code>: a JSON string providing documentation to the
 	    user of this schema (optional).</li>
 	    <li><code>symbols</code>: a JSON array, listing symbols,
 	    as JSON strings (required).  All symbols in an enum must
 	    be unique; duplicates are prohibited.</li>
 	  </ul>
 	  <p>For example, playing card suits might be defined with:</p>
 	  <source>
 { "type": "enum",
   "name": "Suit",
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
 }
 	  </source>
 	</section>

         <section>
           <title>Arrays</title>
           <p>Arrays use the type name <code>"array"</code> and support
           a single attribute:</p>
 	  <ul>
             <li><code>items</code>: the schema of the array's items.</li>
 	  </ul>
 	  <p>For example, an array of strings is declared
 	  with:</p>
 	  <source>{"type": "array", "items": "string"}</source>
 	</section>

         <section>
           <title>Maps</title>
           <p>Maps use the type name <code>"map"</code> and support
           one attribute:</p>
 	  <ul>
             <li><code>values</code>: the schema of the map's values.</li>
 	  </ul>
 	  <p>Map keys are assumed to be strings.</p>
 	  <p>For example, a map from string to long is declared
 	  with:</p>
 	  <source>{"type": "map", "values": "long"}</source>
 	</section>

         <section>
           <title>Unions</title>
           <p>Unions, as mentioned above, are represented using JSON
           arrays.  For example, <code>["string", "null"]</code>
           declares a schema which may be either a string or null.</p>
 	  <p>Unions may not contain more than one schema with the same
 	  type, except for the named types record, fixed and enum.  For
 	  example, unions containing two array types or two map types
 	  are not permitted, but two types with different names are
 	  permitted.  (Names permit efficient resolution when reading
 	  and writing unions.)</p>
 	  <p>Unions may not immediately contain other unions.</p>
         </section>

         <section>
           <title>Fixed</title>
           <p>Fixed uses the type name <code>"fixed"</code> and supports
           two attributes:</p>
 	  <ul>
 	    <li><code>name</code>: a string naming this fixed (required).</li>
 	    <li><em>namespace</em>, a string that qualifies the name;</li>
 	    <li><code>aliases:</code> a JSON array of strings, providing
 	      alternate names for this enum (optional).</li>
             <li><code>size</code>: an integer, specifying the number
             of bytes per value (required).</li>
 	  </ul>
 	  <p>For example, 16-byte quantity may be declared with:</p>
 	  <source>{"type": "fixed", "size": 16, "name": "md5"}</source>
 	</section>


       </section> <!-- end complex types -->

       <section>
 	<title>Names</title>
         <p>Record, enums and fixed are named types.  Each has
           a <em>fullname</em> that is composed of two parts;
           a <em>name</em> and a <em>namespace</em>.  Equality of names
           is defined on the fullname.</p>
 	<p>The name portion of a fullname, and record field names must:</p>
 	<ul>
           <li>start with <code>[A-Za-z_]</code></li>
           <li>subsequently contain only <code>[A-Za-z0-9_]</code></li>
 	</ul>
         <p>A namespace is a dot-separated sequence of such names.</p>
         <p>In record, enum and fixed definitions, the fullname is
         determined in one of the following ways:</p>
 	<ul>
 	  <li>A name and namespace are both specified.  For example,
 	  one might use <code>"name": "X", "namespace":
 	  "org.foo"</code> to indicate the
 	  fullname <code>org.foo.X</code>.</li>
 	  <li>A fullname is specified.  If the name specified contains
 	  a dot, then it is assumed to be a fullname, and any
 	  namespace also specified is ignored.  For example,
 	  use <code>"name": "org.foo.X"</code> to indicate the
 	  fullname <code>org.foo.X</code>.</li>
 	  <li>A name only is specified, i.e., a name that contains no
 	  dots.  In this case the namespace is taken from the most
 	  tightly enclosing schema or protocol.  For example,
 	  if <code>"name": "X"</code> is specified, and this occurs
 	  within a field of the record definition
 	  of <code>org.foo.Y</code>, then the fullname
 	  is <code>org.foo.X</code>.</li>
 	</ul>
 	<p>References to previously defined names are as in the latter
 	two cases above: if they contain a dot they are a fullname, if
 	they do not contain a dot, the namespace is the namespace of
 	the enclosing definition.</p>
 	<p>Primitive type names have no namespace and their names may
 	not be defined in any namespace.  A schema may only contain
 	multiple definitions of a fullname if the definitions are
 	equivalent.</p>
       </section>

       <section>
 	<title>Aliases</title>
 	<p>Named types and fields may have aliases.  An implementation
         may optionally use aliases to map a writer's schema to the
         reader's.  This faciliates both schema evolution as well as
         processing disparate datasets.</p>
 	<p>Aliases function by re-writing the writer's schema using
         aliases from the reader's schema.  For example, if the
         writer's schema was named "Foo" and the reader's schema is
         named "Bar" and has an alias of "Foo", then the implementation
         would act as though "Foo" were named "Bar" when reading.
         Similarly, if data was written as a record with a field named
         "x" and is read as a record with a field named "y" with alias
         "x", then the implementation would act as though "x" were
         named "y" when reading.</p>
 	<p>A type alias may be specified either as a fully
         namespace-qualified, or relative to the namespace of the name
         it is an alias for.  For example, if a type named "a.b" has
         aliases of "c" and "x.y", then the fully qualified names of
         its aliases are "a.c" and "x.y".</p>
       </section>

     </section> <!-- end schemas -->

     <section>
       <title>Data Serialization</title>

       <p>Avro data is always serialized with its schema.  Files that
 	store Avro data should always also include the schema for that
 	data in the same file.  Avro-based remote procedure call (RPC)
 	systems must also guarantee that remote recipients of data
 	have a copy of the schema used to write that data.</p>

       <p>Because the schema used to write data is always available
 	when the data is read, Avro data itself is not tagged with
 	type information.  The schema is required to parse data.</p>

       <p>In general, both serialization and deserialization proceed as
       a depth-first, left-to-right traversal of the schema,
       serializing primitive types as they are encountered.</p>

       <section>
 	<title>Encodings</title>
 	<p>Avro specifies two serialization encodings: binary and
 	  JSON.  Most applications will use the binary encoding, as it
 	  is smaller and faster.  But, for debugging and web-based
 	  applications, the JSON encoding may sometimes be
 	  appropriate.</p>
       </section>

       <section id="binary_encoding">
         <title>Binary Encoding</title>

 	<section id="binary_encode_primitive">
           <title>Primitive Types</title>
           <p>Primitive types are encoded in binary as follows:</p>
           <ul>
             <li><code>null</code> is written as zero bytes.</li>
             <li>a <code>boolean</code> is written as a single byte whose
               value is either <code>0</code> (false) or <code>1</code>
               (true).</li>
             <li><code>int</code> and <code>long</code> values are written
               using <a href="ext:vint">variable-length</a>
 	      <a href="ext:zigzag">zig-zag</a> coding.  Some examples:
 	      <table class="right">
 		<tr><th>value</th><th>hex</th></tr>
 		<tr><td><code> 0</code></td><td><code>00</code></td></tr>
 		<tr><td><code>-1</code></td><td><code>01</code></td></tr>
 		<tr><td><code> 1</code></td><td><code>02</code></td></tr>
 		<tr><td><code>-2</code></td><td><code>03</code></td></tr>
 		<tr><td><code> 2</code></td><td><code>04</code></td></tr>
 		<tr><td colspan="2"><code>...</code></td></tr>
 		<tr><td><code>-64</code></td><td><code>7f</code></td></tr>
 		<tr><td><code> 64</code></td><td><code>&nbsp;80 01</code></td></tr>
 		<tr><td colspan="2"><code>...</code></td></tr>
 	      </table>
 	    </li>
             <li>a <code>float</code> is written as 4 bytes. The float is
               converted into a 32-bit integer using a method equivalent
               to <a href="http://java.sun.com/javase/6/docs/api/java/lang/Float.html#floatToIntBits%28float%29">Java's floatToIntBits</a> and then encoded
               in little-endian format.</li>
             <li>a <code>double</code> is written as 8 bytes. The double
               is converted into a 64-bit integer using a method equivalent
               to <a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#doubleToLongBits%28double%29">Java's
 		doubleToLongBits</a> and then encoded in little-endian
               format.</li>
             <li><code>bytes</code> are encoded as
               a <code>long</code> followed by that many bytes of data.
             </li>
             <li>a <code>string</code> is encoded as
               a <code>long</code> followed by that many bytes of UTF-8
               encoded character data.
               <p>For example, the three-character string "foo" would
               be encoded as the long value 3 (encoded as
               hex <code>06</code>) followed by the UTF-8 encoding of
               'f', 'o', and 'o' (the hex bytes <code>66 6f
               6f</code>):
               </p>
               <source>06 66 6f 6f</source>
             </li>
           </ul>

 	</section>


 	<section id="binary_encode_complex">
           <title>Complex Types</title>
           <p>Complex types are encoded in binary as follows:</p>

           <section>
             <title>Records</title>
 	    <p>A record is encoded by encoding the values of its
 	      fields in the order that they are declared.  In other
 	      words, a record is encoded as just the concatenation of
 	      the encodings of its fields.  Field values are encoded per
 	      their schema.</p>
 	    <p>For example, the record schema</p>
 	    <source>
 	      {
 	      "type": "record",
 	      "name": "test",
 	      "fields" : [
 	      {"name": "a", "type": "long"},
 	      {"name": "b", "type": "string"}
 	      ]
 	      }
 	    </source>
 	    <p>An instance of this record whose <code>a</code> field has
 	      value 27 (encoded as hex <code>36</code>) and
 	      whose <code>b</code> field has value "foo" (encoded as hex
 	      bytes <code>06 66 6f 6f</code>), would be encoded simply
 	      as the concatenation of these, namely the hex byte
 	      sequence:</p>
 	    <source>36 06 66 6f 6f</source>
 	  </section>

           <section>
             <title>Enums</title>
             <p>An enum is encoded by a <code>int</code>, representing
               the zero-based position of the symbol in the schema.</p>
 	    <p>For example, consider the enum:</p>
 	    <source>
 	      {"type": "enum", "name": "Foo", "symbols": ["A", "B", "C", "D"] }
 	    </source>
 	    <p>This would be encoded by an <code>int</code> between
 	      zero and three, with zero indicating "A", and 3 indicating
 	      "D".</p>
 	  </section>


           <section>
             <title>Arrays</title>
             <p>Arrays are encoded as a series of <em>blocks</em>.
               Each block consists of a <code>long</code> <em>count</em>
               value, followed by that many array items.  A block with
               count zero indicates the end of the array.  Each item is
               encoded per the array's item schema.</p>

             <p>If a block's count is negative, its absolute value is used,
               and the count is followed immediately by a <code>long</code>
               block <em>size</em> indicating the number of bytes in the
               block.  This block size permits fast skipping through data,
               e.g., when projecting a record to a subset of its fields.</p>

             <p>For example, the array schema</p>
             <source>{"type": "array", "items": "long"}</source>
             <p>an array containing the items 3 and 27 could be encoded
               as the long value 2 (encoded as hex 04) followed by long
               values 3 and 27 (encoded as hex <code>06 36</code>)
               terminated by zero:</p>
             <source>04 06 36 00</source>

             <p>The blocked representation permits one to read and write
               arrays larger than can be buffered in memory, since one can
               start writing items without knowing the full length of the
               array.</p>

           </section>

 	  <section>
             <title>Maps</title>
             <p>Maps are encoded as a series of <em>blocks</em>.  Each
               block consists of a <code>long</code> <em>count</em>
               value, followed by that many key/value pairs.  A block
               with count zero indicates the end of the map.  Each item
               is encoded per the map's value schema.</p>

             <p>If a block's count is negative, its absolute value is used,
               and the count is followed immediately by a <code>long</code>
               block <em>size</em> indicating the number of bytes in the
               block.  This block size permits fast skipping through data,
               e.g., when projecting a record to a subset of its fields.</p>

             <p>The blocked representation permits one to read and write
               maps larger than can be buffered in memory, since one can
               start writing items without knowing the full length of the
               map.</p>

 	  </section>

           <section>
             <title>Unions</title>
             <p>A union is encoded by first writing a <code>long</code>
               value indicating the zero-based position within the
               union of the schema of its value.  The value is then
               encoded per the indicated schema within the union.</p>
             <p>For example, the union
               schema <code>["string","null"]</code> would encode:</p>
             <ul>
               <li><code>null</code> as the integer 1 (the index of
                 "null" in the union, encoded as
                 hex <code>02</code>): <source>02</source></li>
               <li>the string <code>"a"</code> as zero (the index of
                 "string" in the union), followed by the serialized string:
                 <source>00 02 61</source></li>
             </ul>
           </section>

           <section>
             <title>Fixed</title>
             <p>Fixed instances are encoded using the number of bytes
               declared in the schema.</p>
           </section>

         </section> <!-- end complex types -->

       </section>

       <section id="json_encoding">
         <title>JSON Encoding</title>

         <p>Except for unions, the JSON encoding is the same as is used
         to encode <a href="#schema_record">field default
         values</a>.</p>

         <p>The value of a union is encoded in JSON as follows:</p>

         <ul>
           <li>if its type is <code>null</code>, then it is encoded as
           a JSON null;</li>
           <li>otherwise it is encoded as a JSON object with one
           name/value pair whose name is the type's name and whose
           value is the recursively encoded value.  For Avro's named
           types (record, fixed or enum) the user-specified name is
           used, for other types the type name is used.</li>
         </ul>

         <p>For example, the union
           schema <code>["null","string","Foo"]</code>, where Foo is a
           record name, would encode:</p>
         <ul>
           <li><code>null</code> as <code>null</code>;</li>
           <li>the string <code>"a"</code> as
             <code>{"string": "a"}</code>; and</li>
           <li>a Foo instance as <code>{"Foo": {...}}</code>,
           where <code>{...}</code> indicates the JSON encoding of a
           Foo instance.</li>
         </ul>

         <p>Note that a schema is still required to correctly process
         JSON-encoded data.  For example, the JSON encoding does not
         distinguish between <code>int</code>
         and <code>long</code>, <code>float</code>
         and <code>double</code>, records and maps, enums and strings,
         etc.</p>

       </section>

     </section>

     <section id="order">
       <title>Sort Order</title>

       <p>Avro defines a standard sort order for data.  This permits
         data written by one system to be efficiently sorted by another
         system.  This can be an important optimization, as sort order
         comparisons are sometimes the most frequent per-object
         operation.  Note also that Avro binary-encoded data can be
         efficiently ordered without deserializing it to objects.</p>

       <p>Data items may only be compared if they have identical
         schemas.  Pairwise comparisons are implemented recursively
         with a depth-first, left-to-right traversal of the schema.
         The first mismatch encountered determines the order of the
         items.</p>

       <p>Two items with the same schema are compared according to the
         following rules.</p>
       <ul>
         <li><code>null</code> data is always equal.</li>
         <li><code>boolean</code> data is ordered with false before true.</li>
         <li><code>int</code>, <code>long</code>, <code>float</code>
           and <code>double</code> data is ordered by ascending numeric
           value.</li>
         <li><code>bytes</code> and <code>fixed</code> data are
           compared lexicographically by unsigned 8-bit values.</li>
         <li><code>string</code> data is compared lexicographically by
           Unicode code point.  Note that since UTF-8 is used as the
           binary encoding for strings, sorting of bytes and string
           binary data is identical.</li>
         <li><code>array</code> data is compared lexicographically by
           element.</li>
         <li><code>enum</code> data is ordered by the symbol's position
           in the enum schema.  For example, an enum whose symbols are
           <code>["z", "a"]</code> would sort <code>"z"</code> values
           before <code>"a"</code> values.</li>
         <li><code>union</code> data is first ordered by the branch
           within the union, and, within that, by the type of the
           branch.  For example, an <code>["int", "string"]</code>
           union would order all int values before all string values,
           with the ints and strings themselves ordered as defined
           above.</li>
         <li><code>record</code> data is ordered lexicographically by
           field.  If a field specifies that its order is:
           <ul>
             <li><code>"ascending"</code>, then the order of its values
               is unaltered.</li>
             <li><code>"descending"</code>, then the order of its values
               is reversed.</li>
             <li><code>"ignore"</code>, then its values are ignored
               when sorting.</li>
           </ul>
         </li>
         <li><code>map</code> data may not be compared.  It is an error
           to attempt to compare data containing maps unless those maps
           are in an <code>"order":"ignore"</code> record field.
         </li>
       </ul>
     </section>

     <section>
       <title>Object Container Files</title>
       <p>Avro includes a simple object container file format.  A file
       has a schema, and all objects stored in the file must be written
       according to that schema, using binary encoding.  Objects are
       stored in blocks that may be compressed.  Syncronization markers
       are used between blocks to permit efficient splitting of files
       for MapReduce processing.</p>

       <p>Files may include arbitrary user-specified metadata.</p>

       <p>A file consists of:</p>
       <ul>
         <li>A <em>file header</em>, followed by</li>
         <li>one or more <em>file data blocks</em>.</li>
       </ul>

       <p>A file header consists of:</p>
       <ul>
         <li>Four bytes, ASCII 'O', 'b', 'j', followed by 1.</li>
         <li><em>file metadata</em>, including the schema.</li>
         <li>The 16-byte, randomly-generated sync marker for this file.</li>
       </ul>

       <p>File metadata consists of:</p>
       <ul>
         <li>A long indicating the number of metadata key/value pairs.</li>
         <li>For each pair, a string key and bytes value.</li>
       </ul>

       <p>All metadata properties that start with "avro." are reserved.
       The following file metadata properties are currently used:</p>
       <ul>
         <li><strong>avro.schema</strong> contains the schema of objects
         stored in the file, as JSON data (required).</li>
         <li><strong>avro.codec</strong> the name of the compression codec
         used to compress blocks, as a string.  Implementations
         are required to support the following codecs: "null" and "deflate".
         If codec is absent, it is assumed to be "null".  The codecs
         are described with more detail below.</li>
       </ul>

       <p>A file header is thus described by the following schema:</p>
       <source>
 {"type": "record", "name": "org.apache.avro.file.Header",
  "fields" : [
    {"name": "magic", "type": {"type": "fixed", "name": "Magic", "size": 4}},
    {"name": "meta", "type": {"type": "map", "values": "bytes"}},
    {"name": "sync", "type": {"type": "fixed", "name": "Sync", "size": 16}},
   ]
 }
       </source>

       <p>A file data block consists of:</p>
       <ul>
         <li>A long indicating the count of objects in this block.</li>
         <li>A long indicating the size in bytes of the serialized objects
         in the current block, after any codec is applied</li>
         <li>The serialized objects.  If a codec is specified, this is
         compressed by that codec.</li>
         <li>The file's 16-byte sync marker.</li>
       </ul>
           <p>Thus, each block's binary data can be efficiently extracted or skipped without
           deserializing the contents.  The combination of block size, object counts, and
           sync markers enable detection of corrupt blocks and help ensure data integrity.</p>
       <section>
       <title>Required Codecs</title>
         <section>
         <title>null</title>
         <p>The "null" codec simply passes through data uncompressed.</p>
         </section>

         <section>
         <title>deflate</title>
         <p>The "deflate" codec writes the data block using the
         deflate algorithm as specified in
         <a href="http://www.isi.edu/in-notes/rfc1951.txt">RFC 1951</a>,
         and typically implemented using the zlib library.  Note that this
         format (unlike the "zlib format" in RFC 1950) does not have a
         checksum.
         </p>
         </section>
       </section>

     </section>

     <section>
       <title>Protocol Declaration</title>
       <p>Avro protocols describe RPC interfaces.  Like schemas, they are
       defined with JSON text.</p>

       <p>A protocol is a JSON object with the following attributes:</p>
       <ul>
         <li><em>protocol</em>, a string, the name of the protocol
         (required);</li>
         <li><em>namespace</em>, an optional string that qualifies the name;</li>
         <li><em>doc</em>, an optional string describing this protocol;</li>
         <li><em>types</em>, an optional list of definitions of named types
           (records, enums, fixed and errors).  An error definition is
           just like a record definition except it uses "error" instead
           of "record".  Note that forward references to named types
           are not permitted.</li>
         <li><em>messages</em>, an optional JSON object whose keys are
           message names and whose values are objects whose attributes
           are described below.  No two messages may have the same
           name.</li>
       </ul>
       <p>The name and namespace qualification rules defined for schema objects
 	apply to protocols as well.</p>

       <section>
         <title>Messages</title>
         <p>A message has attributes:</p>
         <ul>
           <li>a <em>doc</em>, an optional description of the message,</li>
           <li>a <em>request</em>, a list of named,
             typed <em>parameter</em> schemas (this has the same form
             as the fields of a record declaration);</li>
           <li>a <em>response</em> schema; </li>
           <li>an optional union of declared <em>error</em> schemas.
 	    The <em>effective</em> union has <code>"string"</code>
 	    prepended to the declared union, to permit transmission of
 	    undeclared "system" errors.  For example, if the declared
 	    error union is <code>["AccessError"]</code>, then the
 	    effective union is <code>["string", "AccessError"]</code>.
 	    When no errors are declared, the effective error union
 	    is <code>["string"]</code>.  Errors are serialized using
 	    the effective union; however, a protocol's JSON
 	    declaration contains only the declared union.
 	  </li>
           <li>an optional <em>one-way</em> boolean parameter.</li>
         </ul>
         <p>A request parameter list is processed equivalently to an
           anonymous record.  Since record field lists may vary between
           reader and writer, request parameters may also differ
           between the caller and responder, and such differences are
           resolved in the same manner as record field differences.</p>
 	<p>The one-way parameter may only be true when the response type
 	  is <code>"null"</code> and no errors are listed.</p>
       </section>
       <section>
         <title>Sample Protocol</title>
         <p>For example, one may define a simple HelloWorld protocol with:</p>
         <source>
 {
   "namespace": "com.acme",
   "protocol": "HelloWorld",
   "doc": "Protocol Greetings",

   "types": [
     {"name": "Greeting", "type": "record", "fields": [
       {"name": "message", "type": "string"}]},
     {"name": "Curse", "type": "error", "fields": [
       {"name": "message", "type": "string"}]}
   ],

   "messages": {
     "hello": {
       "doc": "Say hello.",
       "request": [{"name": "greeting", "type": "Greeting" }],
       "response": "Greeting",
       "errors": ["Curse"]
     }
   }
 }
         </source>
       </section>
     </section>

     <section>
       <title>Protocol Wire Format</title>

       <section>
         <title>Message Transport</title>
         <p>Messages may be transmitted via
         different <em>transport</em> mechanisms.</p>

         <p>To the transport, a <em>message</em> is an opaque byte sequence.</p>

         <p>A transport is a system that supports:</p>
         <ul>
           <li><strong>transmission of request messages</strong>
           </li>
           <li><strong>receipt of corresponding response messages</strong>
             <p>Servers may send a response message back to the client
             corresponding to a request message.  The mechanism of
             correspondance is transport-specific.  For example, in
             HTTP it is implicit, since HTTP directly supports requests
             and responses.  But a transport that multiplexes many
             client threads over a single socket would need to tag
             messages with unique identifiers.</p>
           </li>
         </ul>

 	<p>Transports may be either <em>stateless</em>
         or <em>stateful</em>.  In a stateless transport, messaging
         assumes no established connection state, while stateful
         transports establish connections that may be used for multiple
         messages.  This distinction is discussed further in
         the <a href="#handshake">handshake</a> section below.</p>

         <section>
           <title>HTTP as Transport</title>
           <p>When
             <a href="http://www.w3.org/Protocols/rfc2616/rfc2616.html">HTTP</a>
             is used as a transport, each Avro message exchange is an
             HTTP request/response pair.  All messages of an Avro
             protocol should share a single URL at an HTTP server.
             Other protocols may also use that URL.  Both normal and
             error Avro response messages should use the 200 (OK)
             response code.  The chunked encoding may be used for
             requests and responses, but, regardless the Avro request
             and response are the entire content of an HTTP request and
             response.  The HTTP Content-Type of requests and responses
             should be specified as "avro/binary".  Requests should be
             made using the POST method.</p>
 	  <p>HTTP is used by Avro as a stateless transport.</p>
         </section>
       </section>

       <section>
         <title>Message Framing</title>
         <p>Avro messages are <em>framed</em> as a list of buffers.</p>
         <p>Framing is a layer between messages and the transport.
         It exists to optimize certain operations.</p>

         <p>The format of framed message data is:</p>
         <ul>
           <li>a series of <em>buffers</em>, where each buffer consists of:
             <ul>
               <li>a four-byte, big-endian <em>buffer length</em>, followed by</li>
               <li>that many bytes of <em>buffer data</em>.</li>
             </ul>
           </li>
           <li>A message is always terminated by a zero-lenghted buffer.</li>
         </ul>

         <p>Framing is transparent to request and response message
         formats (described below).  Any message may be presented as a
         single or multiple buffers.</p>

         <p>Framing can permit readers to more efficiently get
         different buffers from different sources and for writers to
         more efficiently store different buffers to different
         destinations.  In particular, it can reduce the number of
         times large binary objects are copied.  For example, if an RPC
         parameter consists of a megabyte of file data, that data can
         be copied directly to a socket from a file descriptor, and, on
         the other end, it could be written directly to a file
         descriptor, never entering user space.</p>

         <p>A simple, recommended, framing policy is for writers to
         create a new segment whenever a single binary object is
         written that is larger than a normal output buffer.  Small
         objects are then appended in buffers, while larger objects are
         written as their own buffers.  When a reader then tries to
         read a large object the runtime can hand it an entire buffer
         directly, without having to copy it.</p>
       </section>

       <section id="handshake">
         <title>Handshake</title>

 	<p>The purpose of the handshake is to ensure that the client
         and the server have each other's protocol definition, so that
         the client can correctly deserialize responses, and the server
         can correctly deserialize requests.  Both clients and servers
         should maintain a cache of recently seen protocols, so that,
         in most cases, a handshake will be completed without extra
         round-trip network exchanges or the transmission of full
         protocol text.</p>

         <p>RPC requests and responses may not be processed until a
         handshake has been completed.  With a stateless transport, all
         requests and responses are prefixed by handshakes.  With a
         stateful transport, handshakes are only attached to requests
         and responses until a successful handshake response has been
         returned over a connection.  After this, request and response
         payloads are sent without handshakes for the lifetime of that
         connection.</p>

         <p>The handshake process uses the following record schemas:</p>

         <source>
 {
   "type": "record",
   "name": "HandshakeRequest", "namespace":"org.apache.avro.ipc",
   "fields": [
     {"name": "clientHash",
      "type": {"type": "fixed", "name": "MD5", "size": 16}},
     {"name": "clientProtocol", "type": ["null", "string"]},
     {"name": "serverHash", "type": "MD5"},
     {"name": "meta", "type": ["null", {"type": "map", "values": "bytes"}]}
   ]
 }
 {
   "type": "record",
   "name": "HandshakeResponse", "namespace": "org.apache.avro.ipc",
   "fields": [
     {"name": "match",
      "type": {"type": "enum", "name": "HandshakeMatch",
               "symbols": ["BOTH", "CLIENT", "NONE"]}},
     {"name": "serverProtocol",
      "type": ["null", "string"]},
     {"name": "serverHash",
      "type": ["null", {"type": "fixed", "name": "MD5", "size": 16}]},
     {"name": "meta",
      "type": ["null", {"type": "map", "values": "bytes"}]}
   ]
 }
         </source>

         <ul>
           <li>A client first prefixes each request with
           a <code>HandshakeRequest</code> containing just the hash of
           its protocol and of the server's protocol
           (<code>clientHash!=null, clientProtocol=null,
           serverHash!=null</code>), where the hashes are 128-bit MD5
           hashes of the JSON protocol text. If a client has never
           connected to a given server, it sends its hash as a guess of
           the server's hash, otherwise it sends the hash that it
           previously obtained from this server.</li>

           <li>The server responds with
           a <code>HandshakeResponse</code> containing one of:
             <ul>
               <li><code>match=BOTH, serverProtocol=null,
               serverHash=null</code> if the client sent the valid hash
               of the server's protocol and the server knows what
               protocol corresponds to the client's hash. In this case,
               the request is complete and the response data
               immediately follows the HandshakeResponse.</li>

               <li><code>match=CLIENT, serverProtocol!=null,
               serverHash!=null</code> if the server has previously
               seen the client's protocol, but the client sent an
               incorrect hash of the server's protocol. The request is
               complete and the response data immediately follows the
               HandshakeResponse. The client must use the returned
               protocol to process the response and should also cache
               that protocol and its hash for future interactions with
               this server.</li>

               <li><code>match=NONE</code> if the server has not
               previously seen the client's protocol.
               The <code>serverHash</code>
               and <code>serverProtocol</code> may also be non-null if
               the server's protocol hash was incorrect.

               <p>In this case the client must then re-submit its request
               with its protocol text (<code>clientHash!=null,
               clientProtocol!=null, serverHash!=null</code>) and the
               server should respond with a successful match
               (<code>match=BOTH, serverProtocol=null,
               serverHash=null</code>) as above.</p>
               </li>
             </ul>
           </li>
         </ul>

         <p>The <code>meta</code> field is reserved for future
         handshake enhancements.</p>

       </section>

       <section>
         <title>Call Format</title>
         <p>A <em>call</em> consists of a request message paired with
         its resulting response or error message.  Requests and
         responses contain extensible metadata, and both kinds of
         messages are framed as described above.</p>

         <p>The format of a call request is:</p>
         <ul>
           <li><em>request metadata</em>, a map with values of
           type <code>bytes</code></li>
           <li>the <em>message name</em>, an Avro string,
           followed by</li>
           <li>the message <em>parameters</em>.  Parameters are
           serialized according to the message's request
           declaration.</li>
         </ul>


         <p>When a message is declared one-way and a stateful
         connection has been established by a successful handshake
         response, no response data is sent.  Otherwise the format of
         the call response is:</p>
         <ul>
           <li><em>response metadata</em>, a map with values of
           type <code>bytes</code></li>
           <li>a one-byte <em>error flag</em> boolean, followed by either:
             <ul>
               <li>if the error flag is false, the message <em>response</em>,
                 serialized per the message's response schema.</li>
               <li>if the error flag is true, the <em>error</em>,
               serialized per the message's effective error union
               schema.</li>
             </ul>
           </li>
         </ul>
       </section>

     </section>

     <section>
       <title>Schema Resolution</title>

       <p>A reader of Avro data, whether from an RPC or a file, can
         always parse that data because its schema is provided.  But
         that schema may not be exactly the schema that was expected.
         For example, if the data was written with a different version
         of the software than it is read, then records may have had
         fields added or removed.  This section specifies how such
         schema differences should be resolved.</p>

       <p>We call the schema used to write the data as
         the <em>writer's</em> schema, and the schema that the
         application expects the <em>reader's</em> schema.  Differences
         between these should be resolved as follows:</p>

       <ul>
         <li><p>It is an error if the two schemas do not <em>match</em>.</p>
           <p>To match, one of the following must hold:</p>
           <ul>
             <li>both schemas are arrays whose item types match</li>
             <li>both schemas are maps whose value types match</li>
             <li>both schemas are enums whose names match</li>
             <li>both schemas are fixed whose sizes and names match</li>
             <li>both schemas are records with the same name</li>
             <li>either schema is a union</li>
             <li>both schemas have same primitive type</li>
             <li>the writer's schema may be <em>promoted</em> to the
               reader's as follows:
               <ul>
                 <li>int is promotable to long, float, or double</li>
                 <li>long is promotable to float or double</li>
                 <li>float is promotable to double</li>
                 </ul>
             </li>
           </ul>
         </li>

         <li><strong>if both are records:</strong>
           <ul>
             <li>the ordering of fields may be different: fields are
               matched by name.</li>

             <li>schemas for fields with the same name in both records
               are resolved recursively.</li>

             <li>if the writer's record contains a field with a name
               not present in the reader's record, the writer's value
               for that field is ignored.</li>

             <li>if the reader's record schema has a field that
               contains a default value, and writer's schema does not
               have a field with the same name, then the reader should
               use the default value from its field.</li>

             <li>if the reader's record schema has a field with no
               default value, and writer's schema does not have a field
               with the same name, an error is signalled.</li>
           </ul>
         </li>

         <li><strong>if both are enums:</strong>
           <p>if the writer's symbol is not present in the reader's
             enum, then an error is signalled.</p>
         </li>

         <li><strong>if both are arrays:</strong>
           <p>This resolution algorithm is applied recursively to the reader's and
             writer's array item schemas.</p>
         </li>

         <li><strong>if both are maps:</strong>
           <p>This resolution algorithm is applied recursively to the reader's and
             writer's value schemas.</p>
         </li>

         <li><strong>if both are unions:</strong>
           <p>The first schema in the reader's union that matches the
             selected writer's union schema is recursively resolved
             against it.  if none match, an error is signalled.</p>
         </li>

         <li><strong>if reader's is a union, but writer's is not</strong>
           <p>The first schema in the reader's union that matches the
             writer's schema is recursively resolved against it.  If none
             match, an error is signalled.</p>
         </li>

         <li><strong>if writer's is a union, but reader's is not</strong>
           <p>If the reader's schema matches the selected writer's schema,
             it is recursively resolved against it.  If they do not
             match, an error is signalled.</p>
         </li>

       </ul>

       <p>A schema's "doc" fields are ignored for the purposes of schema resolution.  Hence,
         the "doc" portion of a schema may be dropped at serialization.</p>

     </section>

   <p><em>Apache Avro, Avro, Apache, and the Avro and Apache logos are
    trademarks of The Apache Software Foundation.</em></p>

   </body>
 </document>