docs/1.10.1/gettingstartedjava.html - avro - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 <html>
 <head>
 <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
 <meta content="Apache Forrest" name="Generator">
 <meta name="Forrest-version" content="0.9">
 <meta name="Forrest-skin-name" content="pelt">
 <title>Apache Avro&#153; 1.10.1 Getting Started (Java)</title>
 <link type="text/css" href="skin/basic.css" rel="stylesheet">
 <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
 <link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
 <link type="text/css" href="skin/profile.css" rel="stylesheet">
 <script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
 <link rel="shortcut icon" href="images/favicon.ico">
 </head>
 <body onload="init()">
 <script type="text/javascript">ndeSetTextSize();</script>
 <div id="top">
 <!--+
     |breadtrail
     +-->
 <div class="breadtrail">
 <a href="https://www.apache.org/">Apache</a> &gt; <a href="https://avro.apache.org/">Avro</a> &gt; <a href="https://avro.apache.org/">Avro</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
 </div>
 <!--+
     |header
     +-->
 <div class="header">
 <!--+
     |start group logo
     +-->
 <div class="grouplogo">
 <a href="https://www.apache.org/"><img class="logoImage" alt="Apache" src="images/apache_feather.gif" title="The Apache Software Foundation"></a>
 </div>
 <!--+
     |end group logo
     +-->
 <!--+
     |start Project Logo
     +-->
 <div class="projectlogo">
 <a href="https://avro.apache.org/"><img class="logoImage" alt="Avro" src="images/avro-logo.png" title="Serialization System"></a>
 </div>
 <!--+
     |end Project Logo
     +-->
 <!--+
     |start Search
     +-->
 <div class="searchbox">
 <form action="http://www.google.com/search" method="get" class="roundtopsmall">
 <input value="avro.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp;
                     <input name="Search" value="Search" type="submit">
 </form>
 </div>
 <!--+
     |end search
     +-->
 <!--+
     |start Tabs
     +-->
 <ul id="tabs">
 <li>
 <a class="unselected" href="https://avro.apache.org/">Project</a>
 </li>
 <li>
 <a class="unselected" href="https://cwiki.apache.org/confluence/display/AVRO/Index">Wiki</a>
 </li>
 <li class="current">
 <a class="selected" href="index.html">Avro 1.10.1 Documentation</a>
 </li>
 </ul>
 <!--+
     |end Tabs
     +-->
 </div>
 </div>
 <div id="main">
 <div id="publishedStrip">
 <!--+
     |start Subtabs
     +-->
 <div id="level2tabs"></div>
 <!--+
     |end Endtabs
     +-->
 <script type="text/javascript"><!--
 document.write("Last Published: " + document.lastModified);
 //  --></script>
 </div>
 <!--+
     |breadtrail
     +-->
 <div class="breadtrail">

              &nbsp;
            </div>
 <!--+
     |start Menu, mainarea
     +-->
 <!--+
     |start Menu
     +-->
 <div id="menu">
 <div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
 <div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
 <div class="menuitem">
 <a href="index.html">Overview</a>
 </div>
 <div class="menupage">
 <div class="menupagetitle">Getting started (Java)</div>
 </div>
 <div class="menuitem">
 <a href="gettingstartedpython.html">Getting started (Python)</a>
 </div>
 <div class="menuitem">
 <a href="spec.html">Specification</a>
 </div>
 <div class="menuitem">
 <a href="trevni/spec.html">Trevni</a>
 </div>
 <div class="menuitem">
 <a href="api/java/index.html">Java API</a>
 </div>
 <div class="menuitem">
 <a href="api/c/index.html">C API</a>
 </div>
 <div class="menuitem">
 <a href="api/cpp/html/index.html">C++ API</a>
 </div>
 <div class="menuitem">
 <a href="api/csharp/html/index.html">C# API</a>
 </div>
 <div class="menuitem">
 <a href="mr.html">MapReduce guide</a>
 </div>
 <div class="menuitem">
 <a href="idl.html">IDL language</a>
 </div>
 <div class="menuitem">
 <a href="sasl.html">SASL profile</a>
 </div>
 <div class="menuitem">
 <a href="https://cwiki.apache.org/confluence/display/AVRO/Index">Wiki</a>
 </div>
 <div class="menuitem">
 <a href="https://cwiki.apache.org/confluence/display/AVRO/FAQ">FAQ</a>
 </div>
 </div>
 <div id="credit"></div>
 <div id="roundbottom">
 <img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
 <!--+
   |alternative credits
   +-->
 <div id="credit2"></div>
 </div>
 <!--+
     |end Menu
     +-->
 <!--+
     |start content
     +-->
 <div id="content">
 <div title="Portable Document Format" class="pdflink">
 <a class="dida" href="gettingstartedjava.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
         PDF</a>
 </div>
 <h1>Apache Avro&#153; 1.10.1 Getting Started (Java)</h1>
 <div id="front-matter">
 <div id="minitoc-area">
 <ul class="minitoc">
 <li>
 <a href="#download_install">Download</a>
 </li>
 <li>
 <a href="#Defining+a+schema">Defining a schema</a>
 </li>
 <li>
 <a href="#Serializing+and+deserializing+with+code+generation">Serializing and deserializing with code generation</a>
 <ul class="minitoc">
 <li>
 <a href="#Compiling+the+schema">Compiling the schema</a>
 </li>
 <li>
 <a href="#Creating+Users">Creating Users</a>
 </li>
 <li>
 <a href="#Serializing">Serializing</a>
 </li>
 <li>
 <a href="#Deserializing">Deserializing</a>
 </li>
 <li>
 <a href="#Compiling+and+running+the+example+code">Compiling and running the example code</a>
 </li>
 <li>
 <a href="#Beta+feature%3A+Generating+faster+code">Beta feature: Generating faster code</a>
 </li>
 </ul>
 </li>
 <li>
 <a href="#Serializing+and+deserializing+without+code+generation">Serializing and deserializing without code generation</a>
 <ul class="minitoc">
 <li>
 <a href="#Creating+users">Creating users</a>
 </li>
 <li>
 <a href="#Serializing-N101F7">Serializing</a>
 </li>
 <li>
 <a href="#Deserializing-N10220">Deserializing</a>
 </li>
 <li>
 <a href="#Compiling+and+running+the+example+code-N10269">Compiling and running the example code</a>
 </li>
 </ul>
 </li>
 </ul>
 </div>
 </div>

 <p>
       This is a short guide for getting started with Apache Avro&#153; using
       Java.  This guide only covers using Avro for data serialization; see
       Patrick Hunt's <a href="https://github.com/phunt/avro-rpc-quickstart">Avro
       RPC Quick Start</a> for a good introduction to using Avro for RPC.
     </p>

 <a name="download_install"></a>
 <h2 class="h3">Download</h2>
 <div class="section">
 <p>
         Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be
         downloaded from the <a href="https://avro.apache.org/releases.html">Apache Avro&#153;
         Releases</a> page.  This guide uses Avro 1.10.1, the latest
         version at the time of writing.  For the examples in this guide,
         download <em>avro-1.10.1.jar</em> and
         <em>avro-tools-1.10.1.jar</em>.
       </p>
 <p>
         Alternatively, if you are using Maven, add the following dependency to
         your POM:
       </p>
 <pre class="code">
 &lt;dependency&gt;
   &lt;groupId&gt;org.apache.avro&lt;/groupId&gt;
   &lt;artifactId&gt;avro&lt;/artifactId&gt;
   &lt;version&gt;1.10.1&lt;/version&gt;
 &lt;/dependency&gt;
       </pre>
 <p>
         As well as the Avro Maven plugin (for performing code generation):
       </p>
 <pre class="code">
 &lt;plugin&gt;
   &lt;groupId&gt;org.apache.avro&lt;/groupId&gt;
   &lt;artifactId&gt;avro-maven-plugin&lt;/artifactId&gt;
   &lt;version&gt;1.10.1&lt;/version&gt;
   &lt;executions&gt;
     &lt;execution&gt;
       &lt;phase&gt;generate-sources&lt;/phase&gt;
       &lt;goals&gt;
         &lt;goal&gt;schema&lt;/goal&gt;
       &lt;/goals&gt;
       &lt;configuration&gt;
         &lt;sourceDirectory&gt;${project.basedir}/src/main/avro/&lt;/sourceDirectory&gt;
         &lt;outputDirectory&gt;${project.basedir}/src/main/java/&lt;/outputDirectory&gt;
       &lt;/configuration&gt;
     &lt;/execution&gt;
   &lt;/executions&gt;
 &lt;/plugin&gt;
 &lt;plugin&gt;
   &lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
   &lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;
   &lt;configuration&gt;
     &lt;source&gt;1.8&lt;/source&gt;
     &lt;target&gt;1.8&lt;/target&gt;
   &lt;/configuration&gt;
 &lt;/plugin&gt;
       </pre>
 <p>
         You may also build the required Avro jars from source.  Building Avro is
         beyond the scope of this guide; see the <a href="https://cwiki.apache.org/AVRO/Build+Documentation">Build
         Documentation</a> page in the wiki for more information.
       </p>
 </div>


 <a name="Defining+a+schema"></a>
 <h2 class="h3">Defining a schema</h2>
 <div class="section">
 <p>
         Avro schemas are defined using JSON.  Schemas are composed of <a href="spec.html#schema_primitive">primitive types</a>
         (<span class="codefrag">null</span>, <span class="codefrag">boolean</span>, <span class="codefrag">int</span>,
         <span class="codefrag">long</span>, <span class="codefrag">float</span>, <span class="codefrag">double</span>,
         <span class="codefrag">bytes</span>, and <span class="codefrag">string</span>) and <a href="spec.html#schema_complex">complex types</a> (<span class="codefrag">record</span>,
         <span class="codefrag">enum</span>, <span class="codefrag">array</span>, <span class="codefrag">map</span>,
         <span class="codefrag">union</span>, and <span class="codefrag">fixed</span>).  You can learn more about
         Avro schemas and types from the specification, but for now let's start
         with a simple schema example, <em>user.avsc</em>:
       </p>
 <pre class="code">
 {"namespace": "example.avro",
  "type": "record",
  "name": "User",
  "fields": [
      {"name": "name", "type": "string"},
      {"name": "favorite_number",  "type": ["int", "null"]},
      {"name": "favorite_color", "type": ["string", "null"]}
  ]
 }
       </pre>
 <p>
         This schema defines a record representing a hypothetical user.  (Note
         that a schema file can only contain a single schema definition.)  At
         minimum, a record definition must include its type (<span class="codefrag">"type":
         "record"</span>), a name (<span class="codefrag">"name": "User"</span>), and fields, in
         this case <span class="codefrag">name</span>, <span class="codefrag">favorite_number</span>, and
         <span class="codefrag">favorite_color</span>.  We also define a namespace
         (<span class="codefrag">"namespace": "example.avro"</span>), which together with the name
         attribute defines the "full name" of the schema
         (<span class="codefrag">example.avro.User</span> in this case).

       </p>
 <p>
         Fields are defined via an array of objects, each of which defines a name
         and type (other attributes are optional, see the <a href="spec.html#schema_record">record specification</a> for more
         details).  The type attribute of a field is another schema object, which
         can be either a primitive or complex type.  For example, the
         <span class="codefrag">name</span> field of our User schema is the primitive type
         <span class="codefrag">string</span>, whereas the <span class="codefrag">favorite_number</span> and
         <span class="codefrag">favorite_color</span> fields are both <span class="codefrag">union</span>s,
         represented by JSON arrays.  <span class="codefrag">union</span>s are a complex type that
         can be any of the types listed in the array; e.g.,
         <span class="codefrag">favorite_number</span> can either be an <span class="codefrag">int</span> or
         <span class="codefrag">null</span>, essentially making it an optional field.
       </p>
 </div>


 <a name="Serializing+and+deserializing+with+code+generation"></a>
 <h2 class="h3">Serializing and deserializing with code generation</h2>
 <div class="section">
 <a name="Compiling+the+schema"></a>
 <h3 class="h4">Compiling the schema</h3>
 <p>
           Code generation allows us to automatically create classes based on our
           previously-defined schema.  Once we have defined the relevant classes,
           there is no need to use the schema directly in our programs.  We use the
           avro-tools jar to generate code as follows:
         </p>
 <pre class="code">
 java -jar /path/to/avro-tools-1.10.1.jar compile schema &lt;schema file&gt; &lt;destination&gt;
         </pre>
 <p>
           This will generate the appropriate source files in a package based on
           the schema's namespace in the provided destination folder.  For
           instance, to generate a <span class="codefrag">User</span> class in package
           <span class="codefrag">example.avro</span> from the schema defined above, run
         </p>
 <pre class="code">
 java -jar /path/to/avro-tools-1.10.1.jar compile schema user.avsc .
         </pre>
 <p>
           Note that if you using the Avro Maven plugin, there is no need to
           manually invoke the schema compiler; the plugin automatically
           performs code generation on any .avsc files present in the configured
           source directory.
         </p>
 <a name="Creating+Users"></a>
 <h3 class="h4">Creating Users</h3>
 <p>
           Now that we've completed the code generation, let's create some
           <span class="codefrag">User</span>s, serialize them to a data file on disk, and then
           read back the file and deserialize the <span class="codefrag">User</span> objects.
         </p>
 <p>
           First let's create some <span class="codefrag">User</span>s and set their fields.
         </p>
 <pre class="code">
 User user1 = new User();
 user1.setName("Alyssa");
 user1.setFavoriteNumber(256);
 // Leave favorite color null

 // Alternate constructor
 User user2 = new User("Ben", 7, "red");

 // Construct via builder
 User user3 = User.newBuilder()
              .setName("Charlie")
              .setFavoriteColor("blue")
              .setFavoriteNumber(null)
              .build();
         </pre>
 <p>
           As shown in this example, Avro objects can be created either by
           invoking a constructor directly or by using a builder.  Unlike
           constructors, builders will automatically set any default values
           specified in the schema.  Additionally, builders validate the data as
           it set, whereas objects constructed directly will not cause an error
           until the object is serialized.  However, using constructors directly
           generally offers better performance, as builders create a copy of the
           datastructure before it is written.
         </p>
 <p>
           Note that we do not set <span class="codefrag">user1</span>'s favorite color. Since
           that record is of type <span class="codefrag">["string", "null"]</span>, we can either
           set it to a <span class="codefrag">string</span> or leave it <span class="codefrag">null</span>; it is
           essentially optional.  Similarly, we set <span class="codefrag">user3</span>'s favorite
           number to null (using a builder requires setting all fields, even if
           they are null).
         </p>
 <a name="Serializing"></a>
 <h3 class="h4">Serializing</h3>
 <p>
         Now let's serialize our <span class="codefrag">User</span>s to disk.
       </p>
 <pre class="code">
 // Serialize user1, user2 and user3 to disk
 DatumWriter&lt;User&gt; userDatumWriter = new SpecificDatumWriter&lt;User&gt;(User.class);
 DataFileWriter&lt;User&gt; dataFileWriter = new DataFileWriter&lt;User&gt;(userDatumWriter);
 dataFileWriter.create(user1.getSchema(), new File("users.avro"));
 dataFileWriter.append(user1);
 dataFileWriter.append(user2);
 dataFileWriter.append(user3);
 dataFileWriter.close();
       </pre>
 <p>
         We create a <span class="codefrag">DatumWriter</span>, which converts Java objects into
         an in-memory serialized format.  The <span class="codefrag">SpecificDatumWriter</span>
         class is used with generated classes and extracts the schema from the
         specified generated type.
       </p>
 <p>
         Next we create a <span class="codefrag">DataFileWriter</span>, which writes the
         serialized records, as well as the schema, to the file specified in the
         <span class="codefrag">dataFileWriter.create</span> call.  We write our users to the file
         via calls to the <span class="codefrag">dataFileWriter.append</span> method.  When we are
         done writing, we close the data file.
       </p>
 <a name="Deserializing"></a>
 <h3 class="h4">Deserializing</h3>
 <p>
           Finally, let's deserialize the data file we just created.
         </p>
 <pre class="code">
 // Deserialize Users from disk
 DatumReader&lt;User&gt; userDatumReader = new SpecificDatumReader&lt;User&gt;(User.class);
 DataFileReader&lt;User&gt; dataFileReader = new DataFileReader&lt;User&gt;(file, userDatumReader);
 User user = null;
 while (dataFileReader.hasNext()) {
 // Reuse user object by passing it to next(). This saves us from
 // allocating and garbage collecting many objects for files with
 // many items.
 user = dataFileReader.next(user);
 System.out.println(user);
 }
         </pre>
 <p>
           This snippet will output:
         </p>
 <pre class="code">
 {"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
 {"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
 {"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}
         </pre>
 <p>
           Deserializing is very similar to serializing.  We create a
           <span class="codefrag">SpecificDatumReader</span>, analogous to the
           <span class="codefrag">SpecificDatumWriter</span> we used in serialization, which
           converts in-memory serialized items into instances of our generated
           class, in this case <span class="codefrag">User</span>.  We pass the
           <span class="codefrag">DatumReader</span> and the previously created <span class="codefrag">File</span>
           to a <span class="codefrag">DataFileReader</span>, analogous to the
           <span class="codefrag">DataFileWriter</span>, which reads both the schema used by the
           writer as well as the data from the file on disk. The data will be
           read using the writer's schema included in the file and the
           schema provided by the reader, in this case the <span class="codefrag">User</span>
           class.  The writer's schema is needed to know the order in which
           fields were written, while the reader's schema is needed to know what
           fields are expected and how to fill in default values for fields
           added since the file was written.  If there are differences between
           the two schemas, they are resolved according to the
           <a href="spec.html#Schema+Resolution">Schema Resolution</a>
           specification.
         </p>
 <p>
           Next we use the <span class="codefrag">DataFileReader</span> to iterate through the
           serialized <span class="codefrag">User</span>s and print the deserialized object to
           stdout.  Note how we perform the iteration: we create a single
           <span class="codefrag">User</span> object which we store the current deserialized user
           in, and pass this record object to every call of
           <span class="codefrag">dataFileReader.next</span>.  This is a performance optimization
           that allows the <span class="codefrag">DataFileReader</span> to reuse the same
           <span class="codefrag">User</span> object rather than allocating a new
           <span class="codefrag">User</span> for every iteration, which can be very expensive in
           terms of object allocation and garbage collection if we deserialize a
           large data file.  While this technique is the standard way to iterate
           through a data file, it's also possible to use <span class="codefrag">for (User user :
           dataFileReader)</span> if performance is not a concern.
         </p>
 <a name="Compiling+and+running+the+example+code"></a>
 <h3 class="h4">Compiling and running the example code</h3>
 <p>
           This example code is included as a Maven project in the
           <em>examples/java-example</em> directory in the Avro docs.  From this
           directory, execute the following commands to build and run the
           example:
         </p>
 <pre class="code">
 $ mvn compile # includes code generation via Avro Maven plugin
 $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain
         </pre>
 <a name="Beta+feature%3A+Generating+faster+code"></a>
 <h3 class="h4">Beta feature: Generating faster code</h3>
 <p>
           In this release we have introduced a new approach to
           generating code that speeds up decoding of objects by more
           than 10% and encoding by more than 30% (future performance
           enhancements are underway).  To ensure a smooth introduction
           of this change into production systems, this feature is
           controlled by a feature flag, the system
           property <span class="codefrag">org.apache.avro.specific.use_custom_coders</span>.
           In this first release, this feature is off by default.  To
           turn it on, set the system flag to <span class="codefrag">true</span> at
           runtime.  In the sample above, for example, you could enable
           the fater coders as follows:
         </p>
 <pre class="code">
 $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \
     -Dorg.apache.avro.specific.use_custom_coders=true
         </pre>
 <p>
           Note that you do <em>not</em> have to recompile your Avro
           schema to have access to this feature.  The feature is
           compiled and built into your code, and you turn it on and
           off at runtime using the feature flag.  As a result, you can
           turn it on during testing, for example, and then off in
           production.  Or you can turn it on in production, and
           quickly turn it off if something breaks.
         </p>
 <p>
           We encourage the Avro community to exercise this new feature
           early to help build confidence.  (For those paying
           one-demand for compute resources in the cloud, it can lead
           to meaningful cost savings.)  As confidence builds, we will
           turn this feature on by default, and eventually eliminate
           the feature flag (and the old code).
         </p>
 </div>


 <a name="Serializing+and+deserializing+without+code+generation"></a>
 <h2 class="h3">Serializing and deserializing without code generation</h2>
 <div class="section">
 <p>
         Data in Avro is always stored with its corresponding schema, meaning we
         can always read a serialized item regardless of whether we know the
         schema ahead of time.  This allows us to perform serialization and
         deserialization without code generation.
       </p>
 <p>
         Let's go over the same example as in the previous section, but without
         using code generation: we'll create some users, serialize them to a data
         file on disk, and then read back the file and deserialize the users
         objects.
       </p>
 <a name="Creating+users"></a>
 <h3 class="h4">Creating users</h3>
 <p>
           First, we use a <span class="codefrag">Parser</span> to read our schema definition and
           create a <span class="codefrag">Schema</span> object.
         </p>
 <pre class="code">
 Schema schema = new Schema.Parser().parse(new File("user.avsc"));
         </pre>
 <p>
           Using this schema, let's create some users.
         </p>
 <pre class="code">
 GenericRecord user1 = new GenericData.Record(schema);
 user1.put("name", "Alyssa");
 user1.put("favorite_number", 256);
 // Leave favorite color null

 GenericRecord user2 = new GenericData.Record(schema);
 user2.put("name", "Ben");
 user2.put("favorite_number", 7);
 user2.put("favorite_color", "red");
         </pre>
 <p>
           Since we're not using code generation, we use
           <span class="codefrag">GenericRecord</span>s to represent users.
           <span class="codefrag">GenericRecord</span> uses the schema to verify that we only
           specify valid fields.  If we try to set a non-existent field (e.g.,
           <span class="codefrag">user1.put("favorite_animal", "cat")</span>), we'll get an
           <span class="codefrag">AvroRuntimeException</span> when we run the program.
         </p>
 <p>
           Note that we do not set <span class="codefrag">user1</span>'s favorite color.  Since
           that record is of type <span class="codefrag">["string", "null"]</span>, we can either
           set it to a <span class="codefrag">string</span> or leave it <span class="codefrag">null</span>; it is
           essentially optional.
         </p>
 <a name="Serializing-N101F7"></a>
 <h3 class="h4">Serializing</h3>
 <p>
           Now that we've created our user objects, serializing and deserializing
           them is almost identical to the example above which uses code
           generation.  The main difference is that we use generic instead of
           specific readers and writers.
         </p>
 <p>
           First we'll serialize our users to a data file on disk.
         </p>
 <pre class="code">
 // Serialize user1 and user2 to disk
 File file = new File("users.avro");
 DatumWriter&lt;GenericRecord&gt; datumWriter = new GenericDatumWriter&lt;GenericRecord&gt;(schema);
 DataFileWriter&lt;GenericRecord&gt; dataFileWriter = new DataFileWriter&lt;GenericRecord&gt;(datumWriter);
 dataFileWriter.create(schema, file);
 dataFileWriter.append(user1);
 dataFileWriter.append(user2);
 dataFileWriter.close();
         </pre>
 <p>
           We create a <span class="codefrag">DatumWriter</span>, which converts Java objects into
           an in-memory serialized format.  Since we are not using code
           generation, we create a <span class="codefrag">GenericDatumWriter</span>.  It requires
           the schema both to determine how to write the
           <span class="codefrag">GenericRecord</span>s and to verify that all non-nullable fields
           are present.
         </p>
 <p>
           As in the code generation example, we also create a
           <span class="codefrag">DataFileWriter</span>, which writes the serialized records, as
           well as the schema, to the file specified in the
           <span class="codefrag">dataFileWriter.create</span> call.  We write our users to the
           file via calls to the <span class="codefrag">dataFileWriter.append</span> method.  When
           we are done writing, we close the data file.
         </p>
 <a name="Deserializing-N10220"></a>
 <h3 class="h4">Deserializing</h3>
 <p>
           Finally, we'll deserialize the data file we just created.
         </p>
 <pre class="code">
 // Deserialize users from disk
 DatumReader&lt;GenericRecord&gt; datumReader = new GenericDatumReader&lt;GenericRecord&gt;(schema);
 DataFileReader&lt;GenericRecord&gt; dataFileReader = new DataFileReader&lt;GenericRecord&gt;(file, datumReader);
 GenericRecord user = null;
 while (dataFileReader.hasNext()) {
 // Reuse user object by passing it to next(). This saves us from
 // allocating and garbage collecting many objects for files with
 // many items.
 user = dataFileReader.next(user);
 System.out.println(user);
         </pre>
 <p>This outputs:</p>
 <pre class="code">
 {"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
 {"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
         </pre>
 <p>
           Deserializing is very similar to serializing.  We create a
           <span class="codefrag">GenericDatumReader</span>, analogous to the
           <span class="codefrag">GenericDatumWriter</span> we used in serialization, which
           converts in-memory serialized items into <span class="codefrag">GenericRecords</span>.
           We pass the <span class="codefrag">DatumReader</span> and the previously created
           <span class="codefrag">File</span> to a <span class="codefrag">DataFileReader</span>, analogous to the
           <span class="codefrag">DataFileWriter</span>, which reads both the schema used by the
           writer as well as the data from the file on disk. The data will be
           read using the writer's schema included in the file, and the reader's
           schema provided to the <span class="codefrag">GenericDatumReader</span>.  The writer's
           schema is needed to know the order in which fields were written,
           while the reader's schema is needed to know what fields are expected
           and how to fill in default values for fields added since the file
           was written.  If there are differences between the two schemas, they
           are resolved according to the
           <a href="spec.html#Schema+Resolution">Schema Resolution</a>
           specification.
         </p>
 <p>
           Next, we use the <span class="codefrag">DataFileReader</span> to iterate through the
           serialized users and print the deserialized object to stdout.  Note
           how we perform the iteration: we create a single
           <span class="codefrag">GenericRecord</span> object which we store the current
           deserialized user in, and pass this record object to every call of
           <span class="codefrag">dataFileReader.next</span>.  This is a performance optimization
           that allows the <span class="codefrag">DataFileReader</span> to reuse the same record
           object rather than allocating a new <span class="codefrag">GenericRecord</span> for
           every iteration, which can be very expensive in terms of object
           allocation and garbage collection if we deserialize a large data file.
           While this technique is the standard way to iterate through a data
           file, it's also possible to use <span class="codefrag">for (GenericRecord user :
           dataFileReader)</span> if performance is not a concern.
         </p>
 <a name="Compiling+and+running+the+example+code-N10269"></a>
 <h3 class="h4">Compiling and running the example code</h3>
 <p>
           This example code is included as a Maven project in the
           <em>examples/java-example</em> directory in the Avro docs.  From this
           directory, execute the following commands to build and run the
           example:
         </p>
 <pre class="code">
 $ mvn compile
 $ mvn -q exec:java -Dexec.mainClass=example.GenericMain
         </pre>
 </div>

 </div>
 <!--+
     |end content
     +-->
 <div class="clearboth">&nbsp;</div>
 </div>
 <div id="footer">
 <!--+
     |start bottomstrip
     +-->
 <div class="lastmodified">
 <script type="text/javascript"><!--
 document.write("Last Published: " + document.lastModified);
 //  --></script>
 </div>
 <div class="copyright">
         Copyright &copy;
          2012 <a href="https://www.apache.org/licenses/">The Apache Software Foundation.</a>
 </div>
 <!--+
     |end bottomstrip
     +-->
 </div>
 </body>
 </html>