blob: 827b661e8ff7fb639d3485af412db34fc252992c [file]
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="Apache Forrest" name="Generator">
<meta name="Forrest-version" content="0.9">
<meta name="Forrest-skin-name" content="pelt">
<title>Apache Avro&#153; 1.10.1 Getting Started (Java)</title>
<link type="text/css" href="skin/basic.css" rel="stylesheet">
<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
<link type="text/css" href="skin/profile.css" rel="stylesheet">
<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
<link rel="shortcut icon" href="images/favicon.ico">
</head>
<body onload="init()">
<script type="text/javascript">ndeSetTextSize();</script>
<div id="top">
<!--+
|breadtrail
+-->
<div class="breadtrail">
<a href="https://www.apache.org/">Apache</a> &gt; <a href="https://avro.apache.org/">Avro</a> &gt; <a href="https://avro.apache.org/">Avro</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
</div>
<!--+
|header
+-->
<div class="header">
<!--+
|start group logo
+-->
<div class="grouplogo">
<a href="https://www.apache.org/"><img class="logoImage" alt="Apache" src="images/apache_feather.gif" title="The Apache Software Foundation"></a>
</div>
<!--+
|end group logo
+-->
<!--+
|start Project Logo
+-->
<div class="projectlogo">
<a href="https://avro.apache.org/"><img class="logoImage" alt="Avro" src="images/avro-logo.png" title="Serialization System"></a>
</div>
<!--+
|end Project Logo
+-->
<!--+
|start Search
+-->
<div class="searchbox">
<form action="http://www.google.com/search" method="get" class="roundtopsmall">
<input value="avro.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp;
<input name="Search" value="Search" type="submit">
</form>
</div>
<!--+
|end search
+-->
<!--+
|start Tabs
+-->
<ul id="tabs">
<li>
<a class="unselected" href="https://avro.apache.org/">Project</a>
</li>
<li>
<a class="unselected" href="https://cwiki.apache.org/confluence/display/AVRO/Index">Wiki</a>
</li>
<li class="current">
<a class="selected" href="index.html">Avro 1.10.1 Documentation</a>
</li>
</ul>
<!--+
|end Tabs
+-->
</div>
</div>
<div id="main">
<div id="publishedStrip">
<!--+
|start Subtabs
+-->
<div id="level2tabs"></div>
<!--+
|end Endtabs
+-->
<script type="text/javascript"><!--
document.write("Last Published: " + document.lastModified);
// --></script>
</div>
<!--+
|breadtrail
+-->
<div class="breadtrail">
&nbsp;
</div>
<!--+
|start Menu, mainarea
+-->
<!--+
|start Menu
+-->
<div id="menu">
<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
<div class="menuitem">
<a href="index.html">Overview</a>
</div>
<div class="menupage">
<div class="menupagetitle">Getting started (Java)</div>
</div>
<div class="menuitem">
<a href="gettingstartedpython.html">Getting started (Python)</a>
</div>
<div class="menuitem">
<a href="spec.html">Specification</a>
</div>
<div class="menuitem">
<a href="trevni/spec.html">Trevni</a>
</div>
<div class="menuitem">
<a href="api/java/index.html">Java API</a>
</div>
<div class="menuitem">
<a href="api/c/index.html">C API</a>
</div>
<div class="menuitem">
<a href="api/cpp/html/index.html">C++ API</a>
</div>
<div class="menuitem">
<a href="api/csharp/html/index.html">C# API</a>
</div>
<div class="menuitem">
<a href="mr.html">MapReduce guide</a>
</div>
<div class="menuitem">
<a href="idl.html">IDL language</a>
</div>
<div class="menuitem">
<a href="sasl.html">SASL profile</a>
</div>
<div class="menuitem">
<a href="https://cwiki.apache.org/confluence/display/AVRO/Index">Wiki</a>
</div>
<div class="menuitem">
<a href="https://cwiki.apache.org/confluence/display/AVRO/FAQ">FAQ</a>
</div>
</div>
<div id="credit"></div>
<div id="roundbottom">
<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
<!--+
|alternative credits
+-->
<div id="credit2"></div>
</div>
<!--+
|end Menu
+-->
<!--+
|start content
+-->
<div id="content">
<div title="Portable Document Format" class="pdflink">
<a class="dida" href="gettingstartedjava.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
PDF</a>
</div>
<h1>Apache Avro&#153; 1.10.1 Getting Started (Java)</h1>
<div id="front-matter">
<div id="minitoc-area">
<ul class="minitoc">
<li>
<a href="#download_install">Download</a>
</li>
<li>
<a href="#Defining+a+schema">Defining a schema</a>
</li>
<li>
<a href="#Serializing+and+deserializing+with+code+generation">Serializing and deserializing with code generation</a>
<ul class="minitoc">
<li>
<a href="#Compiling+the+schema">Compiling the schema</a>
</li>
<li>
<a href="#Creating+Users">Creating Users</a>
</li>
<li>
<a href="#Serializing">Serializing</a>
</li>
<li>
<a href="#Deserializing">Deserializing</a>
</li>
<li>
<a href="#Compiling+and+running+the+example+code">Compiling and running the example code</a>
</li>
<li>
<a href="#Beta+feature%3A+Generating+faster+code">Beta feature: Generating faster code</a>
</li>
</ul>
</li>
<li>
<a href="#Serializing+and+deserializing+without+code+generation">Serializing and deserializing without code generation</a>
<ul class="minitoc">
<li>
<a href="#Creating+users">Creating users</a>
</li>
<li>
<a href="#Serializing-N101F7">Serializing</a>
</li>
<li>
<a href="#Deserializing-N10220">Deserializing</a>
</li>
<li>
<a href="#Compiling+and+running+the+example+code-N10269">Compiling and running the example code</a>
</li>
</ul>
</li>
</ul>
</div>
</div>
<p>
This is a short guide for getting started with Apache Avro&#153; using
Java. This guide only covers using Avro for data serialization; see
Patrick Hunt's <a href="https://github.com/phunt/avro-rpc-quickstart">Avro
RPC Quick Start</a> for a good introduction to using Avro for RPC.
</p>
<a name="download_install"></a>
<h2 class="h3">Download</h2>
<div class="section">
<p>
Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be
downloaded from the <a href="https://avro.apache.org/releases.html">Apache Avro&#153;
Releases</a> page. This guide uses Avro 1.10.1, the latest
version at the time of writing. For the examples in this guide,
download <em>avro-1.10.1.jar</em> and
<em>avro-tools-1.10.1.jar</em>.
</p>
<p>
Alternatively, if you are using Maven, add the following dependency to
your POM:
</p>
<pre class="code">
&lt;dependency&gt;
&lt;groupId&gt;org.apache.avro&lt;/groupId&gt;
&lt;artifactId&gt;avro&lt;/artifactId&gt;
&lt;version&gt;1.10.1&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<p>
As well as the Avro Maven plugin (for performing code generation):
</p>
<pre class="code">
&lt;plugin&gt;
&lt;groupId&gt;org.apache.avro&lt;/groupId&gt;
&lt;artifactId&gt;avro-maven-plugin&lt;/artifactId&gt;
&lt;version&gt;1.10.1&lt;/version&gt;
&lt;executions&gt;
&lt;execution&gt;
&lt;phase&gt;generate-sources&lt;/phase&gt;
&lt;goals&gt;
&lt;goal&gt;schema&lt;/goal&gt;
&lt;/goals&gt;
&lt;configuration&gt;
&lt;sourceDirectory&gt;${project.basedir}/src/main/avro/&lt;/sourceDirectory&gt;
&lt;outputDirectory&gt;${project.basedir}/src/main/java/&lt;/outputDirectory&gt;
&lt;/configuration&gt;
&lt;/execution&gt;
&lt;/executions&gt;
&lt;/plugin&gt;
&lt;plugin&gt;
&lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt;
&lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;
&lt;configuration&gt;
&lt;source&gt;1.8&lt;/source&gt;
&lt;target&gt;1.8&lt;/target&gt;
&lt;/configuration&gt;
&lt;/plugin&gt;
</pre>
<p>
You may also build the required Avro jars from source. Building Avro is
beyond the scope of this guide; see the <a href="https://cwiki.apache.org/AVRO/Build+Documentation">Build
Documentation</a> page in the wiki for more information.
</p>
</div>
<a name="Defining+a+schema"></a>
<h2 class="h3">Defining a schema</h2>
<div class="section">
<p>
Avro schemas are defined using JSON. Schemas are composed of <a href="spec.html#schema_primitive">primitive types</a>
(<span class="codefrag">null</span>, <span class="codefrag">boolean</span>, <span class="codefrag">int</span>,
<span class="codefrag">long</span>, <span class="codefrag">float</span>, <span class="codefrag">double</span>,
<span class="codefrag">bytes</span>, and <span class="codefrag">string</span>) and <a href="spec.html#schema_complex">complex types</a> (<span class="codefrag">record</span>,
<span class="codefrag">enum</span>, <span class="codefrag">array</span>, <span class="codefrag">map</span>,
<span class="codefrag">union</span>, and <span class="codefrag">fixed</span>). You can learn more about
Avro schemas and types from the specification, but for now let's start
with a simple schema example, <em>user.avsc</em>:
</p>
<pre class="code">
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
</pre>
<p>
This schema defines a record representing a hypothetical user. (Note
that a schema file can only contain a single schema definition.) At
minimum, a record definition must include its type (<span class="codefrag">"type":
"record"</span>), a name (<span class="codefrag">"name": "User"</span>), and fields, in
this case <span class="codefrag">name</span>, <span class="codefrag">favorite_number</span>, and
<span class="codefrag">favorite_color</span>. We also define a namespace
(<span class="codefrag">"namespace": "example.avro"</span>), which together with the name
attribute defines the "full name" of the schema
(<span class="codefrag">example.avro.User</span> in this case).
</p>
<p>
Fields are defined via an array of objects, each of which defines a name
and type (other attributes are optional, see the <a href="spec.html#schema_record">record specification</a> for more
details). The type attribute of a field is another schema object, which
can be either a primitive or complex type. For example, the
<span class="codefrag">name</span> field of our User schema is the primitive type
<span class="codefrag">string</span>, whereas the <span class="codefrag">favorite_number</span> and
<span class="codefrag">favorite_color</span> fields are both <span class="codefrag">union</span>s,
represented by JSON arrays. <span class="codefrag">union</span>s are a complex type that
can be any of the types listed in the array; e.g.,
<span class="codefrag">favorite_number</span> can either be an <span class="codefrag">int</span> or
<span class="codefrag">null</span>, essentially making it an optional field.
</p>
</div>
<a name="Serializing+and+deserializing+with+code+generation"></a>
<h2 class="h3">Serializing and deserializing with code generation</h2>
<div class="section">
<a name="Compiling+the+schema"></a>
<h3 class="h4">Compiling the schema</h3>
<p>
Code generation allows us to automatically create classes based on our
previously-defined schema. Once we have defined the relevant classes,
there is no need to use the schema directly in our programs. We use the
avro-tools jar to generate code as follows:
</p>
<pre class="code">
java -jar /path/to/avro-tools-1.10.1.jar compile schema &lt;schema file&gt; &lt;destination&gt;
</pre>
<p>
This will generate the appropriate source files in a package based on
the schema's namespace in the provided destination folder. For
instance, to generate a <span class="codefrag">User</span> class in package
<span class="codefrag">example.avro</span> from the schema defined above, run
</p>
<pre class="code">
java -jar /path/to/avro-tools-1.10.1.jar compile schema user.avsc .
</pre>
<p>
Note that if you using the Avro Maven plugin, there is no need to
manually invoke the schema compiler; the plugin automatically
performs code generation on any .avsc files present in the configured
source directory.
</p>
<a name="Creating+Users"></a>
<h3 class="h4">Creating Users</h3>
<p>
Now that we've completed the code generation, let's create some
<span class="codefrag">User</span>s, serialize them to a data file on disk, and then
read back the file and deserialize the <span class="codefrag">User</span> objects.
</p>
<p>
First let's create some <span class="codefrag">User</span>s and set their fields.
</p>
<pre class="code">
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
// Leave favorite color null
// Alternate constructor
User user2 = new User("Ben", 7, "red");
// Construct via builder
User user3 = User.newBuilder()
.setName("Charlie")
.setFavoriteColor("blue")
.setFavoriteNumber(null)
.build();
</pre>
<p>
As shown in this example, Avro objects can be created either by
invoking a constructor directly or by using a builder. Unlike
constructors, builders will automatically set any default values
specified in the schema. Additionally, builders validate the data as
it set, whereas objects constructed directly will not cause an error
until the object is serialized. However, using constructors directly
generally offers better performance, as builders create a copy of the
datastructure before it is written.
</p>
<p>
Note that we do not set <span class="codefrag">user1</span>'s favorite color. Since
that record is of type <span class="codefrag">["string", "null"]</span>, we can either
set it to a <span class="codefrag">string</span> or leave it <span class="codefrag">null</span>; it is
essentially optional. Similarly, we set <span class="codefrag">user3</span>'s favorite
number to null (using a builder requires setting all fields, even if
they are null).
</p>
<a name="Serializing"></a>
<h3 class="h4">Serializing</h3>
<p>
Now let's serialize our <span class="codefrag">User</span>s to disk.
</p>
<pre class="code">
// Serialize user1, user2 and user3 to disk
DatumWriter&lt;User&gt; userDatumWriter = new SpecificDatumWriter&lt;User&gt;(User.class);
DataFileWriter&lt;User&gt; dataFileWriter = new DataFileWriter&lt;User&gt;(userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File("users.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
</pre>
<p>
We create a <span class="codefrag">DatumWriter</span>, which converts Java objects into
an in-memory serialized format. The <span class="codefrag">SpecificDatumWriter</span>
class is used with generated classes and extracts the schema from the
specified generated type.
</p>
<p>
Next we create a <span class="codefrag">DataFileWriter</span>, which writes the
serialized records, as well as the schema, to the file specified in the
<span class="codefrag">dataFileWriter.create</span> call. We write our users to the file
via calls to the <span class="codefrag">dataFileWriter.append</span> method. When we are
done writing, we close the data file.
</p>
<a name="Deserializing"></a>
<h3 class="h4">Deserializing</h3>
<p>
Finally, let's deserialize the data file we just created.
</p>
<pre class="code">
// Deserialize Users from disk
DatumReader&lt;User&gt; userDatumReader = new SpecificDatumReader&lt;User&gt;(User.class);
DataFileReader&lt;User&gt; dataFileReader = new DataFileReader&lt;User&gt;(file, userDatumReader);
User user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
}
</pre>
<p>
This snippet will output:
</p>
<pre class="code">
{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
{"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}
</pre>
<p>
Deserializing is very similar to serializing. We create a
<span class="codefrag">SpecificDatumReader</span>, analogous to the
<span class="codefrag">SpecificDatumWriter</span> we used in serialization, which
converts in-memory serialized items into instances of our generated
class, in this case <span class="codefrag">User</span>. We pass the
<span class="codefrag">DatumReader</span> and the previously created <span class="codefrag">File</span>
to a <span class="codefrag">DataFileReader</span>, analogous to the
<span class="codefrag">DataFileWriter</span>, which reads both the schema used by the
writer as well as the data from the file on disk. The data will be
read using the writer's schema included in the file and the
schema provided by the reader, in this case the <span class="codefrag">User</span>
class. The writer's schema is needed to know the order in which
fields were written, while the reader's schema is needed to know what
fields are expected and how to fill in default values for fields
added since the file was written. If there are differences between
the two schemas, they are resolved according to the
<a href="spec.html#Schema+Resolution">Schema Resolution</a>
specification.
</p>
<p>
Next we use the <span class="codefrag">DataFileReader</span> to iterate through the
serialized <span class="codefrag">User</span>s and print the deserialized object to
stdout. Note how we perform the iteration: we create a single
<span class="codefrag">User</span> object which we store the current deserialized user
in, and pass this record object to every call of
<span class="codefrag">dataFileReader.next</span>. This is a performance optimization
that allows the <span class="codefrag">DataFileReader</span> to reuse the same
<span class="codefrag">User</span> object rather than allocating a new
<span class="codefrag">User</span> for every iteration, which can be very expensive in
terms of object allocation and garbage collection if we deserialize a
large data file. While this technique is the standard way to iterate
through a data file, it's also possible to use <span class="codefrag">for (User user :
dataFileReader)</span> if performance is not a concern.
</p>
<a name="Compiling+and+running+the+example+code"></a>
<h3 class="h4">Compiling and running the example code</h3>
<p>
This example code is included as a Maven project in the
<em>examples/java-example</em> directory in the Avro docs. From this
directory, execute the following commands to build and run the
example:
</p>
<pre class="code">
$ mvn compile # includes code generation via Avro Maven plugin
$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain
</pre>
<a name="Beta+feature%3A+Generating+faster+code"></a>
<h3 class="h4">Beta feature: Generating faster code</h3>
<p>
In this release we have introduced a new approach to
generating code that speeds up decoding of objects by more
than 10% and encoding by more than 30% (future performance
enhancements are underway). To ensure a smooth introduction
of this change into production systems, this feature is
controlled by a feature flag, the system
property <span class="codefrag">org.apache.avro.specific.use_custom_coders</span>.
In this first release, this feature is off by default. To
turn it on, set the system flag to <span class="codefrag">true</span> at
runtime. In the sample above, for example, you could enable
the fater coders as follows:
</p>
<pre class="code">
$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \
-Dorg.apache.avro.specific.use_custom_coders=true
</pre>
<p>
Note that you do <em>not</em> have to recompile your Avro
schema to have access to this feature. The feature is
compiled and built into your code, and you turn it on and
off at runtime using the feature flag. As a result, you can
turn it on during testing, for example, and then off in
production. Or you can turn it on in production, and
quickly turn it off if something breaks.
</p>
<p>
We encourage the Avro community to exercise this new feature
early to help build confidence. (For those paying
one-demand for compute resources in the cloud, it can lead
to meaningful cost savings.) As confidence builds, we will
turn this feature on by default, and eventually eliminate
the feature flag (and the old code).
</p>
</div>
<a name="Serializing+and+deserializing+without+code+generation"></a>
<h2 class="h3">Serializing and deserializing without code generation</h2>
<div class="section">
<p>
Data in Avro is always stored with its corresponding schema, meaning we
can always read a serialized item regardless of whether we know the
schema ahead of time. This allows us to perform serialization and
deserialization without code generation.
</p>
<p>
Let's go over the same example as in the previous section, but without
using code generation: we'll create some users, serialize them to a data
file on disk, and then read back the file and deserialize the users
objects.
</p>
<a name="Creating+users"></a>
<h3 class="h4">Creating users</h3>
<p>
First, we use a <span class="codefrag">Parser</span> to read our schema definition and
create a <span class="codefrag">Schema</span> object.
</p>
<pre class="code">
Schema schema = new Schema.Parser().parse(new File("user.avsc"));
</pre>
<p>
Using this schema, let's create some users.
</p>
<pre class="code">
GenericRecord user1 = new GenericData.Record(schema);
user1.put("name", "Alyssa");
user1.put("favorite_number", 256);
// Leave favorite color null
GenericRecord user2 = new GenericData.Record(schema);
user2.put("name", "Ben");
user2.put("favorite_number", 7);
user2.put("favorite_color", "red");
</pre>
<p>
Since we're not using code generation, we use
<span class="codefrag">GenericRecord</span>s to represent users.
<span class="codefrag">GenericRecord</span> uses the schema to verify that we only
specify valid fields. If we try to set a non-existent field (e.g.,
<span class="codefrag">user1.put("favorite_animal", "cat")</span>), we'll get an
<span class="codefrag">AvroRuntimeException</span> when we run the program.
</p>
<p>
Note that we do not set <span class="codefrag">user1</span>'s favorite color. Since
that record is of type <span class="codefrag">["string", "null"]</span>, we can either
set it to a <span class="codefrag">string</span> or leave it <span class="codefrag">null</span>; it is
essentially optional.
</p>
<a name="Serializing-N101F7"></a>
<h3 class="h4">Serializing</h3>
<p>
Now that we've created our user objects, serializing and deserializing
them is almost identical to the example above which uses code
generation. The main difference is that we use generic instead of
specific readers and writers.
</p>
<p>
First we'll serialize our users to a data file on disk.
</p>
<pre class="code">
// Serialize user1 and user2 to disk
File file = new File("users.avro");
DatumWriter&lt;GenericRecord&gt; datumWriter = new GenericDatumWriter&lt;GenericRecord&gt;(schema);
DataFileWriter&lt;GenericRecord&gt; dataFileWriter = new DataFileWriter&lt;GenericRecord&gt;(datumWriter);
dataFileWriter.create(schema, file);
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.close();
</pre>
<p>
We create a <span class="codefrag">DatumWriter</span>, which converts Java objects into
an in-memory serialized format. Since we are not using code
generation, we create a <span class="codefrag">GenericDatumWriter</span>. It requires
the schema both to determine how to write the
<span class="codefrag">GenericRecord</span>s and to verify that all non-nullable fields
are present.
</p>
<p>
As in the code generation example, we also create a
<span class="codefrag">DataFileWriter</span>, which writes the serialized records, as
well as the schema, to the file specified in the
<span class="codefrag">dataFileWriter.create</span> call. We write our users to the
file via calls to the <span class="codefrag">dataFileWriter.append</span> method. When
we are done writing, we close the data file.
</p>
<a name="Deserializing-N10220"></a>
<h3 class="h4">Deserializing</h3>
<p>
Finally, we'll deserialize the data file we just created.
</p>
<pre class="code">
// Deserialize users from disk
DatumReader&lt;GenericRecord&gt; datumReader = new GenericDatumReader&lt;GenericRecord&gt;(schema);
DataFileReader&lt;GenericRecord&gt; dataFileReader = new DataFileReader&lt;GenericRecord&gt;(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
</pre>
<p>This outputs:</p>
<pre class="code">
{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
</pre>
<p>
Deserializing is very similar to serializing. We create a
<span class="codefrag">GenericDatumReader</span>, analogous to the
<span class="codefrag">GenericDatumWriter</span> we used in serialization, which
converts in-memory serialized items into <span class="codefrag">GenericRecords</span>.
We pass the <span class="codefrag">DatumReader</span> and the previously created
<span class="codefrag">File</span> to a <span class="codefrag">DataFileReader</span>, analogous to the
<span class="codefrag">DataFileWriter</span>, which reads both the schema used by the
writer as well as the data from the file on disk. The data will be
read using the writer's schema included in the file, and the reader's
schema provided to the <span class="codefrag">GenericDatumReader</span>. The writer's
schema is needed to know the order in which fields were written,
while the reader's schema is needed to know what fields are expected
and how to fill in default values for fields added since the file
was written. If there are differences between the two schemas, they
are resolved according to the
<a href="spec.html#Schema+Resolution">Schema Resolution</a>
specification.
</p>
<p>
Next, we use the <span class="codefrag">DataFileReader</span> to iterate through the
serialized users and print the deserialized object to stdout. Note
how we perform the iteration: we create a single
<span class="codefrag">GenericRecord</span> object which we store the current
deserialized user in, and pass this record object to every call of
<span class="codefrag">dataFileReader.next</span>. This is a performance optimization
that allows the <span class="codefrag">DataFileReader</span> to reuse the same record
object rather than allocating a new <span class="codefrag">GenericRecord</span> for
every iteration, which can be very expensive in terms of object
allocation and garbage collection if we deserialize a large data file.
While this technique is the standard way to iterate through a data
file, it's also possible to use <span class="codefrag">for (GenericRecord user :
dataFileReader)</span> if performance is not a concern.
</p>
<a name="Compiling+and+running+the+example+code-N10269"></a>
<h3 class="h4">Compiling and running the example code</h3>
<p>
This example code is included as a Maven project in the
<em>examples/java-example</em> directory in the Avro docs. From this
directory, execute the following commands to build and run the
example:
</p>
<pre class="code">
$ mvn compile
$ mvn -q exec:java -Dexec.mainClass=example.GenericMain
</pre>
</div>
</div>
<!--+
|end content
+-->
<div class="clearboth">&nbsp;</div>
</div>
<div id="footer">
<!--+
|start bottomstrip
+-->
<div class="lastmodified">
<script type="text/javascript"><!--
document.write("Last Published: " + document.lastModified);
// --></script>
</div>
<div class="copyright">
Copyright &copy;
2012 <a href="https://www.apache.org/licenses/">The Apache Software Foundation.</a>
</div>
<!--+
|end bottomstrip
+-->
</div>
</body>
</html>