| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <head> |
| <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <meta content="Apache Forrest" name="Generator"> |
| <meta name="Forrest-version" content="0.9"> |
| <meta name="Forrest-skin-name" content="pelt"> |
| <title>Apache Avro™ 1.10.2 Getting Started (Java)</title> |
| <link type="text/css" href="skin/basic.css" rel="stylesheet"> |
| <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet"> |
| <link media="print" type="text/css" href="skin/print.css" rel="stylesheet"> |
| <link type="text/css" href="skin/profile.css" rel="stylesheet"> |
| <script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script> |
| <link rel="shortcut icon" href="images/favicon.ico"> |
| </head> |
| <body onload="init()"> |
| <script type="text/javascript">ndeSetTextSize();</script> |
| <div id="top"> |
| <!--+ |
| |breadtrail |
| +--> |
| <div class="breadtrail"> |
| <a href="https://www.apache.org/">Apache</a> > <a href="https://avro.apache.org/">Avro</a> > <a href="https://avro.apache.org/">Avro</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script> |
| </div> |
| <!--+ |
| |header |
| +--> |
| <div class="header"> |
| <!--+ |
| |start group logo |
| +--> |
| <div class="grouplogo"> |
| <a href="https://www.apache.org/"><img class="logoImage" alt="Apache" src="images/apache_feather.gif" title="The Apache Software Foundation"></a> |
| </div> |
| <!--+ |
| |end group logo |
| +--> |
| <!--+ |
| |start Project Logo |
| +--> |
| <div class="projectlogo"> |
| <a href="https://avro.apache.org/"><img class="logoImage" alt="Avro" src="images/avro-logo.png" title="Serialization System"></a> |
| </div> |
| <!--+ |
| |end Project Logo |
| +--> |
| <!--+ |
| |start Search |
| +--> |
| <div class="searchbox"> |
| <form action="http://www.google.com/search" method="get" class="roundtopsmall"> |
| <input value="avro.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google"> |
| <input name="Search" value="Search" type="submit"> |
| </form> |
| </div> |
| <!--+ |
| |end search |
| +--> |
| <!--+ |
| |start Tabs |
| +--> |
| <ul id="tabs"> |
| <li> |
| <a class="unselected" href="https://avro.apache.org/">Project</a> |
| </li> |
| <li> |
| <a class="unselected" href="https://cwiki.apache.org/confluence/display/AVRO/Index">Wiki</a> |
| </li> |
| <li class="current"> |
| <a class="selected" href="index.html">Avro 1.10.2 Documentation</a> |
| </li> |
| </ul> |
| <!--+ |
| |end Tabs |
| +--> |
| </div> |
| </div> |
| <div id="main"> |
| <div id="publishedStrip"> |
| <!--+ |
| |start Subtabs |
| +--> |
| <div id="level2tabs"></div> |
| <!--+ |
| |end Endtabs |
| +--> |
| <script type="text/javascript"><!-- |
| document.write("Last Published: " + document.lastModified); |
| // --></script> |
| </div> |
| <!--+ |
| |breadtrail |
| +--> |
| <div class="breadtrail"> |
| |
| |
| </div> |
| <!--+ |
| |start Menu, mainarea |
| +--> |
| <!--+ |
| |start Menu |
| +--> |
| <div id="menu"> |
| <div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div> |
| <div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;"> |
| <div class="menuitem"> |
| <a href="index.html">Overview</a> |
| </div> |
| <div class="menupage"> |
| <div class="menupagetitle">Getting started (Java)</div> |
| </div> |
| <div class="menuitem"> |
| <a href="gettingstartedpython.html">Getting started (Python)</a> |
| </div> |
| <div class="menuitem"> |
| <a href="spec.html">Specification</a> |
| </div> |
| <div class="menuitem"> |
| <a href="trevni/spec.html">Trevni</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/java/index.html">Java API</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/c/index.html">C API</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/cpp/html/index.html">C++ API</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/csharp/html/index.html">C# API</a> |
| </div> |
| <div class="menuitem"> |
| <a href="mr.html">MapReduce guide</a> |
| </div> |
| <div class="menuitem"> |
| <a href="idl.html">IDL language</a> |
| </div> |
| <div class="menuitem"> |
| <a href="sasl.html">SASL profile</a> |
| </div> |
| <div class="menuitem"> |
| <a href="https://cwiki.apache.org/confluence/display/AVRO/Index">Wiki</a> |
| </div> |
| <div class="menuitem"> |
| <a href="https://cwiki.apache.org/confluence/display/AVRO/FAQ">FAQ</a> |
| </div> |
| </div> |
| <div id="credit"></div> |
| <div id="roundbottom"> |
| <img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div> |
| <!--+ |
| |alternative credits |
| +--> |
| <div id="credit2"></div> |
| </div> |
| <!--+ |
| |end Menu |
| +--> |
| <!--+ |
| |start content |
| +--> |
| <div id="content"> |
| <div title="Portable Document Format" class="pdflink"> |
| <a class="dida" href="gettingstartedjava.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br> |
| PDF</a> |
| </div> |
| <h1>Apache Avro™ 1.10.2 Getting Started (Java)</h1> |
| <div id="front-matter"> |
| <div id="minitoc-area"> |
| <ul class="minitoc"> |
| <li> |
| <a href="#download_install">Download</a> |
| </li> |
| <li> |
| <a href="#Defining+a+schema">Defining a schema</a> |
| </li> |
| <li> |
| <a href="#Serializing+and+deserializing+with+code+generation">Serializing and deserializing with code generation</a> |
| <ul class="minitoc"> |
| <li> |
| <a href="#Compiling+the+schema">Compiling the schema</a> |
| </li> |
| <li> |
| <a href="#Creating+Users">Creating Users</a> |
| </li> |
| <li> |
| <a href="#Serializing">Serializing</a> |
| </li> |
| <li> |
| <a href="#Deserializing">Deserializing</a> |
| </li> |
| <li> |
| <a href="#Compiling+and+running+the+example+code">Compiling and running the example code</a> |
| </li> |
| <li> |
| <a href="#Beta+feature%3A+Generating+faster+code">Beta feature: Generating faster code</a> |
| </li> |
| </ul> |
| </li> |
| <li> |
| <a href="#Serializing+and+deserializing+without+code+generation">Serializing and deserializing without code generation</a> |
| <ul class="minitoc"> |
| <li> |
| <a href="#Creating+users">Creating users</a> |
| </li> |
| <li> |
| <a href="#Serializing-N101F7">Serializing</a> |
| </li> |
| <li> |
| <a href="#Deserializing-N10220">Deserializing</a> |
| </li> |
| <li> |
| <a href="#Compiling+and+running+the+example+code-N10269">Compiling and running the example code</a> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| </div> |
| |
| <p> |
| This is a short guide for getting started with Apache Avro™ using |
| Java. This guide only covers using Avro for data serialization; see |
| Patrick Hunt's <a href="https://github.com/phunt/avro-rpc-quickstart">Avro |
| RPC Quick Start</a> for a good introduction to using Avro for RPC. |
| </p> |
| |
| <a name="download_install"></a> |
| <h2 class="h3">Download</h2> |
| <div class="section"> |
| <p> |
| Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be |
| downloaded from the <a href="https://avro.apache.org/releases.html">Apache Avro™ |
| Releases</a> page. This guide uses Avro 1.10.2, the latest |
| version at the time of writing. For the examples in this guide, |
| download <em>avro-1.10.2.jar</em> and |
| <em>avro-tools-1.10.2.jar</em>. |
| </p> |
| <p> |
| Alternatively, if you are using Maven, add the following dependency to |
| your POM: |
| </p> |
| <pre class="code"> |
| <dependency> |
| <groupId>org.apache.avro</groupId> |
| <artifactId>avro</artifactId> |
| <version>1.10.2</version> |
| </dependency> |
| </pre> |
| <p> |
| As well as the Avro Maven plugin (for performing code generation): |
| </p> |
| <pre class="code"> |
| <plugin> |
| <groupId>org.apache.avro</groupId> |
| <artifactId>avro-maven-plugin</artifactId> |
| <version>1.10.2</version> |
| <executions> |
| <execution> |
| <phase>generate-sources</phase> |
| <goals> |
| <goal>schema</goal> |
| </goals> |
| <configuration> |
| <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory> |
| <outputDirectory>${project.basedir}/src/main/java/</outputDirectory> |
| </configuration> |
| </execution> |
| </executions> |
| </plugin> |
| <plugin> |
| <groupId>org.apache.maven.plugins</groupId> |
| <artifactId>maven-compiler-plugin</artifactId> |
| <configuration> |
| <source>1.8</source> |
| <target>1.8</target> |
| </configuration> |
| </plugin> |
| </pre> |
| <p> |
| You may also build the required Avro jars from source. Building Avro is |
| beyond the scope of this guide; see the <a href="https://cwiki.apache.org/AVRO/Build+Documentation">Build |
| Documentation</a> page in the wiki for more information. |
| </p> |
| </div> |
| |
| |
| <a name="Defining+a+schema"></a> |
| <h2 class="h3">Defining a schema</h2> |
| <div class="section"> |
| <p> |
| Avro schemas are defined using JSON. Schemas are composed of <a href="spec.html#schema_primitive">primitive types</a> |
| (<span class="codefrag">null</span>, <span class="codefrag">boolean</span>, <span class="codefrag">int</span>, |
| <span class="codefrag">long</span>, <span class="codefrag">float</span>, <span class="codefrag">double</span>, |
| <span class="codefrag">bytes</span>, and <span class="codefrag">string</span>) and <a href="spec.html#schema_complex">complex types</a> (<span class="codefrag">record</span>, |
| <span class="codefrag">enum</span>, <span class="codefrag">array</span>, <span class="codefrag">map</span>, |
| <span class="codefrag">union</span>, and <span class="codefrag">fixed</span>). You can learn more about |
| Avro schemas and types from the specification, but for now let's start |
| with a simple schema example, <em>user.avsc</em>: |
| </p> |
| <pre class="code"> |
| {"namespace": "example.avro", |
| "type": "record", |
| "name": "User", |
| "fields": [ |
| {"name": "name", "type": "string"}, |
| {"name": "favorite_number", "type": ["int", "null"]}, |
| {"name": "favorite_color", "type": ["string", "null"]} |
| ] |
| } |
| </pre> |
| <p> |
| This schema defines a record representing a hypothetical user. (Note |
| that a schema file can only contain a single schema definition.) At |
| minimum, a record definition must include its type (<span class="codefrag">"type": |
| "record"</span>), a name (<span class="codefrag">"name": "User"</span>), and fields, in |
| this case <span class="codefrag">name</span>, <span class="codefrag">favorite_number</span>, and |
| <span class="codefrag">favorite_color</span>. We also define a namespace |
| (<span class="codefrag">"namespace": "example.avro"</span>), which together with the name |
| attribute defines the "full name" of the schema |
| (<span class="codefrag">example.avro.User</span> in this case). |
| |
| </p> |
| <p> |
| Fields are defined via an array of objects, each of which defines a name |
| and type (other attributes are optional, see the <a href="spec.html#schema_record">record specification</a> for more |
| details). The type attribute of a field is another schema object, which |
| can be either a primitive or complex type. For example, the |
| <span class="codefrag">name</span> field of our User schema is the primitive type |
| <span class="codefrag">string</span>, whereas the <span class="codefrag">favorite_number</span> and |
| <span class="codefrag">favorite_color</span> fields are both <span class="codefrag">union</span>s, |
| represented by JSON arrays. <span class="codefrag">union</span>s are a complex type that |
| can be any of the types listed in the array; e.g., |
| <span class="codefrag">favorite_number</span> can either be an <span class="codefrag">int</span> or |
| <span class="codefrag">null</span>, essentially making it an optional field. |
| </p> |
| </div> |
| |
| |
| <a name="Serializing+and+deserializing+with+code+generation"></a> |
| <h2 class="h3">Serializing and deserializing with code generation</h2> |
| <div class="section"> |
| <a name="Compiling+the+schema"></a> |
| <h3 class="h4">Compiling the schema</h3> |
| <p> |
| Code generation allows us to automatically create classes based on our |
| previously-defined schema. Once we have defined the relevant classes, |
| there is no need to use the schema directly in our programs. We use the |
| avro-tools jar to generate code as follows: |
| </p> |
| <pre class="code"> |
| java -jar /path/to/avro-tools-1.10.2.jar compile schema <schema file> <destination> |
| </pre> |
| <p> |
| This will generate the appropriate source files in a package based on |
| the schema's namespace in the provided destination folder. For |
| instance, to generate a <span class="codefrag">User</span> class in package |
| <span class="codefrag">example.avro</span> from the schema defined above, run |
| </p> |
| <pre class="code"> |
| java -jar /path/to/avro-tools-1.10.2.jar compile schema user.avsc . |
| </pre> |
| <p> |
| Note that if you using the Avro Maven plugin, there is no need to |
| manually invoke the schema compiler; the plugin automatically |
| performs code generation on any .avsc files present in the configured |
| source directory. |
| </p> |
| <a name="Creating+Users"></a> |
| <h3 class="h4">Creating Users</h3> |
| <p> |
| Now that we've completed the code generation, let's create some |
| <span class="codefrag">User</span>s, serialize them to a data file on disk, and then |
| read back the file and deserialize the <span class="codefrag">User</span> objects. |
| </p> |
| <p> |
| First let's create some <span class="codefrag">User</span>s and set their fields. |
| </p> |
| <pre class="code"> |
| User user1 = new User(); |
| user1.setName("Alyssa"); |
| user1.setFavoriteNumber(256); |
| // Leave favorite color null |
| |
| // Alternate constructor |
| User user2 = new User("Ben", 7, "red"); |
| |
| // Construct via builder |
| User user3 = User.newBuilder() |
| .setName("Charlie") |
| .setFavoriteColor("blue") |
| .setFavoriteNumber(null) |
| .build(); |
| </pre> |
| <p> |
| As shown in this example, Avro objects can be created either by |
| invoking a constructor directly or by using a builder. Unlike |
| constructors, builders will automatically set any default values |
| specified in the schema. Additionally, builders validate the data as |
| it set, whereas objects constructed directly will not cause an error |
| until the object is serialized. However, using constructors directly |
| generally offers better performance, as builders create a copy of the |
| datastructure before it is written. |
| </p> |
| <p> |
| Note that we do not set <span class="codefrag">user1</span>'s favorite color. Since |
| that record is of type <span class="codefrag">["string", "null"]</span>, we can either |
| set it to a <span class="codefrag">string</span> or leave it <span class="codefrag">null</span>; it is |
| essentially optional. Similarly, we set <span class="codefrag">user3</span>'s favorite |
| number to null (using a builder requires setting all fields, even if |
| they are null). |
| </p> |
| <a name="Serializing"></a> |
| <h3 class="h4">Serializing</h3> |
| <p> |
| Now let's serialize our <span class="codefrag">User</span>s to disk. |
| </p> |
| <pre class="code"> |
| // Serialize user1, user2 and user3 to disk |
| DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class); |
| DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter); |
| dataFileWriter.create(user1.getSchema(), new File("users.avro")); |
| dataFileWriter.append(user1); |
| dataFileWriter.append(user2); |
| dataFileWriter.append(user3); |
| dataFileWriter.close(); |
| </pre> |
| <p> |
| We create a <span class="codefrag">DatumWriter</span>, which converts Java objects into |
| an in-memory serialized format. The <span class="codefrag">SpecificDatumWriter</span> |
| class is used with generated classes and extracts the schema from the |
| specified generated type. |
| </p> |
| <p> |
| Next we create a <span class="codefrag">DataFileWriter</span>, which writes the |
| serialized records, as well as the schema, to the file specified in the |
| <span class="codefrag">dataFileWriter.create</span> call. We write our users to the file |
| via calls to the <span class="codefrag">dataFileWriter.append</span> method. When we are |
| done writing, we close the data file. |
| </p> |
| <a name="Deserializing"></a> |
| <h3 class="h4">Deserializing</h3> |
| <p> |
| Finally, let's deserialize the data file we just created. |
| </p> |
| <pre class="code"> |
| // Deserialize Users from disk |
| DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class); |
| DataFileReader<User> dataFileReader = new DataFileReader<User>(file, userDatumReader); |
| User user = null; |
| while (dataFileReader.hasNext()) { |
| // Reuse user object by passing it to next(). This saves us from |
| // allocating and garbage collecting many objects for files with |
| // many items. |
| user = dataFileReader.next(user); |
| System.out.println(user); |
| } |
| </pre> |
| <p> |
| This snippet will output: |
| </p> |
| <pre class="code"> |
| {"name": "Alyssa", "favorite_number": 256, "favorite_color": null} |
| {"name": "Ben", "favorite_number": 7, "favorite_color": "red"} |
| {"name": "Charlie", "favorite_number": null, "favorite_color": "blue"} |
| </pre> |
| <p> |
| Deserializing is very similar to serializing. We create a |
| <span class="codefrag">SpecificDatumReader</span>, analogous to the |
| <span class="codefrag">SpecificDatumWriter</span> we used in serialization, which |
| converts in-memory serialized items into instances of our generated |
| class, in this case <span class="codefrag">User</span>. We pass the |
| <span class="codefrag">DatumReader</span> and the previously created <span class="codefrag">File</span> |
| to a <span class="codefrag">DataFileReader</span>, analogous to the |
| <span class="codefrag">DataFileWriter</span>, which reads both the schema used by the |
| writer as well as the data from the file on disk. The data will be |
| read using the writer's schema included in the file and the |
| schema provided by the reader, in this case the <span class="codefrag">User</span> |
| class. The writer's schema is needed to know the order in which |
| fields were written, while the reader's schema is needed to know what |
| fields are expected and how to fill in default values for fields |
| added since the file was written. If there are differences between |
| the two schemas, they are resolved according to the |
| <a href="spec.html#Schema+Resolution">Schema Resolution</a> |
| specification. |
| </p> |
| <p> |
| Next we use the <span class="codefrag">DataFileReader</span> to iterate through the |
| serialized <span class="codefrag">User</span>s and print the deserialized object to |
| stdout. Note how we perform the iteration: we create a single |
| <span class="codefrag">User</span> object which we store the current deserialized user |
| in, and pass this record object to every call of |
| <span class="codefrag">dataFileReader.next</span>. This is a performance optimization |
| that allows the <span class="codefrag">DataFileReader</span> to reuse the same |
| <span class="codefrag">User</span> object rather than allocating a new |
| <span class="codefrag">User</span> for every iteration, which can be very expensive in |
| terms of object allocation and garbage collection if we deserialize a |
| large data file. While this technique is the standard way to iterate |
| through a data file, it's also possible to use <span class="codefrag">for (User user : |
| dataFileReader)</span> if performance is not a concern. |
| </p> |
| <a name="Compiling+and+running+the+example+code"></a> |
| <h3 class="h4">Compiling and running the example code</h3> |
| <p> |
| This example code is included as a Maven project in the |
| <em>examples/java-example</em> directory in the Avro docs. From this |
| directory, execute the following commands to build and run the |
| example: |
| </p> |
| <pre class="code"> |
| $ mvn compile # includes code generation via Avro Maven plugin |
| $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain |
| </pre> |
| <a name="Beta+feature%3A+Generating+faster+code"></a> |
| <h3 class="h4">Beta feature: Generating faster code</h3> |
| <p> |
| In this release we have introduced a new approach to |
| generating code that speeds up decoding of objects by more |
| than 10% and encoding by more than 30% (future performance |
| enhancements are underway). To ensure a smooth introduction |
| of this change into production systems, this feature is |
| controlled by a feature flag, the system |
| property <span class="codefrag">org.apache.avro.specific.use_custom_coders</span>. |
| In this first release, this feature is off by default. To |
| turn it on, set the system flag to <span class="codefrag">true</span> at |
| runtime. In the sample above, for example, you could enable |
| the fater coders as follows: |
| </p> |
| <pre class="code"> |
| $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \ |
| -Dorg.apache.avro.specific.use_custom_coders=true |
| </pre> |
| <p> |
| Note that you do <em>not</em> have to recompile your Avro |
| schema to have access to this feature. The feature is |
| compiled and built into your code, and you turn it on and |
| off at runtime using the feature flag. As a result, you can |
| turn it on during testing, for example, and then off in |
| production. Or you can turn it on in production, and |
| quickly turn it off if something breaks. |
| </p> |
| <p> |
| We encourage the Avro community to exercise this new feature |
| early to help build confidence. (For those paying |
| one-demand for compute resources in the cloud, it can lead |
| to meaningful cost savings.) As confidence builds, we will |
| turn this feature on by default, and eventually eliminate |
| the feature flag (and the old code). |
| </p> |
| </div> |
| |
| |
| <a name="Serializing+and+deserializing+without+code+generation"></a> |
| <h2 class="h3">Serializing and deserializing without code generation</h2> |
| <div class="section"> |
| <p> |
| Data in Avro is always stored with its corresponding schema, meaning we |
| can always read a serialized item regardless of whether we know the |
| schema ahead of time. This allows us to perform serialization and |
| deserialization without code generation. |
| </p> |
| <p> |
| Let's go over the same example as in the previous section, but without |
| using code generation: we'll create some users, serialize them to a data |
| file on disk, and then read back the file and deserialize the users |
| objects. |
| </p> |
| <a name="Creating+users"></a> |
| <h3 class="h4">Creating users</h3> |
| <p> |
| First, we use a <span class="codefrag">Parser</span> to read our schema definition and |
| create a <span class="codefrag">Schema</span> object. |
| </p> |
| <pre class="code"> |
| Schema schema = new Schema.Parser().parse(new File("user.avsc")); |
| </pre> |
| <p> |
| Using this schema, let's create some users. |
| </p> |
| <pre class="code"> |
| GenericRecord user1 = new GenericData.Record(schema); |
| user1.put("name", "Alyssa"); |
| user1.put("favorite_number", 256); |
| // Leave favorite color null |
| |
| GenericRecord user2 = new GenericData.Record(schema); |
| user2.put("name", "Ben"); |
| user2.put("favorite_number", 7); |
| user2.put("favorite_color", "red"); |
| </pre> |
| <p> |
| Since we're not using code generation, we use |
| <span class="codefrag">GenericRecord</span>s to represent users. |
| <span class="codefrag">GenericRecord</span> uses the schema to verify that we only |
| specify valid fields. If we try to set a non-existent field (e.g., |
| <span class="codefrag">user1.put("favorite_animal", "cat")</span>), we'll get an |
| <span class="codefrag">AvroRuntimeException</span> when we run the program. |
| </p> |
| <p> |
| Note that we do not set <span class="codefrag">user1</span>'s favorite color. Since |
| that record is of type <span class="codefrag">["string", "null"]</span>, we can either |
| set it to a <span class="codefrag">string</span> or leave it <span class="codefrag">null</span>; it is |
| essentially optional. |
| </p> |
| <a name="Serializing-N101F7"></a> |
| <h3 class="h4">Serializing</h3> |
| <p> |
| Now that we've created our user objects, serializing and deserializing |
| them is almost identical to the example above which uses code |
| generation. The main difference is that we use generic instead of |
| specific readers and writers. |
| </p> |
| <p> |
| First we'll serialize our users to a data file on disk. |
| </p> |
| <pre class="code"> |
| // Serialize user1 and user2 to disk |
| File file = new File("users.avro"); |
| DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); |
| DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); |
| dataFileWriter.create(schema, file); |
| dataFileWriter.append(user1); |
| dataFileWriter.append(user2); |
| dataFileWriter.close(); |
| </pre> |
| <p> |
| We create a <span class="codefrag">DatumWriter</span>, which converts Java objects into |
| an in-memory serialized format. Since we are not using code |
| generation, we create a <span class="codefrag">GenericDatumWriter</span>. It requires |
| the schema both to determine how to write the |
| <span class="codefrag">GenericRecord</span>s and to verify that all non-nullable fields |
| are present. |
| </p> |
| <p> |
| As in the code generation example, we also create a |
| <span class="codefrag">DataFileWriter</span>, which writes the serialized records, as |
| well as the schema, to the file specified in the |
| <span class="codefrag">dataFileWriter.create</span> call. We write our users to the |
| file via calls to the <span class="codefrag">dataFileWriter.append</span> method. When |
| we are done writing, we close the data file. |
| </p> |
| <a name="Deserializing-N10220"></a> |
| <h3 class="h4">Deserializing</h3> |
| <p> |
| Finally, we'll deserialize the data file we just created. |
| </p> |
| <pre class="code"> |
| // Deserialize users from disk |
| DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema); |
| DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader); |
| GenericRecord user = null; |
| while (dataFileReader.hasNext()) { |
| // Reuse user object by passing it to next(). This saves us from |
| // allocating and garbage collecting many objects for files with |
| // many items. |
| user = dataFileReader.next(user); |
| System.out.println(user); |
| </pre> |
| <p>This outputs:</p> |
| <pre class="code"> |
| {"name": "Alyssa", "favorite_number": 256, "favorite_color": null} |
| {"name": "Ben", "favorite_number": 7, "favorite_color": "red"} |
| </pre> |
| <p> |
| Deserializing is very similar to serializing. We create a |
| <span class="codefrag">GenericDatumReader</span>, analogous to the |
| <span class="codefrag">GenericDatumWriter</span> we used in serialization, which |
| converts in-memory serialized items into <span class="codefrag">GenericRecords</span>. |
| We pass the <span class="codefrag">DatumReader</span> and the previously created |
| <span class="codefrag">File</span> to a <span class="codefrag">DataFileReader</span>, analogous to the |
| <span class="codefrag">DataFileWriter</span>, which reads both the schema used by the |
| writer as well as the data from the file on disk. The data will be |
| read using the writer's schema included in the file, and the reader's |
| schema provided to the <span class="codefrag">GenericDatumReader</span>. The writer's |
| schema is needed to know the order in which fields were written, |
| while the reader's schema is needed to know what fields are expected |
| and how to fill in default values for fields added since the file |
| was written. If there are differences between the two schemas, they |
| are resolved according to the |
| <a href="spec.html#Schema+Resolution">Schema Resolution</a> |
| specification. |
| </p> |
| <p> |
| Next, we use the <span class="codefrag">DataFileReader</span> to iterate through the |
| serialized users and print the deserialized object to stdout. Note |
| how we perform the iteration: we create a single |
| <span class="codefrag">GenericRecord</span> object which we store the current |
| deserialized user in, and pass this record object to every call of |
| <span class="codefrag">dataFileReader.next</span>. This is a performance optimization |
| that allows the <span class="codefrag">DataFileReader</span> to reuse the same record |
| object rather than allocating a new <span class="codefrag">GenericRecord</span> for |
| every iteration, which can be very expensive in terms of object |
| allocation and garbage collection if we deserialize a large data file. |
| While this technique is the standard way to iterate through a data |
| file, it's also possible to use <span class="codefrag">for (GenericRecord user : |
| dataFileReader)</span> if performance is not a concern. |
| </p> |
| <a name="Compiling+and+running+the+example+code-N10269"></a> |
| <h3 class="h4">Compiling and running the example code</h3> |
| <p> |
| This example code is included as a Maven project in the |
| <em>examples/java-example</em> directory in the Avro docs. From this |
| directory, execute the following commands to build and run the |
| example: |
| </p> |
| <pre class="code"> |
| $ mvn compile |
| $ mvn -q exec:java -Dexec.mainClass=example.GenericMain |
| </pre> |
| </div> |
| |
| </div> |
| <!--+ |
| |end content |
| +--> |
| <div class="clearboth"> </div> |
| </div> |
| <div id="footer"> |
| <!--+ |
| |start bottomstrip |
| +--> |
| <div class="lastmodified"> |
| <script type="text/javascript"><!-- |
| document.write("Last Published: " + document.lastModified); |
| // --></script> |
| </div> |
| <div class="copyright"> |
| Copyright © |
| 2012 <a href="https://www.apache.org/licenses/">The Apache Software Foundation.</a> |
| </div> |
| <!--+ |
| |end bottomstrip |
| +--> |
| </div> |
| </body> |
| </html> |