doc/src/content/xdocs/gettingstartedjava.xml - avro - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License.  You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
   -->
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
    "https://forrest.apache.org/dtd/document-v20.dtd" [
   <!ENTITY % avro-entities PUBLIC "-//Apache//ENTITIES Avro//EN"
 	   "../../../../build/avro.ent">
   %avro-entities;
 ]>
 <document>
   <header>
     <title>Apache Avro&#153; &AvroVersion; Getting Started (Java)</title>
   </header>
   <body>
     <p>
       This is a short guide for getting started with Apache Avro&#153; using
       Java.  This guide only covers using Avro for data serialization; see
       Patrick Hunt's <a href="https://github.com/phunt/avro-rpc-quickstart">Avro
       RPC Quick Start</a> for a good introduction to using Avro for RPC.
     </p>
     <section id="download_install">
       <title>Download</title>
       <p>
         Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be
         downloaded from the <a
         href="https://avro.apache.org/releases.html">Apache Avro&#153;
         Releases</a> page.  This guide uses Avro &AvroVersion;, the latest
         version at the time of writing.  For the examples in this guide,
         download <em>avro-&AvroVersion;.jar</em> and
         <em>avro-tools-&AvroVersion;.jar</em>.
       </p>
       <p>
         Alternatively, if you are using Maven, add the following dependency to
         your POM:
       </p>
       <source>
 &#60;dependency&#62;
   &#60;groupId&#62;org.apache.avro&#60;/groupId&#62;
   &#60;artifactId&#62;avro&#60;/artifactId&#62;
   &#60;version&#62;&AvroVersion;&#60;/version&#62;
 &#60;/dependency&#62;
       </source>
       <p>
         As well as the Avro Maven plugin (for performing code generation):
       </p>
       <source>
 &#60;plugin&#62;
   &#60;groupId&#62;org.apache.avro&#60;/groupId&#62;
   &#60;artifactId&#62;avro-maven-plugin&#60;/artifactId&#62;
   &#60;version&#62;&AvroVersion;&#60;/version&#62;
   &#60;executions&#62;
     &#60;execution&#62;
       &#60;phase&#62;generate-sources&#60;/phase&#62;
       &#60;goals&#62;
         &#60;goal&#62;schema&#60;/goal&#62;
       &#60;/goals&#62;
       &#60;configuration&#62;
         &#60;sourceDirectory&#62;${project.basedir}/src/main/avro/&#60;/sourceDirectory&#62;
         &#60;outputDirectory&#62;${project.basedir}/src/main/java/&#60;/outputDirectory&#62;
       &#60;/configuration&#62;
     &#60;/execution&#62;
   &#60;/executions&#62;
 &#60;/plugin&#62;
 &#60;plugin&#62;
   &#60;groupId&#62;org.apache.maven.plugins&#60;/groupId&#62;
   &#60;artifactId&#62;maven-compiler-plugin&#60;/artifactId&#62;
   &#60;configuration&#62;
     &#60;source&#62;1.8&#60;/source&#62;
     &#60;target&#62;1.8&#60;/target&#62;
   &#60;/configuration&#62;
 &#60;/plugin&#62;
       </source>
       <p>
         You may also build the required Avro jars from source.  Building Avro is
         beyond the scope of this guide; see the <a
         href="https://cwiki.apache.org/AVRO/Build+Documentation">Build
         Documentation</a> page in the wiki for more information.
       </p>
     </section>

     <section>
       <title>Defining a schema</title>
       <p>
         Avro schemas are defined using JSON.  Schemas are composed of <a
         href="spec.html#schema_primitive">primitive types</a>
         (<code>null</code>, <code>boolean</code>, <code>int</code>,
         <code>long</code>, <code>float</code>, <code>double</code>,
         <code>bytes</code>, and <code>string</code>) and <a
         href="spec.html#schema_complex">complex types</a> (<code>record</code>,
         <code>enum</code>, <code>array</code>, <code>map</code>,
         <code>union</code>, and <code>fixed</code>).  You can learn more about
         Avro schemas and types from the specification, but for now let's start
         with a simple schema example, <em>user.avsc</em>:
       </p>
       <source>
 {"namespace": "example.avro",
  "type": "record",
  "name": "User",
  "fields": [
      {"name": "name", "type": "string"},
      {"name": "favorite_number",  "type": ["int", "null"]},
      {"name": "favorite_color", "type": ["string", "null"]}
  ]
 }
       </source>
       <p>
         This schema defines a record representing a hypothetical user.  (Note
         that a schema file can only contain a single schema definition.)  At
         minimum, a record definition must include its type (<code>"type":
         "record"</code>), a name (<code>"name": "User"</code>), and fields, in
         this case <code>name</code>, <code>favorite_number</code>, and
         <code>favorite_color</code>.  We also define a namespace
         (<code>"namespace": "example.avro"</code>), which together with the name
         attribute defines the "full name" of the schema
         (<code>example.avro.User</code> in this case).

       </p>
       <p>
         Fields are defined via an array of objects, each of which defines a name
         and type (other attributes are optional, see the <a
         href="spec.html#schema_record">record specification</a> for more
         details).  The type attribute of a field is another schema object, which
         can be either a primitive or complex type.  For example, the
         <code>name</code> field of our User schema is the primitive type
         <code>string</code>, whereas the <code>favorite_number</code> and
         <code>favorite_color</code> fields are both <code>union</code>s,
         represented by JSON arrays.  <code>union</code>s are a complex type that
         can be any of the types listed in the array; e.g.,
         <code>favorite_number</code> can either be an <code>int</code> or
         <code>null</code>, essentially making it an optional field.
       </p>
     </section>

     <section>
       <title>Serializing and deserializing with code generation</title>
       <section>
         <title>Compiling the schema</title>
         <p>
           Code generation allows us to automatically create classes based on our
           previously-defined schema.  Once we have defined the relevant classes,
           there is no need to use the schema directly in our programs.  We use the
           avro-tools jar to generate code as follows:
         </p>
         <source>
 java -jar /path/to/avro-tools-&AvroVersion;.jar compile schema &#60;schema file&#62; &#60;destination&#62;
         </source>
         <p>
           This will generate the appropriate source files in a package based on
           the schema's namespace in the provided destination folder.  For
           instance, to generate a <code>User</code> class in package
           <code>example.avro</code> from the schema defined above, run
         </p>
         <source>
 java -jar /path/to/avro-tools-&AvroVersion;.jar compile schema user.avsc .
         </source>
         <p>
           Note that if you using the Avro Maven plugin, there is no need to
           manually invoke the schema compiler; the plugin automatically
           performs code generation on any .avsc files present in the configured
           source directory.
         </p>
       </section>
       <section>
         <title>Creating Users</title>
         <p>
           Now that we've completed the code generation, let's create some
           <code>User</code>s, serialize them to a data file on disk, and then
           read back the file and deserialize the <code>User</code> objects.
         </p>
         <p>
           First let's create some <code>User</code>s and set their fields.
         </p>
         <source>
 User user1 = new User();
 user1.setName("Alyssa");
 user1.setFavoriteNumber(256);
 // Leave favorite color null

 // Alternate constructor
 User user2 = new User("Ben", 7, "red");

 // Construct via builder
 User user3 = User.newBuilder()
              .setName("Charlie")
              .setFavoriteColor("blue")
              .setFavoriteNumber(null)
              .build();
         </source>
         <p>
           As shown in this example, Avro objects can be created either by
           invoking a constructor directly or by using a builder.  Unlike
           constructors, builders will automatically set any default values
           specified in the schema.  Additionally, builders validate the data as
           it set, whereas objects constructed directly will not cause an error
           until the object is serialized.  However, using constructors directly
           generally offers better performance, as builders create a copy of the
           datastructure before it is written.
         </p>
         <p>
           Note that we do not set <code>user1</code>'s favorite color. Since
           that record is of type <code>["string", "null"]</code>, we can either
           set it to a <code>string</code> or leave it <code>null</code>; it is
           essentially optional.  Similarly, we set <code>user3</code>'s favorite
           number to null (using a builder requires setting all fields, even if
           they are null).
         </p>
       </section>
       <section>
         <title>Serializing</title>
       <p>
         Now let's serialize our <code>User</code>s to disk.
       </p>
       <source>
 // Serialize user1, user2 and user3 to disk
 DatumWriter&#60;User&#62; userDatumWriter = new SpecificDatumWriter&#60;User&#62;(User.class);
 DataFileWriter&#60;User&#62; dataFileWriter = new DataFileWriter&#60;User&#62;(userDatumWriter);
 dataFileWriter.create(user1.getSchema(), new File("users.avro"));
 dataFileWriter.append(user1);
 dataFileWriter.append(user2);
 dataFileWriter.append(user3);
 dataFileWriter.close();
       </source>
       <p>
         We create a <code>DatumWriter</code>, which converts Java objects into
         an in-memory serialized format.  The <code>SpecificDatumWriter</code>
         class is used with generated classes and extracts the schema from the
         specified generated type.
       </p>
       <p>
         Next we create a <code>DataFileWriter</code>, which writes the
         serialized records, as well as the schema, to the file specified in the
         <code>dataFileWriter.create</code> call.  We write our users to the file
         via calls to the <code>dataFileWriter.append</code> method.  When we are
         done writing, we close the data file.
       </p>
       </section>
       <section>
         <title>Deserializing</title>
         <p>
           Finally, let's deserialize the data file we just created.
         </p>
         <source>
 // Deserialize Users from disk
 DatumReader&#60;User&#62; userDatumReader = new SpecificDatumReader&#60;User&#62;(User.class);
 DataFileReader&#60;User&#62; dataFileReader = new DataFileReader&#60;User&#62;(file, userDatumReader);
 User user = null;
 while (dataFileReader.hasNext()) {
 // Reuse user object by passing it to next(). This saves us from
 // allocating and garbage collecting many objects for files with
 // many items.
 user = dataFileReader.next(user);
 System.out.println(user);
 }
         </source>
         <p>
           This snippet will output:
         </p>
         <source>
 {"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
 {"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
 {"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}
         </source>
         <p>
           Deserializing is very similar to serializing.  We create a
           <code>SpecificDatumReader</code>, analogous to the
           <code>SpecificDatumWriter</code> we used in serialization, which
           converts in-memory serialized items into instances of our generated
           class, in this case <code>User</code>.  We pass the
           <code>DatumReader</code> and the previously created <code>File</code>
           to a <code>DataFileReader</code>, analogous to the
           <code>DataFileWriter</code>, which reads both the schema used by the
           writer as well as the data from the file on disk. The data will be
           read using the writer's schema included in the file and the
           schema provided by the reader, in this case the <code>User</code>
           class.  The writer's schema is needed to know the order in which
           fields were written, while the reader's schema is needed to know what
           fields are expected and how to fill in default values for fields
           added since the file was written.  If there are differences between
           the two schemas, they are resolved according to the
           <a href="spec.html#Schema+Resolution">Schema Resolution</a>
           specification.
         </p>
         <p>
           Next we use the <code>DataFileReader</code> to iterate through the
           serialized <code>User</code>s and print the deserialized object to
           stdout.  Note how we perform the iteration: we create a single
           <code>User</code> object which we store the current deserialized user
           in, and pass this record object to every call of
           <code>dataFileReader.next</code>.  This is a performance optimization
           that allows the <code>DataFileReader</code> to reuse the same
           <code>User</code> object rather than allocating a new
           <code>User</code> for every iteration, which can be very expensive in
           terms of object allocation and garbage collection if we deserialize a
           large data file.  While this technique is the standard way to iterate
           through a data file, it's also possible to use <code>for (User user :
           dataFileReader)</code> if performance is not a concern.
         </p>
       </section>
       <section>
         <title>Compiling and running the example code</title>
         <p>
           This example code is included as a Maven project in the
           <em>examples/java-example</em> directory in the Avro docs.  From this
           directory, execute the following commands to build and run the
           example:
         </p>
         <source>
 $ mvn compile # includes code generation via Avro Maven plugin
 $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain
         </source>
       </section>
       <section>
         <title>Beta feature: Generating faster code</title>
         <p>
           In this release we have introduced a new approach to
           generating code that speeds up decoding of objects by more
           than 10% and encoding by more than 30% (future performance
           enhancements are underway).  To ensure a smooth introduction
           of this change into production systems, this feature is
           controlled by a feature flag, the system
           property <code>org.apache.avro.specific.use_custom_coders</code>.
           In this first release, this feature is off by default.  To
           turn it on, set the system flag to <code>true</code> at
           runtime.  In the sample above, for example, you could enable
           the fater coders as follows:
         </p>
         <source>
 $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \
     -Dorg.apache.avro.specific.use_custom_coders=true
         </source>
         <p>
           Note that you do <em>not</em> have to recompile your Avro
           schema to have access to this feature.  The feature is
           compiled and built into your code, and you turn it on and
           off at runtime using the feature flag.  As a result, you can
           turn it on during testing, for example, and then off in
           production.  Or you can turn it on in production, and
           quickly turn it off if something breaks.
         </p>
         <p>
           We encourage the Avro community to exercise this new feature
           early to help build confidence.  (For those paying
           one-demand for compute resources in the cloud, it can lead
           to meaningful cost savings.)  As confidence builds, we will
           turn this feature on by default, and eventually eliminate
           the feature flag (and the old code).
         </p>
       </section>
     </section>

     <section>
       <title>Serializing and deserializing without code generation</title>
       <p>
         Data in Avro is always stored with its corresponding schema, meaning we
         can always read a serialized item regardless of whether we know the
         schema ahead of time.  This allows us to perform serialization and
         deserialization without code generation.
       </p>
       <p>
         Let's go over the same example as in the previous section, but without
         using code generation: we'll create some users, serialize them to a data
         file on disk, and then read back the file and deserialize the users
         objects.
       </p>
       <section>
         <title>Creating users</title>
         <p>
           First, we use a <code>Parser</code> to read our schema definition and
           create a <code>Schema</code> object.
         </p>
         <source>
 Schema schema = new Schema.Parser().parse(new File("user.avsc"));
         </source>
         <p>
           Using this schema, let's create some users.
         </p>
         <source>
 GenericRecord user1 = new GenericData.Record(schema);
 user1.put("name", "Alyssa");
 user1.put("favorite_number", 256);
 // Leave favorite color null

 GenericRecord user2 = new GenericData.Record(schema);
 user2.put("name", "Ben");
 user2.put("favorite_number", 7);
 user2.put("favorite_color", "red");
         </source>
         <p>
           Since we're not using code generation, we use
           <code>GenericRecord</code>s to represent users.
           <code>GenericRecord</code> uses the schema to verify that we only
           specify valid fields.  If we try to set a non-existent field (e.g.,
           <code>user1.put("favorite_animal", "cat")</code>), we'll get an
           <code>AvroRuntimeException</code> when we run the program.
         </p>
         <p>
           Note that we do not set <code>user1</code>'s favorite color.  Since
           that record is of type <code>["string", "null"]</code>, we can either
           set it to a <code>string</code> or leave it <code>null</code>; it is
           essentially optional.
         </p>
       </section>
       <section>
         <title>Serializing</title>
         <p>
           Now that we've created our user objects, serializing and deserializing
           them is almost identical to the example above which uses code
           generation.  The main difference is that we use generic instead of
           specific readers and writers.
         </p>
         <p>
           First we'll serialize our users to a data file on disk.
         </p>
         <source>
 // Serialize user1 and user2 to disk
 File file = new File("users.avro");
 DatumWriter&#60;GenericRecord&#62; datumWriter = new GenericDatumWriter&#60;GenericRecord&#62;(schema);
 DataFileWriter&#60;GenericRecord&#62; dataFileWriter = new DataFileWriter&#60;GenericRecord&#62;(datumWriter);
 dataFileWriter.create(schema, file);
 dataFileWriter.append(user1);
 dataFileWriter.append(user2);
 dataFileWriter.close();
         </source>
         <p>
           We create a <code>DatumWriter</code>, which converts Java objects into
           an in-memory serialized format.  Since we are not using code
           generation, we create a <code>GenericDatumWriter</code>.  It requires
           the schema both to determine how to write the
           <code>GenericRecord</code>s and to verify that all non-nullable fields
           are present.
         </p>
         <p>
           As in the code generation example, we also create a
           <code>DataFileWriter</code>, which writes the serialized records, as
           well as the schema, to the file specified in the
           <code>dataFileWriter.create</code> call.  We write our users to the
           file via calls to the <code>dataFileWriter.append</code> method.  When
           we are done writing, we close the data file.
         </p>
       </section>
       <section>
         <title>Deserializing</title>
         <p>
           Finally, we'll deserialize the data file we just created.
         </p>
         <source>
 // Deserialize users from disk
 DatumReader&#60;GenericRecord&#62; datumReader = new GenericDatumReader&#60;GenericRecord&#62;(schema);
 DataFileReader&#60;GenericRecord&#62; dataFileReader = new DataFileReader&#60;GenericRecord&#62;(file, datumReader);
 GenericRecord user = null;
 while (dataFileReader.hasNext()) {
 // Reuse user object by passing it to next(). This saves us from
 // allocating and garbage collecting many objects for files with
 // many items.
 user = dataFileReader.next(user);
 System.out.println(user);
         </source>
         <p>This outputs:</p>
         <source>
 {"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
 {"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
         </source>
         <p>
           Deserializing is very similar to serializing.  We create a
           <code>GenericDatumReader</code>, analogous to the
           <code>GenericDatumWriter</code> we used in serialization, which
           converts in-memory serialized items into <code>GenericRecords</code>.
           We pass the <code>DatumReader</code> and the previously created
           <code>File</code> to a <code>DataFileReader</code>, analogous to the
           <code>DataFileWriter</code>, which reads both the schema used by the
           writer as well as the data from the file on disk. The data will be
           read using the writer's schema included in the file, and the reader's
           schema provided to the <code>GenericDatumReader</code>.  The writer's
           schema is needed to know the order in which fields were written,
           while the reader's schema is needed to know what fields are expected
           and how to fill in default values for fields added since the file
           was written.  If there are differences between the two schemas, they
           are resolved according to the
           <a href="spec.html#Schema+Resolution">Schema Resolution</a>
           specification.
         </p>
         <p>
           Next, we use the <code>DataFileReader</code> to iterate through the
           serialized users and print the deserialized object to stdout.  Note
           how we perform the iteration: we create a single
           <code>GenericRecord</code> object which we store the current
           deserialized user in, and pass this record object to every call of
           <code>dataFileReader.next</code>.  This is a performance optimization
           that allows the <code>DataFileReader</code> to reuse the same record
           object rather than allocating a new <code>GenericRecord</code> for
           every iteration, which can be very expensive in terms of object
           allocation and garbage collection if we deserialize a large data file.
           While this technique is the standard way to iterate through a data
           file, it's also possible to use <code>for (GenericRecord user :
           dataFileReader)</code> if performance is not a concern.
         </p>
       </section>
       <section>
         <title>Compiling and running the example code</title>
         <p>
           This example code is included as a Maven project in the
           <em>examples/java-example</em> directory in the Avro docs.  From this
           directory, execute the following commands to build and run the
           example:
         </p>
         <source>
 $ mvn compile
 $ mvn -q exec:java -Dexec.mainClass=example.GenericMain
         </source>
       </section>
     </section>
   </body>
 </document>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	https://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
	"https://forrest.apache.org/dtd/document-v20.dtd" [
	<!ENTITY % avro-entities PUBLIC "-//Apache//ENTITIES Avro//EN"
	"../../../../build/avro.ent">
	%avro-entities;
	]>
	<document>
	<header>
	<title>Apache Avro &AvroVersion; Getting Started (Java)</title>
	</header>
	<body>
	<p>
	This is a short guide for getting started with Apache Avro using
	Java. This guide only covers using Avro for data serialization; see
	Patrick Hunt's <a href="https://github.com/phunt/avro-rpc-quickstart">Avro
	RPC Quick Start</a> for a good introduction to using Avro for RPC.
	</p>
	<section id="download_install">
	<title>Download</title>
	<p>
	Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be
	downloaded from the <a
	href="https://avro.apache.org/releases.html">Apache Avro
	Releases</a> page. This guide uses Avro &AvroVersion;, the latest
	version at the time of writing. For the examples in this guide,
	download <em>avro-&AvroVersion;.jar</em> and
	<em>avro-tools-&AvroVersion;.jar</em>.
	</p>
	<p>
	Alternatively, if you are using Maven, add the following dependency to
	your POM:
	</p>
	<source>
	<dependency>
	<groupId>org.apache.avro</groupId>
	<artifactId>avro</artifactId>
	<version>&AvroVersion;</version>
	</dependency>
	</source>
	<p>
	As well as the Avro Maven plugin (for performing code generation):
	</p>
	<source>
	<plugin>
	<groupId>org.apache.avro</groupId>
	<artifactId>avro-maven-plugin</artifactId>
	<version>&AvroVersion;</version>
	<executions>
	<execution>
	<phase>generate-sources</phase>
	<goals>
	<goal>schema</goal>
	</goals>
	<configuration>
	<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
	<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
	</configuration>
	</execution>
	</executions>
	</plugin>
	<plugin>
	<groupId>org.apache.maven.plugins</groupId>
	<artifactId>maven-compiler-plugin</artifactId>
	<configuration>
	<source>1.8</source>
	<target>1.8</target>
	</configuration>
	</plugin>
	</source>
	<p>
	You may also build the required Avro jars from source. Building Avro is
	beyond the scope of this guide; see the <a
	href="https://cwiki.apache.org/AVRO/Build+Documentation">Build
	Documentation</a> page in the wiki for more information.
	</p>
	</section>

	<section>
	<title>Defining a schema</title>
	<p>
	Avro schemas are defined using JSON. Schemas are composed of <a
	href="spec.html#schema_primitive">primitive types</a>
	(<code>null</code>, <code>boolean</code>, <code>int</code>,
	<code>long</code>, <code>float</code>, <code>double</code>,
	<code>bytes</code>, and <code>string</code>) and <a
	href="spec.html#schema_complex">complex types</a> (<code>record</code>,
	<code>enum</code>, <code>array</code>, <code>map</code>,
	<code>union</code>, and <code>fixed</code>). You can learn more about
	Avro schemas and types from the specification, but for now let's start
	with a simple schema example, <em>user.avsc</em>:
	</p>
	<source>
	{"namespace": "example.avro",
	"type": "record",
	"name": "User",
	"fields": [
	{"name": "name", "type": "string"},
	{"name": "favorite_number", "type": ["int", "null"]},
	{"name": "favorite_color", "type": ["string", "null"]}
	]
	}
	</source>
	<p>
	This schema defines a record representing a hypothetical user. (Note
	that a schema file can only contain a single schema definition.) At
	minimum, a record definition must include its type (<code>"type":
	"record"</code>), a name (<code>"name": "User"</code>), and fields, in
	this case <code>name</code>, <code>favorite_number</code>, and
	<code>favorite_color</code>. We also define a namespace
	(<code>"namespace": "example.avro"</code>), which together with the name
	attribute defines the "full name" of the schema
	(<code>example.avro.User</code> in this case).

	</p>
	<p>
	Fields are defined via an array of objects, each of which defines a name
	and type (other attributes are optional, see the <a
	href="spec.html#schema_record">record specification</a> for more
	details). The type attribute of a field is another schema object, which
	can be either a primitive or complex type. For example, the
	<code>name</code> field of our User schema is the primitive type
	<code>string</code>, whereas the <code>favorite_number</code> and
	<code>favorite_color</code> fields are both <code>union</code>s,
	represented by JSON arrays. <code>union</code>s are a complex type that
	can be any of the types listed in the array; e.g.,
	<code>favorite_number</code> can either be an <code>int</code> or
	<code>null</code>, essentially making it an optional field.
	</p>
	</section>

	<section>
	<title>Serializing and deserializing with code generation</title>
	<section>
	<title>Compiling the schema</title>
	<p>
	Code generation allows us to automatically create classes based on our
	previously-defined schema. Once we have defined the relevant classes,
	there is no need to use the schema directly in our programs. We use the
	avro-tools jar to generate code as follows:
	</p>
	<source>
	java -jar /path/to/avro-tools-&AvroVersion;.jar compile schema <schema file> <destination>
	</source>
	<p>
	This will generate the appropriate source files in a package based on
	the schema's namespace in the provided destination folder. For
	instance, to generate a <code>User</code> class in package
	<code>example.avro</code> from the schema defined above, run
	</p>
	<source>
	java -jar /path/to/avro-tools-&AvroVersion;.jar compile schema user.avsc .
	</source>
	<p>
	Note that if you using the Avro Maven plugin, there is no need to
	manually invoke the schema compiler; the plugin automatically
	performs code generation on any .avsc files present in the configured
	source directory.
	</p>
	</section>
	<section>
	<title>Creating Users</title>
	<p>
	Now that we've completed the code generation, let's create some
	<code>User</code>s, serialize them to a data file on disk, and then
	read back the file and deserialize the <code>User</code> objects.
	</p>
	<p>
	First let's create some <code>User</code>s and set their fields.
	</p>
	<source>
	User user1 = new User();
	user1.setName("Alyssa");
	user1.setFavoriteNumber(256);
	// Leave favorite color null

	// Alternate constructor
	User user2 = new User("Ben", 7, "red");

	// Construct via builder
	User user3 = User.newBuilder()
	.setName("Charlie")
	.setFavoriteColor("blue")
	.setFavoriteNumber(null)
	.build();
	</source>
	<p>
	As shown in this example, Avro objects can be created either by
	invoking a constructor directly or by using a builder. Unlike
	constructors, builders will automatically set any default values
	specified in the schema. Additionally, builders validate the data as
	it set, whereas objects constructed directly will not cause an error
	until the object is serialized. However, using constructors directly
	generally offers better performance, as builders create a copy of the
	datastructure before it is written.
	</p>
	<p>
	Note that we do not set <code>user1</code>'s favorite color. Since
	that record is of type <code>["string", "null"]</code>, we can either
	set it to a <code>string</code> or leave it <code>null</code>; it is
	essentially optional. Similarly, we set <code>user3</code>'s favorite
	number to null (using a builder requires setting all fields, even if
	they are null).
	</p>
	</section>
	<section>
	<title>Serializing</title>
	<p>
	Now let's serialize our <code>User</code>s to disk.
	</p>
	<source>
	// Serialize user1, user2 and user3 to disk
	DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
	DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
	dataFileWriter.create(user1.getSchema(), new File("users.avro"));
	dataFileWriter.append(user1);
	dataFileWriter.append(user2);
	dataFileWriter.append(user3);
	dataFileWriter.close();
	</source>
	<p>
	We create a <code>DatumWriter</code>, which converts Java objects into
	an in-memory serialized format. The <code>SpecificDatumWriter</code>
	class is used with generated classes and extracts the schema from the
	specified generated type.
	</p>
	<p>
	Next we create a <code>DataFileWriter</code>, which writes the
	serialized records, as well as the schema, to the file specified in the
	<code>dataFileWriter.create</code> call. We write our users to the file
	via calls to the <code>dataFileWriter.append</code> method. When we are
	done writing, we close the data file.
	</p>
	</section>
	<section>
	<title>Deserializing</title>
	<p>
	Finally, let's deserialize the data file we just created.
	</p>
	<source>
	// Deserialize Users from disk
	DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
	DataFileReader<User> dataFileReader = new DataFileReader<User>(file, userDatumReader);
	User user = null;
	while (dataFileReader.hasNext()) {
	// Reuse user object by passing it to next(). This saves us from
	// allocating and garbage collecting many objects for files with
	// many items.
	user = dataFileReader.next(user);
	System.out.println(user);
	}
	</source>
	<p>
	This snippet will output:
	</p>
	<source>
	{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
	{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
	{"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}
	</source>
	<p>
	Deserializing is very similar to serializing. We create a
	<code>SpecificDatumReader</code>, analogous to the
	<code>SpecificDatumWriter</code> we used in serialization, which
	converts in-memory serialized items into instances of our generated
	class, in this case <code>User</code>. We pass the
	<code>DatumReader</code> and the previously created <code>File</code>
	to a <code>DataFileReader</code>, analogous to the
	<code>DataFileWriter</code>, which reads both the schema used by the
	writer as well as the data from the file on disk. The data will be
	read using the writer's schema included in the file and the
	schema provided by the reader, in this case the <code>User</code>
	class. The writer's schema is needed to know the order in which
	fields were written, while the reader's schema is needed to know what
	fields are expected and how to fill in default values for fields
	added since the file was written. If there are differences between
	the two schemas, they are resolved according to the
	<a href="spec.html#Schema+Resolution">Schema Resolution</a>
	specification.
	</p>
	<p>
	Next we use the <code>DataFileReader</code> to iterate through the
	serialized <code>User</code>s and print the deserialized object to
	stdout. Note how we perform the iteration: we create a single
	<code>User</code> object which we store the current deserialized user
	in, and pass this record object to every call of
	<code>dataFileReader.next</code>. This is a performance optimization
	that allows the <code>DataFileReader</code> to reuse the same
	<code>User</code> object rather than allocating a new
	<code>User</code> for every iteration, which can be very expensive in
	terms of object allocation and garbage collection if we deserialize a
	large data file. While this technique is the standard way to iterate
	through a data file, it's also possible to use <code>for (User user :
	dataFileReader)</code> if performance is not a concern.
	</p>
	</section>
	<section>
	<title>Compiling and running the example code</title>
	<p>
	This example code is included as a Maven project in the
	<em>examples/java-example</em> directory in the Avro docs. From this
	directory, execute the following commands to build and run the
	example:
	</p>
	<source>
	$ mvn compile # includes code generation via Avro Maven plugin
	$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain
	</source>
	</section>
	<section>
	<title>Beta feature: Generating faster code</title>
	<p>
	In this release we have introduced a new approach to
	generating code that speeds up decoding of objects by more
	than 10% and encoding by more than 30% (future performance
	enhancements are underway). To ensure a smooth introduction
	of this change into production systems, this feature is
	controlled by a feature flag, the system
	property <code>org.apache.avro.specific.use_custom_coders</code>.
	In this first release, this feature is off by default. To
	turn it on, set the system flag to <code>true</code> at
	runtime. In the sample above, for example, you could enable
	the fater coders as follows:
	</p>
	<source>
	$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \
	-Dorg.apache.avro.specific.use_custom_coders=true
	</source>
	<p>
	Note that you do <em>not</em> have to recompile your Avro
	schema to have access to this feature. The feature is
	compiled and built into your code, and you turn it on and
	off at runtime using the feature flag. As a result, you can
	turn it on during testing, for example, and then off in
	production. Or you can turn it on in production, and
	quickly turn it off if something breaks.
	</p>
	<p>
	We encourage the Avro community to exercise this new feature
	early to help build confidence. (For those paying
	one-demand for compute resources in the cloud, it can lead
	to meaningful cost savings.) As confidence builds, we will
	turn this feature on by default, and eventually eliminate
	the feature flag (and the old code).
	</p>
	</section>
	</section>

	<section>
	<title>Serializing and deserializing without code generation</title>
	<p>
	Data in Avro is always stored with its corresponding schema, meaning we
	can always read a serialized item regardless of whether we know the
	schema ahead of time. This allows us to perform serialization and
	deserialization without code generation.
	</p>
	<p>
	Let's go over the same example as in the previous section, but without
	using code generation: we'll create some users, serialize them to a data
	file on disk, and then read back the file and deserialize the users
	objects.
	</p>
	<section>
	<title>Creating users</title>
	<p>
	First, we use a <code>Parser</code> to read our schema definition and
	create a <code>Schema</code> object.
	</p>
	<source>
	Schema schema = new Schema.Parser().parse(new File("user.avsc"));
	</source>
	<p>
	Using this schema, let's create some users.
	</p>
	<source>
	GenericRecord user1 = new GenericData.Record(schema);
	user1.put("name", "Alyssa");
	user1.put("favorite_number", 256);
	// Leave favorite color null

	GenericRecord user2 = new GenericData.Record(schema);
	user2.put("name", "Ben");
	user2.put("favorite_number", 7);
	user2.put("favorite_color", "red");
	</source>
	<p>
	Since we're not using code generation, we use
	<code>GenericRecord</code>s to represent users.
	<code>GenericRecord</code> uses the schema to verify that we only
	specify valid fields. If we try to set a non-existent field (e.g.,
	<code>user1.put("favorite_animal", "cat")</code>), we'll get an
	<code>AvroRuntimeException</code> when we run the program.
	</p>
	<p>
	Note that we do not set <code>user1</code>'s favorite color. Since
	that record is of type <code>["string", "null"]</code>, we can either
	set it to a <code>string</code> or leave it <code>null</code>; it is
	essentially optional.
	</p>
	</section>
	<section>
	<title>Serializing</title>
	<p>
	Now that we've created our user objects, serializing and deserializing
	them is almost identical to the example above which uses code
	generation. The main difference is that we use generic instead of
	specific readers and writers.
	</p>
	<p>
	First we'll serialize our users to a data file on disk.
	</p>
	<source>
	// Serialize user1 and user2 to disk
	File file = new File("users.avro");
	DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
	DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
	dataFileWriter.create(schema, file);
	dataFileWriter.append(user1);
	dataFileWriter.append(user2);
	dataFileWriter.close();
	</source>
	<p>
	We create a <code>DatumWriter</code>, which converts Java objects into
	an in-memory serialized format. Since we are not using code
	generation, we create a <code>GenericDatumWriter</code>. It requires
	the schema both to determine how to write the
	<code>GenericRecord</code>s and to verify that all non-nullable fields
	are present.
	</p>
	<p>
	As in the code generation example, we also create a
	<code>DataFileWriter</code>, which writes the serialized records, as
	well as the schema, to the file specified in the
	<code>dataFileWriter.create</code> call. We write our users to the
	file via calls to the <code>dataFileWriter.append</code> method. When
	we are done writing, we close the data file.
	</p>
	</section>
	<section>
	<title>Deserializing</title>
	<p>
	Finally, we'll deserialize the data file we just created.
	</p>
	<source>
	// Deserialize users from disk
	DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
	DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);
	GenericRecord user = null;
	while (dataFileReader.hasNext()) {
	// Reuse user object by passing it to next(). This saves us from
	// allocating and garbage collecting many objects for files with
	// many items.
	user = dataFileReader.next(user);
	System.out.println(user);
	</source>
	<p>This outputs:</p>
	<source>
	{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
	{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
	</source>
	<p>
	Deserializing is very similar to serializing. We create a
	<code>GenericDatumReader</code>, analogous to the
	<code>GenericDatumWriter</code> we used in serialization, which
	converts in-memory serialized items into <code>GenericRecords</code>.
	We pass the <code>DatumReader</code> and the previously created
	<code>File</code> to a <code>DataFileReader</code>, analogous to the
	<code>DataFileWriter</code>, which reads both the schema used by the
	writer as well as the data from the file on disk. The data will be
	read using the writer's schema included in the file, and the reader's
	schema provided to the <code>GenericDatumReader</code>. The writer's
	schema is needed to know the order in which fields were written,
	while the reader's schema is needed to know what fields are expected
	and how to fill in default values for fields added since the file
	was written. If there are differences between the two schemas, they
	are resolved according to the
	<a href="spec.html#Schema+Resolution">Schema Resolution</a>
	specification.
	</p>
	<p>
	Next, we use the <code>DataFileReader</code> to iterate through the
	serialized users and print the deserialized object to stdout. Note
	how we perform the iteration: we create a single
	<code>GenericRecord</code> object which we store the current
	deserialized user in, and pass this record object to every call of
	<code>dataFileReader.next</code>. This is a performance optimization
	that allows the <code>DataFileReader</code> to reuse the same record
	object rather than allocating a new <code>GenericRecord</code> for
	every iteration, which can be very expensive in terms of object
	allocation and garbage collection if we deserialize a large data file.
	While this technique is the standard way to iterate through a data
	file, it's also possible to use <code>for (GenericRecord user :
	dataFileReader)</code> if performance is not a concern.
	</p>
	</section>
	<section>
	<title>Compiling and running the example code</title>
	<p>
	This example code is included as a Maven project in the
	<em>examples/java-example</em> directory in the Avro docs. From this
	directory, execute the following commands to build and run the
	example:
	</p>
	<source>
	$ mvn compile
	$ mvn -q exec:java -Dexec.mainClass=example.GenericMain
	</source>
	</section>
	</section>
	</body>
	</document>