lang/java/avro/src/main/java/overview.html - avro - Git at Google

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
 <html>
 <head>
    <title>Avro</title>
 </head>
 <body>Avro is a data serialization system.

   <h2>Overview</h2>

   <p>Avro provides:
     <ul>
       <li>Rich data structures.
       <li>A compact, fast, binary data format.
       <li>A container file, to store persistent data.
       <li>Remote procedure call (RPC).
       <li>Simple integration with dynamic languages.  Code generation
       is not required to read or write data files nor to use or
       implement RPC protocols.  Code generation as an optional
       optimization, only worth implementing for statically typed
       languages.
     </ul>

   <h2>Schemas</h2>

   <p>Avro relies on <i>{@link org.apache.avro.Schema schemas}</i>.
   When Avro data is read, the schema used when writing it is always
   present.  This permits each datum to be written with no per-value
   overheads, making serialization both fast and small.  This also
   facilitates use with dynamic, scripting languages, since data,
   together with its schema, is fully self-describing.

   <p>When Avro data is stored in a {@link
   org.apache.avro.file.DataFileWriter file}, its schema is stored with
   it, so that files may be processed later by any program.  If the
   program reading the data expects a different schema this can be
   easily resolved, since both schemas are present.

   <p>When Avro is used in {@link org.apache.avro.ipc RPC}, the client
     and server exchange schemas in the connection handshake.  (This
     can be optimized so that, for most calls, no schemas are actually
     transmitted.)  Since both client and server both have the other's
     full schema, correspondence between same named fields, missing
     fields, extra fields, etc. can all be easily resolved.

   <p>Avro schemas are defined with
   with <a href="http://www.json.org/">JSON</a> .  This facilitates
   implementation in languages that already have JSON libraries.

   <h2>Comparison with other systems</h2>

   Avro provides functionality similar to systems such
   as <a href="http://incubator.apache.org/thrift/">Thrift</a>,
   <a href="http://code.google.com/protobuf/">Protocol Buffers</a>,
   etc.  Avro differs from these systems in the following fundamental
   aspects.
   <ul>
     <li><i>Dynamic typing</i>: Avro does not require that code be
     generated.  Data is always accompanied by a schema that permits
     full processing of that data without code generation, static
     datatypes, etc.  This facilitates construction of generic
     data-processing systems and languages.
     <li><i>Untagged data</i>: Since the schema is present when data is
     read, considerably less type information need be encoded with
     data, resulting in smaller serialization size.</li>
     <li><i>No manually-assigned field IDs</i>: When a schema changes,
     both the old and new schema are always present when processing
     data, so differences may be resolved symbolically, using field
     names.
   </ul>

 </body>
 </html>
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<html>
	<head>
	<title>Avro</title>
	</head>
	<body>Avro is a data serialization system.

	<h2>Overview</h2>

	<p>Avro provides:
	<ul>
	<li>Rich data structures.
	<li>A compact, fast, binary data format.
	<li>A container file, to store persistent data.
	<li>Remote procedure call (RPC).
	<li>Simple integration with dynamic languages. Code generation
	is not required to read or write data files nor to use or
	implement RPC protocols. Code generation as an optional
	optimization, only worth implementing for statically typed
	languages.
	</ul>

	<h2>Schemas</h2>

	<p>Avro relies on <i>{@link org.apache.avro.Schema schemas}</i>.
	When Avro data is read, the schema used when writing it is always
	present. This permits each datum to be written with no per-value
	overheads, making serialization both fast and small. This also
	facilitates use with dynamic, scripting languages, since data,
	together with its schema, is fully self-describing.

	<p>When Avro data is stored in a {@link
	org.apache.avro.file.DataFileWriter file}, its schema is stored with
	it, so that files may be processed later by any program. If the
	program reading the data expects a different schema this can be
	easily resolved, since both schemas are present.

	<p>When Avro is used in {@link org.apache.avro.ipc RPC}, the client
	and server exchange schemas in the connection handshake. (This
	can be optimized so that, for most calls, no schemas are actually
	transmitted.) Since both client and server both have the other's
	full schema, correspondence between same named fields, missing
	fields, extra fields, etc. can all be easily resolved.

	<p>Avro schemas are defined with
	with <a href="http://www.json.org/">JSON</a> . This facilitates
	implementation in languages that already have JSON libraries.

	<h2>Comparison with other systems</h2>

	Avro provides functionality similar to systems such
	as <a href="http://incubator.apache.org/thrift/">Thrift</a>,
	<a href="http://code.google.com/protobuf/">Protocol Buffers</a>,
	etc. Avro differs from these systems in the following fundamental
	aspects.
	<ul>
	<li><i>Dynamic typing</i>: Avro does not require that code be
	generated. Data is always accompanied by a schema that permits
	full processing of that data without code generation, static
	datatypes, etc. This facilitates construction of generic
	data-processing systems and languages.
	<li><i>Untagged data</i>: Since the schema is present when data is
	read, considerably less type information need be encoded with
	data, resulting in smaller serialization size.</li>
	<li><i>No manually-assigned field IDs</i>: When a schema changes,
	both the old and new schema are always present when processing
	data, so differences may be resolved symbolically, using field
	names.
	</ul>

	</body>
	</html>