| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <html> |
| <head> |
| <title>Avro</title> |
| </head> |
| <body>Avro is a data serialization system. |
| |
| <h2>Overview</h2> |
| |
| <p>Avro provides: |
| <ul> |
| <li>Rich data structures. |
| <li>A compact, fast, binary data format. |
| <li>A container file, to store persistent data. |
| <li>Remote procedure call (RPC). |
| <li>Simple integration with dynamic languages. Code generation |
| is not required to read or write data files nor to use or |
| implement RPC protocols. Code generation as an optional |
| optimization, only worth implementing for statically typed |
| languages. |
| </ul> |
| |
| <h2>Schemas</h2> |
| |
| <p>Avro relies on <i>{@link org.apache.avro.Schema schemas}</i>. |
| When Avro data is read, the schema used when writing it is always |
| present. This permits each datum to be written with no per-value |
| overheads, making serialization both fast and small. This also |
| facilitates use with dynamic, scripting languages, since data, |
| together with its schema, is fully self-describing. |
| |
| <p>When Avro data is stored in a {@link |
| org.apache.avro.file.DataFileWriter file}, its schema is stored with |
| it, so that files may be processed later by any program. If the |
| program reading the data expects a different schema this can be |
| easily resolved, since both schemas are present. |
| |
| <p>When Avro is used in {@link org.apache.avro.ipc RPC}, the client |
| and server exchange schemas in the connection handshake. (This |
| can be optimized so that, for most calls, no schemas are actually |
| transmitted.) Since both client and server both have the other's |
| full schema, correspondence between same named fields, missing |
| fields, extra fields, etc. can all be easily resolved. |
| |
| <p>Avro schemas are defined with |
| with <a href="http://www.json.org/">JSON</a> . This facilitates |
| implementation in languages that already have JSON libraries. |
| |
| <h2>Comparison with other systems</h2> |
| |
| Avro provides functionality similar to systems such |
| as <a href="http://incubator.apache.org/thrift/">Thrift</a>, |
| <a href="http://code.google.com/protobuf/">Protocol Buffers</a>, |
| etc. Avro differs from these systems in the following fundamental |
| aspects. |
| <ul> |
| <li><i>Dynamic typing</i>: Avro does not require that code be |
| generated. Data is always accompanied by a schema that permits |
| full processing of that data without code generation, static |
| datatypes, etc. This facilitates construction of generic |
| data-processing systems and languages. |
| <li><i>Untagged data</i>: Since the schema is present when data is |
| read, considerably less type information need be encoded with |
| data, resulting in smaller serialization size.</li> |
| <li><i>No manually-assigned field IDs</i>: When a schema changes, |
| both the old and new schema are always present when processing |
| data, so differences may be resolved symbolically, using field |
| names. |
| </ul> |
| |
| </body> |
| </html> |