lang/java/mapred/src/main/java/org/apache/avro/mapred/package.html - avro - Git at Google

 <html>

 <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
 -->

 <body>
 Run <a href="http://hadoop.apache.org/">Hadoop</a> MapReduce jobs over
 Avro data, with map and reduce functions written in Java.

 <p>Avro data files do not contain key/value pairs as expected by
   Hadoop's MapReduce API, but rather just a sequence of values.  Thus
   we provide here a layer on top of Hadoop's MapReduce API.</p>

 <p>In all cases, input and output paths are set and jobs are submitted
   as with standard Hadoop jobs:
  <ul>
    <li>Specify input files with {@link
    org.apache.hadoop.mapred.FileInputFormat#setInputPaths}</li>
    <li>Specify an output directory with {@link
    org.apache.hadoop.mapred.FileOutputFormat#setOutputPath}</li>
    <li>Run your job with {@link org.apache.hadoop.mapred.JobClient#runJob}</li>
  </ul>
 </p>

 <p>For jobs whose input and output are Avro data files:
  <ul>
    <li>Call {@link org.apache.avro.mapred.AvroJob#setInputSchema} and
    {@link org.apache.avro.mapred.AvroJob#setOutputSchema} with your
    job's input and output schemas.</li>
    <li>Subclass {@link org.apache.avro.mapred.AvroMapper} and specify
    this as your job's mapper with {@link
    org.apache.avro.mapred.AvroJob#setMapperClass}</li>
    <li>Subclass {@link org.apache.avro.mapred.AvroReducer} and specify
    this as your job's reducer and perhaps combiner, with {@link
    org.apache.avro.mapred.AvroJob#setReducerClass} and {@link
    org.apache.avro.mapred.AvroJob#setCombinerClass}</li>
  </ul>
 </p>

 <p>For jobs whose input is an Avro data file and which use an {@link
   org.apache.avro.mapred.AvroMapper}, but whose reducer is a non-Avro
   {@link org.apache.hadoop.mapred.Reducer} and whose output is a
   non-Avro format:
  <ul>
    <li>Call {@link org.apache.avro.mapred.AvroJob#setInputSchema} with your
    job's input schema.</li>
    <li>Subclass {@link org.apache.avro.mapred.AvroMapper} and specify
    this as your job's mapper with {@link
    org.apache.avro.mapred.AvroJob#setMapperClass}</li>
    <li>Implement {@link org.apache.hadoop.mapred.Reducer} and specify
    your job's reducer and combiner with {@link
    org.apache.hadoop.mapred.JobConf#setReducerClass} and {@link
    org.apache.hadoop.mapred.JobConf#setCombinerClass}.  The input key
    and value types should be {@link org.apache.avro.mapred.AvroKey} and {@link
    org.apache.avro.mapred.AvroValue}.</li>
    <li>Specify your job's output key and value types {@link
    org.apache.hadoop.mapred.JobConf#setOutputKeyClass} and {@link
    org.apache.hadoop.mapred.JobConf#setOutputValueClass}.</li>
    <li>Specify your job's output format {@link
    org.apache.hadoop.mapred.JobConf#setOutputFormat}.</li>
  </ul>
 </p>

 <p>For jobs whose input is non-Avro data file and which use a
   non-Avro {@link org.apache.hadoop.mapred.Mapper}, but whose reducer
   is an {@link org.apache.avro.mapred.AvroReducer} and whose output is
   an Avro data file:
  <ul>
    <li>Set your input file format with {@link
    org.apache.hadoop.mapred.JobConf#setInputFormat}.</li>
    <li>Implement {@link org.apache.hadoop.mapred.Mapper} and specify
    your job's mapper with {@link
    org.apache.hadoop.mapred.JobConf#setMapperClass}.  The output key
    and value type should be {@link org.apache.avro.mapred.AvroKey} and
    {@link org.apache.avro.mapred.AvroValue}.</li>
    <li>Subclass {@link org.apache.avro.mapred.AvroReducer} and specify
    this as your job's reducer and perhaps combiner, with {@link
    org.apache.avro.mapred.AvroJob#setReducerClass} and {@link
    org.apache.avro.mapred.AvroJob#setCombinerClass}</li>
    <li>Call {@link org.apache.avro.mapred.AvroJob#setOutputSchema} with your
    job's output schema.</li>
  </ul>
 </p>

 </body>
 </html>
	<html>

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<body>
	Run <a href="http://hadoop.apache.org/">Hadoop</a> MapReduce jobs over
	Avro data, with map and reduce functions written in Java.

	<p>Avro data files do not contain key/value pairs as expected by
	Hadoop's MapReduce API, but rather just a sequence of values. Thus
	we provide here a layer on top of Hadoop's MapReduce API.</p>

	<p>In all cases, input and output paths are set and jobs are submitted
	as with standard Hadoop jobs:
	<ul>
	<li>Specify input files with {@link
	org.apache.hadoop.mapred.FileInputFormat#setInputPaths}</li>
	<li>Specify an output directory with {@link
	org.apache.hadoop.mapred.FileOutputFormat#setOutputPath}</li>
	<li>Run your job with {@link org.apache.hadoop.mapred.JobClient#runJob}</li>
	</ul>
	</p>

	<p>For jobs whose input and output are Avro data files:
	<ul>
	<li>Call {@link org.apache.avro.mapred.AvroJob#setInputSchema} and
	{@link org.apache.avro.mapred.AvroJob#setOutputSchema} with your
	job's input and output schemas.</li>
	<li>Subclass {@link org.apache.avro.mapred.AvroMapper} and specify
	this as your job's mapper with {@link
	org.apache.avro.mapred.AvroJob#setMapperClass}</li>
	<li>Subclass {@link org.apache.avro.mapred.AvroReducer} and specify
	this as your job's reducer and perhaps combiner, with {@link
	org.apache.avro.mapred.AvroJob#setReducerClass} and {@link
	org.apache.avro.mapred.AvroJob#setCombinerClass}</li>
	</ul>
	</p>

	<p>For jobs whose input is an Avro data file and which use an {@link
	org.apache.avro.mapred.AvroMapper}, but whose reducer is a non-Avro
	{@link org.apache.hadoop.mapred.Reducer} and whose output is a
	non-Avro format:
	<ul>
	<li>Call {@link org.apache.avro.mapred.AvroJob#setInputSchema} with your
	job's input schema.</li>
	<li>Subclass {@link org.apache.avro.mapred.AvroMapper} and specify
	this as your job's mapper with {@link
	org.apache.avro.mapred.AvroJob#setMapperClass}</li>
	<li>Implement {@link org.apache.hadoop.mapred.Reducer} and specify
	your job's reducer and combiner with {@link
	org.apache.hadoop.mapred.JobConf#setReducerClass} and {@link
	org.apache.hadoop.mapred.JobConf#setCombinerClass}. The input key
	and value types should be {@link org.apache.avro.mapred.AvroKey} and {@link
	org.apache.avro.mapred.AvroValue}.</li>
	<li>Specify your job's output key and value types {@link
	org.apache.hadoop.mapred.JobConf#setOutputKeyClass} and {@link
	org.apache.hadoop.mapred.JobConf#setOutputValueClass}.</li>
	<li>Specify your job's output format {@link
	org.apache.hadoop.mapred.JobConf#setOutputFormat}.</li>
	</ul>
	</p>

	<p>For jobs whose input is non-Avro data file and which use a
	non-Avro {@link org.apache.hadoop.mapred.Mapper}, but whose reducer
	is an {@link org.apache.avro.mapred.AvroReducer} and whose output is
	an Avro data file:
	<ul>
	<li>Set your input file format with {@link
	org.apache.hadoop.mapred.JobConf#setInputFormat}.</li>
	<li>Implement {@link org.apache.hadoop.mapred.Mapper} and specify
	your job's mapper with {@link
	org.apache.hadoop.mapred.JobConf#setMapperClass}. The output key
	and value type should be {@link org.apache.avro.mapred.AvroKey} and
	{@link org.apache.avro.mapred.AvroValue}.</li>
	<li>Subclass {@link org.apache.avro.mapred.AvroReducer} and specify
	this as your job's reducer and perhaps combiner, with {@link
	org.apache.avro.mapred.AvroJob#setReducerClass} and {@link
	org.apache.avro.mapred.AvroJob#setCombinerClass}</li>
	<li>Call {@link org.apache.avro.mapred.AvroJob#setOutputSchema} with your
	job's output schema.</li>
	</ul>
	</p>

	</body>
	</html>