docs/releasenotes.html - hadoop - Git at Google

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 <html><head>
     <title>Hadoop 0.17.1 Release Notes</title></head>
 <body>
 <font face="sans-serif">
     <h1>Hadoop 0.17.1 Release Notes</h1>
 The bug fixes are listed below.
 <ul><a name="changes">
     <h2>Changes Since Hadoop 0.17.0</h2>
 <ul>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-2159'>HADOOP-2159</a>] -         Namenode stuck in safemode
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3442'>HADOOP-3442</a>] -         QuickSort may get into unbounded recursion
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3472'>HADOOP-3472</a>] -         MapFile.Reader getClosest() function returns incorrect results when before is true
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3475'>HADOOP-3475</a>] -         MapOutputBuffer allocates 4x as much space to record capacity as intended
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3477'>HADOOP-3477</a>] -         release tar.gz contains duplicate files
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3522'>HADOOP-3522</a>] -         ValuesIterator.next() doesn't return a new object, thus failing many equals() tests.
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3526'>HADOOP-3526</a>] -         contrib/data_join doesn't work
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3550'>HADOOP-3550</a>] -         Reduce tasks failing with OOM
 </li>
 <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3565'>HADOOP-3565</a>] -         JavaSerialization can throw java.io.StreamCorruptedException
 </li>
 </ul>
 </ul>

     <h1>Hadoop 0.17.0 Release Notes</h1>

 These release notes include new developer and user facing incompatibilities, features, and major improvements.  The table below is sorted by Component.
 <ul><a name="changes">
 <h2>Changes Since Hadoop 0.16.4</h2>
   <table border="1" width="100%" cellpadding="4">
    <tbody><tr>
     <td><b>Issue</b></td>
     <td><b>Component</b></td>
     <td><b>Notes</b></td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a>
     </td>
     <td>
     conf
     </td>
     <td>
     Remove these deprecated methods in
     <tt>org.apache.hadoop.conf.Configuration</tt>:<br><tt><ul><li>
     public Object getObject(String name) </li><li>
     public void setObject(String name, Object value) </li><li>
     public Object get(String name, Object defaultValue) </li><li>
     public void set(String name, Object value)</li><li>public Iterator entries()
     </li></ul></tt></td>
    </tr>
    <tr>
     <td nowrap>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2410">HADOOP-2410</a>
     </td>
     <td>
     contrib/ec2
     </td>
     <td>
     The command <tt>hadoop-ec2
     run</tt> has been replaced by <tt>hadoop-ec2 launch-cluster
     &lt;group&gt; &lt;number of instances&gt;</tt>, and <tt>hadoop-ec2
     start-hadoop</tt> has been removed since Hadoop is started on instance
     start up. See <a href="http://wiki.apache.org/hadoop/AmazonEC2">http://wiki.apache.org/hadoop/AmazonEC2</a>
     for details.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a>
     </td>
     <td>
     contrib/hod
     </td>
     <td>
     Added a provision to reliably detect a
     failing script's exit code. When the HOD script option
     returns a non-zero exit code, look for a <tt>script.exitcode</tt>
     file written to the HOD cluster directory. If this file is present, it
     means the script failed with the exit code given in the file.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2775">HADOOP-2775</a>
     </td>
     <td>
     contrib/hod
     </td>
     <td>
     Added A unit testing framework based on
     pyunit to HOD. Developers contributing patches to HOD should now
     contribute unit tests along with the patches when possible.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3137">HADOOP-3137</a>
     </td>
     <td>
     contrib/hod
     </td>
     <td>
     The HOD version is now the same as the Hadoop version.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2855">HADOOP-2855</a>
     </td>
     <td>
     contrib/hod
     </td>
     <td>
     HOD now handles relative
     paths correctly for important HOD options such as the cluster directory,
     tarball option, and script file.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a>
     </td>
     <td>
     contrib/hod
     </td>
     <td>
     HOD now cleans up the HOD generated mapred system directory
     at cluster deallocation time.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a>
     </td>
     <td>
     contrib/hod
     </td>
     <td>
     The number of free nodes in the cluster
     is computed using a better algorithm that filters out inconsistencies in
     node status as reported by Torque.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2947">HADOOP-2947</a>
     </td>
     <td>
     contrib/hod
     </td>
     <td>
     The stdout and stderr streams of
     daemons are redirected to files that are created under the hadoop log
     directory. Users can now send a <tt>kill 3</tt> signal to the daemons to get stack traces
     and thread dumps for debugging.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3168">HADOOP-3168</a>
     </td>
     <td>
     contrib/streaming
     </td>
     <td>
     Decreased the frequency of logging
     in Hadoop streaming (from every 100 records to every 10,000 records).
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a>
     </td>
     <td>
     contrib/streaming
     </td>
     <td>
     Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is
     the separator, then an empty key is assumed and the whole line is the value.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2820">HADOOP-2820</a>
     </td>
     <td>
     contrib/streaming
     </td>
     <td>
     Removed these deprecated classes: <br><tt><ul><li>org.apache.hadoop.streaming.StreamLineRecordReader</li><li>org.apache.hadoop.streaming.StreamOutputFormat</li><li>org.apache.hadoop.streaming.StreamSequenceRecordReader</li></ul></tt></td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3280">HADOOP-3280</a>
     </td>
     <td>
     contrib/streaming
     </td>
     <td>
     Added the
     <tt>mapred.child.ulimit</tt> configuration variable to limit the maximum virtual memory allocated to processes launched by the
 Map-Reduce framework. This can be used to control both the Mapper/Reducer
 tasks and applications using Hadoop pipes, Hadoop streaming etc.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2657">HADOOP-2657</a>
     </td>
     <td>
     dfs
     </td>
     <td>Added the new API <tt>DFSOututStream.flush()</tt> to
     flush all outstanding data to DataNodes.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2219">HADOOP-2219</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Added a new <tt>fs -count</tt> command for
     counting the number of bytes, files, and directories under a given path. <br>
     <br>
     Added a new RPC <tt>getContentSummary(String path)</tt> to ClientProtocol.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2559">HADOOP-2559</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Changed DFS block placement to
     allocate the first replica locally, the second off-rack, and the third
     intra-rack from the second.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2758">HADOOP-2758</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Improved DataNode CPU usage by 50% while serving data to clients.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2634">HADOOP-2634</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Deprecated ClientProtocol's <tt>exists()</tt> method.  Use <tt>getFileInfo(String)</tt> instead.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2423">HADOOP-2423</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Improved <tt>FSDirectory.mkdirs(...)</tt> performance by about 50% as measured by the NNThroughputBenchmark.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3124">HADOOP-3124</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Made DataNode socket write timeout configurable, however the configuration variable is undocumented.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2470">HADOOP-2470</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Removed <tt>open()</tt> and <tt>isDir()</tt> methods from ClientProtocol without first deprecating. <br>
     <br>
     Remove deprecated <tt>getContentLength()</tt> from ClientProtocol.<br>
     <br>
     Deprecated <tt>isDirectory</tt> in DFSClient.  Use <tt>getFileStatus()</tt> instead.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Removed deprecated method <tt>org.apache.hadoop.ipc.Server.getUserInfo()</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-771">HADOOP-771</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Added a new method to <tt>FileSystem</tt> API, <tt>delete(path, boolean)</tt>,
     and deprecated the previous <tt>delete(path)</tt> method.
     The new method recursively deletes files only if boolean is set to true.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3239">HADOOP-3239</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Modified <tt>org.apache.hadoop.dfs.FSDirectory.getFileInfo(String)</tt> to return null when a file is not
     found instead of throwing FileNotFoundException.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Enhanced <tt>hadoop dfs -put</tt> command to accept multiple
     sources when destination is a directory.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2192">HADOOP-2192</a>
     </td>
     <td>
     dfs
     </td>
     <td>
     Modified <tt>hadoop dfs -mv</tt> to be closer in functionality to
     the Linux <tt>mv</tt> command by removing unnecessary output and return
     an error message when moving non existent files/directories.
     </td>
    </tr>
    <tr>
     <td>
     <u1:p></u1:p><a href="https://issues.apache.org/jira/browse/HADOOP-1985">HADOOP-1985</a>
     </td>
     <td>
     dfs <br>
     mapred
     </td>
     <td>
     Added rack awareness for map tasks and moves the rack resolution logic to the
     NameNode and JobTracker. <p> The administrator can specify a
     loadable class given by topology.node.switch.mapping.impl to specify the
     class implementing the logic for rack resolution. The class must implement
     a method - resolve(List&lt;String&gt; names), where names is the list of
     DNS-names/IP-addresses that we want resolved. The return value is a list of
     resolved network paths of the form /foo/rack, where rack is the rackID
     where the node belongs to and foo is the switch where multiple racks are
     connected, and so on. The default implementation of this class is packaged
     along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping
     and this class loads a script that can be used for rack resolution. The
     script location is configurable. It is specified by
     topology.script.file.name and defaults to an empty script. In the case
     where the script name is empty, /default-rack is returned for all
     dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides
     administrators fleixibilty to define how their site's node resolution
     should happen. <br>
     For mapred, one can also specify the level of the cache w.r.t the number of
     levels in the resolved network path - defaults to two. This means that the
     JobTracker will cache tasks at the host level and at the rack level. <br>
     Known issue: the task caching will not work with levels greater than 2
     (beyond racks). This bug is tracked in <a href="https://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2063">HADOOP-2063</a>
     </td>
     <td>
     fs
     </td>
     <td>
     Added a new option <tt>-ignoreCrc</tt> to <tt>fs -get</tt> and <tt>fs -copyToLocal</tt>.  The option causes CRC checksums to be
     ignored for this command so that corrupt files may be downloaded.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3001">HADOOP-3001</a>
     </td>
     <td>
     fs
     </td>
     <td>
     Added a new Map/Reduce framework
     counters that track the number of bytes read and written to HDFS, local,
     KFS, and S3 file systems.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a>
     </td>
     <td>
     fs
     </td>
     <td>
     Added a new FileSystem method <tt>getFileBlockLocations</tt> to return the number of bytes in each block in a file
     via a single rpc to the NameNode. Deprecated <tt>getFileCacheHints</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2839">HADOOP-2839</a>
     </td>
     <td>
     fs
     </td>
     <td>
     Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.globPaths()</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a>
     </td>
     <td>
     fs
     </td>
     <td>
     Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.listPaths()</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-1593">HADOOP-1593</a>
     </td>
     <td>
     fs
     </td>
     <td>
     Modified FSShell commands to accept non-default paths. Now you can commands like <tt>hadoop dfs -ls hdfs://remotehost1:port/path</tt>
     and <tt>hadoop dfs -ls hdfs://remotehost2:port/path</tt> without changing your Hadoop config.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3048">HADOOP-3048</a>
     </td>
     <td>
     io
     </td>
     <td>
     Added a new API and a default
     implementation to convert and restore serializations of objects to strings.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3152">HADOOP-3152</a>
     </td>
     <td>
     io
     </td>
     <td>
     Add a static method
     <tt>MapFile.setIndexInterval(Configuration, int interval)</tt> so that Map/Reduce
     jobs using <tt>MapFileOutputFormat</tt> can set the index interval.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a>
     </td>
     <td>
     ipc
     </td>
     <td>
     <tt>SocketOutputStream.close()</tt> now closes the
     underlying channel. This increase compatibility with
     <tt>java.net.Socket.getOutputStream</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3041">HADOOP-3041</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Deprecated <tt>JobConf.setOutputPath</tt> and <tt>JobConf.getOutputPath</tt>.<p>
     Deprecated <tt>OutputFormatBase</tt>. Added <tt>FileOutputFormat</tt>. Existing output
     formats extending <tt>OutputFormatBase</tt> now extend <tt>FileOutputFormat</tt>. <p>
     Added the following methods to <tt>FileOutputFormat</tt>:
     <tt><ul>
     <li>public static void setOutputPath(JobConf conf, Path outputDir)
     <li>public static Path getOutputPath(JobConf conf)
     <li>public static Path getWorkOutputPath(JobConf conf)
     <li>static void setWorkOutputPath(JobConf conf, Path outputDir)
     </ul></tt>
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3204">HADOOP-3204</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Fixed <tt>ReduceTask.LocalFSMerger</tt> to handle errors and exceptions better. Prior to this all
     exceptions except IOException would be silently ignored.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-1986">HADOOP-1986</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Programs that implement the raw
     <tt>Mapper</tt> or <tt>Reducer</tt> interfaces will need modification to compile with this
     release. For example, <p>
     <pre>
     class MyMapper implements Mapper {
         public void map(WritableComparable key, Writable val,
                     OutputCollector out, Reporter reporter) throws IOException {
             // ...
         }
         // ...
     }
     </pre>
     will need to be changed to refer to the parameterized type. For example: <p>
     <pre>
     class MyMapper implements Mapper&lt;WritableComparable, Writable, WritableComparable, Writable&gt; {
         public void map(WritableComparable key, Writable val,
                         OutputCollector&lt;WritableComparable, Writable&gt;
                         out, Reporter reporter) throws IOException {
             // ...
         }
         // ...
     }
     </pre>
     Similarly implementations of the following raw interfaces will need
     modification:
     <tt><ul>
     <li>InputFormat
     <li>OutputCollector
     <li>OutputFormat
     <li>Partitioner
     <li>RecordReader
     <li>RecordWriter
     </ul></tt>
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Reducers now perform merges of
     shuffle data (both in-memory and on disk) while fetching map outputs.
     Earlier, during shuffle they used to merge only the in-memory outputs.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Removed the deprecated classes <tt>org.apache.hadoop.mapred.InputFormatBase</tt>
     and <tt>org.apache.hadoop.mapred.PhasedFileSystem</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2817">HADOOP-2817</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Removed the deprecated method
     <tt>org.apache.hadoop.mapred.ClusterStatus.getMaxTasks()</tt>
     and the deprecated configuration property <tt>mapred.tasktracker.tasks.maximum</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2825">HADOOP-2825</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Removed the deprecated method
     <tt>org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path
     localFilename, int reduce, Progressable pingee, int timeout)</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Removed the deprecated methods
     <tt>org.apache.hadoop.mapred.Counters.getDisplayName(String counter)</tt> and
     <tt>org.apache.hadoop.mapred.Counters.getCounterNames()</tt>.
     Undeprecated the method
     <tt>org.apache.hadoop.mapred.Counters.getCounter(String counterName)</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2826">HADOOP-2826</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Changed The signature of the method
     <tt>public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream)</tt> to
     <tt>UTF8ByteArrayUtils.readLIne(LineReader, Text)</tt>. Since the old
     signature is not deprecated, any code using the old method must be changed
     to use the new method.
     <p>
     Removed the deprecated methods <tt>org.apache.hadoop.mapred.FileSplit.getFile()</tt>
     and <tt>org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in,
     OutputStream out)</tt>.
     <p>
     Made the constructor <tt>org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration
     conf)</tt> public.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2819">HADOOP-2819</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Removed these deprecated methods from <tt>org.apache.hadoop.JobConf</tt>:
     <tt><ul>
     <li>public Class getInputKeyClass()
     <li>public void setInputKeyClass(Class theClass)
     <li>public Class getInputValueClass()
     <li>public void setInputValueClass(Class theClass)
     </ul></tt>
     and undeprecated these methods:
     <tt><ul>
     <li>getSpeculativeExecution()
     <li>public void setSpeculativeExecution(boolean speculativeExecution)
     </ul></tt>
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Added the following public methods to <tt>org.apache.hadoop.conf.Configuration</tt>:
     <tt><ul>
     <li>String[] Configuration.getStrings(String name, String... defaultValue)
     <li>void Configuration.setStrings(String name, String... values)
     </ul></tt>
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     The key and value objects that are given
     to the Combiner and Reducer are now reused between calls. This is much more
     efficient, but the user can not assume the objects are constant.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3162">HADOOP-3162</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Deprecated the public methods <tt>org.apache.hadoop.mapred.JobConf.setInputPath(Path)</tt> and
     <tt>org.apache.hadoop.mapred.JobConf.addInputPath(Path)</tt>.
     <p>
     Added the following public methods to <tt>org.apache.hadoop.mapred.FileInputFormat</tt>:
     <tt><ul>
     <li>public static void setInputPaths(JobConf job, Path... paths); <br>
     <li>public static void setInputPaths(JobConf job, String commaSeparatedPaths); <br>
     <li>public static void addInputPath(JobConf job, Path path); <br>
     <li>public static void addInputPaths(JobConf job, String commaSeparatedPaths); <br>
     </ul></tt>
     Earlier code calling <tt>JobConf.setInputPath(Path)</tt> and <tt>JobConf.addInputPath(Path)</tt>
     should now call <tt>FileInputFormat.setInputPaths(JobConf, Path...)</tt> and
     <tt>FileInputFormat.addInputPath(Path)</tt> respectively.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Provided a new facility to
     store job history on DFS. Cluster administrator can now provide either localFS
     location or DFS location using configuration property
     <tt>mapred.job.history.location</tt> to store job histroy. History will also
     be logged in user specified location if the configuration property
     <tt>mapred.job.history.user.location</tt> is specified.
     <p>
     Removed these classes and method:
     <tt><ul>
     <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndex
     <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndexParseListener
     <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseMasterIndex
     </ul></tt>
     <p>
     Changed the signature of the public method
     <tt>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File
     jobHistoryFile, JobHistory.JobInfo job)</tt> to
     <tt>DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile,
     JobHistory.JobInfo job, FileSystem fs)</tt>. <p>
     Changed the signature of the public method
     <tt>org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l)</tt>
     to <tt>JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs)</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Users are now provided the ability to specify what paths to ignore when processing the job input directory
     (apart from the filenames that start with "_" and ".").
     To do this, two new methods were defined:
     <tt><ul>
     <li>FileInputFormat.setInputPathFilter(JobConf, PathFilter)
     <li>FileInputFormat.getInputPathFilter(JobConf)
     </ul></tt>
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory
     (<tt>mapred-local/taskTracker/jobcache/$jobid/work</tt>) for use as scratch
     space, through configuration property and system property
     <tt>job.local.dir</tt>. The directory <tt>../work</tt> is no longer available from the task's current working directory.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a>
     </td>
     <td>
     mapred
     </td>
     <td>
     Added new command line options for <tt>hadoop jar</tt> command:
     <p>
     <tt>hadoop jar -files &lt;comma seperated list of files&gt; -libjars &lt;comma
     seperated list of jars&gt; -archives &lt;comma seperated list of
     archives&gt; </tt>
     <p>
     where the options have these meanings:
     <p>
     <ul>
     <li><tt>-files</tt> options allows you to speficy comma seperated list of path which
     would be present in your current working directory of your task <br>
     <li><tt>-libjars</tt> option allows you to add jars to the classpaths of the maps and
     reduces. <br>
     <li><tt>-archives</tt> allows you to pass archives as arguments that are
     unzipped/unjarred and a link with name of the jar/zip are created in the
     current working directory if tasks.
     </ul>
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a>
     </td>
     <td>
     record
     </td>
     <td>
     Removed the deprecated methods in
     <tt>org.apache.hadoop.record.compiler.generated.SimpleCharStream</tt>:
     <tt><ul>
     <li>public int getColumn()
     <li>and public int getLine()
     </ul></tt>
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2551">HADOOP-2551</a>
     </td>
     <td>
     scripts
     </td>
     <td>
     Introduced new environment variables to allow finer grained control of Java options passed to server and
     client JVMs. See the new <tt>*_OPTS</tt> variables in <tt>conf/hadoop-env.sh</tt>.
     </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-3099">HADOOP-3099</a>
     </td>
     <td>
     util
     </td>
     <td>
     Added a new <tt>-p</tt> option to <tt>distcp</tt> for preserving file and directory status:
     <pre>
     -p[rbugp] Preserve status
         r: replication number
         b: block size
         u: user
         g: group
         p: permission
     </pre>
     The <tt>-p</tt> option alone is equivalent to <tt>-prbugp</tt>
    </td>
    </tr>
    <tr>
     <td>
     <a href="https://issues.apache.org/jira/browse/HADOOP-2821">HADOOP-2821</a>
     </td>
     <td>
     util
     </td>
     <td>
     Removed the deprecated classes <tt>org.apache.hadoop.util.ShellUtil</tt> and <tt>org.apache.hadoop.util.ToolBase</tt>.
     </td>
    </tr>
   </tbody></table>

 </ul>

 </body></html>