| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <html><head> |
| <title>Hadoop 0.17.2 Release Notes</title></head> |
| <body> |
| <font face="sans-serif"> |
| <h1>Hadoop 0.17.2 Release Notes</h1> |
| The bug fixes are listed below. |
| <ul><a name="changes"> |
| <h2>Changes Since Hadoop 0.17.1</h2> |
| <ul> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3678'>HADOOP-3678</a>] - Avoid spurious exceptions logged at DataNode when clients |
| read from DFS.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3760'>HADOOP-3760</a>] - Fix a bug with HDFS file close()</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3707'>HADOOP-3707</a>] - NameNode keeps a count of number of blocks scheduled |
| to be written to a datanode and uses it to avoid allocating more |
| blocks than a datanode can hold.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3681'>HADOOP-3681</a>] - DFSClient can get into an infinite loop while closing |
| a file if there are some errors.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3002'>HADOOP-3002</a>] - Hold off block removal while in safe mode.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3685'>HADOOP-3685</a>] - Unbalanced replication target.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3758'>HADOOP-3758</a>] - Shutdown datanode on version mismatch instead of retrying |
| continuously, preventing excessive logging at the namenode.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3633'>HADOOP-3633</a>] - Correct exception handling in DataXceiveServer, and throttle |
| the number of xceiver threads in a data-node.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3370'>HADOOP-3370</a>] - Ensure that the TaskTracker.runningJobs data-structure is |
| correctly cleaned-up on task completion.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3813'>HADOOP-3813</a>] - Fix task-output clean-up on HDFS to use the recursive |
| FileSystem.delete rather than the FileUtil.fullyDelete.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3859'>HADOOP-3859</a>] - Allow the maximum number of xceivers in the data node to |
| be configurable.</li> |
| |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3931'>HADOOP-3931</a>] - Fix corner case in the map-side sort that causes some values |
| to be counted as too large and cause pre-mature spills to disk. Some values |
| will also bypass the combiner incorrectly.</li> |
| </ul> |
| </ul> |
| |
| <h1>Hadoop 0.17.1 Release Notes</h1> |
| The bug fixes are listed below. |
| <ul><a name="changes"> |
| <h2>Changes Since Hadoop 0.17.0</h2> |
| <ul> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-2159'>HADOOP-2159</a>] - Namenode stuck in safemode |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3442'>HADOOP-3442</a>] - QuickSort may get into unbounded recursion |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3472'>HADOOP-3472</a>] - MapFile.Reader getClosest() function returns incorrect results when before is true |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3475'>HADOOP-3475</a>] - MapOutputBuffer allocates 4x as much space to record capacity as intended |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3477'>HADOOP-3477</a>] - release tar.gz contains duplicate files |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3522'>HADOOP-3522</a>] - ValuesIterator.next() doesn't return a new object, thus failing many equals() tests. |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3526'>HADOOP-3526</a>] - contrib/data_join doesn't work |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3550'>HADOOP-3550</a>] - Reduce tasks failing with OOM |
| </li> |
| <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3565'>HADOOP-3565</a>] - JavaSerialization can throw java.io.StreamCorruptedException |
| </li> |
| </ul> |
| </ul> |
| |
| <h1>Hadoop 0.17.0 Release Notes</h1> |
| |
| These release notes include new developer and user facing incompatibilities, features, and major improvements. The table below is sorted by Component. |
| <ul><a name="changes"> |
| <h2>Changes Since Hadoop 0.16.4</h2> |
| <table border="1" width="100%" cellpadding="4"> |
| <tbody><tr> |
| <td><b>Issue</b></td> |
| <td><b>Component</b></td> |
| <td><b>Notes</b></td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a> |
| </td> |
| <td> |
| conf |
| </td> |
| <td> |
| Remove these deprecated methods in |
| <tt>org.apache.hadoop.conf.Configuration</tt>:<br><tt><ul><li> |
| public Object getObject(String name) </li><li> |
| public void setObject(String name, Object value) </li><li> |
| public Object get(String name, Object defaultValue) </li><li> |
| public void set(String name, Object value)</li><li>public Iterator entries() |
| </li></ul></tt></td> |
| </tr> |
| <tr> |
| <td nowrap> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2410">HADOOP-2410</a> |
| </td> |
| <td> |
| contrib/ec2 |
| </td> |
| <td> |
| The command <tt>hadoop-ec2 |
| run</tt> has been replaced by <tt>hadoop-ec2 launch-cluster |
| <group> <number of instances></tt>, and <tt>hadoop-ec2 |
| start-hadoop</tt> has been removed since Hadoop is started on instance |
| start up. See <a href="http://wiki.apache.org/hadoop/AmazonEC2">http://wiki.apache.org/hadoop/AmazonEC2</a> |
| for details. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a> |
| </td> |
| <td> |
| contrib/hod |
| </td> |
| <td> |
| Added a provision to reliably detect a |
| failing script's exit code. When the HOD script option |
| returns a non-zero exit code, look for a <tt>script.exitcode</tt> |
| file written to the HOD cluster directory. If this file is present, it |
| means the script failed with the exit code given in the file. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2775">HADOOP-2775</a> |
| </td> |
| <td> |
| contrib/hod |
| </td> |
| <td> |
| Added A unit testing framework based on |
| pyunit to HOD. Developers contributing patches to HOD should now |
| contribute unit tests along with the patches when possible. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3137">HADOOP-3137</a> |
| </td> |
| <td> |
| contrib/hod |
| </td> |
| <td> |
| The HOD version is now the same as the Hadoop version. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2855">HADOOP-2855</a> |
| </td> |
| <td> |
| contrib/hod |
| </td> |
| <td> |
| HOD now handles relative |
| paths correctly for important HOD options such as the cluster directory, |
| tarball option, and script file. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a> |
| </td> |
| <td> |
| contrib/hod |
| </td> |
| <td> |
| HOD now cleans up the HOD generated mapred system directory |
| at cluster deallocation time. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a> |
| </td> |
| <td> |
| contrib/hod |
| </td> |
| <td> |
| The number of free nodes in the cluster |
| is computed using a better algorithm that filters out inconsistencies in |
| node status as reported by Torque. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2947">HADOOP-2947</a> |
| </td> |
| <td> |
| contrib/hod |
| </td> |
| <td> |
| The stdout and stderr streams of |
| daemons are redirected to files that are created under the hadoop log |
| directory. Users can now send a <tt>kill 3</tt> signal to the daemons to get stack traces |
| and thread dumps for debugging. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3168">HADOOP-3168</a> |
| </td> |
| <td> |
| contrib/streaming |
| </td> |
| <td> |
| Decreased the frequency of logging |
| in Hadoop streaming (from every 100 records to every 10,000 records). |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a> |
| </td> |
| <td> |
| contrib/streaming |
| </td> |
| <td> |
| Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is |
| the separator, then an empty key is assumed and the whole line is the value. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2820">HADOOP-2820</a> |
| </td> |
| <td> |
| contrib/streaming |
| </td> |
| <td> |
| Removed these deprecated classes: <br><tt><ul><li>org.apache.hadoop.streaming.StreamLineRecordReader</li><li>org.apache.hadoop.streaming.StreamOutputFormat</li><li>org.apache.hadoop.streaming.StreamSequenceRecordReader</li></ul></tt></td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3280">HADOOP-3280</a> |
| </td> |
| <td> |
| contrib/streaming |
| </td> |
| <td> |
| Added the |
| <tt>mapred.child.ulimit</tt> configuration variable to limit the maximum virtual memory allocated to processes launched by the |
| Map-Reduce framework. This can be used to control both the Mapper/Reducer |
| tasks and applications using Hadoop pipes, Hadoop streaming etc. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2657">HADOOP-2657</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td>Added the new API <tt>DFSOututStream.flush()</tt> to |
| flush all outstanding data to DataNodes. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2219">HADOOP-2219</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Added a new <tt>fs -count</tt> command for |
| counting the number of bytes, files, and directories under a given path. <br> |
| <br> |
| Added a new RPC <tt>getContentSummary(String path)</tt> to ClientProtocol. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2559">HADOOP-2559</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Changed DFS block placement to |
| allocate the first replica locally, the second off-rack, and the third |
| intra-rack from the second. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2758">HADOOP-2758</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Improved DataNode CPU usage by 50% while serving data to clients. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2634">HADOOP-2634</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Deprecated ClientProtocol's <tt>exists()</tt> method. Use <tt>getFileInfo(String)</tt> instead. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2423">HADOOP-2423</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Improved <tt>FSDirectory.mkdirs(...)</tt> performance by about 50% as measured by the NNThroughputBenchmark. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3124">HADOOP-3124</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Made DataNode socket write timeout configurable, however the configuration variable is undocumented. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2470">HADOOP-2470</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Removed <tt>open()</tt> and <tt>isDir()</tt> methods from ClientProtocol without first deprecating. <br> |
| <br> |
| Remove deprecated <tt>getContentLength()</tt> from ClientProtocol.<br> |
| <br> |
| Deprecated <tt>isDirectory</tt> in DFSClient. Use <tt>getFileStatus()</tt> instead. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Removed deprecated method <tt>org.apache.hadoop.ipc.Server.getUserInfo()</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-771">HADOOP-771</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Added a new method to <tt>FileSystem</tt> API, <tt>delete(path, boolean)</tt>, |
| and deprecated the previous <tt>delete(path)</tt> method. |
| The new method recursively deletes files only if boolean is set to true. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3239">HADOOP-3239</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Modified <tt>org.apache.hadoop.dfs.FSDirectory.getFileInfo(String)</tt> to return null when a file is not |
| found instead of throwing FileNotFoundException. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Enhanced <tt>hadoop dfs -put</tt> command to accept multiple |
| sources when destination is a directory. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2192">HADOOP-2192</a> |
| </td> |
| <td> |
| dfs |
| </td> |
| <td> |
| Modified <tt>hadoop dfs -mv</tt> to be closer in functionality to |
| the Linux <tt>mv</tt> command by removing unnecessary output and return |
| an error message when moving non existent files/directories. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <u1:p></u1:p><a href="https://issues.apache.org/jira/browse/HADOOP-1985">HADOOP-1985</a> |
| </td> |
| <td> |
| dfs <br> |
| mapred |
| </td> |
| <td> |
| Added rack awareness for map tasks and moves the rack resolution logic to the |
| NameNode and JobTracker. <p> The administrator can specify a |
| loadable class given by topology.node.switch.mapping.impl to specify the |
| class implementing the logic for rack resolution. The class must implement |
| a method - resolve(List<String> names), where names is the list of |
| DNS-names/IP-addresses that we want resolved. The return value is a list of |
| resolved network paths of the form /foo/rack, where rack is the rackID |
| where the node belongs to and foo is the switch where multiple racks are |
| connected, and so on. The default implementation of this class is packaged |
| along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping |
| and this class loads a script that can be used for rack resolution. The |
| script location is configurable. It is specified by |
| topology.script.file.name and defaults to an empty script. In the case |
| where the script name is empty, /default-rack is returned for all |
| dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides |
| administrators fleixibilty to define how their site's node resolution |
| should happen. <br> |
| For mapred, one can also specify the level of the cache w.r.t the number of |
| levels in the resolved network path - defaults to two. This means that the |
| JobTracker will cache tasks at the host level and at the rack level. <br> |
| Known issue: the task caching will not work with levels greater than 2 |
| (beyond racks). This bug is tracked in <a href="https://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2063">HADOOP-2063</a> |
| </td> |
| <td> |
| fs |
| </td> |
| <td> |
| Added a new option <tt>-ignoreCrc</tt> to <tt>fs -get</tt> and <tt>fs -copyToLocal</tt>. The option causes CRC checksums to be |
| ignored for this command so that corrupt files may be downloaded. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3001">HADOOP-3001</a> |
| </td> |
| <td> |
| fs |
| </td> |
| <td> |
| Added a new Map/Reduce framework |
| counters that track the number of bytes read and written to HDFS, local, |
| KFS, and S3 file systems. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a> |
| </td> |
| <td> |
| fs |
| </td> |
| <td> |
| Added a new FileSystem method <tt>getFileBlockLocations</tt> to return the number of bytes in each block in a file |
| via a single rpc to the NameNode. Deprecated <tt>getFileCacheHints</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2839">HADOOP-2839</a> |
| </td> |
| <td> |
| fs |
| </td> |
| <td> |
| Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.globPaths()</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a> |
| </td> |
| <td> |
| fs |
| </td> |
| <td> |
| Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.listPaths()</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-1593">HADOOP-1593</a> |
| </td> |
| <td> |
| fs |
| </td> |
| <td> |
| Modified FSShell commands to accept non-default paths. Now you can commands like <tt>hadoop dfs -ls hdfs://remotehost1:port/path</tt> |
| and <tt>hadoop dfs -ls hdfs://remotehost2:port/path</tt> without changing your Hadoop config. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3048">HADOOP-3048</a> |
| </td> |
| <td> |
| io |
| </td> |
| <td> |
| Added a new API and a default |
| implementation to convert and restore serializations of objects to strings. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3152">HADOOP-3152</a> |
| </td> |
| <td> |
| io |
| </td> |
| <td> |
| Add a static method |
| <tt>MapFile.setIndexInterval(Configuration, int interval)</tt> so that Map/Reduce |
| jobs using <tt>MapFileOutputFormat</tt> can set the index interval. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a> |
| </td> |
| <td> |
| ipc |
| </td> |
| <td> |
| <tt>SocketOutputStream.close()</tt> now closes the |
| underlying channel. This increase compatibility with |
| <tt>java.net.Socket.getOutputStream</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3041">HADOOP-3041</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Deprecated <tt>JobConf.setOutputPath</tt> and <tt>JobConf.getOutputPath</tt>.<p> |
| Deprecated <tt>OutputFormatBase</tt>. Added <tt>FileOutputFormat</tt>. Existing output |
| formats extending <tt>OutputFormatBase</tt> now extend <tt>FileOutputFormat</tt>. <p> |
| Added the following methods to <tt>FileOutputFormat</tt>: |
| <tt><ul> |
| <li>public static void setOutputPath(JobConf conf, Path outputDir) |
| <li>public static Path getOutputPath(JobConf conf) |
| <li>public static Path getWorkOutputPath(JobConf conf) |
| <li>static void setWorkOutputPath(JobConf conf, Path outputDir) |
| </ul></tt> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3204">HADOOP-3204</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Fixed <tt>ReduceTask.LocalFSMerger</tt> to handle errors and exceptions better. Prior to this all |
| exceptions except IOException would be silently ignored. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-1986">HADOOP-1986</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Programs that implement the raw |
| <tt>Mapper</tt> or <tt>Reducer</tt> interfaces will need modification to compile with this |
| release. For example, <p> |
| <pre> |
| class MyMapper implements Mapper { |
| public void map(WritableComparable key, Writable val, |
| OutputCollector out, Reporter reporter) throws IOException { |
| // ... |
| } |
| // ... |
| } |
| </pre> |
| will need to be changed to refer to the parameterized type. For example: <p> |
| <pre> |
| class MyMapper implements Mapper<WritableComparable, Writable, WritableComparable, Writable> { |
| public void map(WritableComparable key, Writable val, |
| OutputCollector<WritableComparable, Writable> |
| out, Reporter reporter) throws IOException { |
| // ... |
| } |
| // ... |
| } |
| </pre> |
| Similarly implementations of the following raw interfaces will need |
| modification: |
| <tt><ul> |
| <li>InputFormat |
| <li>OutputCollector |
| <li>OutputFormat |
| <li>Partitioner |
| <li>RecordReader |
| <li>RecordWriter |
| </ul></tt> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Reducers now perform merges of |
| shuffle data (both in-memory and on disk) while fetching map outputs. |
| Earlier, during shuffle they used to merge only the in-memory outputs. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Removed the deprecated classes <tt>org.apache.hadoop.mapred.InputFormatBase</tt> |
| and <tt>org.apache.hadoop.mapred.PhasedFileSystem</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2817">HADOOP-2817</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Removed the deprecated method |
| <tt>org.apache.hadoop.mapred.ClusterStatus.getMaxTasks()</tt> |
| and the deprecated configuration property <tt>mapred.tasktracker.tasks.maximum</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2825">HADOOP-2825</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Removed the deprecated method |
| <tt>org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path |
| localFilename, int reduce, Progressable pingee, int timeout)</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Removed the deprecated methods |
| <tt>org.apache.hadoop.mapred.Counters.getDisplayName(String counter)</tt> and |
| <tt>org.apache.hadoop.mapred.Counters.getCounterNames()</tt>. |
| Undeprecated the method |
| <tt>org.apache.hadoop.mapred.Counters.getCounter(String counterName)</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2826">HADOOP-2826</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Changed The signature of the method |
| <tt>public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream)</tt> to |
| <tt>UTF8ByteArrayUtils.readLIne(LineReader, Text)</tt>. Since the old |
| signature is not deprecated, any code using the old method must be changed |
| to use the new method. |
| <p> |
| Removed the deprecated methods <tt>org.apache.hadoop.mapred.FileSplit.getFile()</tt> |
| and <tt>org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in, |
| OutputStream out)</tt>. |
| <p> |
| Made the constructor <tt>org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration |
| conf)</tt> public. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2819">HADOOP-2819</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Removed these deprecated methods from <tt>org.apache.hadoop.JobConf</tt>: |
| <tt><ul> |
| <li>public Class getInputKeyClass() |
| <li>public void setInputKeyClass(Class theClass) |
| <li>public Class getInputValueClass() |
| <li>public void setInputValueClass(Class theClass) |
| </ul></tt> |
| and undeprecated these methods: |
| <tt><ul> |
| <li>getSpeculativeExecution() |
| <li>public void setSpeculativeExecution(boolean speculativeExecution) |
| </ul></tt> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Added the following public methods to <tt>org.apache.hadoop.conf.Configuration</tt>: |
| <tt><ul> |
| <li>String[] Configuration.getStrings(String name, String... defaultValue) |
| <li>void Configuration.setStrings(String name, String... values) |
| </ul></tt> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| The key and value objects that are given |
| to the Combiner and Reducer are now reused between calls. This is much more |
| efficient, but the user can not assume the objects are constant. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3162">HADOOP-3162</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Deprecated the public methods <tt>org.apache.hadoop.mapred.JobConf.setInputPath(Path)</tt> and |
| <tt>org.apache.hadoop.mapred.JobConf.addInputPath(Path)</tt>. |
| <p> |
| Added the following public methods to <tt>org.apache.hadoop.mapred.FileInputFormat</tt>: |
| <tt><ul> |
| <li>public static void setInputPaths(JobConf job, Path... paths); <br> |
| <li>public static void setInputPaths(JobConf job, String commaSeparatedPaths); <br> |
| <li>public static void addInputPath(JobConf job, Path path); <br> |
| <li>public static void addInputPaths(JobConf job, String commaSeparatedPaths); <br> |
| </ul></tt> |
| Earlier code calling <tt>JobConf.setInputPath(Path)</tt> and <tt>JobConf.addInputPath(Path)</tt> |
| should now call <tt>FileInputFormat.setInputPaths(JobConf, Path...)</tt> and |
| <tt>FileInputFormat.addInputPath(Path)</tt> respectively. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Provided a new facility to |
| store job history on DFS. Cluster administrator can now provide either localFS |
| location or DFS location using configuration property |
| <tt>mapred.job.history.location</tt> to store job histroy. History will also |
| be logged in user specified location if the configuration property |
| <tt>mapred.job.history.user.location</tt> is specified. |
| <p> |
| Removed these classes and method: |
| <tt><ul> |
| <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndex |
| <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndexParseListener |
| <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseMasterIndex |
| </ul></tt> |
| <p> |
| Changed the signature of the public method |
| <tt>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File |
| jobHistoryFile, JobHistory.JobInfo job)</tt> to |
| <tt>DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile, |
| JobHistory.JobInfo job, FileSystem fs)</tt>. <p> |
| Changed the signature of the public method |
| <tt>org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l)</tt> |
| to <tt>JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs)</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Users are now provided the ability to specify what paths to ignore when processing the job input directory |
| (apart from the filenames that start with "_" and "."). |
| To do this, two new methods were defined: |
| <tt><ul> |
| <li>FileInputFormat.setInputPathFilter(JobConf, PathFilter) |
| <li>FileInputFormat.getInputPathFilter(JobConf) |
| </ul></tt> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory |
| (<tt>mapred-local/taskTracker/jobcache/$jobid/work</tt>) for use as scratch |
| space, through configuration property and system property |
| <tt>job.local.dir</tt>. The directory <tt>../work</tt> is no longer available from the task's current working directory. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a> |
| </td> |
| <td> |
| mapred |
| </td> |
| <td> |
| Added new command line options for <tt>hadoop jar</tt> command: |
| <p> |
| <tt>hadoop jar -files <comma seperated list of files> -libjars <comma |
| seperated list of jars> -archives <comma seperated list of |
| archives> </tt> |
| <p> |
| where the options have these meanings: |
| <p> |
| <ul> |
| <li><tt>-files</tt> options allows you to speficy comma seperated list of path which |
| would be present in your current working directory of your task <br> |
| <li><tt>-libjars</tt> option allows you to add jars to the classpaths of the maps and |
| reduces. <br> |
| <li><tt>-archives</tt> allows you to pass archives as arguments that are |
| unzipped/unjarred and a link with name of the jar/zip are created in the |
| current working directory if tasks. |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a> |
| </td> |
| <td> |
| record |
| </td> |
| <td> |
| Removed the deprecated methods in |
| <tt>org.apache.hadoop.record.compiler.generated.SimpleCharStream</tt>: |
| <tt><ul> |
| <li>public int getColumn() |
| <li>and public int getLine() |
| </ul></tt> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2551">HADOOP-2551</a> |
| </td> |
| <td> |
| scripts |
| </td> |
| <td> |
| Introduced new environment variables to allow finer grained control of Java options passed to server and |
| client JVMs. See the new <tt>*_OPTS</tt> variables in <tt>conf/hadoop-env.sh</tt>. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-3099">HADOOP-3099</a> |
| </td> |
| <td> |
| util |
| </td> |
| <td> |
| Added a new <tt>-p</tt> option to <tt>distcp</tt> for preserving file and directory status: |
| <pre> |
| -p[rbugp] Preserve status |
| r: replication number |
| b: block size |
| u: user |
| g: group |
| p: permission |
| </pre> |
| The <tt>-p</tt> option alone is equivalent to <tt>-prbugp</tt> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <a href="https://issues.apache.org/jira/browse/HADOOP-2821">HADOOP-2821</a> |
| </td> |
| <td> |
| util |
| </td> |
| <td> |
| Removed the deprecated classes <tt>org.apache.hadoop.util.ShellUtil</tt> and <tt>org.apache.hadoop.util.ToolBase</tt>. |
| </td> |
| </tr> |
| </tbody></table> |
| |
| </ul> |
| |
| </body></html> |