blob: 084d4efa6d3eef48e173d32b92383b770c32202c [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head>
<title>Hadoop 0.17.1 Release Notes</title></head>
<body>
<font face="sans-serif">
<h1>Hadoop 0.17.1 Release Notes</h1>
The bug fixes are listed below.
<ul><a name="changes">
<h2>Changes Since Hadoop 0.17.0</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-2159'>HADOOP-2159</a>] - Namenode stuck in safemode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3442'>HADOOP-3442</a>] - QuickSort may get into unbounded recursion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3472'>HADOOP-3472</a>] - MapFile.Reader getClosest() function returns incorrect results when before is true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3475'>HADOOP-3475</a>] - MapOutputBuffer allocates 4x as much space to record capacity as intended
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3477'>HADOOP-3477</a>] - release tar.gz contains duplicate files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3522'>HADOOP-3522</a>] - ValuesIterator.next() doesn't return a new object, thus failing many equals() tests.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3526'>HADOOP-3526</a>] - contrib/data_join doesn't work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3550'>HADOOP-3550</a>] - Reduce tasks failing with OOM
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3565'>HADOOP-3565</a>] - JavaSerialization can throw java.io.StreamCorruptedException
</li>
</ul>
</ul>
<h1>Hadoop 0.17.0 Release Notes</h1>
These release notes include new developer and user facing incompatibilities, features, and major improvements. The table below is sorted by Component.
<ul><a name="changes">
<h2>Changes Since Hadoop 0.16.4</h2>
<table border="1" width="100%" cellpadding="4">
<tbody><tr>
<td><b>Issue</b></td>
<td><b>Component</b></td>
<td><b>Notes</b></td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a>
</td>
<td>
conf
</td>
<td>
Remove these deprecated methods in
<tt>org.apache.hadoop.conf.Configuration</tt>:<br><tt><ul><li>
public Object getObject(String name) </li><li>
public void setObject(String name, Object value) </li><li>
public Object get(String name, Object defaultValue) </li><li>
public void set(String name, Object value)</li><li>public Iterator entries()
</li></ul></tt></td>
</tr>
<tr>
<td nowrap>
<a href="https://issues.apache.org/jira/browse/HADOOP-2410">HADOOP-2410</a>
</td>
<td>
contrib/ec2
</td>
<td>
The command <tt>hadoop-ec2
run</tt> has been replaced by <tt>hadoop-ec2 launch-cluster
&lt;group&gt; &lt;number of instances&gt;</tt>, and <tt>hadoop-ec2
start-hadoop</tt> has been removed since Hadoop is started on instance
start up. See <a href="http://wiki.apache.org/hadoop/AmazonEC2">http://wiki.apache.org/hadoop/AmazonEC2</a>
for details.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a>
</td>
<td>
contrib/hod
</td>
<td>
Added a provision to reliably detect a
failing script's exit code. When the HOD script option
returns a non-zero exit code, look for a <tt>script.exitcode</tt>
file written to the HOD cluster directory. If this file is present, it
means the script failed with the exit code given in the file.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2775">HADOOP-2775</a>
</td>
<td>
contrib/hod
</td>
<td>
Added A unit testing framework based on
pyunit to HOD. Developers contributing patches to HOD should now
contribute unit tests along with the patches when possible.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3137">HADOOP-3137</a>
</td>
<td>
contrib/hod
</td>
<td>
The HOD version is now the same as the Hadoop version.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2855">HADOOP-2855</a>
</td>
<td>
contrib/hod
</td>
<td>
HOD now handles relative
paths correctly for important HOD options such as the cluster directory,
tarball option, and script file.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a>
</td>
<td>
contrib/hod
</td>
<td>
HOD now cleans up the HOD generated mapred system directory
at cluster deallocation time.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a>
</td>
<td>
contrib/hod
</td>
<td>
The number of free nodes in the cluster
is computed using a better algorithm that filters out inconsistencies in
node status as reported by Torque.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2947">HADOOP-2947</a>
</td>
<td>
contrib/hod
</td>
<td>
The stdout and stderr streams of
daemons are redirected to files that are created under the hadoop log
directory. Users can now send a <tt>kill 3</tt> signal to the daemons to get stack traces
and thread dumps for debugging.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3168">HADOOP-3168</a>
</td>
<td>
contrib/streaming
</td>
<td>
Decreased the frequency of logging
in Hadoop streaming (from every 100 records to every 10,000 records).
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a>
</td>
<td>
contrib/streaming
</td>
<td>
Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is
the separator, then an empty key is assumed and the whole line is the value.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2820">HADOOP-2820</a>
</td>
<td>
contrib/streaming
</td>
<td>
Removed these deprecated classes: <br><tt><ul><li>org.apache.hadoop.streaming.StreamLineRecordReader</li><li>org.apache.hadoop.streaming.StreamOutputFormat</li><li>org.apache.hadoop.streaming.StreamSequenceRecordReader</li></ul></tt></td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3280">HADOOP-3280</a>
</td>
<td>
contrib/streaming
</td>
<td>
Added the
<tt>mapred.child.ulimit</tt> configuration variable to limit the maximum virtual memory allocated to processes launched by the
Map-Reduce framework. This can be used to control both the Mapper/Reducer
tasks and applications using Hadoop pipes, Hadoop streaming etc.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2657">HADOOP-2657</a>
</td>
<td>
dfs
</td>
<td>Added the new API <tt>DFSOututStream.flush()</tt> to
flush all outstanding data to DataNodes.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2219">HADOOP-2219</a>
</td>
<td>
dfs
</td>
<td>
Added a new <tt>fs -count</tt> command for
counting the number of bytes, files, and directories under a given path. <br>
<br>
Added a new RPC <tt>getContentSummary(String path)</tt> to ClientProtocol.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2559">HADOOP-2559</a>
</td>
<td>
dfs
</td>
<td>
Changed DFS block placement to
allocate the first replica locally, the second off-rack, and the third
intra-rack from the second.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2758">HADOOP-2758</a>
</td>
<td>
dfs
</td>
<td>
Improved DataNode CPU usage by 50% while serving data to clients.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2634">HADOOP-2634</a>
</td>
<td>
dfs
</td>
<td>
Deprecated ClientProtocol's <tt>exists()</tt> method. Use <tt>getFileInfo(String)</tt> instead.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2423">HADOOP-2423</a>
</td>
<td>
dfs
</td>
<td>
Improved <tt>FSDirectory.mkdirs(...)</tt> performance by about 50% as measured by the NNThroughputBenchmark.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3124">HADOOP-3124</a>
</td>
<td>
dfs
</td>
<td>
Made DataNode socket write timeout configurable, however the configuration variable is undocumented.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2470">HADOOP-2470</a>
</td>
<td>
dfs
</td>
<td>
Removed <tt>open()</tt> and <tt>isDir()</tt> methods from ClientProtocol without first deprecating. <br>
<br>
Remove deprecated <tt>getContentLength()</tt> from ClientProtocol.<br>
<br>
Deprecated <tt>isDirectory</tt> in DFSClient. Use <tt>getFileStatus()</tt> instead.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a>
</td>
<td>
dfs
</td>
<td>
Removed deprecated method <tt>org.apache.hadoop.ipc.Server.getUserInfo()</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a>
</td>
<td>
dfs
</td>
<td>
Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-771">HADOOP-771</a>
</td>
<td>
dfs
</td>
<td>
Added a new method to <tt>FileSystem</tt> API, <tt>delete(path, boolean)</tt>,
and deprecated the previous <tt>delete(path)</tt> method.
The new method recursively deletes files only if boolean is set to true.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3239">HADOOP-3239</a>
</td>
<td>
dfs
</td>
<td>
Modified <tt>org.apache.hadoop.dfs.FSDirectory.getFileInfo(String)</tt> to return null when a file is not
found instead of throwing FileNotFoundException.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a>
</td>
<td>
dfs
</td>
<td>
Enhanced <tt>hadoop dfs -put</tt> command to accept multiple
sources when destination is a directory.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2192">HADOOP-2192</a>
</td>
<td>
dfs
</td>
<td>
Modified <tt>hadoop dfs -mv</tt> to be closer in functionality to
the Linux <tt>mv</tt> command by removing unnecessary output and return
an error message when moving non existent files/directories.
</td>
</tr>
<tr>
<td>
<u1:p></u1:p><a href="https://issues.apache.org/jira/browse/HADOOP-1985">HADOOP-1985</a>
</td>
<td>
dfs <br>
mapred
</td>
<td>
Added rack awareness for map tasks and moves the rack resolution logic to the
NameNode and JobTracker. <p> The administrator can specify a
loadable class given by topology.node.switch.mapping.impl to specify the
class implementing the logic for rack resolution. The class must implement
a method - resolve(List&lt;String&gt; names), where names is the list of
DNS-names/IP-addresses that we want resolved. The return value is a list of
resolved network paths of the form /foo/rack, where rack is the rackID
where the node belongs to and foo is the switch where multiple racks are
connected, and so on. The default implementation of this class is packaged
along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping
and this class loads a script that can be used for rack resolution. The
script location is configurable. It is specified by
topology.script.file.name and defaults to an empty script. In the case
where the script name is empty, /default-rack is returned for all
dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides
administrators fleixibilty to define how their site's node resolution
should happen. <br>
For mapred, one can also specify the level of the cache w.r.t the number of
levels in the resolved network path - defaults to two. This means that the
JobTracker will cache tasks at the host level and at the rack level. <br>
Known issue: the task caching will not work with levels greater than 2
(beyond racks). This bug is tracked in <a href="https://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2063">HADOOP-2063</a>
</td>
<td>
fs
</td>
<td>
Added a new option <tt>-ignoreCrc</tt> to <tt>fs -get</tt> and <tt>fs -copyToLocal</tt>. The option causes CRC checksums to be
ignored for this command so that corrupt files may be downloaded.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3001">HADOOP-3001</a>
</td>
<td>
fs
</td>
<td>
Added a new Map/Reduce framework
counters that track the number of bytes read and written to HDFS, local,
KFS, and S3 file systems.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a>
</td>
<td>
fs
</td>
<td>
Added a new FileSystem method <tt>getFileBlockLocations</tt> to return the number of bytes in each block in a file
via a single rpc to the NameNode. Deprecated <tt>getFileCacheHints</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2839">HADOOP-2839</a>
</td>
<td>
fs
</td>
<td>
Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.globPaths()</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a>
</td>
<td>
fs
</td>
<td>
Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.listPaths()</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-1593">HADOOP-1593</a>
</td>
<td>
fs
</td>
<td>
Modified FSShell commands to accept non-default paths. Now you can commands like <tt>hadoop dfs -ls hdfs://remotehost1:port/path</tt>
and <tt>hadoop dfs -ls hdfs://remotehost2:port/path</tt> without changing your Hadoop config.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3048">HADOOP-3048</a>
</td>
<td>
io
</td>
<td>
Added a new API and a default
implementation to convert and restore serializations of objects to strings.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3152">HADOOP-3152</a>
</td>
<td>
io
</td>
<td>
Add a static method
<tt>MapFile.setIndexInterval(Configuration, int interval)</tt> so that Map/Reduce
jobs using <tt>MapFileOutputFormat</tt> can set the index interval.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a>
</td>
<td>
ipc
</td>
<td>
<tt>SocketOutputStream.close()</tt> now closes the
underlying channel. This increase compatibility with
<tt>java.net.Socket.getOutputStream</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3041">HADOOP-3041</a>
</td>
<td>
mapred
</td>
<td>
Deprecated <tt>JobConf.setOutputPath</tt> and <tt>JobConf.getOutputPath</tt>.<p>
Deprecated <tt>OutputFormatBase</tt>. Added <tt>FileOutputFormat</tt>. Existing output
formats extending <tt>OutputFormatBase</tt> now extend <tt>FileOutputFormat</tt>. <p>
Added the following methods to <tt>FileOutputFormat</tt>:
<tt><ul>
<li>public static void setOutputPath(JobConf conf, Path outputDir)
<li>public static Path getOutputPath(JobConf conf)
<li>public static Path getWorkOutputPath(JobConf conf)
<li>static void setWorkOutputPath(JobConf conf, Path outputDir)
</ul></tt>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3204">HADOOP-3204</a>
</td>
<td>
mapred
</td>
<td>
Fixed <tt>ReduceTask.LocalFSMerger</tt> to handle errors and exceptions better. Prior to this all
exceptions except IOException would be silently ignored.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-1986">HADOOP-1986</a>
</td>
<td>
mapred
</td>
<td>
Programs that implement the raw
<tt>Mapper</tt> or <tt>Reducer</tt> interfaces will need modification to compile with this
release. For example, <p>
<pre>
class MyMapper implements Mapper {
public void map(WritableComparable key, Writable val,
OutputCollector out, Reporter reporter) throws IOException {
// ...
}
// ...
}
</pre>
will need to be changed to refer to the parameterized type. For example: <p>
<pre>
class MyMapper implements Mapper&lt;WritableComparable, Writable, WritableComparable, Writable&gt; {
public void map(WritableComparable key, Writable val,
OutputCollector&lt;WritableComparable, Writable&gt;
out, Reporter reporter) throws IOException {
// ...
}
// ...
}
</pre>
Similarly implementations of the following raw interfaces will need
modification:
<tt><ul>
<li>InputFormat
<li>OutputCollector
<li>OutputFormat
<li>Partitioner
<li>RecordReader
<li>RecordWriter
</ul></tt>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a>
</td>
<td>
mapred
</td>
<td>
Reducers now perform merges of
shuffle data (both in-memory and on disk) while fetching map outputs.
Earlier, during shuffle they used to merge only the in-memory outputs.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>
</td>
<td>
mapred
</td>
<td>
Removed the deprecated classes <tt>org.apache.hadoop.mapred.InputFormatBase</tt>
and <tt>org.apache.hadoop.mapred.PhasedFileSystem</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2817">HADOOP-2817</a>
</td>
<td>
mapred
</td>
<td>
Removed the deprecated method
<tt>org.apache.hadoop.mapred.ClusterStatus.getMaxTasks()</tt>
and the deprecated configuration property <tt>mapred.tasktracker.tasks.maximum</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2825">HADOOP-2825</a>
</td>
<td>
mapred
</td>
<td>
Removed the deprecated method
<tt>org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path
localFilename, int reduce, Progressable pingee, int timeout)</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a>
</td>
<td>
mapred
</td>
<td>
Removed the deprecated methods
<tt>org.apache.hadoop.mapred.Counters.getDisplayName(String counter)</tt> and
<tt>org.apache.hadoop.mapred.Counters.getCounterNames()</tt>.
Undeprecated the method
<tt>org.apache.hadoop.mapred.Counters.getCounter(String counterName)</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2826">HADOOP-2826</a>
</td>
<td>
mapred
</td>
<td>
Changed The signature of the method
<tt>public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream)</tt> to
<tt>UTF8ByteArrayUtils.readLIne(LineReader, Text)</tt>. Since the old
signature is not deprecated, any code using the old method must be changed
to use the new method.
<p>
Removed the deprecated methods <tt>org.apache.hadoop.mapred.FileSplit.getFile()</tt>
and <tt>org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in,
OutputStream out)</tt>.
<p>
Made the constructor <tt>org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration
conf)</tt> public.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2819">HADOOP-2819</a>
</td>
<td>
mapred
</td>
<td>
Removed these deprecated methods from <tt>org.apache.hadoop.JobConf</tt>:
<tt><ul>
<li>public Class getInputKeyClass()
<li>public void setInputKeyClass(Class theClass)
<li>public Class getInputValueClass()
<li>public void setInputValueClass(Class theClass)
</ul></tt>
and undeprecated these methods:
<tt><ul>
<li>getSpeculativeExecution()
<li>public void setSpeculativeExecution(boolean speculativeExecution)
</ul></tt>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a>
</td>
<td>
mapred
</td>
<td>
Added the following public methods to <tt>org.apache.hadoop.conf.Configuration</tt>:
<tt><ul>
<li>String[] Configuration.getStrings(String name, String... defaultValue)
<li>void Configuration.setStrings(String name, String... values)
</ul></tt>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a>
</td>
<td>
mapred
</td>
<td>
The key and value objects that are given
to the Combiner and Reducer are now reused between calls. This is much more
efficient, but the user can not assume the objects are constant.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3162">HADOOP-3162</a>
</td>
<td>
mapred
</td>
<td>
Deprecated the public methods <tt>org.apache.hadoop.mapred.JobConf.setInputPath(Path)</tt> and
<tt>org.apache.hadoop.mapred.JobConf.addInputPath(Path)</tt>.
<p>
Added the following public methods to <tt>org.apache.hadoop.mapred.FileInputFormat</tt>:
<tt><ul>
<li>public static void setInputPaths(JobConf job, Path... paths); <br>
<li>public static void setInputPaths(JobConf job, String commaSeparatedPaths); <br>
<li>public static void addInputPath(JobConf job, Path path); <br>
<li>public static void addInputPaths(JobConf job, String commaSeparatedPaths); <br>
</ul></tt>
Earlier code calling <tt>JobConf.setInputPath(Path)</tt> and <tt>JobConf.addInputPath(Path)</tt>
should now call <tt>FileInputFormat.setInputPaths(JobConf, Path...)</tt> and
<tt>FileInputFormat.addInputPath(Path)</tt> respectively.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a>
</td>
<td>
mapred
</td>
<td>
Provided a new facility to
store job history on DFS. Cluster administrator can now provide either localFS
location or DFS location using configuration property
<tt>mapred.job.history.location</tt> to store job histroy. History will also
be logged in user specified location if the configuration property
<tt>mapred.job.history.user.location</tt> is specified.
<p>
Removed these classes and method:
<tt><ul>
<li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndex
<li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndexParseListener
<li>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseMasterIndex
</ul></tt>
<p>
Changed the signature of the public method
<tt>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File
jobHistoryFile, JobHistory.JobInfo job)</tt> to
<tt>DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile,
JobHistory.JobInfo job, FileSystem fs)</tt>. <p>
Changed the signature of the public method
<tt>org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l)</tt>
to <tt>JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs)</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a>
</td>
<td>
mapred
</td>
<td>
Users are now provided the ability to specify what paths to ignore when processing the job input directory
(apart from the filenames that start with "_" and ".").
To do this, two new methods were defined:
<tt><ul>
<li>FileInputFormat.setInputPathFilter(JobConf, PathFilter)
<li>FileInputFormat.getInputPathFilter(JobConf)
</ul></tt>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a>
</td>
<td>
mapred
</td>
<td>
Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory
(<tt>mapred-local/taskTracker/jobcache/$jobid/work</tt>) for use as scratch
space, through configuration property and system property
<tt>job.local.dir</tt>. The directory <tt>../work</tt> is no longer available from the task's current working directory.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a>
</td>
<td>
mapred
</td>
<td>
Added new command line options for <tt>hadoop jar</tt> command:
<p>
<tt>hadoop jar -files &lt;comma seperated list of files&gt; -libjars &lt;comma
seperated list of jars&gt; -archives &lt;comma seperated list of
archives&gt; </tt>
<p>
where the options have these meanings:
<p>
<ul>
<li><tt>-files</tt> options allows you to speficy comma seperated list of path which
would be present in your current working directory of your task <br>
<li><tt>-libjars</tt> option allows you to add jars to the classpaths of the maps and
reduces. <br>
<li><tt>-archives</tt> allows you to pass archives as arguments that are
unzipped/unjarred and a link with name of the jar/zip are created in the
current working directory if tasks.
</ul>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a>
</td>
<td>
record
</td>
<td>
Removed the deprecated methods in
<tt>org.apache.hadoop.record.compiler.generated.SimpleCharStream</tt>:
<tt><ul>
<li>public int getColumn()
<li>and public int getLine()
</ul></tt>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2551">HADOOP-2551</a>
</td>
<td>
scripts
</td>
<td>
Introduced new environment variables to allow finer grained control of Java options passed to server and
client JVMs. See the new <tt>*_OPTS</tt> variables in <tt>conf/hadoop-env.sh</tt>.
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-3099">HADOOP-3099</a>
</td>
<td>
util
</td>
<td>
Added a new <tt>-p</tt> option to <tt>distcp</tt> for preserving file and directory status:
<pre>
-p[rbugp] Preserve status
r: replication number
b: block size
u: user
g: group
p: permission
</pre>
The <tt>-p</tt> option alone is equivalent to <tt>-prbugp</tt>
</td>
</tr>
<tr>
<td>
<a href="https://issues.apache.org/jira/browse/HADOOP-2821">HADOOP-2821</a>
</td>
<td>
util
</td>
<td>
Removed the deprecated classes <tt>org.apache.hadoop.util.ShellUtil</tt> and <tt>org.apache.hadoop.util.ToolBase</tt>.
</td>
</tr>
</tbody></table>
</ul>
</body></html>