blob: ea071dd004e1110e81fa2f375bcc73324b68afe0 [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
<h1>Hadoop Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop</h2>
<li> <a href="">MAPREDUCE-2846</a>.
Blocker bug reported by aw and fixed by owen.omalley (task, task-controller, tasktracker)<br>
<b>a small % of all tasks fail with DefaultTaskController</b><br>
<blockquote>Fixed a race condition in writing the log index file that caused tasks to &apos;fail&apos;.</blockquote></li>
<li> <a href="">MAPREDUCE-2804</a>.
Blocker bug reported by aw and fixed by owen.omalley <br>
<b>&quot;Creation of symlink to attempt log dir failed.&quot; message is not useful</b><br>
<blockquote>Removed duplicate chmods of job log dir that were vulnerable to race conditions between tasks. Also improved the messages when the symlinks failed to be created.</blockquote></li>
<li> <a href="">MAPREDUCE-2651</a>.
Major bug reported by bharathm and fixed by bharathm (task-controller)<br>
<b>Race condition in Linux Task Controller for job log directory creation</b><br>
<blockquote>There is a rare race condition in linux task controller when concurrent task processes tries to create job log directory at the same time. </blockquote></li>
<li> <a href="">MAPREDUCE-2621</a>.
Minor bug reported by sherri_chen and fixed by sherri_chen <br>
<b>TestCapacityScheduler fails with &quot;Queue &quot;q1&quot; does not exist&quot;</b><br>
<blockquote>{quote}<br>Error Message<br><br>Queue &quot;q1&quot; does not exist<br><br>Stacktrace<br><br> Queue &quot;q1&quot; does not exist<br> at org.apache.hadoop.mapred.JobInProgress.&lt;init&gt;(<br> at org.apache.hadoop.mapred.TestCapacityScheduler$FakeJobInProgress.&lt;init&gt;(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.submitJobAndInit(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.testMultiTaskAssignmentInMultipleQueues(<br>{quote}<br><br>When queue name is invalid, an exception is thrown now. <br><br></blockquote></li>
<li> <a href="">MAPREDUCE-2558</a>.
Major new feature reported by naisbitt and fixed by naisbitt (jobtracker)<br>
<b>Add queue-level metrics 0.20-security branch</b><br>
<blockquote>We would like to record and present the jobtracker metrics on a per-queue basis.</blockquote></li>
<li> <a href="">MAPREDUCE-2555</a>.
Minor bug reported by tgraves and fixed by tgraves (tasktracker)<br>
<b>JvmInvalidate errors in the gridmix TT logs</b><br>
<blockquote>Observing a lot of jvmValidate exceptions in TT logs for grid mix run<br><br><br><br>************************<br>2011-04-28 02:00:37,578 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 46121, call<br>statusUpdate(attempt_201104270735_5993_m_003305_0, org.apache.hadoop.mapred.MapTaskStatus@1840a9c,<br>org.apache.hadoop.mapred.JvmContext@1d4ab6b) from error: JvmValidate Failed.<br>Ignoring request from task: attempt_201104270735_5993_m_003305_0, with JvmId:<br>jvm_201104270735_5993_m_103399012gsbl20430: JvmValidate Failed. Ignoring request from task:<br>attempt_201104270735_5993_m_003305_0, with JvmId: jvm_201104270735_5993_m_103399012gsbl20430: --<br> at org.apache.hadoop.ipc.Server$Handler$<br> at Method)<br> at<br> at<br> at org.apache.hadoop.ipc.Server$<br><br><br>*********************<br><br></blockquote></li>
<li> <a href="">MAPREDUCE-2529</a>.
Major bug reported by tgraves and fixed by tgraves (tasktracker)<br>
<b>Recognize Jetty bug 1342 and handle it</b><br>
<blockquote>Added 2 new config parameters:<br><br><br><br>mapreduce.reduce.shuffle.catch.exception.stack.regex<br><br>mapreduce.reduce.shuffle.catch.exception.message.regex</blockquote></li>
<li> <a href="">MAPREDUCE-2524</a>.
Minor improvement reported by tgraves and fixed by tgraves (tasktracker)<br>
<b>Backport trunk heuristics for failing maps when we get fetch failures retrieving map output during shuffle</b><br>
<blockquote>Added a new configuration option: mapreduce.reduce.shuffle.maxfetchfailures, and removed a no longer used option: mapred.reduce.copy.backoff.</blockquote></li>
<li> <a href="">MAPREDUCE-2514</a>.
Trivial bug reported by jeagles and fixed by jeagles (tasktracker)<br>
<b>ReinitTrackerAction class name misspelled RenitTrackerAction in task tracker log</b><br>
<li> <a href="">MAPREDUCE-2495</a>.
Minor improvement reported by revans2 and fixed by revans2 (distributed-cache)<br>
<b>The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason</b><br>
<blockquote>The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. </blockquote></li>
<li> <a href="">MAPREDUCE-2490</a>.
Trivial improvement reported by jeagles and fixed by jeagles (jobtracker)<br>
<b>Log blacklist debug count</b><br>
<blockquote>Gain some insight into blacklist increments/decrements by enhancing the debug logging</blockquote></li>
<li> <a href="">MAPREDUCE-2479</a>.
Major improvement reported by revans2 and fixed by revans2 (tasktracker)<br>
<b>Backport MAPREDUCE-1568 to hadoop security branch</b><br>
<blockquote>Added mapreduce.tasktracker.distributedcache.checkperiod to the task tracker that defined the period to wait while cleaning up the distributed cache. The default is 1 min.</blockquote></li>
<li> <a href="">MAPREDUCE-2456</a>.
Trivial improvement reported by naisbitt and fixed by naisbitt (jobtracker)<br>
<b>Show the reducer taskid and map/reduce tasktrackers for &quot;Failed fetch notification #_ for task attempt...&quot; log messages</b><br>
<blockquote>This jira is to provide more useful log information for debugging the &quot;Too many fetch-failures&quot; error.<br><br>Looking at the JobTracker node, we see messages like this:<br>&quot;2010-12-14 00:00:06,911 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #8 for task<br>attempt_201011300729_189729_m_007458_0&quot;.<br><br>I would be useful to see which reducer is reporting the error here.<br><br>So, I propose we add the following to these log messages:<br> 1. reduce task ID<br> 2. TaskTracker nodenames for both the mapper and the reducer<br></blockquote></li>
<li> <a href="">MAPREDUCE-2451</a>.
Trivial bug reported by tgraves and fixed by tgraves (jobtracker)<br>
<b>Log the reason string of healthcheck script</b><br>
<blockquote>The information on why a specific TaskTracker got blacklisted is not stored anywhere. The jobtracker web ui will show the detailed reason string until the TT gets unblacklisted. After that it is lost.</blockquote></li>
<li> <a href="">MAPREDUCE-2447</a>.
Minor bug reported by sseth and fixed by sseth <br>
<b>Set JvmContext sooner for a task - MR2429</b><br>
<blockquote>TaskTracker.validateJVM() is throwing NPE when setupWorkDir() throws IOException. This is because<br>taskFinal.setJvmContext() is not executed yet</blockquote></li>
<li> <a href="">MAPREDUCE-2443</a>.
Minor bug reported by sseth and fixed by sseth (test)<br>
<b>Fix FI build - broken after MR-2429</b><br>
<blockquote>src/test/system/aop/org/apache/hadoop/mapred/TaskAspect.aj:72 [warning] advice defined in org.apache.hadoop.mapred.TaskAspect has not been applied [Xlint:adviceDidNotMatch]<br><br>After the fix in MR-2429, the call to ping in TaskAspect needs to be fixed.</blockquote></li>
<li> <a href="">MAPREDUCE-2429</a>.
Major bug reported by acmurthy and fixed by sseth (tasktracker)<br>
<b>Check jvmid during task status report</b><br>
<blockquote>Currently TT doens&apos;t check to ensure jvmid is relevant during communication with the Child via TaskUmbilicalProtocol.</blockquote></li>
<li> <a href="">MAPREDUCE-2418</a>.
Minor bug reported by sseth and fixed by sseth <br>
<b>Errors not shown in the JobHistory servlet (specifically Counter Limit Exceeded)</b><br>
<blockquote>Job error details are not displayed in the JobHistory servlet. e.g. Errors like &apos;Counter limit exceeded for a job&apos;. <br>jobdetails.jsp has &apos;Failure Info&apos;, but this is missing in jobdetailshistory.jsp</blockquote></li>
<li> <a href="">MAPREDUCE-2415</a>.
Major sub-task reported by bharathm and fixed by bharathm (task-controller, tasktracker)<br>
<b>Distribute TaskTracker userlogs onto multiple disks</b><br>
<blockquote>Currently, userlogs directory in TaskTracker is placed under hadoop.log.dir like &lt;hadoop.log.dir&gt;/userlogs. I am proposing to spread these userlogs onto multiple configured mapred.local.dirs to strengthen TaskTracker reliability w.r.t disk failures. </blockquote></li>
<li> <a href="">MAPREDUCE-2413</a>.
Major sub-task reported by bharathm and fixed by ravidotg (task-controller, tasktracker)<br>
<b>TaskTracker should handle disk failures at both startup and runtime</b><br>
<blockquote>At present, TaskTracker doesn&apos;t handle disk failures properly both at startup and runtime.<br><br>(1) Currently TaskTracker doesn&apos;t come up if any of the mapred-local-dirs is on a bad disk. TaskTracker should ignore that particular mapred-local-dir and start up and use only the remaining good mapred-local-dirs.<br>(2) If a disk goes bad while TaskTracker is running, currently TaskTracker doesn&apos;t do anything special. This results in either<br> (a) TaskTracker continues to &quot;try to use that bad disk&quot; and this results in lots of task failures and possibly job failures(because of multiple TTs having bad disks) and eventually these TTs getting graylisted for all jobs. And this needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. OR<br> (b) Health check script identifying the disk as bad and the TT gets blacklisted. And this also needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk.<br><br>This JIRA is to make TaskTracker more fault-tolerant to disk failures solving (1) and (2). i.e. TT should start even if at least one of the mapred-local-dirs is on a good disk and TT should adjust its in-memory list of mapred-local-dirs and avoid using bad mapred-local-dirs.<br></blockquote></li>
<li> <a href="">MAPREDUCE-2411</a>.
Minor bug reported by dking and fixed by dking <br>
<b>When you submit a job to a queue with no ACLs you get an inscrutible NPE</b><br>
<blockquote>With this patch we&apos;ll check for that, and print a message in the logs. Then at submission time you find out about it.</blockquote></li>
<li> <a href="">MAPREDUCE-2409</a>.
Major bug reported by sseth and fixed by sseth (distributed-cache)<br>
<b>Distributed Cache does not differentiate between file /archive for files with the same path</b><br>
<blockquote>If a &apos;global&apos; file is specified as a &apos;file&apos; by one job - subsequent jobs cannot override this source file to be an &apos;archive&apos; (until the TT cleans up it&apos;s cache or a TT restart).<br>The other way around as well -&gt; &apos;archive&apos; to &apos;file&apos;<br><br>In case of an accidental submission using the wrong type - some of the tasks for the second job will end up seeing the source file as an archive, others as a file.</blockquote></li>
<li> <a href="">MAPREDUCE-2366</a>.
Major bug reported by owen.omalley and fixed by dking (tasktracker)<br>
<b>TaskTracker can&apos;t retrieve stdout and stderr from web UI</b><br>
<blockquote>Problem where the task browser UI can&apos;t retrieve the stdxxx printouts of streaming jobs that abend in the unix code, in the common case where the containing job doesn&apos;t reuse JVM&apos;s.</blockquote></li>
<li> <a href="">MAPREDUCE-2364</a>.
Major bug reported by owen.omalley and fixed by devaraj (tasktracker)<br>
<b>Shouldn&apos;t hold lock on rjob while localizing resources.</b><br>
<blockquote>There is a deadlock while localizing resources on the TaskTracker.</blockquote></li>
<li> <a href="">MAPREDUCE-2362</a>.
Major bug reported by owen.omalley and fixed by roelofs (test)<br>
<b>Unit test failures: TestBadRecords and TestTaskTrackerMemoryManager</b><br>
<blockquote>Fix unit-test failures: TestBadRecords (NPE due to rearranged MapTask code) and TestTaskTrackerMemoryManager (need hostname in output-string pattern).</blockquote></li>
<li> <a href="">MAPREDUCE-2360</a>.
Major bug reported by owen.omalley and fixed by (client)<br>
<b>Pig fails when using non-default FileSystem</b><br>
<blockquote>The job client strips the file system from the user&apos;s job jar, which causes breakage when it isn&apos;t the default file system.</blockquote></li>
<li> <a href="">MAPREDUCE-2359</a>.
Major bug reported by owen.omalley and fixed by ramach <br>
<b>Distributed cache doesn&apos;t use non-default FileSystems correctly</b><br>
<blockquote>We are passing as viewfs:/// in core site.xml on oozie server.<br>We have default name node in configuration also viewfs:///<br><br>We are using hdfs://path in our path for application.<br>Its giving following error:<br><br>IllegalArgumentException: Wrong FS:<br>hdfs://nn/user/strat_ci/oozie-oozi/0000002-110217014830452-oozie-oozi-W/hadoop1--map-reduce/map-reduce-launcher.jar,<br>expected: viewfs:/</blockquote></li>
<li> <a href="">MAPREDUCE-2358</a>.
Major bug reported by owen.omalley and fixed by ramach <br>
<b>MapReduce assumes HDFS as the default filesystem</b><br>
<blockquote>Mapred assumes hdfs as the default fs even when defined otherwise.</blockquote></li>
<li> <a href="">MAPREDUCE-2357</a>.
Major bug reported by owen.omalley and fixed by vicaya (task)<br>
<b>When extending inputsplit (non-FileSplit), all exceptions are ignored</b><br>
<blockquote>if you&apos;re using a custom RecordReader/InputFormat setup and using an<br>InputSplit that does NOT extend FileSplit, then any exceptions you throw in your RecordReader.nextKeyValue() function<br>are silently ignored.</blockquote></li>
<li> <a href="">MAPREDUCE-2356</a>.
Major bug reported by owen.omalley and fixed by vicaya <br>
<b>A task succeeded even though there were errors on all attempts.</b><br>
<blockquote>From Luke Lu:<br><br>Here is a summary of why the failed map task was considered &quot;successful&quot; (Thanks to Mahadev, Arun and Devaraj<br>for insightful discussions).<br><br>1. The map task was hanging BEFORE being initialized (probably in localization, but it doesn&apos;t matter in this case).<br>Its state is UNASSIGNED.<br><br>2. The jt decided to kill it due to timeout and scheduled a cleanup task on the same node.<br><br>3. The cleanup task has the same attempt id (by design.) but runs in a different JVM. Its initial state is<br>FAILED_UNCLEAN.<br><br>4. The JVM of the original attempt is getting killed, while proceeding to setupWorkDir and throwed an<br>IllegalStateException while FileSystem.getLocal, which causes taskFinal.taskCleanup being called in Child, and<br>triggered the NPE due to the task is not yet initialized (committer is null). Before the NPE, however it sent a<br>statusUpdate to TT, and in tip.reportProgress, changed the task state (currently FAILED_UNCLEAN) to UNASSIGNED.<br><br>5. The cleanup attempt succeeded and report done to TT. In tip.reportDone, the isCleanup() check returned false due to<br>the UNASSIGNED state and set the task state as SUCCEEDED.<br></blockquote></li>
<li> <a href="">MAPREDUCE-517</a>.
Critical bug reported by acmurthy and fixed by acmurthy <br>
<b>The capacity-scheduler should assign multiple tasks per heartbeat</b><br>
<blockquote>HADOOP-3136 changed the default o.a.h.mapred.JobQueueTaskScheduler to assign multiple tasks per TaskTracker heartbeat, the capacity-scheduler should do the same.</blockquote></li>
<li> <a href="">MAPREDUCE-118</a>.
Blocker bug reported by amar_kamat and fixed by amareshwari (client)<br>
<b>Job.getJobID() will always return null</b><br>
<blockquote>JobContext is used for a read-only view of job&apos;s info. Hence all the readonly fields in JobContext are set in the constructor. Job extends JobContext. When a Job is created, jobid is not known and hence there is no way to set JobID once Job is created. JobID is obtained only when the JobClient queries the jobTracker for a job-id., which happens later i.e upon job submission.</blockquote></li>
<li> <a href="">HDFS-2218</a>.
Blocker test reported by mattf and fixed by mattf (contrib/hdfsproxy, test)<br>
<b>Disable TestHdfsProxy.testHdfsProxyInterface in automated test suite for 0.20-security-204 release</b><br>
<blockquote>Test case TestHdfsProxy.testHdfsProxyInterface has been temporarily disabled for this release, due to failure in the Hudson automated test environment.</blockquote></li>
<li> <a href="">HDFS-2057</a>.
Major bug reported by bharathm and fixed by bharathm (data-node)<br>
<b>Wait time to terminate the threads causing unit tests to take longer time</b><br>
<blockquote>As a part of datanode process hang, this part of code was introduced in 0.20.204 to clean up all the waiting threads.<br><br>- try {<br>- readPool.awaitTermination(10, TimeUnit.SECONDS);<br>- } catch (InterruptedException e) {<br>-;Exception occured in doStop:&quot; + e.getMessage());<br>- }<br>- readPool.shutdownNow();<br><br>This was clearly meant for production, but all the unit tests uses minidfscluster and minimrcluster for shutdown which waits on this part of the code. Due to this, we saw increase in unit test run times. So removing this code. <br></blockquote></li>
<li> <a href="">HDFS-2044</a>.
Major test reported by mattf and fixed by mattf (test)<br>
<b>TestQueueProcessingStatistics failing automatic test due to timing issues</b><br>
<blockquote>The test makes assumptions about timing issues that hold true in workstation environments but not in Hudson auto-test.</blockquote></li>
<li> <a href="">HDFS-2023</a>.
Major bug reported by bharathm and fixed by bharathm (data-node)<br>
<b>Backport of NPE for File.list and File.listFiles</b><br>
<blockquote>Since we have multiple Jira&apos;s in trunk for common and hdfs, I am creating another jira for this issue. <br><br>This patch addresses the following:<br><br>1. Provides FileUtil API for list and listFiles which throws IOException for null cases. <br>2. Replaces most of the code where JDK file API with FileUtil API. </blockquote></li>
<li> <a href="">HDFS-1878</a>.
Minor bug reported by mattf and fixed by mattf (name-node)<br>
<b>TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence</b><br>
<blockquote>In 20.204, TestHDFSServerPorts was observed to intermittently throw a NullPointerException. This only happens when FSNamesystem.close() is called, which means system termination for the Namenode, so this is not a serious bug for .204. TestHDFSServerPorts is more likely than normal execution to stimulate the race, because it runs two Namenodes in the same JVM, causing more interleaving and more potential to see a race condition.<br><br>The race is in FSNamesystem.close(), line 566, we have:<br> if (replthread != null) replthread.interrupt();<br> if (replmon != null) replmon = null;<br><br>Since the interrupted replthread is not waited on, there is a potential race condition with replmon being nulled before replthread is dead, but replthread references replmon in computeDatanodeWork() where the NullPointerException occurs.<br><br>The solution is either to wait on replthread or just don&apos;t null replmon. The latter is preferred, since none of the sibling Namenode processing threads are waited on in close().<br><br>I&apos;ll attach a patch for .205.<br></blockquote></li>
<li> <a href="">HDFS-1822</a>.
Blocker bug reported by sureshms and fixed by sureshms (name-node)<br>
<b>Editlog opcodes overlap between 20 security and later releases</b><br>
<blockquote>Same opcode are used for different operations between, 0.22 and 0.23. This results in failure to load editlogs on later release, especially during upgrades.</blockquote></li>
<li> <a href="">HDFS-1773</a>.
Minor improvement reported by tanping and fixed by tanping (name-node)<br>
<b>Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists</b><br>
<blockquote>Our service engineering team who operates the clusters on a daily basis founds it is confusing that after a data node is decommissioned, there is no way to make the cluster forget about this data node and it always remains in the dead node list.</blockquote></li>
<li> <a href="">HDFS-1767</a>.
Major sub-task reported by mattf and fixed by mattf (data-node)<br>
<b>Namenode should ignore non-initial block reports from datanodes when in safemode during startup</b><br>
<blockquote>Consider a large cluster that takes 40 minutes to start up. The datanodes compete to register and send their Initial Block Reports (IBRs) as fast as they can after startup (subject to a small sub-two-minute random delay, which isn&apos;t relevant to this discussion). <br><br>As each datanode succeeds in sending its IBR, it schedules the starting time for its regular cycle of reports, every hour (or other configured value of dfs.blockreport.intervalMsec). In order to spread the reports evenly across the block report interval, each datanode picks a random fraction of that interval, for the starting point of its regular report cycle. For example, if a particular datanode ends up randomly selecting 18 minutes after the hour, then that datanode will send a Block Report at 18 minutes after the hour every hour as long as it remains up. Other datanodes will start their cycles at other randomly selected times. This code is in DataNode.blockReport() and DataNode.scheduleBlockReport().<br><br>The &quot;second Block Report&quot; (2BR), is the start of these hourly reports. The problem is that some of these 2BRs get scheduled sooner rather than later, and actually occur within the startup period. For example, if the cluster takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that succeed in sending their IBRs during the first 10 minutes, between 1/2 and 2/3 of them will send their 2BR before the 40-minute startup time has completed!<br><br>2BRs sent within the startup time actually compete with the remaining IBRs, and thereby slow down the overall startup process. This can be seen in the following data, which shows the startup process for a 3700-node cluster that took about 17 minutes to finish startup:<br><br>{noformat}<br> time starts sum regs sum IBR sum 2nd_BR sum total_BRs/min<br>0 1299799498 3042 3042 1969 1969 151 151 0 151<br>1 1299799558 665 3707 1470 3439 248 399 0 248<br>2 1299799618 3707 224 3663 270 669 0 270<br>3 1299799678 3707 14 3677 261 930 3 3 264<br>4 1299799738 3707 23 3700 288 1218 1 4 289<br>5 1299799798 3707 7 3707 258 1476 3 7 261<br>6 1299799858 3707 3707 317 1793 4 11 321<br>7 1299799918 3707 3707 292 2085 6 17 298<br>8 1299799978 3707 3707 292 2377 8 25 300<br>9 1299800038 3707 3707 272 2649 25 272<br>10 1299800098 3707 3707 280 2929 15 40 295<br>11 1299800158 3707 3707 223 3152 14 54 237<br>12 1299800218 3707 3707 143 3295 54 143<br>13 1299800278 3707 3707 141 3436 20 74 161<br>14 1299800338 3707 3707 195 3631 78 152 273<br>15 1299800398 3707 3707 51 3682 209 361 260<br>16 1299800458 3707 3707 25 3707 369 730 394<br>17 1299800518 3707 3707 3707 166 896 166<br>18 1299800578 3707 3707 3707 72 968 72<br>19 1299800638 3707 3707 3707 67 1035 67<br>20 1299800698 3707 3707 3707 75 1110 75<br>21 1299800758 3707 3707 3707 71 1181 71<br>22 1299800818 3707 3707 3707 67 1248 67<br>23 1299800878 3707 3707 3707 62 1310 62<br>24 1299800938 3707 3707 3707 56 1366 56<br>25 1299800998 3707 3707 3707 60 1426 60<br>{noformat}<br><br>This data was harvested from the startup logs of all the datanodes, and correlated into one-minute buckets. Each row of the table represents the progress during one elapsed minute of clock time. It seems that every cluster startup is different, but this one showed the effect fairly well.<br><br>The &quot;starts&quot; column shows that all the nodes started up within the first 2 minutes, and the &quot;regs&quot; column shows that all succeeded in registering by minute 6. The IBR column shows a sustained rate of Initial Block Report processing of 250-300/minute for the first 10 minutes.<br><br>The question is why, during minutes 11 through 16, the rate of IBR processing slowed down. Why didn&apos;t the startup just finish? In the &quot;2nd_BR&quot; column, we see the rate of 2BRs ramping up as more datanodes complete their IBRs. As the rate increases, they become more effective at competing with the IBRs, and slow down the IBR processing even more. After the IBRs finally finish in minute 16, the rate of 2BRs settles down to a steady ~60-70/minute.<br><br>In order to decrease competition for locks and other resources, to speed up IBR processing during startup, we propose to delay 2BRs until later into the cycle.</blockquote></li>
<li> <a href="">HDFS-1758</a>.
Minor bug reported by tanping and fixed by tanping (tools)<br>
<b>Web UI JSP pages thread safety issue</b><br>
<blockquote>The set of JSP pages that web UI uses are not thread safe. We have observed some problems when requesting Live/Dead/Decommissioning pages from the web UI, incorrect page is displayed. To be more specific, requesting Dead node list page, sometimes, Live node page is returned. Requesting decommissioning page, sometimes, dead page is returned.<br><br>The root cause of this problem is that JSP page is not thread safe by default. When multiple requests come in, each request is assigned to a different thread, multiple threads access the same instance of the servlet class resulted from a JSP page. A class variable is shared by multiple threads. The JSP code in 20 branche, for example, dfsnodelist.jsp has<br>{code}<br>&lt;!%<br> int rowNum = 0;<br> int colNum = 0;<br> String sorterField = null;<br> String sorterOrder = null;<br> String whatNodes = &quot;LIVE&quot;;<br> ...<br>%&gt;<br>{code}<br><br>declared as class variables. ( These set of variables are declared within &lt;%! code %&gt; directives which made them class members. ) Multiple threads share the same set of class member variables, one request would step on anther&apos;s toe. <br><br>However, due to the JSP code refactor, HADOOP-5857, all of these class member variables are moved to become function local variables. So this bug does not appear in Apache trunk. Hence, we have proposed to take a simple fix for this bug on 20 branch alone, to be more specific, branch-0.20-security.<br><br>The simple fix is to add jsp ThreadSafe=&quot;false&quot; directive into the related JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. only on request is processed at each time. <br><br>We did evaluate the thread safety issue for other JSP pages on trunk, we noticed a potential problem is that when we retrieving some statistics from namenode, for example, we make the call to <br>{code}<br>NamenodeJspHelper.getInodeLimitText(fsn);<br>{code}<br>in dfshealth.jsp, which eventuality is <br><br>{code}<br> static String getInodeLimitText(FSNamesystem fsn) {<br> long inodes = fsn.dir.totalInodes();<br> long blocks = fsn.getBlocksTotal();<br> long maxobjects = fsn.getMaxObjects();<br> ....<br>{code}<br><br>some of the function calls are already guarded by readwritelock, e.g. dir.totalInodes, but others are not. As a result of this, the web ui results are not 100% thread safe. But after evaluating the prons and cons of adding a giant lock into the JSP pages, we decided not to issue FSNamesystem ReadWrite locks into JSPs.<br><br></blockquote></li>
<li> <a href="">HDFS-1750</a>.
Major bug reported by szetszwo and fixed by szetszwo <br>
<b>fs -ls hftp://file not working</b><br>
<blockquote>{noformat}<br>hadoop dfs -touchz /tmp/file1 # create file. OK<br>hadoop dfs -ls /tmp/file1 # OK<br>hadoop dfs -ls hftp://namenode:50070/tmp/file1 # FAILED: not seeing the file<br>{noformat}</blockquote></li>
<li> <a href="">HDFS-1692</a>.
Major bug reported by bharathm and fixed by bharathm (data-node)<br>
<b>In secure mode, Datanode process doesn&apos;t exit when disks fail.</b><br>
<blockquote>In secure mode, when disks fail more than volumes tolerated, datanode process doesn&apos;t exit properly and it just hangs even though shutdown method is called. <br><br></blockquote></li>
<li> <a href="">HDFS-1592</a>.
Major bug reported by bharathm and fixed by bharathm <br>
<b>Datanode startup doesn&apos;t honor volumes.tolerated </b><br>
<blockquote>Datanode startup doesn&apos;t honor volumes.tolerated for hadoop 20 version.</blockquote></li>
<li> <a href="">HDFS-1541</a>.
Major sub-task reported by hairong and fixed by hairong (name-node)<br>
<b>Not marking datanodes dead When namenode in safemode</b><br>
<blockquote>In a big cluster, when namenode starts up, it takes a long time for namenode to process block reports from all datanodes. Because heartbeats processing get delayed, some datanodes are erroneously marked as dead, then later on they have to register again, thus wasting time.<br><br>It would speed up starting time if the checking of dead nodes is disabled when namenode in safemode.</blockquote></li>
<li> <a href="">HDFS-1445</a>.
Major sub-task reported by mattf and fixed by mattf (data-node)<br>
<b>Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file</b><br>
<blockquote>Batch hardlinking during &quot;upgrade&quot; snapshots, cutting time from aprx 8 minutes per volume to aprx 8 seconds. Validated in both Linux and Windows. Depends on prior integration with patch for HADOOP-7133.</blockquote></li>
<li> <a href="">HDFS-1377</a>.
Blocker bug reported by eli and fixed by eli (name-node)<br>
<b>Quota bug for partial blocks allows quotas to be violated </b><br>
<blockquote>There&apos;s a bug in the quota code that causes them not to be respected when a file is not an exact multiple of the block size. Here&apos;s an example:<br><br>{code}<br>$ hadoop fs -mkdir /test<br>$ hadoop dfsadmin -setSpaceQuota 384M /test<br>$ ls dir/ | wc -l # dir contains 101 files<br>101<br>$ du -ms dir # each is 3mb<br>304 dir<br>$ hadoop fs -put dir /test<br>$ hadoop fs -count -q /test<br> none inf 402653184 -550502400 2 101 317718528 hdfs://<br>$ hadoop fs -stat &quot;%o %r&quot; /test/dir/f30<br>134217728 3 # three 128mb blocks<br>{code}<br><br>INodeDirectoryWithQuota caches the number of bytes consumed by it&apos;s children in {{diskspace}}. The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when a file is not an exact multiple of the block size (the value ends up being negative). <br><br>This causes the quota checking code to think that the files in the directory consumes less space than they actually do, so the verifyQuota does not throw a QuotaExceededException even when the directory is over quota. However the bug isn&apos;t visible to users because {{fs count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value.<br><br>In FSDirectory#addBlock the disk space consumed is set conservatively to the full block size * the number of replicas:<br><br>{code}<br>updateCount(inodes, inodes.length-1, 0,<br> fileNode.getPreferredBlockSize()*fileNode.getReplication(), true);<br>{code}<br><br>In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting out the difference between the conservative estimate and what the number of bytes actually stored was:<br><br>{code}<br>//Updated space consumed if required.<br>INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null;<br>long diff = (file == null) ? 0 :<br> (file.getPreferredBlockSize() - storedBlock.getNumBytes());<br><br>if (diff &gt; 0 &amp;&amp; file.isUnderConstruction() &amp;&amp;<br> cursize &lt; storedBlock.getNumBytes()) {<br>...<br> dir.updateSpaceConsumed(path, 0, -diff*file.getReplication());<br>{code}<br><br>We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity (I believe the intent here is to correct for the cases when there&apos;s a failure replicating blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will use the preferred block size (vs of Block#getNumBytes used by newnode) so we will again subtract out the difference between the full block size and what the number of bytes actually stored was:<br><br>{code}<br>long dsOld = oldnode.diskspaceConsumed();<br>...<br>//check if disk space needs to be updated.<br>long dsNew = 0;<br>if (updateDiskspace &amp;&amp; (dsNew = newnode.diskspaceConsumed()) != dsOld) {<br> try {<br> updateSpaceConsumed(path, 0, dsNew-dsOld);<br>...<br>{code}<br><br>So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract 375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and yet we can still write more files.<br><br>So a directory with lots of single block files (if you have multiple blocks on the final partial block ends up subtracting from the diskspace used) ends up having a quota that&apos;s way off.<br><br>I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations differ when the old and new INode have the same blocks. I&apos;ll work on a patch which also adds a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary if the computed size does not reflect the cached value.</blockquote></li>
<li> <a href="">HDFS-1258</a>.
Blocker bug reported by atm and fixed by atm (name-node)<br>
<b>Clearing namespace quota on &quot;/&quot; corrupts FS image</b><br>
<blockquote>The HDFS root directory starts out with a default namespace quota of Integer.MAX_VALUE. If you clear this quota (using &quot;hadoop dfsadmin -clrQuota /&quot;), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, and the NN will not come back up from a restart.</blockquote></li>
<li> <a href="">HDFS-1189</a>.
Major bug reported by xiaokang and fixed by johnvijoe (name-node)<br>
<b>Quota counts missed between clear quota and set quota</b><br>
<blockquote>HDFS Quota counts will be missed between a clear quota operation and a set quota.<br><br>When setting quota for a dir, the INodeDirectory will be replaced by INodeDirectoryWithQuota and dir.isQuotaSet() becomes true. When INodeDirectoryWithQuota is newly created, quota counting will be performed. However, when clearing quota, the quota conf is set to -1 and dir.isQuotaSet() becomes false while INodeDirectoryWithQuota will NOT be replaced back to INodeDirectory.<br><br>FSDirectory.updateCount just update the quota count for inodes that isQuotaSet() is true. So after clear quota for a dir, its quota counts will not be updated and it&apos;s reasonable. But when re seting quota for this dir, quota counting will not be performed and some counts will be missed.</blockquote></li>
<li> <a href="">HADOOP-7475</a>.
Blocker bug reported by eyang and fixed by eyang <br>
<b> is broken</b><br>
<blockquote>When running, the system can not find the templates configuration directory:<br><br>{noformat}<br>cat: /usr/libexec/../templates/conf/core-site.xml: No such file or directory<br>cat: /usr/libexec/../templates/conf/hdfs-site.xml: No such file or directory<br>cat: /usr/libexec/../templates/conf/mapred-site.xml: No such file or directory<br>cat: /usr/libexec/../templates/conf/ No such file or directory<br>chown: cannot access `;: No such file or directory<br>chmod: cannot access `;: No such file or directory<br>cp: cannot stat `*.xml&apos;: No such file or directory<br>cp: cannot stat `;: No such file or directory<br>{noformat}</blockquote></li>
<li> <a href="">HADOOP-7398</a>.
Major new feature reported by owen.omalley and fixed by owen.omalley <br>
<b>create a mechanism to suppress the HADOOP_HOME deprecated warning</b><br>
<blockquote>Create a new mechanism to suppress the warning about HADOOP_HOME deprecation.<br><br>I&apos;ll create a HADOOP_HOME_WARN_SUPPRESS environment variable that suppresses the warning.</blockquote></li>
<li> <a href="">HADOOP-7373</a>.
Major bug reported by owen.omalley and fixed by owen.omalley <br>
<b>Tarball deployment doesn&apos;t work with {start,stop}-{dfs,mapred}</b><br>
<blockquote>The overrides the variable &quot;bin&quot;, which makes the scripts use libexec for hadoop-daemon(s).</blockquote></li>
<li> <a href="">HADOOP-7364</a>.
Major bug reported by tgraves and fixed by tgraves (test)<br>
<b>TestMiniMRDFSCaching fails if is set to something other than build/test</b><br>
<blockquote>TestMiniMRDFSCaching fails if is set to something other than build/test. </blockquote></li>
<li> <a href="">HADOOP-7356</a>.
Blocker bug reported by eyang and fixed by eyang <br>
<b>RPM packages broke bin/hadoop script for hadoop 0.20.205</b><br>
<blockquote> has been moved to libexec for binary package, but developers prefers to have in bin. Hadoo shell scripts should be modified to support both scenarios.</blockquote></li>
<li> <a href="">HADOOP-7330</a>.
Major bug reported by vicaya and fixed by vicaya (metrics)<br>
<b>The metrics source mbean implementation should return the attribute value instead of the object</b><br>
<blockquote>The MetricsSourceAdapter#getAttribute in 0.20.203 is returning the attribute object instead of the value.</blockquote></li>
<li> <a href="">HADOOP-7324</a>.
Blocker bug reported by vicaya and fixed by priyomustafi (metrics)<br>
<b>Ganglia plugins for metrics v2</b><br>
<blockquote>Although, all metrics in metrics v2 are exposed via the standard JMX mechanisms, most users are using Ganglia to collect metrics.</blockquote></li>
<li> <a href="">HADOOP-7277</a>.
Minor improvement reported by naisbitt and fixed by naisbitt (build)<br>
<b>Add Eclipse launch tasks for the 0.20-security branch</b><br>
<blockquote>This is to add the eclipse launchers from HADOOP-5911 to the 0.20 security branch.<br><br>Eclipse has a notion of &quot;run configuration&quot;, which encapsulates what&apos;s needed to run or debug an application. I use this quite a bit to start various Hadoop daemons in debug mode, with breakpoints set, to inspect state and what not.<br><br>This is simply configuration, so no tests are provided. After running &quot;ant eclipse&quot; and refreshing your project, you should see entries in the Run Configurations and Debug Configurations for launching the various hadoop daemons from within eclipse. There&apos;s a template for testing a specific test, and also templates to run all the tests, the job tracker, and a task tracker. It&apos;s likely that some parameters need to be further tweaked to have the same behavior as &quot;ant test&quot;, but for most tests, this works.<br><br>This also requires a small change to build.xml for the eclipse classpath.</blockquote></li>
<li> <a href="">HADOOP-7274</a>.
Minor bug reported by jeagles and fixed by jeagles (util)<br>
<b>CLONE - IOUtils.readFully and IOUtils.skipFully have typo in exception creation&apos;s message</b><br>
<blockquote>Same fix as for HADOOP-7057 for the Hadoop security branch<br><br>{noformat}<br> throw new IOException( &quot;Premeture EOF from inputStream&quot;);<br>{noformat}</blockquote></li>
<li> <a href="">HADOOP-7248</a>.
Minor improvement reported by cos and fixed by tgraves (build)<br>
<b>Have a way to automatically update Eclipse .classpath file when new libs are added to the classpath through Ivy for 0.20-* based sources</b><br>
<blockquote>Backport HADOOP-6407 into 0.20 based source trees</blockquote></li>
<li> <a href="">HADOOP-7232</a>.
Blocker bug reported by owen.omalley and fixed by owen.omalley (documentation)<br>
<b>Fix javadoc warnings</b><br>
<blockquote>The javadoc is currently generating 31 warnings.</blockquote></li>
<li> <a href="">HADOOP-7144</a>.
Major new feature reported by vicaya and fixed by revans2 <br>
<b>Expose JMX with something like JMXProxyServlet </b><br>
<blockquote>Much of the Hadoop metrics and status info is available via JMX, especially since 0.20.100, and 0.22+ (HDFS-1318, HADOOP-6728 etc.) For operations staff not familiar JMX setup, especially JMX with SSL and firewall tunnelling, the usage can be daunting. Using a JMXProxyServlet (a la Tomcat) to translate JMX attributes into JSON output would make a lot of non-Java admins happy.<br><br>We could probably use Tomcat&apos;s JMXProxyServlet code directly, if it&apos;s already output some standard format (JSON or XML etc.) The code is simple enough to port over and can probably integrate with the common HttpServer as one of the default servelet (maybe /jmx) for the pluggable security.</blockquote></li>
<li> <a href="">HADOOP-6255</a>.
Major new feature reported by owen.omalley and fixed by eyang <br>
<b>Create an rpm integration project</b><br>
<blockquote>Added RPM/DEB packages to build system.</blockquote></li>
<h2>Changes Since Hadoop 0.20.2</h2>
<li> <a href="">HADOOP-7190</a>. Add metrics v1 back for backwards compatibility. (omalley)
<li> <a href="">MAPREDUCE-2360</a>. Remove stripping of scheme, authority from submit dir in
support of viewfs. (cdouglas)
<li> <a href="">MAPREDUCE-2359</a> Use correct file system to access distributed cache objects.
(Krishna Ramachandran)
<li> <a href="">MAPREDUCE-2361</a>. "Fix Distributed Cache is not adding files to class paths
correctly" - Drop the host/scheme/fragment from URI (cdouglas)
<li> <a href="">MAPREDUCE-2362</a>. Fix unit-test failures: TestBadRecords (NPE due to
rearranged MapTask code) and TestTaskTrackerMemoryManager
(need hostname in output-string pattern). (Greg Roelofs, Krishna
<li> <a href="">HDFS-1729</a>. Add statistics logging for better visibility into
startup time costs. (Matt Foley)
<li> <a href="">MAPREDUCE-2363</a>. When a queue is built without any access rights we
explain the problem. (Richard King)
<li> <a href="">MAPREDUCE-1563</a>. TaskDiagnosticInfo may be missed sometime. (Krishna
<li> <a href="">MAPREDUCE-2364</a>. Don't hold the rjob lock while localizing resources. (ddas
via omalley)
<li> <a href="">HDFS-1598</a>. Directory listing on hftp:// does not show
.*.crc files. (szetszwo)
<li> <a href="">MAPREDUCE-2365</a>. New counters for FileInputFormat (BYTES_READ) and
FileOutputFormat (BYTES_WRITTEN).
New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize.
(Siddharth Seth)
<li> <a href="">HADOOP-7040</a>. Change DiskErrorException to IOException (boryas)
<li> <a href="">HADOOP-7104</a>. Remove unnecessary DNS reverse lookups from RPC layer
<li> <a href="">MAPREDUCE-2366</a>. Fix a problem where the task browser UI can't retrieve the
stdxxx printouts of streaming jobs that abend in the unix code, in
the common case where the containing job doesn't reuse JVM's.
(Richard King)
<li> <a href="">HADOOP-6977</a>. Herriot daemon clients should vend statistics (cos)
<li> <a href="">HADOOP-6971</a>. Clover build doesn't generate per-test coverage (cos)
<li> <a href="">HADOOP-6879</a>. Provide SSH based (Jsch) remote execution API for system
tests. (cos)
<li> <a href="">MAPREDUCE-2355</a>. Add a configuration knob
mapreduce.tasktracker.outofband.heartbeat.damper that limits out of band
heartbeats (acmurthy)
<li> <a href="">MAPREDUCE-2356</a>. Fix a race-condition that corrupted a task's state on the
JobTracker. (Luke Lu)
<li> <a href="">MAPREDUCE-2357</a>. Always propagate IOExceptions that are thrown by
non-FileInputFormat. (Luke Lu)
<li> <a href="">HADOOP-7163</a>. RPC handles SocketTimeOutException during SASL negotiation.
<li> <a href="">MAPREDUCE-2358</a>. MapReduce assumes the default FileSystem is HDFS.
(Krishna Ramachandran)
<li> <a href="">MAPREDUCE-1904</a>. Reducing locking contention in TaskTracker's
MapOutputServlet LocalDirAllocator. (Rajesh Balamohan via acmurthy)
<li> <a href="">HDFS-1626</a>. Make BLOCK_INVALIDATE_LIMIT configurable. (szetszwo)
<li> <a href="">HDFS-1584</a>. Adds a check for whether relogin is needed to
getDelegationToken in HftpFileSystem. (Kan Zhang via ddas)
<li> <a href="">HADOOP-7115</a>. Reduces the number of calls to getpwuid_r and
getpwgid_r, by implementing a cache in NativeIO. (ddas)
<li> <a href="">HADOOP-6882</a>. An XSS security exploit in jetty-6.1.14. jetty upgraded to
6.1.26. (ddas)
<li> <a href="">MAPREDUCE-2278</a>. Fixes a memory leak in the TaskTracker. (cdouglas)
<li> <a href=" redux">HDFS-1353 redux</a>. Modulate original 1353 to not bump RPC version.
<li> <a href="">MAPREDUCE-2082</a> Race condition in writing the jobtoken password file when
launching pipes jobs (jitendra and ddas)
<a href="">HADOOP-6978</a>. Fixes task log servlet vulnerabilities via symlinks.
(Todd Lipcon and Devaraj Das)
<li> <a href="">MAPREDUCE-2178</a>. Write task initialization to avoid race
conditions leading to privilege escalation and resource leakage by
performing more actiions as the user. (Owen O'Malley, Devaraj Das,
Chris Douglas via cdouglas)
<li> <a href="">HDFS-1364</a>. HFTP client should support relogin from keytab
<li> <a href="">HADOOP-6907</a>. Make RPC client to use per-proxy configuration.
(Kan Zhang via ddas)
<li> <a href="">MAPREDUCE-2055</a>. Fix JobTracker to decouple job retirement from copy of
job-history file to HDFS and enhance RetiredJobInfo to carry aggregated
job-counters to prevent a disk roundtrip on job-completion to fetch
counters for the JobClient. (Krishna Ramachandran via acmurthy)
<a href="">HDFS-1353</a>. Remove most of getBlockLocation optimization (jghoman)
<li> <a href="">MAPREDUCE-2023</a>. TestDFSIO read test may not read specified bytes. (htang)
<li> <a href="">HDFS-1340</a>. A null delegation token is appended to the url if security is
disabled when browsing filesystem.(boryas)
<li> <a href="">HDFS-1352</a>. Fix jsvc.location. (jghoman)
<li> <a href="">HADOOP-6860</a>. 'compile-fault-inject' should never be called directly. (cos)
<li> <a href="">MAPREDUCE-2005</a>. TestDelegationTokenRenewal fails (boryas)
<li> <a href="">MAPREDUCE-2000</a>. Rumen is not able to extract counters for Job history logs
from Hadoop 0.20. (htang)
<li> <a href="">MAPREDUCE-1961</a>. ConcurrentModificationException when shutting down Gridmix.
<li> <a href="">HADOOP-6899</a>. RawLocalFileSystem set working directory does
not work for relative names. (suresh)
<li> <a href="">HDFS-495</a>. New clients should be able to take over files lease if the old
client died. (shv)
<li> <a href="">HADOOP-6728</a>. Re-design and overhaul of the Metrics framework. (Luke Lu via
<li> <a href="">MAPREDUCE-1966</a>. Change blacklisting of tasktrackers on task failures to be
a simple graylist to fingerpoint bad tasktrackers. (Greg Roelofs via
<li> <a href="">HADOOP-6864</a>. Add ability to get netgroups (as returned by getent
netgroup command) using native code (JNI) instead of forking. (Erik Steffl)
<li> <a href="">HDFS-1318</a>. HDFS Namenode and Datanode WebUI information needs to be
accessible programmatically for scripts. (Tanping Wang via suresh)
<li> <a href="">HDFS-1315</a>. Add fsck event to audit log and remove other audit log events
corresponding to FSCK listStatus and open calls. (suresh)
<li> <a href="">MAPREDUCE-1941</a>. Provides access to JobHistory file (raw) with job user/acl
permission. (Srikanth Sundarrajan via ddas)
<li> <a href="">MAPREDUCE-291.</a> Optionally a separate daemon should serve JobHistory.
(Srikanth Sundarrajan via ddas)
<li> <a href="">MAPREDUCE-1936</a>. Make Gridmix3 more customizable (sync changes from trunk).
<li> <a href="">HADOOP-5981</a>. Fix variable substitution during parsing of child environment
variables. (Krishna Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-339.</a> Greedily schedule failed tasks to cause early job failure.
<li> <a href="">MAPREDUCE-1872</a>. Hardened CapacityScheduler to have comprehensive, coherent
limits on tasks/jobs for jobs/users/queues. Also, added the ability to
refresh queue definitions without the need to restart the JobTracker.
<li> <a href="">HDFS-1161</a>. Make DN minimum valid volumes configurable. (shv)
<li> <a href="">HDFS-457</a>. Reintroduce volume failure tolerance for DataNodes. (shv)
<li> <a href=" Add start time, end time and total time taken for FSCK
to FSCK report">HDFS-1307 Add start time, end time and total time taken for FSCK
to FSCK report</a>. (suresh)
<li> <a href="">MAPREDUCE-1207</a>. Sanitize user environment of map/reduce tasks and allow
admins to set environment and java options. (Krishna Ramachandran via
<li> <a href=" - Add support in HDFS for new statistics added in FileSystem
to track the file system operations (suresh)
<li> HDFS-1301">HDFS-1298 - Add support in HDFS for new statistics added in FileSystem
to track the file system operations (suresh)
<li> HDFS-1301</a>. TestHDFSProxy need to use server side conf for ProxyUser
<li> <a href="">HADOOP-6859</a> - Introduce additional statistics to FileSystem to track
file system operations (suresh)
<li> <a href="">HADOOP-6818</a>. Provides a JNI implementation of Unix Group resolution. The
config should be set to to enable this
implementation. (ddas)
<li> <a href="">MAPREDUCE-1938</a>. Introduces a configuration for putting user classes before
the system classes during job submission and in task launches. Two things
need to be done in order to use this feature -
(1) mapreduce.user.classpath.first : this should be set to true in the
jobconf, and, (2) HADOOP_USER_CLASSPATH_FIRST : this is relevant for job
submissions done using bin/hadoop shell script. HADOOP_USER_CLASSPATH_FIRST
should be defined in the environment with some non-empty value
(like "true"), and then bin/hadoop should be executed. (ddas)
<li> <a href="">HADOOP-6669</a>. Respect compression configuration when creating DefaultCodec
compressors. (Koji Noguchi via cdouglas)
<li> <a href="">HADOOP-6855</a>. Add support for netgroups, as returned by command
getent netgroup. (Erik Steffl)
<li> <a href="">HDFS-599</a>. Allow NameNode to have a seprate port for service requests from
client requests. (Dmytro Molkov via hairong)
<li> <a href="">HDFS-132</a>. Fix namenode to not report files deleted metrics for deletions
done while replaying edits during startup. (shv)
<li> <a href="">MAPREDUCE-1521</a>. Protection against incorrectly configured reduces
<li> <a href="">MAPREDUCE-1936</a>. Make Gridmix3 more customizable. (htang)
<li> <a href="">MAPREDUCE-517.</a> Enhance the CapacityScheduler to assign multiple tasks
per-heartbeat. (acmurthy)
<li> <a href="">MAPREDUCE-323.</a> Re-factor layout of JobHistory files on HDFS to improve
operability. (Dick King via acmurthy)
<li> <a href="">MAPREDUCE-1921</a>. Ensure exceptions during reading of input data in map
tasks are augmented by information about actual input file which caused
the exception. (Krishna Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-1118</a>. Enhance the JobTracker web-ui to ensure tabular columns
are sortable, also added a /scheduler servlet to CapacityScheduler for
enhanced UI for queue information. (Krishna Ramachandran via acmurthy)
<li> <a href="">HADOOP-5913</a>. Add support for starting/stopping queues. (cdouglas)
<li> <a href="">HADOOP-6835</a>. Add decode support for concatenated gzip files. (Greg Roelofs)
<li> <a href="">HDFS-1158</a>. Revert <a href="">HDFS-457</a>. (shv)
<li> <a href="">MAPREDUCE-1699</a>. Ensure JobHistory isn't disabled for any reason. (Krishna
Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-1682</a>. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
<li> <a href="">MAPREDUCE-1914</a>. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
<li> <a href="">HADOOP-6713</a>. Multiple RPC Reader Threads (Bharathm)
<li> <a href="">HDFS-1250</a>. Namenode should reject block reports and block received
requests from dead datanodes (suresh)
<li> <a href="">MAPREDUCE-1863</a>. [Rumen] Null failedMapAttemptCDFs in job traces generated
by Rumen. (htang)
<li> <a href="">MAPREDUCE-1309</a>. Rumen refactory. (htang)
<li> <a href="">HDFS-1114</a>. Implement LightWeightGSet for BlocksMap in order to reduce
NameNode memory footprint. (szetszwo)
<li> <a href="">MAPREDUCE-572.</a> Fixes DistributedCache.checkURIs to throw error if link is
missing for uri in cache archives. (amareshwari)
<li> <a href="">MAPREDUCE-787.</a> Fix JobSubmitter to honor user given symlink in the path.
<li> <a href="">HADOOP-6815</a>. refreshSuperUserGroupsConfiguration should use
server side configuration for the refresh( boryas)
<li> <a href="">MAPREDUCE-1868</a>. Add a read and connection timeout to JobClient while
pulling tasklogs. (Krishna Ramachandran via acmurthy)
<li> <a href="">HDFS-1119</a>. Introduce a GSet interface to BlocksMap. (szetszwo)
<li> <a href="">MAPREDUCE-1778</a>. Ensure failure to setup CompletedJobStatusStore is not
silently ignored by the JobTracker. (Krishna Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-1538</a>. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
<li> <a href="">MAPREDUCE-1850</a>. Add information about the host from which a job is
submitted. (Krishna Ramachandran via acmurthy)
<li> <a href="">HDFS-1110</a>. Reuses objects for commonly used file names in namenode to
reduce the heap usage. (suresh)
<li> <a href="">HADOOP-6810</a>. Extract a subset of tests for smoke (DOA) validation. (cos)
<li> <a href="">HADOOP-6642</a>. Remove debug stmt left from original patch. (cdouglas)
<li> <a href="">HADOOP-6808</a>. Add comments on how to setup File/Ganglia Context for
kerberos metrics (Erik Steffl)
<li> <a href="">HDFS-1061</a>. INodeFile memory optimization. (bharathm)
<li> <a href="">HDFS-1109</a>. HFTP supports filenames that contains the character "+".
(Dmytro Molkov via dhruba, backported by szetszwo)
<li> <a href="">HDFS-1085</a>. Check file length and bytes read when reading a file through
hftp in order to detect failure. (szetszwo)
<li> <a href="">HDFS-1311</a>. Running tests with 'testcase' cause triple execution of the
same test case (cos)
<li> <a href="">HDFS-1150</a>.FIX. Verify datanodes' identities to clients in secure clusters.
Update to patch to improve handling of jsvc source in build.xml (jghoman)
<li> <a href="">HADOOP-6752</a>. Remote cluster control functionality needs JavaDocs
improvement. (Balaji Rajagopalan via cos)
<li> <a href="">MAPREDUCE-1288</a>. Fixes TrackerDistributedCacheManager to take into account
the owner of the localized file in the mapping from cache URIs to
CacheStatus objects. (ddas)
<li> <a href="">MAPREDUCE-1682</a>. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
<li> <a href="">MAPREDUCE-1914</a>. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
<li> <a href="">MAPREDUCE-1538</a>. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
<li> <a href="">MAPREDUCE-1900</a>. Fixes a FS leak that i missed in the earlier patch.
<li> <a href="">MAPREDUCE-1900</a>. Makes JobTracker/TaskTracker close filesystems, created
on behalf of users, when they are no longer needed. (ddas)
<li> <a href="">HADOOP-6832</a>. Add a static user plugin for web auth for external users.
<li> <a href="">HDFS-1007</a>. Fixes a bug in SecurityUtil.buildDTServiceName to do
with handling of null hostname. (omalley)
<li> <a href="">HDFS-1007</a>. makes long running servers using hftp work. Also has some
refactoring in the MR code to do with handling of delegation tokens.
(omalley & ddas)
<li> <a href="">HDFS-1178</a>. The NameNode servlets should not use RPC to connect to the
NameNode. (omalley)
<li> <a href="">MAPREDUCE-1807</a>. Re-factor TestQueueManager. (Richard King via acmurthy)
<li> <a href="">HDFS-1150</a>. Fixes the earlier patch to do logging in the right directory
and also adds facility for monitoring processes (via -Dprocname in the
command line). (Jakob Homan via ddas)
<li> <a href="">HADOOP-6781</a>. security audit log shouldn't have exception in it. (boryas)
<li> <a href="">HADOOP-6776</a>. Fixes the javadoc in UGI.createProxyUser. (ddas)
<li> <a href="">HDFS-1150</a>. building jsvc from source tar. source tar is also checked in.
<li> <a href="">HDFS-1150</a>. Bugfix in the hadoop shell script. (ddas)
<li> <a href="">HDFS-1153</a>. The navigation to /dfsnodelist.jsp with invalid input
parameters produces NPE and HTTP 500 error (rphulari)
<a href="">MAPREDUCE-1664</a>. Bugfix to enable queue administrators of a queue to
view job details of jobs submitted to that queue even though they
are not part of acl-view-job.
<li> <a href="">HDFS-1150</a>. Bugfix to add more knobs to secure datanode starter.
<li> <a href="">HDFS-1157</a>. Modifications introduced by <a href=" are breaking aspect's
bindings (cos)
<li> HDFS-1130">HDFS-1150 are breaking aspect's
bindings (cos)
<li> HDFS-1130</a>. Adds a configuration dfs.cluster.administrators for
controlling access to the default servlets in hdfs. (ddas)
<li> <a href="">HADOOP-6706</a>.FIX. Relogin behavior for RPC clients could be improved
<li> <a href="">HDFS-1150</a>. Verify datanodes' identities to clients in secure clusters.
<li> <a href="">MAPREDUCE-1442</a>. Fixed regex in job-history related to parsing Counter
values. (Luke Lu via acmurthy)
<li> <a href="">HADOOP-6760</a>. WebServer shouldn't increase port number in case of negative
port setting caused by Jetty's race. (cos)
<li> <a href="">HDFS-1146</a>. Javadoc for getDelegationTokenSecretManager in FSNamesystem.
<li> <a href="">HADOOP-6706</a>. Fix on top of the earlier patch. Closes the connection
on a SASL connection failure, and retries again with a new
connection. (ddas)
<li> <a href="">MAPREDUCE-1716</a>. Fix on top of earlier patch for logs truncation a.k.a
<a href="">MAPREDUCE-1100</a>. Addresses log truncation issues when binary data is
written to log files and adds a header to a truncated log file to
inform users of the done trucation.
<li> <a href="">HDFS-1383</a>. Improve the error messages when using hftp://.
<li> <a href="">MAPREDUCE-1744</a>. Fixed DistributedCache apis to take a user-supplied
FileSystem to allow for better proxy behaviour for Oozie. (Richard King)
<li> <a href="">MAPREDUCE-1733</a>. Authentication between pipes processes and java
counterparts. (jitendra)
<li> <a href="">MAPREDUCE-1664</a>. Bugfix on top of the previous patch. (ddas)
<li> <a href="">HDFS-1136</a>. FileChecksumServlets.RedirectServlet doesn't carry forward
the delegation token (boryas)
<li> <a href="">HADOOP-6756</a>. Change value of FS_DEFAULT_NAME_KEY from fs.defaultFS
to which is a correct name for 0.20 (steffl)
<li> <a href="">HADOOP-6756</a>. Document (javadoc comments) and cleanup configuration
keys in (steffl)
<li> <a href="">MAPREDUCE-1759</a>. Exception message for unauthorized user doing killJob,
killTask, setJobPriority needs to be improved. (gravi via vinodkv)
<li> <a href="">HADOOP-6715</a>. AccessControlList.toString() returns empty string when
we set acl to "*". (gravi via vinodkv)
<li> <a href="">HADOOP-6757</a>. NullPointerException for hadoop clients launched from
streaming tasks. (amarrk via vinodkv)
<li> <a href="">HADOOP-6631</a>. FileUtil.fullyDelete() should continue to delete other files
despite failure at any level. (vinodkv)
<li> <a href="">MAPREDUCE-1317</a>. NPE in setHostName in Rumen. (rksingh)
<li> <a href="">MAPREDUCE-1754</a>. Replace mapred.persmissions.supergroup with an acl :
mapreduce.cluster.administrators and <a href="">HADOOP-6748</a>.: Remove
hadoop.cluster.administrators. Contributed by Amareshwari Sriramadasu.
<li> <a href="">HADOOP-6701</a>. Incorrect exit codes for "dfs -chown", "dfs -chgrp"
<li> <a href="">HADOOP-6640</a>. FileSystem.get() does RPC retires within a static
synchronized block. (hairong)
<li> <a href="">HDFS-1006</a>. Removes unnecessary logins from the previous patch. (ddas)
<li> <a href="">HADOOP-6745</a>. adding some java doc to Server.RpcMetrics, UGI (boryas)
<li> <a href="">MAPREDUCE-1707</a>. TaskRunner can get NPE in getting ugi from TaskTracker.
<li> <a href="">HDFS-1104</a>. Fsck triggers full GC on NameNode. (hairong)
<li> <a href="">HADOOP-6332</a>. Large-scale Automated Test Framework (sharad, Sreekanth
Ramakrishnan, at all via cos)
<li> <a href="">HADOOP-6526</a>. Additional fix for test context on top of existing one. (cos)
<li> <a href="">HADOOP-6710</a>. Symbolic umask for file creation is not conformant with posix.
<li> <a href="">HADOOP-6693</a>. Added metrics to track kerberos login success and failure.
<li> <a href="">MAPREDUCE-1711</a>. Gridmix should provide an option to submit jobs to the same
queues as specified in the trace. (rksing via htang)
<li> <a href="">MAPREDUCE-1687</a>. Stress submission policy does not always stress the
cluster. (htang)
<li> <a href="">MAPREDUCE-1641</a>. Bug-fix to ensure command line options such as
-files/-archives are checked for duplicate artifacts in the
DistributedCache. (Amareshwari Sreeramadasu via acmurthy)
<li> <a href="">MAPREDUCE-1641</a>. Fix DistributedCache to ensure same files cannot be put in
both the archives and files sections. (Richard King via acmurthy)
<li> <a href="">HADOOP-6670</a>. Fixes a testcase issue introduced by the earlier commit
of the <a href="">HADOOP-6670</a> patch. (ddas)
<li> <a href="">MAPREDUCE-1718</a>. Fixes a problem to do with correctly constructing
service name for the delegation token lookup in HftpFileSystem
(borya via ddas)
<li> <a href="">HADOOP-6674</a>. Fixes the earlier patch to handle pings correctly (ddas).
<li> <a href="">MAPREDUCE-1664</a>. Job Acls affect when Queue Acls are set.
(Ravi Gummadi via vinodkv)
<li> <a href="">HADOOP-6718</a>. Fixes a problem to do with clients not closing RPC
connections on a SASL failure. (ddas)
<li> <a href="">MAPREDUCE-1397</a>. NullPointerException observed during task failures.
(Amareshwari Sriramadasu via vinodkv)
<li> <a href="">HADOOP-6670</a>. Use the UserGroupInformation's Subject as the criteria for
equals and hashCode. (omalley)
<li> <a href="">HADOOP-6716</a>. System won't start in non-secure mode when kerb5.conf
( on Mac) is not present. (boryas)
<li> <a href="">MAPREDUCE-1607</a>. Task controller may not set permissions for a
task cleanup attempt's log directory. (Amareshwari Sreeramadasu via
<li> <a href="">MAPREDUCE-1533</a>. JobTracker performance enhancements. (Amar Kamat via
<li> <a href="">MAPREDUCE-1701</a>. AccessControlException while renewing a delegation token
in not correctly handled in the JobTracker. (boryas)
<li> <a href="">HDFS-481</a>. Incremental patch to fix broken unit test in contrib/hdfsproxy
<li> <a href="">HADOOP-6706</a>. Fixes a bug in the earlier version of the same patch (ddas)
<li> <a href="">HDFS-1096</a>. allow dfsadmin/mradmin refresh of superuser proxy group
<li> <a href="">HDFS-1012</a>. Support for cluster specific path entries in ldap for hdfsproxy
(Srikanth Sundarrajan via Nicholas)
<li> <a href="">HDFS-1011</a>. Improve Logging in HDFSProxy to include cluster name associated
with the request (Srikanth Sundarrajan via Nicholas)
<li> <a href="">HDFS-1010</a>. Retrieve group information from UnixUserGroupInformation
instead of LdapEntry (Srikanth Sundarrajan via Nicholas)
<li> <a href="">HDFS-481</a>. Bug fix - hdfsproxy: Stack overflow + Race conditions
(Srikanth Sundarrajan via Nicholas)
<li> <a href="">MAPREDUCE-1657</a>. After task logs directory is deleted, tasklog servlet
displays wrong error message about job ACLs. (Ravi Gummadi via vinodkv)
<li> <a href="">MAPREDUCE-1692</a>. Remove TestStreamedMerge from the streaming tests.
(Amareshwari Sriramadasu and Sreekanth Ramakrishnan via vinodkv)
<li> <a href="">HDFS-1081</a>. Performance regression in
DistributedFileSystem::getFileBlockLocations in secure systems (jhoman)
<a href="">MAPREDUCE-1656</a>. JobStory should provide queue info. (htang)
<li> <a href="">MAPREDUCE-1317</a>. Reducing memory consumption of rumen objects. (htang)
<li> <a href="">MAPREDUCE-1317</a>. Reverting the patch since it caused build failures. (htang)
<li> <a href="">MAPREDUCE-1683</a>. Fixed jobtracker web-ui to correctly display heap-usage.
<a href="">HADOOP-6706</a>. Fixes exception handling for saslConnect. The ideal
solution is to the Refreshable interface but as Owen noted in
<a href="">HADOOP-6656</a>, it doesn't seem to work as expected. (ddas)
<li> <a href="">MAPREDUCE-1617</a>. TestBadRecords failed once in our test runs. (Amar
Kamat via vinodkv).
<li> <a href="">MAPREDUCE-587.</a> Stream test TestStreamingExitStatus fails with Out of
Memory. (Amar Kamat via vinodkv).
<li> <a href="">HDFS-1096</a>. Reverting the patch since it caused build failures. (ddas)
<li> <a href="">MAPREDUCE-1317</a>. Reducing memory consumption of rumen objects. (htang)
<li> <a href="">MAPREDUCE-1680</a>. Add a metric to track number of heartbeats processed by the
JobTracker. (Richard King via acmurthy)
<li> <a href="">MAPREDUCE-1683</a>. Removes JNI calls to get jvm current/max heap usage in
ClusterStatus by default. (acmurthy)
<li> <a href="">HADOOP-6687</a>. user object in the subject in UGI should be reused in case
of a relogin. (jitendra)
<li> <a href="">HADOOP-5647</a>. TestJobHistory fails if /tmp/_logs is not writable to.
Testcase should not depend on /tmp. (Ravi Gummadi via vinodkv)
<li> <a href="">MAPREDUCE-181.</a> Bug fix for Secure job submission. (Ravi Gummadi via
<li> <a href="">MAPREDUCE-1635</a>. ResourceEstimator does not work after <a href="">MAPREDUCE-842.</a>
(Amareshwari Sriramadasu via vinodkv)
<li> <a href="">MAPREDUCE-1526</a>. Cache the job related information while submitting the
job. (rksingh)
<li> <a href="">HADOOP-6674</a>. Turn off SASL checksums for RPCs. (jitendra via omalley)
<li> <a href="">HADOOP-5958</a>. Replace fork of DF with library call. (cdouglas via omalley)
<li> <a href="">HDFS-999</a>. Secondary namenode should login using kerberos if security
is configured. Bugfix to original patch. (jhoman)
<li> <a href="">MAPREDUCE-1594</a>. Support for SleepJobs in Gridmix (rksingh)
<li> <a href="">HDFS-1007</a>. Fix. ServiceName for delegation token for Hftp has hftp
port and not RPC port.
<a href="">MAPREDUCE-1376</a>. Support for varied user submissions in Gridmix (rksingh)
<li> <a href="">HDFS-1080</a>. SecondaryNameNode image transfer should use the defined
http address rather than local ip address (jhoman)
<a href="">HADOOP-6661</a>. User document for UserGroupInformation.doAs for secure
impersonation. (jitendra)
<li> <a href="">MAPREDUCE-1624</a>. Documents the job credentials and associated details
to do with delegation tokens (ddas)
<a href="">HDFS-1036</a>. Documentation for fetchdt for forrest (boryas)
<a href="">HDFS-1039</a>. New patch on top of previous patch. Gets namenode address
from conf. (jitendra)
<li> <a href="">HADOOP-6656</a>. Renew Kerberos TGT when 80% of the renew lifetime has been
used up. (omalley)
<li> <a href="">HADOOP-6653</a>. Protect against NPE in setupSaslConnection when real user is
null. (omalley)
<li> <a href="">HADOOP-6649</a>. An error in the previous committed patch. (jitendra)
<li> <a href="">HADOOP-6652</a>. ShellBasedUnixGroupsMapping shouldn't have a cache.
<li> <a href="">HADOOP-6649</a>. login object in UGI should be inside the subject
<li> <a href="">HADOOP-6637</a>. Benchmark overhead of RPC session establishment
(shv via jitendra)
<li> <a href="">HADOOP-6648</a>. Credentials must ignore null tokens that can be generated
when using HFTP to talk to insecure clusters. (omalley)
<li> <a href="">HADOOP-6632</a>. Fix on JobTracker to reuse filesystem handles if possible.
<li> <a href="">HADOOP-6647</a>. balancer fails with "is not authorized for protocol
interface NamenodeProtocol" in secure environment (boryas)
<li> <a href="">MAPREDUCE-1612</a>. job conf file is not accessible from job history
web page. (Ravi Gummadi via vinodkv)
<li> <a href="">MAPREDUCE-1611</a>. Refresh nodes and refresh queues doesnt work with
service authorization enabled. (Amar Kamat via vinodkv)
<li> <a href="">HADOOP-6644</a>. util.Shell getGROUPS_FOR_USER_COMMAND method
name - should use common naming convention (boryas)
<li> <a href="">MAPREDUCE-1609</a>. Fixes a problem with localization of job log
directories when tasktracker is re-initialized that can result
in failed tasks. (Amareshwari Sriramadasu via yhemanth)
<li> <a href="">MAPREDUCE-1610</a>. Update forrest documentation for directory
structure of localized files. (Ravi Gummadi via yhemanth)
<li> <a href="">MAPREDUCE-1532</a>. Fixes a javadoc and an exception message in JobInProgress
when the authenticated user is different from the user in conf. (ddas)
<li> <a href="">MAPREDUCE-1417</a>. Update forrest documentation for private
and public distributed cache files. (Ravi Gummadi via yhemanth)
<li> <a href="">HADOOP-6634</a>. AccessControlList uses full-principal names to verify acls
causing queue-acls to fail (vinodkv)
<a href="">HADOOP-6642</a>. Fix javac, javadoc, findbugs warnings. (chrisdo via acmurthy)
<li> <a href="">HDFS-1044</a>. Cannot submit mapreduce job from secure client to
unsecure sever. (boryas)
<a href="">HADOOP-6638</a>. try to relogin in a case of failed RPC connection
(expired tgt) only in case the subject is loginUser or
proxyUgi.realUser. (boryas)
<li> <a href="">HADOOP-6632</a>. Support for using different Kerberos keys for different
instances of Hadoop services. (jitendra)
<li> <a href="">HADOOP-6526</a>. Need mapping from long principal names to local OS
user names. (jitendra)
<li> <a href="">MAPREDUCE-1604</a>. Update Forrest documentation for job authorization
ACLs. (Amareshwari Sriramadasu via yhemanth)
<li> <a href="">HDFS-1045</a>. In secure clusters, re-login is necessary for https
clients before opening connections (jhoman)
<li> <a href="">HADOOP-6603</a>. Addition to original patch to be explicit
about new method not being for general use. (jhoman)
<li> <a href="">MAPREDUCE-1543</a>. Add audit log messages for job and queue
access control checks. (Amar Kamat via yhemanth)
<li> <a href="">MAPREDUCE-1606</a>. Fixed occassinal timeout in TestJobACL. (Ravi Gummadi via
<li><a href="">HADOOP-6633</a>. normalize property names for JT/NN kerberos principal
names in configuration. (boryas)
<li> <a href="">HADOOP-6613</a>. Changes the RPC server so that version is checked first
on an incoming connection. (Kan Zhang via ddas)
<li> <a href="">HADOOP-5592</a>. Fix typo in Streaming doc in reference to GzipCodec.
(Corinne Chandel via tomwhite)
<li> <a href="">MAPREDUCE-813.</a> Updates Streaming and M/R tutorial documents.
(Corinne Chandel via ddas)
<li> <a href="">MAPREDUCE-927.</a> Cleanup of task-logs should happen in TaskTracker instead
of the Child. (Amareshwari Sriramadasu via vinodkv)
<li> <a href="">HDFS-1039</a>. Service should be set in the token in JspHelper.getUGI.
<li> <a href="">MAPREDUCE-1599</a>. MRBench reuses jobConf and credentials there in.
<li> <a href="">MAPREDUCE-1522</a>. FileInputFormat may use the default FileSystem for the
input path. (Tsz Wo (Nicholas), SZE via cdouglas)
<li> <a href="">HDFS-1036</a>. In DelegationTokenFetch pass Configuration object so
getDefaultUri will work correctly.
<li> <a href="">HDFS-1038</a>. In nn_browsedfscontent.jsp fetch delegation token only if
security is enabled. (jitendra)
<li> <a href="">HDFS-1036</a>. in DelegationTokenFetch dfs.getURI returns no port (boryas)
<li> <a href="">HADOOP-6598</a>. Verbose logging from the Group class (one more case)
<li> <a href="">HADOOP-6627</a>. Bad Connection to FS" message in FSShell should print
message from the exception (boryas)
<li> <a href="">HDFS-1033</a>. In secure clusters, NN and SNN should verify that the remote
principal during image and edits transfer (jhoman)
<li> <a href="">HDFS-1005</a>. Fixes a bug to do with calling the cross-realm API in Fsck
client. (ddas)
<li> <a href="">MAPREDUCE-1422</a>. Fix cleanup of localized job directory to work if files
with non-deletable permissions are created within it.
(Amar Kamat via yhemanth)
<li> <a href="">HDFS-1007</a>. Fixes bugs to do with 20S cluster talking to 20 over
hftp (borya)
<li> <a href="">MAPREDUCE-1566</a>. Fixes bugs in the earlier patch. (ddas)
<li> <a href="">HDFS-992</a>. A bug in backport for <a href="">HDFS-992</a>. (jitendra)
<li> <a href="">HADOOP-6598</a>. Remove verbose logging from the Groups class. (borya)
<a href="">HADOOP-6620</a>. NPE if renewer is passed as null in getDelegationToken.
<li> <a href="">HDFS-1023</a>. Second Update to original patch to fix username (jhoman)
<li> <a href="">MAPREDUCE-1435</a>. Add test cases to already committed patch for this
jira, synchronizing changes with trunk. (yhemanth)
<li> <a href="">HADOOP-6612</a>. Protocols RefreshUserToGroupMappingsProtocol and
RefreshAuthorizationPolicyProtocol authorization settings thru
KerberosInfo (boryas)
<li> <a href="">MAPREDUCE-1566</a>. Bugfix for tests on top of the earlier patch. (ddas)
<li> <a href="">MAPREDUCE-1566</a>. Mechanism to import tokens and secrets from a file in to
the submitted job. (omalley)
<li> <a href="">HADOOP-6603</a>. Provide workaround for issue with Kerberos not
resolving corss-realm principal. (kan via jhoman)
<li> <a href="">HDFS-1023</a>. Update to original patch to fix username (jhoman)
<li> <a href="">HDFS-814</a>. Add an api to get the visible length of a
DFSDataInputStream. (hairong)
<li> <a href="">HDFS-1023</a>. Allow http server to start as regular user if https
principal is not defined. (jhoman)
<li> <a href="">HDFS-1022</a>. Merge all three test specs files (common, hdfs, mapred)
into one. (steffl)
<li> <a href="">HDFS-101</a>. DFS write pipeline: DFSClient sometimes does not detect
second datanode failure. (hairong)
<li> <a href="">HDFS-1015</a>. Intermittent failure in TestSecurityTokenEditLog. (jitendra)
<li> <a href="">MAPREDUCE-1550</a>. A bugfix on top of what was committed earlier (ddas).
<li> <a href="">MAPREDUCE-1155</a>. DISABLING THE TestStreamingExitStatus temporarily. (ddas)
<li> <a href="">HDFS-1020</a>. Changes the check for renewer from short name to long name
in the cancel/renew delegation token methods. (jitendra via ddas)
<li> <a href="">HDFS-1019</a>. Fixes values of delegation token parameters in
hdfs-default.xml. (jitendra via ddas)
<li> <a href="">MAPREDUCE-1430</a>. Fixes a backport issue with the earlier patch. (ddas)
<li> <a href="">MAPREDUCE-1559</a>. Fixes a problem in DelegationTokenRenewal class to
do with using the right credentials when talking to the NameNode.(ddas)
<li> <a href="">MAPREDUCE-1550</a>. Fixes a problem to do with creating a filesystem using
the user's UGI in the JobHistory browsing. (ddas)
<li> <a href="">HADOOP-6609</a>. Fix UTF8 to use a thread local DataOutputBuffer instead of
a static that was causing a deadlock in RPC. (omalley)
<li> <a href="">HADOOP-6584</a>. Fix javadoc warnings introduced by original <a href="">HADOOP-6584</a>
patch (jhoman)
<a href="">HDFS-1017</a>. browsedfs jsp should call JspHelper.getUGI rather than using
createRemoteUser(). (jhoman)
<li> <a href="">MAPREDUCE-899.</a> Modified LinuxTaskController to check that task-controller
has right permissions and ownership before performing any actions.
(Amareshwari Sriramadasu via yhemanth)
<li> <a href="">HDFS-204</a>. Revive number of files listed metrics. (hairong)
<li> <a href="">HADOOP-6569</a>. FsShell#cat should avoid calling uneccessary getFileStatus
before opening a file to read. (hairong)
<li> <a href="">HDFS-1014</a>. Error in reading delegation tokens from edit logs. (jitendra)
<li> <a href="">HDFS-458</a>. Add under-10-min tests from 0.22 to 0.20.1xx, only the tests
that already exist in 0.20.1xx (steffl)
<li> <a href="">MAPREDUCE-1155</a>. Just pulls out the TestStreamingExitStatus part of the
patch from jira (that went to 0.22). (ddas)
<a href="">HADOOP-6600</a>. Fix for branch backport only. Comparing of user should use
equals. (boryas).
<li> <a href="">HDFS-1006</a>. Fixes NameNode and SecondaryNameNode to use kerberizedSSL for
the http communication. (Jakob Homan via ddas)
<li> <a href="">HDFS-1007</a>. Fixes a bug on top of the earlier patch. (ddas)
<li> <a href="">HDFS-1005</a>. Fsck security. Makes it work over kerberized SSL (boryas and
<li> <a href="">HDFS-1007</a>. Makes HFTP and Distcp use kerberized SSL. (ddas)
<li> <a href="">MAPREDUCE-1455</a>. Fixes a testcase in the earlier patch.
(Ravi Gummadi via ddas)
<li> <a href="">HDFS-992</a>. Refactors block access token implementation to conform to the
generic Token interface. (Kan Zhang via ddas)
<li> <a href="">HADOOP-6584</a>. Adds KrbSSL connector for jetty. (Jakob Homan via ddas)
<li> <a href="">HADOOP-6589</a>. Add a framework for better error messages when rpc connections
fail to authenticate. (Kan Zhang via omalley)
<li> <a href="">HADOOP-6600</a>,<a href=",<a href="https://issues">HDFS-1003,<a href="https://issues</a>">MAPREDUCE-1539</a>. mechanism for authorization check
for inter-server protocols(boryas)
<li> <a href="">HADOOP-6580</a>,<a href=",<a href="https://issues">HDFS-993,<a href="https://issues</a>">MAPREDUCE-1516</a>. UGI should contain authentication
<li> Namenode and JT should issue a delegation token only for kerberos
authenticated clients. (jitendra)
<li> <a href=",<a href="https://issues">HDFS-984,<a href="https://issues</a>">HADOOP-6573</a>,<a href="">MAPREDUCE-1537</a>. Delegation Tokens should be persisted
in Namenode, and corresponding changes in common and mr. (jitendra)
<li> <a href="">HDFS-994</a>. Provide methods for obtaining delegation token from Namenode for
hftp and other uses. Incorporates <a href="">HADOOP-6594</a>: Update hdfs script to
provide fetchdt tool. (jitendra)
<li> <a href="">HADOOP-6586</a>. Log authentication and authorization failures and successes
<li> <a href="">HDFS-991</a>. Allow use of delegation tokens to authenticate to the
HDFS servlets. (omalley)
<li> <a href="">HADOOP-1849</a>. Add undocumented configuration parameter for per handler
call queue size in IPC Server. (shv)
<a href="">HADOOP-6599</a>. Split existing RpcMetrics with summary in RpcMetrics and
details information in RpcDetailedMetrics. (suresh)
<li> <a href="">HDFS-985</a>. HDFS should issue multiple RPCs for listing a large directory.
<li> <a href="">HDFS-1000</a>. Updates libhdfs to use the new UGI. (ddas)
<li> <a href="">MAPREDUCE-1532</a>. Ensures all filesystem operations at the client is done
as the job submitter. Also, changes the renewal to maintain list of tokens
to renew. (ddas)
<li> <a href="">HADOOP-6596</a>. Add a version field to the seialization of the
AbstractDelegationTokenIdentifier. (omalley)
<li> <a href="">HADOOP-5561</a>. Add javadoc.maxmemory to build.xml to allow larger memory.
(jkhoman via omalley)
<li> <a href="">HADOOP-6579</a>. Add a mechanism for encoding and decoding Tokens in to
url-safe strings. (omalley)
<li> <a href="">MAPREDUCE-1354</a>. Make incremental changes in jobtracker for
improving scalability (acmurthy)
<li> <a href="">HDFS-999</a>.Secondary namenode should login using kerberos if security
is configured(boryas)
<li> <a href="">MAPREDUCE-1466</a>. Added a private configuration variable
mapreduce.input.num.files, to store number of input files
being processed by M/R job. (Arun Murthy via yhemanth)
<li> <a href="">MAPREDUCE-1403</a>. Save file-sizes of each of the artifacts in
DistributedCache in the JobConf (Arun Murthy via yhemanth)
<li> <a href="">HADOOP-6543</a>. Fixes a compilation problem in the original commit. (ddas)
<li> <a href="">MAPREDUCE-1520</a>. Moves a call to setWorkingDirectory in Child to within
a doAs block. (Amareshwari Sriramadasu via ddas)
<li> <a href="">HADOOP-6543</a>. Allows secure clients to talk to unsecure clusters.
(Kan Zhang via ddas)
<li> <a href="">MAPREDUCE-1505</a>. Delays construction of the job client until it is really
required. (Arun C Murthy via ddas)
<li> <a href="">HADOOP-6549</a>. TestDoAsEffectiveUser should use ip address of the host
for superuser ip check. (jitendra)
<li> <a href="">HDFS-464</a>. Fix memory leaks in libhdfs. (Christian Kunz via suresh)
<li> <a href="">HDFS-946</a>. NameNode should not return full path name when lisitng a
diretory or getting the status of a file. (hairong)
<li> <a href="">MAPREDUCE-1398</a>. Fix TaskLauncher to stop waiting for slots on a TIP
that is killed / failed. (Amareshwari Sriramadasu via yhemanth)
<li> <a href="">MAPREDUCE-1476</a>. Fix the M/R framework to not call commit for special
tasks like job setup/cleanup and task cleanup.
(Amareshwari Sriramadasu via yhemanth)
<li> <a href="">HADOOP-6467</a>. Performance improvement for liststatus on directories in
hadoop archives. (mahadev)
<li> <a href="">HADOOP-6558</a>. archive does not work with distcp -update. (nicholas via
<li> <a href="">HADOOP-6583</a>. Captures authentication and authorization metrics. (ddas)
<li> <a href="">MAPREDUCE-1316</a>. Fixes a memory leak of TaskInProgress instances in
the jobtracker. (Amar Kamat via yhemanth)
<li> <a href="">MAPREDUCE-670.</a> Creates ant target for 10 mins patch test build.
(Jothi Padmanabhan via gkesavan)
<li> <a href="">MAPREDUCE-1430</a>. JobTracker should be able to renew delegation tokens
for the jobs(boryas)
<li> <a href="">HADOOP-6551</a>, <a href=", <a href="https://issues">HDFS-986, <a href="https://issues</a>">MAPREDUCE-1503</a>. Change API for tokens to throw
exceptions instead of returning booleans. (omalley)
<li> <a href="">HADOOP-6545</a>. Changes the Key for the FileSystem to be UGI. (ddas)
<li> <a href="">HADOOP-6572</a>. Makes sure that SASL encryption and push to responder queue
for the RPC response happens atomically. (Kan Zhang via ddas)
<li> <a href="">HDFS-965</a>. Split the HDFS TestDelegationToken into two tests, of which
one proxy users and the other normal users. (jitendra via omalley)
<li> <a href="">HADOOP-6560</a>. HarFileSystem throws NPE for har://hdfs-/foo (nicholas via
<li> <a href="">MAPREDUCE-686.</a> Move TestSpeculativeExecution.Fake* into a separate class
so that it can be used by other tests. (Jothi Padmanabhan via sharad)
<li> <a href="">MAPREDUCE-181.</a> Fixes an issue in the use of the right config. (ddas)
<li> <a href="">MAPREDUCE-1026</a>. Fixes a bug in the backport. (ddas)
<li> <a href="">HADOOP-6559</a>. Makes the RPC client automatically re-login when the SASL
connection setup fails. This is applicable to only keytab based logins.
<li> <a href="">HADOOP-2141</a>. Backport changes made in the original JIRA to aid
fast unit tests in Map/Reduce. (Amar Kamat via yhemanth)
<li> <a href="">HADOOP-6382</a>. Import the mavenizable pom file structure and adjust
the build targets and bin scripts. (gkesvan via ltucker)
<li> <a href="">MAPREDUCE-1425</a>. archive throws OutOfMemoryError (mahadev)
<li> <a href="">MAPREDUCE-1399</a>. The archive command shows a null error message. (nicholas)
<li> <a href="">HADOOP-6552</a>. Puts renewTGT=true and useTicketCache=true for the keytab
kerberos options. (ddas)
<li> <a href="">MAPREDUCE-1433</a>. Adds delegation token for MapReduce (ddas)
<li> <a href="">HADOOP-4359</a>. Fixes a bug in the earlier backport. (ddas)
<li> <a href="">HADOOP-6547</a>, <a href=", <a href="https://issues">HDFS-949, <a href="https://issues</a>">MAPREDUCE-1470</a>. Move Delegation token into Common
so that we can use it for MapReduce also. It is a combined patch for
common, hdfs and mr. (jitendra)
<li> <a href="">HADOOP-6510</a>,<a href=",<a href="https://issues">HDFS-935,<a href="https://issues</a>">MAPREDUCE-1464</a>. Support for doAs to allow
authenticated superuser to impersonate proxy users. It is a combined
patch with compatible fixes in HDFS and MR. (jitendra)
<li> <a href="">MAPREDUCE-1435</a>. Fixes the way symlinks are handled when cleaning up
work directory files. (Ravi Gummadi via yhemanth)
<li> <a href="">MAPREDUCE-6419</a>. Fixes a bug in the backported patch. (ddas)
<li> <a href="">MAPREDUCE-1457</a>. Fixes JobTracker to get the FileSystem object within
getStagingAreaDir within a privileged block. Fixes to use the
appropriate UGIs while getting the TaskUmbilicalProtocol proxy and while
executing the task. Contributed by Jakob Homan. (ddas)
<li> <a href="">MAPREDUCE-1440</a>. Replace the long user name in MapReduce with the local
name. (ddas)
<li> <a href="">HADOOP-6419</a>. Adds SASL based authentication to RPC. Also includes the
<a href="">MAPREDUCE-1335</a> and <a href=" patches">HDFS-933 patches</a>. Contributed by Kan Zhang.
<a href="">HADOOP-6538</a>. Sets to simple by default.
<li> <a href="">HDFS-938</a>. Replace calls to UGI.getUserName() with
<li> <a href="">HADOOP-6544</a>. fix ivy settings to include JSON
libs for .20 (boryas)
<a href="">HDFS-907</a>. Add tests for getBlockLocations and totalLoad metrics. (rphulari)
<li> <a href="">HADOOP-6204</a>. Implementing aspects development and fault injeciton
framework for Hadoop (cos)
<li> <a href="">MAPREDUCE-1432</a>. Adds hooks in the jobtracker and tasktracker
for loading the tokens in the user's ugi. This is required for
the copying of files from the hdfs. (Devaraj Das vi boryas)
<li> <a href="">MAPREDUCE-1383</a>. Automates fetching of delegation tokens in File*Formats
Distributed Cache and Distcp. Also, provides a config
mapreduce.job.hdfs-servers that the jobs can populate with a comma
separated list of namenodes. The job client automatically fetches
delegation tokens from those namenodes.
<li> <a href="">HADOOP-6337</a>. Update FilterInitializer class to be more visible
and take a conf for further development. (jhoman)
<li> <a href="">HADOOP-6520</a>. UGI should load tokens from the environment. (jitendra)
<li> <a href="">HADOOP-6517</a>, <a href="">HADOOP-6518</a>. Ability to add/get tokens from
UserGroupInformation & Kerberos login in UGI should honor KRB5CCNAME
<li> <a href="">HADOOP-6299</a>. Reimplement the UserGroupInformation to use the OS
specific and Kerberos JAAS login. (jhoman, ddas, oom)
<a href="">HADOOP-6524</a>. Contrib tests are failing Clover'ed build. (cos)
<li> <a href="">MAPREDUCE-842.</a> Fixing a bug in the earlier version of the patch
related to improper localization of the job token file.
(Ravi Gummadi via yhemanth)
<li> <a href="">HDFS-919</a>. Create test to validate the BlocksVerified metric (Gary Murry
via cos)
<li> <a href="">MAPREDUCE-1186</a>. Modified code in distributed cache to set
permissions only on required set of localized paths.
(Amareshwari Sriramadasu via yhemanth)
<li> <a href="">HDFS-899</a>. Delegation Token Implementation. (Jitendra Nath Pandey)
<li> <a href="">MAPREDUCE-896.</a> Enhance tasktracker to cleanup files that might have
been created by user tasks with non-writable permissions.
(Ravi Gummadi via yhemanth)
<li> <a href="">HADOOP-5879</a>. Read compression level and strategy from Configuration for
gzip compression. (He Yongqiang via cdouglas)
<li> <a href="">HADOOP-6161</a>. Add get/setEnum methods to Configuration. (cdouglas)
<li> <a href="">HADOOP-6382</a> Mavenize the build.xml targets and update the bin scripts
in preparation for publishing POM files (giri kesavan via ltucker)
<li> <a href="">HDFS-737</a>. Add full path name of the file to the block information and
summary of total number of files, blocks, live and deadnodes to
metasave output. (Jitendra Nath Pandey via suresh)
<li> <a href="">HADOOP-6577</a>. Add hidden configuration option "ipc.server.max.response.size"
to change the default 1 MB, the maximum size when large IPC handler
response buffer is reset. (suresh)
<li> <a href="">HADOOP-6521</a>. Fix backward compatiblity issue with umask when applications
use deprecated param dfs.umask in configuration or use
FsPermission.setUMask(). (suresh)
<li> <a href="">HDFS-737</a>. Add full path name of the file to the block information and
summary of total number of files, blocks, live and deadnodes to
metasave output. (Jitendra Nath Pandey via suresh)
<li> <a href="">HADOOP-6521</a>. Fix backward compatiblity issue with umask when applications
use deprecated param dfs.umask in configuration or use
FsPermission.setUMask(). (suresh)
<li> <a href="">MAPREDUCE-433.</a> Use more reliable counters in TestReduceFetch.
(Christopher Douglas via ddas)
<li> <a href="">MAPREDUCE-744.</a> Introduces the notion of a public distributed cache.
<li> <a href="">MAPREDUCE-1140</a>. Fix DistributedCache to not decrement reference counts
for unreferenced files in error conditions.
(Amareshwari Sriramadasu via yhemanth)
<li> <a href="">MAPREDUCE-1284</a>. Fix fts_open() call in task-controller that was failing
LinuxTaskController unit tests. (Ravi Gummadi via yhemanth)
<li> <a href="">MAPREDUCE-1098</a>. Fixed the distributed-cache to not do i/o while
holding a global lock.
(Amareshwari Sriramadasu via acmurthy)
<li> <a href="">MAPREDUCE-1338</a>. Introduces the notion of token cache using which
tokens and secrets can be sent by the Job client to the JobTracker.
(Boris Shkolnik)
<li> <a href="">HADOOP-6495</a>. Identifier should be serialized after the password is created
In Token constructor. (Jitendra Nath Pandey)
<li> <a href="">HADOOP-6506</a>. Failing tests prevent the rest of test targets from
execution. (cos)
<li> <a href="">HADOOP-5457</a>. Fix to continue to run builds even if contrib test fails.
<li> <a href="">MAPREDUCE-856.</a> Setup secure permissions for distributed cache files.
(Vinod Kumar Vavilapalli via yhemanth)
<li> <a href="">MAPREDUCE-871.</a> Fix ownership of Job/Task local files to have correct
group ownership according to the egid of the tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
<a href="">MAPREDUCE-476.</a> Extend DistributedCache to work locally (LocalJobRunner).
(Philip Zeyliger via tomwhite)
<li> <a href="">MAPREDUCE-711.</a> Removed Distributed Cache from Common, to move it under
Map/Reduce. (Vinod Kumar Vavilapalli via yhemanth)
<li> <a href="">MAPREDUCE-478.</a> Allow map and reduce jvm parameters, environment
variables and ulimit to be set separately. (acmurthy)
<a href="">MAPREDUCE-842.</a> Setup secure permissions for localized job files,
intermediate outputs and log files on tasktrackers.
(Vinod Kumar Vavilapalli via yhemanth)
<li> <a href="">MAPREDUCE-408.</a> Fixes an assertion problem in TestKillSubProcesses.
(Ravi Gummadi via ddas)
<li> <a href="">HADOOP-4041</a>. IsolationRunner does not work as documented.
(Philip Zeyliger via tomwhite)
<li> <a href="">MAPREDUCE-181.</a> Changes the job submission process to be secure.
(Devaraj Das)
<li> <a href="">HADOOP-5737</a>. Fixes a problem in the way the JobTracker used to talk to
other daemons like the NameNode to get the job's files. Also adds APIs
in the JobTracker to get the FileSystem objects as per the JobTracker's
configuration. (Amar Kamat via ddas)
<a href="">HADOOP-5771</a>. Implements unit tests for LinuxTaskController.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
<li> <a href="">HADOOP-4656</a>, <a href=", <a href="https://issues">HDFS-685, <a href="https://issues</a>">MAPREDUCE-1083</a>. Use the user-to-groups mapping
service in the NameNode and JobTracker. Combined patch for these 3 jiras
otherwise tests fail. (Jitendra Nath Pandey)
<li> <a href="">MAPREDUCE-1250</a>. Refactor job token to use a common token interface.
(Jitendra Nath Pandey)
<li> <a href="">MAPREDUCE-1026</a>. Shuffle should be secure. (Jitendra Nath Pandey)
<li> <a href="">HADOOP-4268</a>. Permission checking in fsck. (Jitendra Nath Pandey)
<li> <a href="">HADOOP-6415</a>. Adding a common token interface for both job token and
delegation token. (Jitendra Nath Pandey)
<li> <a href="">HADOOP-6367</a>, <a href="">HDFS-764</a>. Moving Access Token implementation from Common to
HDFS. These two jiras must be committed together otherwise build will
fail. (Jitendra Nath Pandey)
<li> <a href="">HDFS-409</a>. Add more access token tests
(Jitendra Nath Pandey)
<li> <a href="">HADOOP-6132</a>. RPC client opens an extra connection for VersionedProtocol.
(Jitendra Nath Pandey)
<li> <a href="">HDFS-445</a>. pread() fails when cached block locations are no longer valid.
(Jitendra Nath Pandey)
<li> <a href="">HDFS-195</a>. Need to handle access token expiration when re-establishing the
pipeline for dfs write. (Jitendra Nath Pandey)
<li> <a href="">HADOOP-6176</a>. Adding a couple private methods to AccessTokenHandler
for testing purposes. (Jitendra Nath Pandey)
<li> <a href="">HADOOP-5824</a>. remove OP_READ_METADATA functionality from Datanode.
(Jitendra Nath Pandey)
<li> <a href="">HADOOP-4359</a>. Access Token: Support for data access authorization
checking on DataNodes. (Jitendra Nath Pandey)
<li> <a href="">MAPREDUCE-1372</a>. Fixed a ConcurrentModificationException in jobtracker.
(Arun C Murthy via yhemanth)
<li> <a href="">MAPREDUCE-1316</a>. Fix jobs' retirement from the JobTracker to prevent memory
leaks via stale references. (Amar Kamat via acmurthy)
<li> <a href="">MAPREDUCE-1342</a>. Fixed deadlock in global blacklisting of tasktrackers.
(Amareshwari Sriramadasu via acmurthy)
<li> <a href="">HADOOP-6460</a>. Reinitializes buffers used for serializing responses in ipc
server on exceeding maximum response size to free up Java heap. (suresh)
<li> <a href="">MAPREDUCE-1100</a>. Truncate user logs to prevent TaskTrackers' disks from
filling up. (Vinod Kumar Vavilapalli via acmurthy)
<li> <a href="">MAPREDUCE-1143</a>. Fix running task counters to be updated correctly
when speculative attempts are running for a TIP.
(Rahul Kumar Singh via yhemanth)
<li> <a href="">HADOOP-6151</a>, 6281, 6285, 6441. Add HTML quoting of the parameters to all
of the servlets to prevent XSS attacks. (omalley)
<li> <a href="">MAPREDUCE-896.</a> Fix bug in earlier implementation to prevent
spurious logging in tasktracker logs for absent file paths.
(Ravi Gummadi via yhemanth)
<li> <a href="">MAPREDUCE-676.</a> Fix Hadoop Vaidya to ensure it works for map-only jobs.
(Suhas Gogate via acmurthy)
<li> <a href="">HADOOP-5582</a>. Fix Hadoop Vaidya to use new Counters in
org.apache.hadoop.mapreduce package. (Suhas Gogate via acmurthy)
<li> <a href="">HDFS-595</a>. umask settings in configuration may now use octal or
symbolic instead of decimal. Update HDFS tests as such. (jghoman)
<li> <a href="">MAPREDUCE-1068</a>. Added a verbose error message when user specifies an
incorrect -file parameter. (Amareshwari Sriramadasu via acmurthy)
<li> <a href="">MAPREDUCE-1171</a>. Allow the read-error notification in shuffle to be
configurable. (Amareshwari Sriramadasu via acmurthy)
<li> <a href="">MAPREDUCE-353.</a> Allow shuffle read and connection timeouts to be
configurable. (Amareshwari Sriramadasu via acmurthy)
<li> <a href="">HDFS-781</a>. Namenode metrics PendingDeletionBlocks is not decremented.
<a href="">MAPREDUCE-1185</a>. Redirect running job url to history url if job is already
retired. (Amareshwari Sriramadasu and Sharad Agarwal via sharad)
<li> <a href="">MAPREDUCE-754.</a> Fix NPE in expiry thread when a TT is lost. (Amar Kamat
via sharad)
<li> <a href="">MAPREDUCE-896.</a> Modify permissions for local files on tasktracker before
deletion so they can be deleted cleanly. (Ravi Gummadi via yhemanth)
<a href="">HADOOP-5771</a>. Implements unit tests for LinuxTaskController.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
<li> <a href="">MAPREDUCE-1124</a>. Import Gridmix3 and Rumen. (cdouglas)
<li> <a href="">MAPREDUCE-1063</a>. Document gridmix benchmark. (cdouglas)
<li> <a href="">HDFS-758</a>. Changes to report status of decommissioining on the namenode web
UI. (jitendra)
<li> <a href="">HADOOP-6234</a>. Add new option dfs.umaskmode to set umask in configuration
to use octal or symbolic instead of decimal. (Jakob Homan via suresh)
<li> <a href="">MAPREDUCE-1147</a>. Add map output counters to new API. (Amar Kamat via
<li> <a href="">MAPREDUCE-1182</a>. Fix overflow in reduce causing allocations to exceed the
configured threshold. (cdouglas)
<li> <a href="">HADOOP-4933</a>. Fixes a ConcurrentModificationException problem that shows up
when the history viewer is accessed concurrently.
(Amar Kamat via ddas)
<li> <a href="">MAPREDUCE-1140</a>. Fix DistributedCache to not decrement reference counts for
unreferenced files in error conditions.
(Amareshwari Sriramadasu via yhemanth)
<li> <a href="">HADOOP-6203</a>. FsShell rm/rmr error message indicates exceeding Trash quota
and suggests using -skpTrash, when moving to trash fails.
(Boris Shkolnik via suresh)
<li> <a href="">HADOOP-5675</a>. Do not launch a job if DistCp has no work to do. (Tsz Wo
(Nicholas), SZE via cdouglas)
<li> <a href="">HDFS-457</a>. Better handling of volume failure in Data Node storage,
This fix is a port from hdfs-0.22 to common-0.20 by Boris Shkolnik.
Contributed by Erik Steffl
<li> <a href="">HDFS-625</a>. Fix NullPointerException thrown from ListPathServlet.
Contributed by Suresh Srinivas.
<li> <a href="">HADOOP-6343</a>. Log unexpected throwable object caught in RPC.
Contributed by Jitendra Nath Pandey
<li> <a href="">MAPREDUCE-1186</a>. Fixed DistributedCache to do a recursive chmod on just the
per-cache directory, not all of mapred.local.dir.
(Amareshwari Sriramadasu via acmurthy)
<li> <a href="">MAPREDUCE-1231</a>. Add an option to distcp to ignore checksums when used with
the upgrade option.
(Jothi Padmanabhan via yhemanth)
<li> <a href="">MAPREDUCE-1219</a>. Fixed JobTracker to not collect per-job metrics, thus
easing load on it. (Amareshwari Sriramadasu via acmurthy)
<li> <a href="">HDFS-761</a>. Fix failure to process rename operation from edits log due to
quota verification. (suresh)
<li> <a href="">MAPREDUCE-1196</a>. Fix FileOutputCommitter to use the deprecated cleanupJob
api correctly. (acmurthy)
<li> <a href="">HADOOP-6344</a>. rm and rmr immediately delete files rather than sending
to trash, despite trash being enabled, if a user is over-quota. (jhoman)
<li> <a href="">MAPREDUCE-1160</a>. Reduce verbosity of log lines in some Map/Reduce classes
to avoid filling up jobtracker logs on a busy cluster.
(Ravi Gummadi and Hong Tang via yhemanth)
<li> <a href="">HDFS-587</a>. Add ability to run HDFS with MR test on non-default queue,
also updated junit dependendcy from junit-3.8.1 to junit-4.5 (to make
it possible to use Configured and Tool to process command line to
be able to specify a queue). Contributed by Erik Steffl.
<li> <a href="">MAPREDUCE-1158</a>. Fix JT running maps and running reduces metrics.
<li> <a href="">MAPREDUCE-947.</a> Fix bug in earlier implementation that was
causing unit tests to fail.
(Ravi Gummadi via yhemanth)
<li> <a href="">MAPREDUCE-1062</a>. Fix MRReliabilityTest to work with retired jobs
(Contributed by Sreekanth Ramakrishnan)
<li> <a href="">MAPREDUCE-1090</a>. Modified log statement in TaskMemoryManagerThread to
include task attempt id. (yhemanth)
<li> <a href="">MAPREDUCE-1098</a>. Fixed the distributed-cache to not do i/o while
holding a global lock. (Amareshwari Sriramadasu via acmurthy)
<li> <a href="">MAPREDUCE-1048</a>. Add occupied/reserved slot usage summary on
jobtracker UI. (Amareshwari Sriramadasu via sharad)
<li> <a href="">MAPREDUCE-1103</a>. Added more metrics to Jobtracker. (sharad)
<li> <a href="">MAPREDUCE-947.</a> Added commitJob and abortJob apis to OutputCommitter.
Enhanced FileOutputCommitter to create a _SUCCESS file for successful
jobs. (Amar Kamat & Jothi Padmanabhan via acmurthy)
<li> <a href="">MAPREDUCE-1105</a>. Remove max limit configuration in capacity scheduler in
favor of max capacity percentage thus allowing the limit to go over
queue capacity. (Rahul Kumar Singh via yhemanth)
<li> <a href="">MAPREDUCE-1086</a>. Setup Hadoop logging environment for tasks to point to
task related parameters. (Ravi Gummadi via yhemanth)
<li> <a href="">MAPREDUCE-739.</a> Allow relative paths to be created inside archives.
<li> <a href="">HADOOP-6097</a>. Multiple bugs w/ Hadoop archives (mahadev)
<li> <a href="">HADOOP-6231</a>. Allow caching of filesystem instances to be disabled on a
per-instance basis (ben slusky via mahadev)
<li> <a href="">MAPREDUCE-826.</a> harchive doesn't use ToolRunner / harchive returns 0 even
if the job fails with exception (koji via mahadev)
<li> <a href="">HDFS-686</a>. NullPointerException is thrown while merging edit log and
image. (hairong)
<li> <a href="">HDFS-709</a>. Fix TestDFSShell failure due to rename bug introduced by
<a href="">HDFS-677</a>. (suresh)
<li> <a href="">HDFS-677</a>. Rename failure when both source and destination quota exceeds
results in deletion of source. (suresh)
<li> <a href="">HADOOP-6284</a>. Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to so that it allows setting java command options for
JAVA_PLATFORM. (Koji Noguchi via szetszwo)
<li> <a href="">MAPREDUCE-732.</a> Removed spurious log statements in the node
blacklisting logic. (Sreekanth Ramakrishnan via yhemanth)
<li> <a href="">MAPREDUCE-144.</a> Includes dump of the process tree in task diagnostics when
a task is killed due to exceeding memory limits.
(Vinod Kumar Vavilapalli via yhemanth)
<li> <a href="">MAPREDUCE-979.</a> Fixed JobConf APIs related to memory parameters to
return values of new configuration variables when deprecated
variables are disabled. (Sreekanth Ramakrishnan via yhemanth)
<li> <a href="">MAPREDUCE-277.</a> Makes job history counters available on the job history
viewers. (Jothi Padmanabhan via ddas)
<li> <a href="">HADOOP-5625</a>. Add operation duration to clienttrace. (Lei Xu
via cdouglas)
<li> <a href="">HADOOP-5222</a>. Add offset to datanode clienttrace. (Lei Xu via cdouglas)
<li> <a href="">HADOOP-6218</a>. Adds a feature where TFile can be split by Record
Sequence number. Contributed by Hong Tang and Raghu Angadi.
<li> <a href="">MAPREDUCE-1088</a>. Changed permissions on JobHistory files on local disk to
0744. Contributed by Arun C. Murthy.
<li> <a href="">HADOOP-6304</a>. Use{Readable|Writable|Executable} where
possible in RawLocalFileSystem. Contributed by Arun C. Murthy.
<a href="">MAPREDUCE-270.</a> Fix the tasktracker to optionally send an out-of-band
heartbeat on task-completion for better job-latency. Contributed by
Arun C. Murthy
Configuration changes:
add mapreduce.tasktracker.outofband.heartbeat
<li> <a href="">MAPREDUCE-1030</a>. Fix capacity-scheduler to assign a map and a reduce task
per-heartbeat. Contributed by Rahuk K Singh.
<li> <a href="">MAPREDUCE-1028</a>. Fixed number of slots occupied by cleanup tasks to one
irrespective of slot size for the job. Contributed by Ravi Gummadi.
<li> <a href="">MAPREDUCE-964.</a> Fixed start and finish times of TaskStatus to be
consistent, thereby fixing inconsistencies in metering tasks.
Contributed by Sreekanth Ramakrishnan.
<li> <a href="">HADOOP-5976</a>. Add a new command, classpath, to the hadoop
script. Contributed by Owen O'Malley and Gary Murry
<li> <a href="">HADOOP-5784</a>. Makes the number of heartbeats that should arrive
a second at the JobTracker configurable. Contributed by
Amareshwari Sriramadasu.
<li> <a href="">MAPREDUCE-945.</a> Modifies MRBench and TestMapRed to use
ToolRunner so that options such as queue name can be
passed via command line. Contributed by Sreekanth Ramakrishnan.
<li> HADOOP:5420 Correct bug in earlier implementation
by Arun C. Murthy
<li> <a href="">HADOOP-5363</a> Add support for proxying connections to multiple
clusters with different versions to hdfsproxy. Contributed
by Zhiyong Zhang
<li> <a href="">HADOOP-5780</a>. Improve per block message prited by -metaSave
in HDFS. (Raghu Angadi)
<li> <a href="">HADOOP-6227</a>. Fix Configuration to allow final parameters to be set
to null and prevent them from being overridden. Contributed by
Amareshwari Sriramadasu.
<li> <a href=" ">MAPREDUCE-430 </a> Added patch supplied by Amar Kamat to allow roll forward
on branch to includ externally committed patch.
<li> <a href="">MAPREDUCE-768.</a> Provide an option to dump jobtracker configuration in
JSON format to standard output. Contributed by V.V.Chaitanya
<li> <a href=" ">MAPREDUCE-834 </a>Correct an issue created by merging this issue with
patch attached to external Jira.
<li> <a href="">HADOOP-6184</a> Provide an API to dump Configuration in a JSON format.
Contributed by V.V.Chaitanya Krishna.
<li> <a href=" ">MAPREDUCE-745 </a> Patch added for this issue to allow branch-0.20 to
merge cleanly.
<li> <a href=" ">MAPREDUCE-478 </a>Allow map and reduce jvm parameters, environment
variables and ulimit to be set separately.
<li> <a href=" ">MAPREDUCE-682 </a>Removes reservations on tasktrackers which are blacklisted.
Contributed by Sreekanth Ramakrishnan.
<li> HADOOP:5420 Support killing of process groups in LinuxTaskController
<li> <a href="">HADOOP-5488</a> Removes the pidfile management for the Task JVM from the
framework and instead passes the PID back and forth between the
TaskTracker and the Task processes. Contributed by Ravi Gummadi.
<li> <a href=" ">MAPREDUCE-467 </a>Provide ability to collect statistics about total tasks and
succeeded tasks in different time windows.
<li> <a href="">MAPREDUCE-817.</a> Add a cache for retired jobs with minimal job
info and provide a way to access history file url
<li> <a href="">MAPREDUCE-814.</a> Provide a way to configure completed job history
files to be on HDFS.
<li> <a href=" ">MAPREDUCE-838 </a>Fixes a problem in the way commit of task outputs
happens. The bug was that even if commit failed, the task would be
declared as successful. Contributed by Amareshwari Sriramadasu.
<li> <a href=" ">MAPREDUCE-809 </a>Fix job-summary logs to correctly record final status of
<li> <a href=" ">MAPREDUCE-740 </a>Log a job-summary at the end of a job, while
allowing it to be configured to use a custom appender if desired.
<li> <a href=" ">MAPREDUCE-771 </a>Fixes a bug which delays normal jobs in favor of
high-ram jobs.
<li> <a href="">HADOOP-5420</a> Support setsid based kill in LinuxTaskController.
<li> <a href=" ">MAPREDUCE-733 </a>Fixes a bug that when a task tracker is killed ,
it throws exception. Instead it should catch it and process it and
allow the rest of the flow to go through
<li> <a href=" ">MAPREDUCE-734 </a>Fixes a bug which prevented hi ram jobs from being
removed from the scheduler queue.
<li> <a href=" ">MAPREDUCE-693 </a> Fixes a bug that when a job is submitted and the
JT is restarted (before job files have been written) and the job
is killed after recovery, the conf files fail to be moved to the
"done" subdirectory.
<li> <a href=" ">MAPREDUCE-722 </a>Fixes a bug where more slots are getting reserved
for HiRAM job tasks than required.
<li> <a href=" ">MAPREDUCE-683 </a>TestJobTrackerRestart failed because of stale
filemanager cache (which was created once per jvm). This patch makes
sure that the filemanager is inited upon every JobHistory.init()
and hence upon every restart. Note that this wont happen in production
as upon a restart the new jobtracker will start in a new jvm and
hence a new cache will be created.
<li> <a href=" ">MAPREDUCE-709 </a>Fixes a bug where node health check script does
not display the correct message on timeout.
<li> <a href=" ">MAPREDUCE-708 </a>Fixes a bug where node health check script does
not refresh the "reason for blacklisting".
<li> <a href=" ">MAPREDUCE-522 </a>Rewrote TestQueueCapacities to make it simpler
and avoid timeout errors.
<li> <a href=" ">MAPREDUCE-532 </a>Provided ability in the capacity scheduler to
limit the number of slots that can be concurrently used per queue
at any given time.
<li> <a href=" ">MAPREDUCE-211 </a>Provides ability to run a health check script on
the tasktracker nodes and blacklist nodes if they are unhealthy.
Contributed by Sreekanth Ramakrishnan.
<li> <a href=" ">MAPREDUCE-516 </a>Remove .orig file included by mistake.
<li> <a href=" ">MAPREDUCE-416 </a>Moves the history file to a "done" folder whenever
a job completes.
<li> <a href="">HADOOP-5980</a> Previously, task spawned off by LinuxTaskController
didn't get LD_LIBRARY_PATH in their environment. The tasks will now
get same LD_LIBRARY_PATH value as when spawned off by
<li> <a href="">HADOOP-5981</a> This issue completes the feature mentioned in
<a href="">HADOOP-2838</a>. <a href="">HADOOP-2838</a> provided a way to set env variables in
child process. This issue provides a way to inherit tt's env variables
and append or reset it. So now X=$X:y will inherit X (if there) and
append y to it.
<li> <a href="">HADOOP-5419</a> This issue is to provide an improvement on the
existing M/R framework to let users know which queues they have
access to, and for what operations. One use case for this would
that currently there is no easy way to know if the user has access
to submit jobs to a queue, until it fails with an access control
<li> <a href="">HADOOP-5420</a> Support setsid based kill in LinuxTaskController.
<li> <a href="">HADOOP-5643</a> Added the functionality to refresh jobtrackers node
list via command line (bin/hadoop mradmin -refreshNodes). The command
should be run as the jobtracker owner (jobtracker process owner)
or from a super group (mapred.permissions.supergroup).
<li> <a href="">HADOOP-2838</a> Now the users can set environment variables using
mapred.child.env. They can do the following X=Y : set X to Y X=$X:Y
: Append Y to X (which should be taken from the tasktracker)
<a href="">HADOOP-5818</a>. Revert the renaming from FSNamesystem.checkSuperuserPrivilege
to checkAccess by <a href="">HADOOP-5643</a>. (Amar Kamat via szetszwo)
<li> <a href="">HADOOP-5801</a>. Fixes the problem: If the hosts file is changed across restart
then it should be refreshed upon recovery so that the excluded hosts are
lost and the maps are re-executed. (Amar Kamat via ddas)
<li> <a href="">HADOOP-5643</a>. <a href="">HADOOP-5643</a>. Adds a way to decommission TaskTrackers
while the JobTracker is running. (Amar Kamat via ddas)
<li> <a href="">HADOOP-5419</a>. Provide a facility to query the Queue ACLs for the
current user. (Rahul Kumar Singh via yhemanth)
<li> <a href="">HADOOP-5733</a>. Add map/reduce slot capacity and blacklisted capacity to
JobTracker metrics. (Sreekanth Ramakrishnan via cdouglas)
<li> <a href="">HADOOP-5738</a>. Split "waiting_tasks" JobTracker metric into waiting maps and
waiting reduces. (Sreekanth Ramakrishnan via cdouglas)
<li> <a href="">HADOOP-4842</a>. Streaming now allows specifiying a command for the combiner.
(Amareshwari Sriramadasu via ddas)
<li> <a href="">HADOOP-4490</a>. Provide ability to run tasks as job owners.
(Sreekanth Ramakrishnan via yhemanth)
<li> <a href="">HADOOP-5442</a>. Paginate jobhistory display and added some search
capabilities. (Amar Kamat via acmurthy)
<li> <a href="">HADOOP-3327</a>. Improves handling of READ_TIMEOUT during map output copying.
(Amareshwari Sriramadasu via ddas)
<li> <a href="">HADOOP-5113</a>. Fixed logcondense to remove files for usernames
beginning with characters specified in the -l option.
(Peeyush Bishnoi via yhemanth)
<li> <a href="">HADOOP-2898</a>. Provide an option to specify a port range for
Hadoop services provisioned by HOD.
(Peeyush Bishnoi via yhemanth)
<li> <a href="">HADOOP-4930</a>. Implement a Linux native executable that can be used to
launch tasks as users. (Sreekanth Ramakrishnan via yhemanth)