blob: ba295f32ac10825a1864c06eb8ff91df6ca28117 [file] [log] [blame]
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.4.1 Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.4.1 Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.4.0</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-2081">YARN-2081</a>.
Minor bug reported by Hong Zhiguo and fixed by Hong Zhiguo (applications/distributed-shell)<br>
<b>TestDistributedShell fails after YARN-1962</b><br>
<blockquote>java.lang.AssertionError: expected:&lt;1&gt; but was:&lt;0&gt;
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-2066">YARN-2066</a>.
Minor bug reported by Ted Yu and fixed by Hong Zhiguo <br>
<b>Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()</b><br>
<blockquote>{code}
if (this.finish != null) {
builder.setFinishBegin(start.getMinimumLong());
builder.setFinishEnd(start.getMaximumLong());
}
{code}
this.finish should be referenced in the if block.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-2053">YARN-2053</a>.
Major sub-task reported by Sumit Mohanty and fixed by Wangda Tan (resourcemanager)<br>
<b>Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts</b><br>
<blockquote>Slider AppMaster restart fails with the following:
{code}
org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-2016">YARN-2016</a>.
Major bug reported by Venkat Ranganathan and fixed by Junping Du (resourcemanager)<br>
<b>Yarn getApplicationRequest start time range is not honored</b><br>
<blockquote>When we query for the previous applications by creating an instance of GetApplicationsRequest and setting the start time range and application tag, we see that the start range provided is not honored and all applications with the tag are returned
Attaching a reproducer.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1986">YARN-1986</a>.
Critical bug reported by Jon Bringhurst and fixed by Hong Zhiguo <br>
<b>In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE</b><br>
<blockquote>After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
-After RM was restarted, the job runs without a problem.-
{noformat}
19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
at java.lang.Thread.run(Thread.java:744)
19:11:13,443 INFO ResourceManager:604 - Exiting, bbye..
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1976">YARN-1976</a>.
Major bug reported by Yesha Vora and fixed by Junping Du <br>
<b>Tracking url missing http protocol for FAILED application</b><br>
<blockquote>Run yarn application -list -appStates FAILED, It does not print http protocol name like FINISHED apps.
{noformat}
-bash-4.1$ yarn application -list -appStates FINISHED,FAILED,KILLED
14/04/15 23:55:07 INFO client.RMProxy: Connecting to ResourceManager at host
Total number of applications (application-types: [] and states: [FINISHED, FAILED, KILLED]):4
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1397598467870_0004 Sleep job MAPREDUCE hrt_qa default FINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0004
application_1397598467870_0003 Sleep job MAPREDUCE hrt_qa default FINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0003
application_1397598467870_0002 Sleep job MAPREDUCE hrt_qa default FAILED FAILED 100% host:8088/cluster/app/application_1397598467870_0002
application_1397598467870_0001 word count MAPREDUCE hrt_qa default FINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0001
{noformat}
It only prints 'host:8088/cluster/app/application_1397598467870_0002' instead 'http://host:8088/cluster/app/application_1397598467870_0002' </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1975">YARN-1975</a>.
Major bug reported by Nathan Roberts and fixed by Mit Desai (resourcemanager)<br>
<b>Used resources shows escaped html in CapacityScheduler and FairScheduler page</b><br>
<blockquote>Used resources displays as &amp;amp;lt;memory:1111, vCores;&amp;amp;gt; with capacity scheduler
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1962">YARN-1962</a>.
Major sub-task reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam <br>
<b>Timeline server is enabled by default</b><br>
<blockquote>Since Timeline server is not matured and secured yet, enabling it by default might create some confusion.
We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet.
Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,.
This JIRA is to turn it off by default.
If there is an agreement, i can put a simple patch about this.
{noformat}
14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server.
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.&lt;in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server.
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.&lt;init&gt;(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:225)
at com.sun.jersey.api.client.CommittingOutputStream.commitWrite(CommittingOutputStream.java:117)
at com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89)
at org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754)
at org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088)
at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:1354)
at org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:527)
at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
... 9 moreit&gt;(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:225)
at com.sun.jersey.api.client.CommittingOutputStream.commitWrite(CommittingOutputStream.java:117)
at com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89)
at org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754)
at org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088)
at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:1354)
at org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:527)
at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
... 9 more
{noformat}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1957">YARN-1957</a>.
Major sub-task reported by Carlo Curino and fixed by Carlo Curino (resourcemanager)<br>
<b>ProportionalCapacitPreemptionPolicy handling of corner cases...</b><br>
<blockquote>The current version of ProportionalCapacityPreemptionPolicy should be improved to deal with the following two scenarios:
1) when rebalancing over-capacity allocations, it potentially preempts without considering the maxCapacity constraints of a queue (i.e., preempting possibly more than strictly necessary)
2) a zero capacity queue is preempted even if there is no demand (coherent with old use of zero-capacity to disabled queues)
The proposed patch fixes both issues, and introduce few new test cases.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1947">YARN-1947</a>.
Major test reported by Jian He and fixed by Jian He <br>
<b>TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently</b><br>
<blockquote>java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1934">YARN-1934</a>.
Blocker bug reported by Rohith and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.</b><br>
<blockquote>For ZK disconnected event , zkClient is set to null. It is very much prone to throw NPE.
{noformat}
case Disconnected:
LOG.info("ZKRMStateStore Session disconnected");
oldZkClient = zkClient;
zkClient = null;
break;
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1933">YARN-1933</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>TestAMRestart and TestNodeHealthService failing sometimes on Windows</b><br>
<blockquote>TestNodeHealthService failures:
testNodeHealthScript(org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService) Time elapsed: 1.405 sec &lt;&lt;&lt; ERROR!
java.io.FileNotFoundException: C:\Users\Administrator\Documents\hadoop-common\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService-localDir\failingscript.cmd (The process cannot access the file because it is being used by another process)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:221)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:171)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.writeNodeHealthScriptFile(TestNodeHealthService.java:82)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.testNodeHealthScript(TestNodeHealthService.java:154)
testNodeHealthScriptShouldRun(org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService) Time elapsed: 0 sec &lt;&lt;&lt; ERROR!
java.io.FileNotFoundException: C:\Users\Administrator\Documents\hadoop-common\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService-localDir\failingscript.cmd (Access is denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:221)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:171)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.writeNodeHealthScriptFile(TestNodeHealthService.java:82)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.testNodeHealthScriptShouldRun(TestNodeHealthService.java:103)
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1932">YARN-1932</a>.
Blocker bug reported by Mit Desai and fixed by Mit Desai <br>
<b>Javascript injection on the job status page</b><br>
<blockquote>Scripts can be injected into the job status page as the diagnostics field is
not sanitized. Whatever string you set there will show up to the jobs page as it is ... ie. if you put any script commands, they will be executed in the browser of the user who is opening the page.
We need escaping the diagnostic string in order to not run the scripts.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1931">YARN-1931</a>.
Blocker bug reported by Thomas Graves and fixed by Sandy Ryza (applications)<br>
<b>Private API change in YARN-1824 in 2.4 broke compatibility with previous releases</b><br>
<blockquote>YARN-1824 broke compatibility with previous 2.x releases by changes the API's in org.apache.hadoop.yarn.util.Apps.{setEnvFromInputString,addToEnvironment} The old api should be added back in.
This affects any ApplicationMasters who were using this api. It also breaks previously built MapReduce libraries from working with the new Yarn release as MR uses this api. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1929">YARN-1929</a>.
Blocker bug reported by Rohith and fixed by Karthik Kambatla (resourcemanager)<br>
<b>DeadLock in RM when automatic failover is enabled.</b><br>
<blockquote>Dead lock detected in RM when automatic failover is enabled.
{noformat}
Found one Java-level deadlock:
=============================
"Thread-2":
waiting to lock monitor 0x00007fb514303cf0 (object 0x00000000ef153fd0, a org.apache.hadoop.ha.ActiveStandbyElector),
which is held by "main-EventThread"
"main-EventThread":
waiting to lock monitor 0x00007fb514750a48 (object 0x00000000ef154020, a org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService),
which is held by "Thread-2"
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1928">YARN-1928</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>TestAMRMRPCNodeUpdates fails ocassionally</b><br>
<blockquote>{code}
junit.framework.AssertionFailedError: expected:&lt;0&gt; but was:&lt;4&gt;
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:136)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1926">YARN-1926</a>.
Major bug reported by Varun Vasudev and fixed by Varun Vasudev <br>
<b>DistributedShell unit tests fail on Windows</b><br>
<blockquote>Couple of unit tests for the DistributedShell fail on Windows - specifically testDSShellWithShellScript and testDSRestartWithPreviousRunningContainers </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1924">YARN-1924</a>.
Critical bug reported by Arpit Gupta and fixed by Jian He <br>
<b>STATE_STORE_OP_FAILED happens when ZKRMStateStore tries to update app(attempt) before storing it</b><br>
<blockquote>Noticed on a HA cluster Both RM shut down with this error. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1920">YARN-1920</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows</b><br>
<blockquote>Though this was only failing in Windows, after debugging, I realized that the test fails because we are leaking a file-handle in the history service.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1914">YARN-1914</a>.
Major bug reported by Varun Vasudev and fixed by Varun Vasudev <br>
<b>Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows</b><br>
<blockquote>The TestFSDownload.testDownloadPublicWithStatCache test in hadoop-yarn-common consistently fails on Windows environments.
The root cause is that the test checks for execute permission for all users on every ancestor of the target directory. In windows, by default, group "Everyone" has no permissions on any directory in the install drive. It's unreasonable to expect this test to pass and we should skip it on Windows.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1910">YARN-1910</a>.
Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>TestAMRMTokens fails on windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1908">YARN-1908</a>.
Major bug reported by Tassapol Athiapinya and fixed by Vinod Kumar Vavilapalli (applications/distributed-shell)<br>
<b>Distributed shell with custom script has permission error.</b><br>
<blockquote>Create test1.sh having "pwd".
Run this command as user1:
hadoop jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -shell_script test1.sh
NM is run by yarn user. An exception is thrown because yarn user has no permissions on custom script in hdfs path. The custom script is created with distributed shell app.
{code}
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=yarn, access=WRITE, inode="/user/user1/DistributedShell/70":user1:user1:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1907">YARN-1907</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails</b><br>
<blockquote>The test has 10000 containers that it tries to cleanup.
The cleanup has a timeout of 20000ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1905">YARN-1905</a>.
Trivial test reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>TestProcfsBasedProcessTree must only run on Linux.</b><br>
<blockquote>The tests in {{TestProcfsBasedProcessTree}} only make sense on Linux, where the process tree calculations are based on reading the /proc file system. Right now, not all of the individual tests are skipped when the OS is not Linux. This patch will make it consistent.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1903">YARN-1903</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set</b><br>
<blockquote>The container status after stopping container is not expected.
{code}
java.lang.AssertionError: 4:
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382)
at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346)
at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1898">YARN-1898</a>.
Major sub-task reported by Yesha Vora and fixed by Xuan Gong (resourcemanager)<br>
<b>Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM</b><br>
<blockquote>Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to Active RM.
It should not be redirected to Active RM</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1892">YARN-1892</a>.
Minor improvement reported by Siddharth Seth and fixed by Jian He (scheduler)<br>
<b>Excessive logging in RM</b><br>
<blockquote>Mostly in the CS I believe
{code}
INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1395435468498_0011 reserved container container_1395435468498_0011_01_000213 on node host: #containers=5 available=4096 used=20960, currently has 1 at priority 4; currentReservation 4096
{code}
{code}
INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: hive2 usedResources: &lt;memory:20480, vCores:5&gt; clusterResources: &lt;memory:81920, vCores:16&gt; currentCapacity 0.25 required &lt;memory:4096, vCores:1&gt; potentialNewCapacity: 0.255 ( max-capacity: 0.25)
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1883">YARN-1883</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestRMAdminService fails due to inconsistent entries in UserGroups</b><br>
<blockquote>testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails with the following error:
{noformat}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421)
at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104)
{noformat}
Line Numbers will be inconsistent as I was testing to run it in a particular order. But the Line on which the failure occurs is
{code}
Assert.assertTrue(groupBefore.contains("test_group_A")
&amp;&amp; groupBefore.contains("test_group_B")
&amp;&amp; groupBefore.contains("test_group_C") &amp;&amp; groupBefore.size() == 3);
{code}
testRMInitialsWithFileSystemBasedConfigurationProvider() and
testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider()
calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes the list of userGroups.
testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() tries to verify the groups before changing it and fails if testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made the changes.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1861">YARN-1861</a>.
Blocker sub-task reported by Arpit Gupta and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Both RM stuck in standby mode when automatic failover is enabled</b><br>
<blockquote>In our HA tests we noticed that the tests got stuck because both RM's got into standby state and no one became active.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1837">YARN-1837</a>.
Major bug reported by Tsuyoshi OZAWA and fixed by Hong Zhiguo <br>
<b>TestMoveApplication.testMoveRejectedByScheduler randomly fails</b><br>
<blockquote>TestMoveApplication#testMoveRejectedByScheduler fails because of NullPointerException. It looks caused by unhandled exception handling at server-side.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1750">YARN-1750</a>.
Major test reported by Ming Ma and fixed by Wangda Tan (nodemanager)<br>
<b>TestNodeStatusUpdater#testNMRegistration is incorrect in test case</b><br>
<blockquote>This test case passes. However, the test output log has
java.lang.AssertionError: Number of applications should only be one! expected:&lt;1&gt; but was:&lt;2&gt;
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker.nodeHeartbeat(TestNodeStatusUpdater.java:267)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:469)
at java.lang.Thread.run(Thread.java:695)
TestNodeStatusUpdater.java has invalid asserts.
} else if (heartBeatID == 3) {
// Checks on the RM end
Assert.assertEquals("Number of applications should only be one!", 1,
appToContainers.size());
Assert.assertEquals("Number of container for the app should be two!",
2, appToContainers.get(appId2).size());
We should fix the assert and add more check to the test.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1701">YARN-1701</a>.
Major sub-task reported by Gera Shegalov and fixed by Tsuyoshi OZAWA <br>
<b>Improve default paths of timeline store and generic history store</b><br>
<blockquote>When I enable AHS via yarn.ahs.enabled, the app history is still not visible in AHS webUI. This is due to NullApplicationHistoryStore as yarn.resourcemanager.history-writer.class. It would be good to have just one key to enable basic functionality.
yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is local file system location. However, FileSystemApplicationHistoryStore uses DFS by default. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1696">YARN-1696</a>.
Blocker sub-task reported by Karthik Kambatla and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>Document RM HA</b><br>
<blockquote>Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1281">YARN-1281</a>.
Major test reported by Karthik Kambatla and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>TestZKRMStateStoreZKClientConnections fails intermittently</b><br>
<blockquote>The test fails intermittently - haven't been able to reproduce the failure deterministically. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1201">YARN-1201</a>.
Minor bug reported by Nemon Lou and fixed by Wangda Tan (resourcemanager)<br>
<b>TestAMAuthorization fails with local hostname cannot be resolved</b><br>
<blockquote>When hostname is 158-1-131-10, TestAMAuthorization fails.
{code}
Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) Time elapsed: 3.952 sec &lt;&lt;&lt; ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) Time elapsed: 3.116 sec &lt;&lt;&lt; ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284)
Results :
Tests in error:
TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
Tests run: 4, Failures: 0, Errors: 2, Skipped: 0
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5843">MAPREDUCE-5843</a>.
Major test reported by Varun Vasudev and fixed by Varun Vasudev <br>
<b>TestMRKeyValueTextInputFormat failing on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5841">MAPREDUCE-5841</a>.
Major bug reported by Sangjin Lee and fixed by Sangjin Lee (mrv2)<br>
<b>uber job doesn't terminate on getting mapred job kill</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5835">MAPREDUCE-5835</a>.
Critical bug reported by Ming Ma and fixed by Ming Ma <br>
<b>Killing Task might cause the job to go to ERROR state</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5833">MAPREDUCE-5833</a>.
Major test reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>TestRMContainerAllocator fails ocassionally</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5832">MAPREDUCE-5832</a>.
Major bug reported by Jian He and fixed by Vinod Kumar Vavilapalli <br>
<b>Few tests in TestJobClient fail on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5830">MAPREDUCE-5830</a>.
Blocker bug reported by Jason Lowe and fixed by Akira AJISAKA <br>
<b>HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5828">MAPREDUCE-5828</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>TestMapReduceJobControl fails on JDK 7 + Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5827">MAPREDUCE-5827</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>TestSpeculativeExecutionWithMRApp fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5826">MAPREDUCE-5826</a>.
Major bug reported by Varun Vasudev and fixed by Varun Vasudev <br>
<b>TestHistoryServerFileSystemStateStoreService.testTokenStore fails in windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5824">MAPREDUCE-5824</a>.
Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>TestPipesNonJavaInputFormat.testFormat fails in windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5821">MAPREDUCE-5821</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (performance , task)<br>
<b>IFile merge allocates new byte array for every value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5818">MAPREDUCE-5818</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>hsadmin cmd is missing in mapred.cmd</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5815">MAPREDUCE-5815</a>.
Blocker bug reported by Gera Shegalov and fixed by Akira AJISAKA (client , mrv2)<br>
<b>Fix NPE in TestMRAppMaster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5714">MAPREDUCE-5714</a>.
Major bug reported by Jinghui Wang and fixed by Jinghui Wang (test)<br>
<b>TestMRAppComponentDependencies causes surefire to exit without saying proper goodbye</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3191">MAPREDUCE-3191</a>.
Trivial bug reported by Todd Lipcon and fixed by Chen He <br>
<b>docs for map output compression incorrectly reference SequenceFile</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6527">HDFS-6527</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Edit log corruption due to defered INode removal</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6411">HDFS-6411</a>.
Major bug reported by Zhongyi Xie and fixed by Brandon Li (nfs)<br>
<b>nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6402">HDFS-6402</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>Suppress findbugs warning for failure to override equals and hashCode in FsAclPermission.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6397">HDFS-6397</a>.
Critical bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam <br>
<b>NN shows inconsistent value in deadnode count </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6362">HDFS-6362</a>.
Blocker bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)<br>
<b>InvalidateBlocks is inconsistent in usage of DatanodeUuid and StorageID</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6361">HDFS-6361</a>.
Major bug reported by Yongjun Zhang and fixed by Yongjun Zhang (nfs)<br>
<b>TestIdUserGroup.testUserUpdateSetting failed due to out of range nfsnobody Id</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6340">HDFS-6340</a>.
Blocker bug reported by Rahul Singhal and fixed by Rahul Singhal (datanode)<br>
<b>DN can't finalize upgrade</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6329">HDFS-6329</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>WebHdfs does not work if HA is enabled on NN but logical URI is not configured.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6326">HDFS-6326</a>.
Blocker bug reported by Daryn Sharp and fixed by Chris Nauroth (webhdfs)<br>
<b>WebHdfs ACL compatibility is broken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6325">HDFS-6325</a>.
Major bug reported by Konstantin Shvachko and fixed by Keith Pak (namenode)<br>
<b>Append should fail if the last block has insufficient number of replicas</b><br>
<blockquote>I have committed the fix to the trunk, branch-2, and branch-2.4 respectively. Thanks Keith!</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6313">HDFS-6313</a>.
Blocker bug reported by Daryn Sharp and fixed by Kihwal Lee (webhdfs)<br>
<b>WebHdfs may use the wrong NN when configured for multiple HA NNs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6245">HDFS-6245</a>.
Major bug reported by Arpit Gupta and fixed by Arpit Agarwal <br>
<b>datanode fails to start with a bad disk even when failed volumes is set</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6236">HDFS-6236</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>ImageServlet should use Time#monotonicNow to measure latency.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6235">HDFS-6235</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode , test)<br>
<b>TestFileJournalManager can fail on Windows due to file locking if tests run out of order.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6234">HDFS-6234</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode , test)<br>
<b>TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6232">HDFS-6232</a>.
Major bug reported by Stephen Chu and fixed by Akira AJISAKA (tools)<br>
<b>OfflineEditsViewer throws a NPE on edits containing ACL modifications</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6231">HDFS-6231</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (hdfs-client)<br>
<b>DFSClient hangs infinitely if using hedged reads and all eligible datanodes die.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6229">HDFS-6229</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao (ha)<br>
<b>Race condition in failover can cause RetryCache fail to work</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6215">HDFS-6215</a>.
Minor bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Wrong error message for upgrade</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6209">HDFS-6209</a>.
Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>Fix flaky test TestValidateConfigurationSettings.testThatDifferentRPCandHttpPortsAreOK</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6208">HDFS-6208</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)<br>
<b>DataNode caching can leak file descriptors.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6206">HDFS-6206</a>.
Major bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze <br>
<b>DFSUtil.substituteForWildcardAddress may throw NPE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6204">HDFS-6204</a>.
Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)<br>
<b>TestRBWBlockInvalidation may fail</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6198">HDFS-6198</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)<br>
<b>DataNode rolling upgrade does not correctly identify current block pool directory and replace with trash on Windows.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6197">HDFS-6197</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>Rolling upgrade rollback on Windows can fail attempting to rename edit log segment files to a destination that already exists.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6189">HDFS-6189</a>.
Major test reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>Multiple HDFS tests fail on Windows attempting to use a test root path containing a colon.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4052">HDFS-4052</a>.
Minor improvement reported by Jing Zhao and fixed by Jing Zhao <br>
<b>BlockManager#invalidateWork should print logs outside the lock</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2882">HDFS-2882</a>.
Major bug reported by Todd Lipcon and fixed by Vinayakumar B (datanode)<br>
<b>DN continues to start up, even if block pool fails to initialize</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10612">HADOOP-10612</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>NFS failed to refresh the user group id mapping table</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10562">HADOOP-10562</a>.
Critical bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>Namenode exits on exception without printing stack trace in AbstractDelegationTokenSecretManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10527">HADOOP-10527</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Fix incorrect return code and allow more retries on EINTR</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10522">HADOOP-10522</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>JniBasedUnixGroupMapping mishandles errors</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10490">HADOOP-10490</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestMapFile and TestBloomMapFile leak file descriptors.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10473">HADOOP-10473</a>.
Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)<br>
<b>TestCallQueueManager is still flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10466">HADOOP-10466</a>.
Minor improvement reported by Nicolas Liochon and fixed by Nicolas Liochon (security)<br>
<b>Lower the log level in UserGroupInformation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10456">HADOOP-10456</a>.
Major bug reported by Nishkam Ravi and fixed by Nishkam Ravi (conf)<br>
<b>Bug in Configuration.java exposed by Spark (ConcurrentModificationException)</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10455">HADOOP-10455</a>.
Major bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (ipc)<br>
<b>When there is an exception, ipc.Server should first check whether it is an terse exception</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8826">HADOOP-8826</a>.
Minor bug reported by Robert Joseph Evans and fixed by Mit Desai <br>
<b>Docs still refer to 0.20.205 as stable line</b><br>
<blockquote></blockquote></li>
</ul>
</body></html>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.4.0 Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.4.0 Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.3.0</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1893">YARN-1893</a>.
Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong (resourcemanager)<br>
<b>Make ApplicationMasterProtocol#allocate AtMostOnce</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1891">YARN-1891</a>.
Minor task reported by Varun Vasudev and fixed by Varun Vasudev <br>
<b>Document NodeManager health-monitoring</b><br>
<blockquote>Start documenting node manager starting with the health monitoring.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1873">YARN-1873</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestDistributedShell#testDSShell fails when the test cases are out of order</b><br>
<blockquote>testDSShell fails when the tests are run in random order. I see a cleanup issue here.
{noformat}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec &lt;&lt;&lt; FAILURE!
java.lang.AssertionError: expected:&lt;1&gt; but was:&lt;6&gt;
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
Results :
Failed tests:
TestDistributedShell.testOrder:134-&gt;testDSShell:204 expected:&lt;1&gt; but was:&lt;6&gt;
{noformat}
The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1867">YARN-1867</a>.
Blocker bug reported by Karthik Kambatla and fixed by Vinod Kumar Vavilapalli (resourcemanager)<br>
<b>NPE while fetching apps via the REST API</b><br>
<blockquote>We ran into the following NPE when fetching applications using the REST API:
{noformat}
INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1866">YARN-1866</a>.
Blocker bug reported by Arpit Gupta and fixed by Jian He <br>
<b>YARN RM fails to load state store with delegation token parsing error</b><br>
<blockquote>In our secure Nightlies we saw exceptions in the RM log where it failed to parse the deletegation token.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1863">YARN-1863</a>.
Blocker test reported by Ted Yu and fixed by Xuan Gong <br>
<b>TestRMFailover fails with 'AssertionError: null' </b><br>
<blockquote>This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced:
{code}
testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec &lt;&lt;&lt; FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216)
testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec &lt;&lt;&lt; FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250)
at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1859">YARN-1859</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM</b><br>
<blockquote>WebAppProxyServlet checks null to determine whether the application is not found or not.
{code}
ApplicationReport applicationReport = getApplicationReport(id);
if(applicationReport == null) {
LOG.warn(req.getRemoteUser()+" Attempting to access "+id+
" that was not found");
{code}
However, WebAppProxyServlet calls AppReportFetcher, which consequently calls ClientRMService. When application is not found, ClientRMService throws ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following logic to create the tracking url for a non-cached app will no longer be in use.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1855">YARN-1855</a>.
Critical test reported by Ted Yu and fixed by Zhijie Shen <br>
<b>TestRMFailover#testRMWebAppRedirect fails in trunk</b><br>
<blockquote>From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console :
{code}
testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.39 sec &lt;&lt;&lt; ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1854">YARN-1854</a>.
Blocker test reported by Mit Desai and fixed by Rohith <br>
<b>Race condition in TestRMHA#testStartAndTransitions</b><br>
<blockquote>There is race in test.
TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately after application is submitted, but QueueMetrics are updated after app attempt is sheduled. Calling verifyClusterMetrics() without verifying app attempt is in Scheduled state cause random test failures.
MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue metrics before app attempt is Scheduled.
{noformat}
testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec &lt;&lt;&lt; FAILURE!
java.lang.AssertionError: Incorrect value for metric availableMB expected:&lt;2048&gt; but was:&lt;4096&gt;
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
Results :
Failed tests:
TestRMHA.testStartAndTransitions:160-&gt;verifyClusterMetrics:387-&gt;assertMetric:396 Incorrect value for metric availableMB expected:&lt;2048&gt; but was:&lt;4096&gt;
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1852">YARN-1852</a>.
Major bug reported by Rohith and fixed by Rohith (resourcemanager)<br>
<b>Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs</b><br>
<blockquote>Recovering for failed/killed application throw InvalidStateTransitonException.
These are logged during recovery of applications.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1850">YARN-1850</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Make enabling timeline service configurable </b><br>
<blockquote>Like generic history service, we'd better to make enabling timeline service configurable, in case the timeline server is not up</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1849">YARN-1849</a>.
Blocker bug reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>NPE in ResourceTrackerService#registerNodeManager for UAM</b><br>
<blockquote>While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1846">YARN-1846</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter <br>
<b>TestRM#testNMTokenSentForNormalContainer assumes CapacityScheduler</b><br>
<blockquote>TestRM.testNMTokenSentForNormalContainer assumes the CapacityScheduler is being used and tries to do:
{code:java}
CapacityScheduler cs = (CapacityScheduler) rm.getResourceScheduler();
{code}
This throws a {{ClassCastException}} if you're not using the CapacityScheduler.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1839">YARN-1839</a>.
Critical bug reported by Tassapol Athiapinya and fixed by Jian He (applications , capacityscheduler)<br>
<b>Capacity scheduler preempts an AM out. AM attempt 2 fails to launch task container with SecretManager$InvalidToken: No NMToken sent</b><br>
<blockquote>Use single-node cluster. Turn on capacity scheduler preemption. Run MR sleep job as app 1. Take entire cluster. Run MR sleep job as app 2. Preempt app1 out. Wait till app 2 finishes. App 1 AM attempt 2 will start. It won't be able to launch a task container with this error stack trace in AM logs:
{code}
2014-03-13 20:13:50,254 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1394741557066_0001_m_000000_1009: Container launch failed for container_1394741557066_0001_02_000021 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for &lt;host&gt;:45454
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.&lt;init&gt;(ContainerManagementProtocolProxy.java:196)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1838">YARN-1838</a>.
Major sub-task reported by Srimanth Gunturi and fixed by Billie Rinaldi <br>
<b>Timeline service getEntities API should provide ability to get entities from given id</b><br>
<blockquote>To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}.
For example on a page of 10 jobs, our first call will be like
[http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfo&amp;limit=11]
When user hits next, we would like to call
[http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfo&amp;fromid=JID11&amp;limit=11]
and continue on for further _Next_ clicks
On hitting back, we will make similar calls for previous items
[http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfo&amp;fromid=JID1&amp;limit=11]
{{fromid}} should be inclusive of the id given.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1833">YARN-1833</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestRMAdminService Fails in trunk and branch-2 : Assert Fails due to different count of UserGroups for currentUser()</b><br>
<blockquote>In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed.
{code}
Assert.assertTrue(groupWithInit.size() != groupBefore.size());
{code}
As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same.
I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1830">YARN-1830</a>.
Major bug reported by Karthik Kambatla and fixed by Zhijie Shen (resourcemanager)<br>
<b>TestRMRestart.testQueueMetricsOnRMRestart failure</b><br>
<blockquote>TestRMRestart.testQueueMetricsOnRMRestart fails intermittently as follows (reported on YARN-1815):
{noformat}
java.lang.AssertionError: expected:&lt;37&gt; but was:&lt;38&gt;
...
at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1728)
at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1682)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1824">YARN-1824</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Make Windows client work with Linux/Unix cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1821">YARN-1821</a>.
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>NPE on registerNodeManager if the request has containers for UnmanagedAMs</b><br>
<blockquote>On RM restart (or failover), NM re-registers with the RM. If it was running containers for Unmanaged AMs, it runs into the following NPE:
{noformat}
Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:213)
at org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceTrackerPBServiceImpl.registerNodeManager(ResourceTrackerPBServiceImpl.java:54)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1816">YARN-1816</a>.
Major sub-task reported by Arpit Gupta and fixed by Jian He <br>
<b>Succeeded application remains in accepted after RM restart</b><br>
<blockquote>{code}
2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008
2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1
2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008
2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1
2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008
2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1
2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2014-03-10 18:09:05,729|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008
2014-03-10 18:09:35,879|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
2014-03-10 18:09:36,951|beaver.machine|INFO|14/03/10 18:09:36 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2014-03-10 18:09:36,992|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1
2014-03-10 18:09:36,993|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2014-03-10 18:09:36,993|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008
2014-03-10 18:10:07,142|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
2014-03-10 18:10:08,201|beaver.machine|INFO|14/03/10 18:10:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2014-03-10 18:10:08,242|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1
2014-03-10 18:10:08,242|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2014-03-10 18:10:08,242|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008
2014-03-10 18:10:38,392|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
2014-03-10 18:10:39,443|beaver.machine|INFO|14/03/10 18:10:39 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2014-03-10 18:10:39,484|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1
2014-03-10 18:10:39,484|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
2014-03-10 18:10:39,485|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1812">YARN-1812</a>.
Major sub-task reported by Yesha Vora and fixed by Jian He <br>
<b>Job stays in PREP state for long time after RM Restarts</b><br>
<blockquote>Steps followed:
1) start a sort job with 80 maps and 5 reducers
2) restart Resource manager when 60 maps and 0 reducers are finished
3) Wait for job to come out of PREP state.
The job does not come out of PREP state after 7-8 mins.
After waiting for 7-8 mins, test kills the job.
However, Sort job should not take this long time to come out of PREP state</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1811">YARN-1811</a>.
Major sub-task reported by Robert Kanter and fixed by Robert Kanter (resourcemanager)<br>
<b>RM HA: AM link broken if the AM is on nodes other than RM</b><br>
<blockquote>When using RM HA, if you click on the "Application Master" link in the RM web UI while the job is running, you get an Error 500:
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1800">YARN-1800</a>.
Critical sub-task reported by Paul Isaychuk and fixed by Varun Vasudev (nodemanager)<br>
<b>YARN NodeManager with java.util.concurrent.RejectedExecutionException</b><br>
<blockquote>Noticed this on tests running on Apache Hadoop 2.2 cluster
{code}
2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar transitioned from INIT to DOWNLOADING
2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.splitmetainfo transitioned from INIT to DOWNLOADING
2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.split transitioned from INIT to DOWNLOADING
2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.xml transitioned from INIT to DOWNLOADING
2014-01-23 01:30:28,576 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:addResource(651)) - Downloading public rsrc:{ hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar, 1390440627435, FILE, null }
2014-01-23 01:30:28,576 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(141)) - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:678)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:583)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:525)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
at java.lang.Thread.run(Thread.java:662)
2014-01-23 01:30:28,577 INFO event.AsyncDispatcher (AsyncDispatcher.java:dispatch(144)) - Exiting, bbye..
2014-01-23 01:30:28,596 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@0.0.0.0:50060
2014-01-23 01:30:28,597 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(328)) - Applications still running : [application_1389742077466_0396]
2014-01-23 01:30:28,597 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(336)) - Wa
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1793">YARN-1793</a>.
Critical bug reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>yarn application -kill doesn't kill UnmanagedAMs</b><br>
<blockquote>Trying to kill an Unmanaged AM though CLI (yarn application -kill &lt;id&gt;) logs a success, but doesn't actually kill the AM or reclaim the containers allocated to it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1789">YARN-1789</a>.
Minor improvement reported by Akira AJISAKA and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>ApplicationSummary does not escape newlines in the app name</b><br>
<blockquote>YARN-side of MAPREDUCE-5778.
ApplicationSummary is not escaping newlines in the app name. This can result in an application summary log entry that spans multiple lines when users are expecting one-app-per-line output.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1788">YARN-1788</a>.
Critical bug reported by Tassapol Athiapinya and fixed by Varun Vasudev (resourcemanager)<br>
<b>AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn application -kill</b><br>
<blockquote>Run MR sleep job. Kill the application in RUNNING state. Observe RM metrics.
Expecting AppsCompleted = 0/AppsKilled = 1
Actual is AppsCompleted = 1/AppsKilled = 0</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1787">YARN-1787</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>yarn applicationattempt/container print wrong usage information</b><br>
<blockquote>yarn applicationattempt prints:
{code}
Invalid Command Usage :
usage: application
-appStates &lt;States&gt; Works with -list to filter applications
based on input comma-separated list of
application states. The valid application
state can be one of the following:
ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
NING,FINISHED,FAILED,KILLED
-appTypes &lt;Types&gt; Works with -list to filter applications
based on input comma-separated list of
application types.
-help Displays help for all commands.
-kill &lt;Application ID&gt; Kills the application.
-list &lt;arg&gt; List application attempts for aplication
from AHS.
-movetoqueue &lt;Application ID&gt; Moves the application to a different
queue.
-queue &lt;Queue Name&gt; Works with the movetoqueue command to
specify which queue to move an
application to.
-status &lt;Application ID&gt; Prints the status of the application.
{code}
yarn container prints:
{code}
Invalid Command Usage :
usage: application
-appStates &lt;States&gt; Works with -list to filter applications
based on input comma-separated list of
application states. The valid application
state can be one of the following:
ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
NING,FINISHED,FAILED,KILLED
-appTypes &lt;Types&gt; Works with -list to filter applications
based on input comma-separated list of
application types.
-help Displays help for all commands.
-kill &lt;Application ID&gt; Kills the application.
-list &lt;arg&gt; List application attempts for aplication
from AHS.
-movetoqueue &lt;Application ID&gt; Moves the application to a different
queue.
-queue &lt;Queue Name&gt; Works with the movetoqueue command to
specify which queue to move an
application to.
-status &lt;Application ID&gt; Prints the status of the application.
{code}
Both commands print irrelevant yarn application usage information.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1785">YARN-1785</a>.
Major bug reported by bc Wong and fixed by bc Wong <br>
<b>FairScheduler treats app lookup failures as ERRORs</b><br>
<blockquote>When invoking the /ws/v1/cluster/apps endpoint, RM will eventually get to RMAppImpl#createAndGetApplicationReport, which calls RMAppAttemptImpl#getApplicationResourceUsageReport, which looks up the app in the scheduler, which may or may not exist. So FairScheduler shouldn't log an error for every lookup failure:
{noformat}
2014-02-17 08:23:21,240 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_1392419715319_0135_000001
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1783">YARN-1783</a>.
Critical bug reported by Arpit Gupta and fixed by Jian He <br>
<b>yarn application does not make any progress even when no other application is running when RM is being restarted in the background</b><br>
<blockquote>Noticed that during HA tests some tests took over 3 hours to run when the test failed.
Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins
I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1781">YARN-1781</a>.
Major sub-task reported by Varun Vasudev and fixed by Varun Vasudev (nodemanager)<br>
<b>NM should allow users to specify max disk utilization for local disks</b><br>
<blockquote>This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers.
The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1780">YARN-1780</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Improve logging in timeline service</b><br>
<blockquote>It's difficult to trace whether the client has successfully posted the entity to the timeline service or not.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1776">YARN-1776</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>renewDelegationToken should survive RM failover</b><br>
<blockquote>When a delegation token is renewed, two RMStateStore operations: 1) removing the old DT, and 2) storing the new DT will happen. If RM fails in between. There would be problem.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1775">YARN-1775</a>.
Major sub-task reported by Rajesh Balamohan and fixed by Rajesh Balamohan (nodemanager)<br>
<b>Create SMAPBasedProcessTree to get PSS information</b><br>
<blockquote>Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1774">YARN-1774</a>.
Blocker bug reported by Anubhav Dhoot and fixed by Anubhav Dhoot (resourcemanager)<br>
<b>FS: Submitting to non-leaf queue throws NPE</b><br>
<blockquote>If you create a hierarchy of queues and assign a job to parent queue, FairScheduler quits with a NPE.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1771">YARN-1771</a>.
Critical improvement reported by Sangjin Lee and fixed by Sangjin Lee (nodemanager)<br>
<b>many getFileStatus calls made from node manager for localizing a public distributed cache resource</b><br>
<blockquote>We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache.
We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example:
{noformat}
2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ...
2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ...
2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ...
2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/ ...
2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1768">YARN-1768</a>.
Minor bug reported by Hitesh Shah and fixed by Tsuyoshi OZAWA (client)<br>
<b>yarn kill non-existent application is too verbose</b><br>
<blockquote>Instead of catching ApplicationNotFound and logging a simple app not found message, the whole stack trace is logged.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1766">YARN-1766</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.</b><br>
<blockquote>Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1765">YARN-1765</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Write test cases to verify that killApplication API works in RM HA</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1764">YARN-1764</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Handle RM fail overs after the submitApplication call.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1761">YARN-1761</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1760">YARN-1760</a>.
Trivial bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>TestRMAdminService assumes CapacityScheduler</b><br>
<blockquote>YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler.
{noformat}
java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1758">YARN-1758</a>.
Blocker bug reported by Hitesh Shah and fixed by Xuan Gong <br>
<b>MiniYARNCluster broken post YARN-1666</b><br>
<blockquote>NPE seen when trying to use MiniYARNCluster</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1752">YARN-1752</a>.
Major bug reported by Jian He and fixed by Rohith <br>
<b>Unexpected Unregistered event at Attempt Launched state</b><br>
<blockquote>{code}
2014-02-21 14:56:03,453 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: UNREGISTERED at LAUNCHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:695)
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1749">YARN-1749</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Review AHS configs and sync them up with the timeline-service configs</b><br>
<blockquote>We need to:
1. Review the configuration names and default values
2. Combine the two store class configurations
Some other thoughts:
1. Maybe we don't need null implementation of ApplicationHistoryStore any more
2. Maybe if yarn.ahs.enabled = false, we should stop AHS web server returning historic information</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1748">YARN-1748</a>.
Blocker bug reported by Sravya Tirukkovalur and fixed by Sravya Tirukkovalur <br>
<b>hadoop-yarn-server-tests packages core-site.xml breaking downstream tests</b><br>
<blockquote>Jars should not package config files, as this might come into the classpaths of clients causing the clients to break.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1742">YARN-1742</a>.
Trivial bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Fix javadoc of parameter DEFAULT_NM_MIN_HEALTHY_DISKS_FRACTION</b><br>
<blockquote>In YarnConfiguration.java,
{code}
/**
* By default, at least 5% of disks are to be healthy to say that the node
* is healthy in terms of disks.
*/
public static final float DEFAULT_NM_MIN_HEALTHY_DISKS_FRACTION
= 0.25F;
{code}
25% is the correct.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1734">YARN-1734</a>.
Critical sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>RM should get the updated Configurations when it transits from Standby to Active</b><br>
<blockquote>Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1732">YARN-1732</a>.
Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>Change types of related entities and primary filters in ATSEntity</b><br>
<blockquote>The current types Map&lt;String, List&lt;String&gt;&gt; relatedEntities and Map&lt;String, Object&gt; primaryFilters have issues. The List&lt;String&gt; value of the related entities map could have multiple identical strings in it, which doesn't make sense. A more major issue is that we cannot allow primary filter values to be overwritten, because otherwise we will be unable to find those primary filter entries when we want to delete an entity (without doing a nearly full scan).
I propose changing related entities to Map&lt;String, Set&lt;String&gt;&gt; and primary filters to Map&lt;String, Set&lt;Object&gt;&gt;. The basic methods to add primary filters and related entities are of the form add(key, value) and will not need to change.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1730">YARN-1730</a>.
Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>Leveldb timeline store needs simple write locking</b><br>
<blockquote>Although the leveldb writes are performed atomically in a batch, a start time for the entity needs to identified before each write. Thus a per-entity write lock should be acquired.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1729">YARN-1729</a>.
Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>TimelineWebServices always passes primary and secondary filters as strings</b><br>
<blockquote>Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1724">YARN-1724</a>.
Critical bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Race condition in Fair Scheduler when continuous scheduling is turned on </b><br>
<blockquote>If nodes resource allocations change during
Collections.sort(nodeIdList, nodeAvailableResourceComparator);
we'll hit:
java.lang.IllegalArgumentException: Comparison method violates its general contract!</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1721">YARN-1721</a>.
Critical bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp</b><br>
<blockquote>FairScheduler.moveApplication should grab lock on FSSchedulerApp, so that allocate() can't be modifying it at the same time.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1719">YARN-1719</a>.
Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>ATSWebServices produces jersey warnings</b><br>
<blockquote>These don't appear to affect how the web services work, but the following warnings are logged:
{noformat}
WARNING: The following warnings have been detected with resource and/or provider
classes:
WARNING: A sub-resource method, public org.apache.hadoop.yarn.server.applicati
onhistoryservice.webapp.ATSWebServices$AboutInfo org.apache.hadoop.yarn.server.a
pplicationhistoryservice.webapp.ATSWebServices.about(javax.servlet.http.HttpServ
letRequest,javax.servlet.http.HttpServletResponse), with URI template, "/", is t
reated as a resource method
WARNING: A sub-resource method, public org.apache.hadoop.yarn.api.records.appt
imeline.ATSPutErrors org.apache.hadoop.yarn.server.applicationhistoryservice.web
app.ATSWebServices.postEntities(javax.servlet.http.HttpServletRequest,javax.serv
let.http.HttpServletResponse,org.apache.hadoop.yarn.api.records.apptimeline.ATSE
ntities), with URI template, "/", is treated as a resource method
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1717">YARN-1717</a>.
Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>Enable offline deletion of entries in leveldb timeline store</b><br>
<blockquote>The leveldb timeline store implementation needs the following:
* better documentation of its internal structures
* internal changes to enable deleting entities
** never overwrite existing primary filter entries
** add hidden reverse pointers to related entities</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1706">YARN-1706</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Create an utility function to dump timeline records to json </b><br>
<blockquote>For verification and log purpose</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1704">YARN-1704</a>.
Blocker sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>Review LICENSE and NOTICE to reflect new levelDB releated libraries being used</b><br>
<blockquote>Make any changes necessary in LICENSE and NOTICE related to dependencies introduced by the application timeline store.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1698">YARN-1698</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Replace MemoryApplicationTimelineStore with LeveldbApplicationTimelineStore as default</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1697">YARN-1697</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>NodeManager reports negative running containers</b><br>
<blockquote>We're seeing the NodeManager metrics report a negative number of running containers.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1692">YARN-1692</a>.
Major bug reported by Sangjin Lee and fixed by Sangjin Lee (scheduler)<br>
<b>ConcurrentModificationException in fair scheduler AppSchedulable</b><br>
<blockquote>We saw a ConcurrentModificationException thrown in the fair scheduler:
{noformat}
2014-02-07 01:40:01,978 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Exception in fair scheduler UpdateThread
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926)
at java.util.HashMap$ValueIterator.next(HashMap.java:954)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.updateDemand(AppSchedulable.java:85)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.updateDemand(FSLeafQueue.java:125)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.updateDemand(FSParentQueue.java:82)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:217)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:195)
at java.lang.Thread.run(Thread.java:724)
{noformat}
The map that gets returned by FSSchedulerApp.getResourceRequests() are iterated on without proper synchronization.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1690">YARN-1690</a>.
Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>Sending timeline entities+events from Distributed shell </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1689">YARN-1689</a>.
Critical bug reported by Deepesh Khandelwal and fixed by Vinod Kumar Vavilapalli (resourcemanager)<br>
<b>RMAppAttempt is not killed when RMApp is at ACCEPTED</b><br>
<blockquote>When running some Hive on Tez jobs, the RM after a while gets into an unusable state where no jobs run. In the RM log I see the following exception:
{code}
2014-02-04 20:28:08,553 WARN ipc.Server (Server.java:run(1978)) - IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster from 172.18.145.156:40474 Call#0 Retry#0: error: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:48)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:278)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
......
2014-02-04 20:28:08,544 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(626)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ATTEMPT_REGISTERED at KILLED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:624)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:81)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:656)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:640)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
2014-02-04 20:28:08,549 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(140)) - USER=hrt_qa IP=172.18.145.156 OPERATION=Kill Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1391543307203_0001
2014-02-04 20:28:08,553 WARN ipc.Server (Server.java:run(1978)) - IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster from 172.18.145.156:40474 Call#0 Retry#0: error: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:48)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:278)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1687">YARN-1687</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Refactoring timeline classes to remove "app" related words</b><br>
<blockquote>Remove ATS prefix, change package name, fix javadoc and so on</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1686">YARN-1686</a>.
Major bug reported by Rohith and fixed by Rohith (nodemanager)<br>
<b>NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.</b><br>
<blockquote>During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens.
Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in "resyncWithRM" (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1685">YARN-1685</a>.
Major sub-task reported by Mayank Bansal and fixed by Zhijie Shen <br>
<b>Bugs around log URL</b><br>
<blockquote>1. Log URL should be different when the container is running and finished
2. Null case needs to be handled
3. The way of constructing log URL should be corrected</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1684">YARN-1684</a>.
Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>Fix history server heap size in yarn script</b><br>
<blockquote>The yarn script currently has the following:
{noformat}
if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then
JAVA_HEAP_MAX="-Xmx""$YARN_HISTORYSERVER_HEAPSIZE""m"
fi
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1676">YARN-1676</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Make admin refreshUserToGroupsMappings of configuration work across RM failover</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1673">YARN-1673</a>.
Blocker bug reported by Tassapol Athiapinya and fixed by Mayank Bansal (client)<br>
<b>Valid yarn kill application prints out help message.</b><br>
<blockquote>yarn application -kill &lt;application ID&gt;
used to work previously. In 2.4.0 it prints out help message and does not kill the application.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1672">YARN-1672</a>.
Trivial bug reported by Karthik Kambatla and fixed by Naren Koneru (nodemanager)<br>
<b>YarnConfiguration is missing a default for yarn.nodemanager.log.retain-seconds</b><br>
<blockquote>YarnConfiguration is missing a default for yarn.nodemanager.log.retain-seconds</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1670">YARN-1670</a>.
Critical bug reported by Thomas Graves and fixed by Mit Desai <br>
<b>aggregated log writer can write more log data then it says is the log length</b><br>
<blockquote>We have seen exceptions when using 'yarn logs' to read log files.
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that.
Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small.
We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this.
We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long.
while (len != -1 &amp;&amp; curRead &lt; fileLength) {
This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1669">YARN-1669</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Make admin refreshServiceAcls work across RM failover</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1668">YARN-1668</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Make admin refreshAdminAcls work across RM failover</b><br>
<blockquote>Change the handling of admin-acls to be available across RM failover by making using of a remote configuration-provider
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1667">YARN-1667</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Make admin refreshSuperUserGroupsConfiguration work across RM failover</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1666">YARN-1666</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Make admin refreshNodes work across RM failover</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1665">YARN-1665</a>.
Major sub-task reported by Arpit Gupta and fixed by Xuan Gong (resourcemanager)<br>
<b>Set better defaults for HA configs for automatic failover</b><br>
<blockquote>In order to enable HA (automatic failover) i had to set the following configs
{code}
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.ha.enabled&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.ha.automatic-failover.enabled&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.ha.automatic-failover.embedded&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
{code}
I believe the user should just have to set yarn.resourcemanager.ha.enabled=true and the rest should be set as defaults. Basically automatic failover should be the default.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1661">YARN-1661</a>.
Major bug reported by Tassapol Athiapinya and fixed by Vinod Kumar Vavilapalli (applications/distributed-shell)<br>
<b>AppMaster logs says failing even if an application does succeed.</b><br>
<blockquote>Run:
/usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar &lt;distributed shell jar&gt; -shell_command ls
Open AM logs. Last line would indicate AM failure even though container logs print good ls result.
{code}
2014-01-24 21:45:29,592 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:finish(599)) - Application completed. Signalling finish to RM
2014-01-24 21:45:29,612 INFO [main] impl.AMRMClientImpl (AMRMClientImpl.java:unregisterApplicationMaster(315)) - Waiting for application to be successfully unregistered.
2014-01-24 21:45:29,816 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:main(267)) - Application Master failed. exiting
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1660">YARN-1660</a>.
Major sub-task reported by Arpit Gupta and fixed by Xuan Gong (resourcemanager)<br>
<b>add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM</b><br>
<blockquote>Currently the user has to specify all the various host:port properties for RM. We should follow the pattern that we do for non HA setup where we can specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all other affected properties.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1659">YARN-1659</a>.
Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>Define the ApplicationTimelineStore store as an abstraction for implementing different storage impls for storing timeline information</b><br>
<blockquote>These will be used by ApplicationTimelineStore interface. The web services will convert the store-facing obects to the user-facing objects.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1658">YARN-1658</a>.
Major sub-task reported by Cindy Li and fixed by Cindy Li <br>
<b>Webservice should redirect to active RM when HA is enabled.</b><br>
<blockquote>When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1641">YARN-1641</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>ZK store should attempt a write periodically to ensure it is still Active</b><br>
<blockquote>Fencing in ZK store kicks in when the RM tries to write something to the store. If the RM doesn't write anything to the store, it doesn't get fenced and can continue to assume being the Active.
By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can ensure it gets fenced.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1640">YARN-1640</a>.
Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Manual Failover does not work in secure clusters</b><br>
<blockquote>NodeManager gets rejected after manually making one RM as active.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1639">YARN-1639</a>.
Major sub-task reported by Arpit Gupta and fixed by Xuan Gong (resourcemanager)<br>
<b>YARM RM HA requires different configs on different RM hosts</b><br>
<blockquote>We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you want to first or second.
This means we have different configs on different RM nodes. This is unlike HDFS HA where the same configs are pushed to both NN's and it would be better to have the same setup for RM as this would make installation and managing easier.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1637">YARN-1637</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>Implement a client library for java users to post entities+events</b><br>
<blockquote>This is a wrapper around the web-service to facilitate easy posting of entity+event data to the time-line server.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1636">YARN-1636</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>Implement timeline related web-services inside AHS for storing and retrieving entities+events</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1635">YARN-1635</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Billie Rinaldi <br>
<b>Implement a Leveldb based ApplicationTimelineStore</b><br>
<blockquote>As per the design doc, we need a levelDB + local-filesystem based implementation to start with and for small deployments.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1634">YARN-1634</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>Define an in-memory implementation of ApplicationTimelineStore</b><br>
<blockquote>As per the design doc, the store needs to pluggable. We need a base interface, and an in-memory implementation for testing.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1633">YARN-1633</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>Define user-faced entity, entity-info and event objects</b><br>
<blockquote>Define the core objects of the application-timeline effort.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1632">YARN-1632</a>.
Minor bug reported by Chen He and fixed by Chen He <br>
<b>TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package</b><br>
<blockquote>ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService). </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1625">YARN-1625</a>.
Trivial sub-task reported by Shinichi Yamashita and fixed by Shinichi Yamashita <br>
<b>mvn apache-rat:check outputs warning message in YARN-321 branch</b><br>
<blockquote>When I ran dev-support/test-patch.sh, following message output.
{code}
mvn apache-rat:check -DHadoopPatchProcess &gt; /tmp/patchReleaseAuditOutput.txt 2&gt;&amp;1
There appear to be 1 release audit warnings after applying the patch.
{code}
{code}
!????? /home/sinchii/git/YARN-321-test/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/applicationhistory/.keep
Lines that start with ????? in the release audit report indicate files that do not have an Apache license header.
{code}
To avoid release audit warning, it should fix pom.xml.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1617">YARN-1617</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate</b><br>
<blockquote>{code}
synchronized private void allocate(Container container) {
// Update consumption and track allocations
//TODO: fixme sharad
/* try {
store.storeContainer(container);
} catch (IOException ie) {
// TODO fix this. we shouldnt ignore
}*/
LOG.debug("allocate: applicationId=" + applicationId + " container="
+ container.getId() + " host="
+ container.getNodeId().toString());
}
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1613">YARN-1613</a>.
Major sub-task reported by Zhijie Shen and fixed by Akira AJISAKA <br>
<b>Fix config name YARN_HISTORY_SERVICE_ENABLED</b><br>
<blockquote>YARN_HISTORY_SERVICE_ENABLED property name is "yarn.ahs..enabled", which is wrong.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1611">YARN-1611</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Make admin refresh of capacity scheduler configuration work across RM failover</b><br>
<blockquote>Currently, If we do refresh* for a standby RM, it will failover to the current active RM, and do the refresh* based on the local configuration file of the active RM. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1605">YARN-1605</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Fix formatting issues with new module in YARN-321 branch</b><br>
<blockquote>There are a bunch of formatting issues. I'm restricting myself for a sweep of all the files in the new module.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1597">YARN-1597</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>FindBugs warnings on YARN-321 branch</b><br>
<blockquote>There are a bunch of findBugs warnings on YARN-321 branch.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1596">YARN-1596</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Javadoc failures on YARN-321 branch</b><br>
<blockquote>There are some javadoc issues on YARN-321 branch.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1595">YARN-1595</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Test failures on YARN-321 branch</b><br>
<blockquote>mvn test doesn't pass on YARN-321 branch anymore.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1594">YARN-1594</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>YARN-321 branch needs to be updated after YARN-888 pom changes</b><br>
<blockquote>YARN-888 changed the pom structure. And so latest merge to trunk breaks YARN-321 branch.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1591">YARN-1591</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Tsuyoshi OZAWA <br>
<b>TestResourceTrackerService fails randomly on trunk</b><br>
<blockquote>As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621&amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
It's failing randomly on trunk on my local box too </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1590">YARN-1590</a>.
Major bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam (resourcemanager)<br>
<b>_HOST doesn't expand properly for RM, NM, ProxyServer and JHS</b><br>
<blockquote>_HOST is not properly substituted when we use VIP address. Currently it always used the host name of the machine and disregard the VIP address. It is true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is working fine for webservice authentication.
On the other hand, the same thing is working fine for NN and SNN in RPC as well as webservice.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1588">YARN-1588</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Rebind NM tokens for previous attempt's running containers to the new attempt</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1587">YARN-1587</a>.
Major sub-task reported by Mayank Bansal and fixed by Vinod Kumar Vavilapalli <br>
<b>[YARN-321] Merge Patch for YARN-321</b><br>
<blockquote>Merge Patch</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1578">YARN-1578</a>.
Major sub-task reported by Shinichi Yamashita and fixed by Shinichi Yamashita <br>
<b>Fix how to read history file in FileSystemApplicationHistoryStore</b><br>
<blockquote>I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
After the job end and when I accessed Web UI of HistoryServer, it displayed "500". And HistoryServer daemon log was output as follows.
{code}
2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory/appattempt/appattempt_1389146249925_0008_000001
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
(snip...)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
(snip...)
{code}
I confirmed that there was container which was not finished from ApplicationHistory file.
In ResourceManager daemon log, ResourceManager reserved this container, but did not allocate it.
When FileSystemApplicationHistoryStore reads container information without finish data in history file, this problem occurs.
In consideration of the case which there is not finish data, we should fix how to read history file in FileSystemApplicationHistoryStore.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1577">YARN-1577</a>.
Blocker sub-task reported by Jian He and fixed by Jian He <br>
<b>Unmanaged AM is broken because of YARN-1493</b><br>
<blockquote>Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1570">YARN-1570</a>.
Minor improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Formatting the lines within 80 chars in YarnCommands.apt.vm</b><br>
<blockquote>In YarnCommands.apt.vm, there are some lines longer than 80 characters.
For example:
{code}
Yarn commands are invoked by the bin/yarn script. Running the yarn script without any arguments prints the description for all commands.
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1566">YARN-1566</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Change distributed-shell to retain containers from previous AppAttempt</b><br>
<blockquote>Change distributed-shell to reuse previous AM's running containers when AM is restarting. It can also be made configurable whether to enable this feature or not.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1555">YARN-1555</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>[YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*</b><br>
<blockquote>Several tests are failing on the latest YARN-321 branch.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1553">YARN-1553</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Do not use HttpConfig.isSecure() in YARN</b><br>
<blockquote>HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1536">YARN-1536</a>.
Minor improvement reported by Karthik Kambatla and fixed by Anubhav Dhoot (resourcemanager)<br>
<b>Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead</b><br>
<blockquote>Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1534">YARN-1534</a>.
Major sub-task reported by Shinichi Yamashita and fixed by Shinichi Yamashita <br>
<b>TestAHSWebApp failed in YARN-321 branch</b><br>
<blockquote>I ran the following commands. And I confirmed failure of TestAHSWebApp.
{code}
[sinchii@hdX YARN-321-test]$ mvn clean test -Dtest=org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.*
{code}
{code}
Running org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.492 sec - in org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
Running org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.193 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp
initializationError(org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp) Time elapsed: 0.016 sec &lt;&lt;&lt; ERROR!
java.lang.Exception: Test class should have exactly one public zero-argument constructor
at org.junit.runners.BlockJUnit4ClassRunner.validateZeroArgConstructor(BlockJUnit4ClassRunner.java:144)
at org.junit.runners.BlockJUnit4ClassRunner.validateConstructor(BlockJUnit4ClassRunner.java:121)
at org.junit.runners.BlockJUnit4ClassRunner.collectInitializationErrors(BlockJUnit4ClassRunner.java:101)
at org.junit.runners.ParentRunner.validate(ParentRunner.java:344)
(*snip*)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1531">YARN-1531</a>.
Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>True up yarn command documentation</b><br>
<blockquote>There are some options which are not written to Yarn Command document.
For example, "yarn rmadmin" command options are as follows:
{code}
Usage: yarn rmadmin
-refreshQueues
-refreshNodes
-refreshSuperUserGroupsConfiguration
-refreshUserToGroupsMappings
-refreshAdminAcls
-refreshServiceAcl
-getGroups [username]
-help [cmd]
-transitionToActive &lt;serviceId&gt;
-transitionToStandby &lt;serviceId&gt;
-failover [--forcefence] [--forceactive] &lt;serviceId&gt; &lt;serviceId&gt;
-getServiceState &lt;serviceId&gt;
-checkHealth &lt;serviceId&gt;
{code}
But some of the new options such as "-getGroups", "-transitionToActive", and "-transitionToStandby" are not documented.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1528">YARN-1528</a>.
Blocker bug reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Allow setting auth for ZK connections</b><br>
<blockquote>ZK store and embedded election allow setting ZK-acls but not auth information</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1525">YARN-1525</a>.
Major sub-task reported by Xuan Gong and fixed by Cindy Li <br>
<b>Web UI should redirect to active RM when HA is enabled.</b><br>
<blockquote>When failover happens, web UI should redirect to the current active rm.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1521">YARN-1521</a>.
Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation</b><br>
<blockquote>After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1512">YARN-1512</a>.
Major improvement reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>Enhance CS to decouple scheduling from node heartbeats</b><br>
<blockquote>Enhance CS to decouple scheduling from node heartbeats; a prototype has improved latency significantly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1493">YARN-1493</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Schedulers don't recognize apps separately from app-attempts</b><br>
<blockquote>Today, scheduler is tied to attempt only.
We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1490">YARN-1490</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>RM should optionally not kill all containers when an ApplicationMaster exits</b><br>
<blockquote>This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1470">YARN-1470</a>.
Major bug reported by Sandy Ryza and fixed by Anubhav Dhoot <br>
<b>Add audience annotation to MiniYARNCluster</b><br>
<blockquote>We should make it clear whether this is a public interface.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1461">YARN-1461</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>RM API and RM changes to handle tags for running jobs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1459">YARN-1459</a>.
Major sub-task reported by Karthik Kambatla and fixed by Xuan Gong (resourcemanager)<br>
<b>RM services should depend on ConfigurationProvider during startup too</b><br>
<blockquote>YARN-1667, YARN-1668, YARN-1669 already changed RM to depend on a configuration provider so as to be able to refresh many configuration files across RM fail-over. The dependency on the configuration-provider by the RM should happen at its boot up time too.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1452">YARN-1452</a>.
Major task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Document the usage of the generic application history and the timeline data service</b><br>
<blockquote>We need to write a bunch of documents to guide users. such as command line tools, configurations and REST APIs</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1444">YARN-1444</a>.
Blocker bug reported by Robert Grandl and fixed by Wangda Tan (client , resourcemanager)<br>
<b>RM crashes when node resource request sent without corresponding off-switch request</b><br>
<blockquote>I have tried to force reducers to execute on certain nodes. What I did is I changed for reduce tasks, the RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, req.capability) to RMContainerRequestor#addResourceRequest(req.priority, HOST_NAME, req.capability).
However, this change lead to RM crashes when reducers needs to be assigned with the following exception:
FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549)
at java.lang.Thread.run(Thread.java:722)
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1428">YARN-1428</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state</b><br>
<blockquote>ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1417">YARN-1417</a>.
Blocker bug reported by Omkar Vinit Joshi and fixed by Jian He <br>
<b>RM may issue expired container tokens to AM while issuing new containers.</b><br>
<blockquote>Today we create new container token when we create container in RM as a part of schedule cycle. However that container may get reserved or assigned. If the container gets reserved and remains like that (in reserved state) for more than container token expiry interval then RM will end up issuing container with expired token.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1410">YARN-1410</a>.
Major sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
<b>Handle RM fails over after getApplicationID() and before submitApplication().</b><br>
<blockquote>App submission involves
1) creating appId
2) using that appId to submit an ApplicationSubmissionContext to the user.
The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM.
Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side.
The same may happen for other 2 step client API operations.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1398">YARN-1398</a>.
Blocker bug reported by Sunil G and fixed by Vinod Kumar Vavilapalli (resourcemanager)<br>
<b>Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedContainer call</b><br>
<blockquote>getQueueInfo in parentQueue will call child.getQueueInfo().
This will try acquire the leaf queue lock over parent queue lock.
Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call.
This lock usage is not in synchronous and can lead to deadlock.
With JCarder, this is showing as a potential deadlock scenario.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1389">YARN-1389</a>.
Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs</b><br>
<blockquote>As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers.
Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1379">YARN-1379</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>[YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170</b><br>
<blockquote>Found this while merging YARN-321 to the latest branch-2. Without this, compilation fails.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1345">YARN-1345</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Removing FINAL_SAVING from YarnApplicationAttemptState</b><br>
<blockquote>Whenever YARN-891 is done, we need to add the mapping of RMAppAttemptState.FINAL_SAVING -&gt; YarnApplicationAttemptState.FINAL_SAVING in RMServerUtils#createApplicationAttemptState</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1301">YARN-1301</a>.
Minor bug reported by Zhijie Shen and fixed by Tsuyoshi OZAWA <br>
<b>Need to log the blacklist additions/removals when YarnSchedule#allocate</b><br>
<blockquote>Now without the log, it's hard to debug whether blacklist is updated on the scheduler side or not</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1285">YARN-1285</a>.
Major bug reported by Zhijie Shen and fixed by Kenji Kikushima <br>
<b>Inconsistency of default "yarn.acl.enable" value</b><br>
<blockquote>In yarn-default.xml, "yarn.acl.enable" is true while in YarnConfiguration, DEFAULT_YARN_ACL_ENABLE is false.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1266">YARN-1266</a>.
Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>Implement PB service and client wrappers for ApplicationHistoryProtocol</b><br>
<blockquote>Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1242">YARN-1242</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Script changes to start AHS as an individual process</b><br>
<blockquote>Add the command in yarn and yarn.cmd to start and stop AHS</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1206">YARN-1206</a>.
Blocker bug reported by Jian He and fixed by Rohith <br>
<b>AM container log link broken on NM web page even though local container logs are available</b><br>
<blockquote>With log aggregation disabled, when container is running, its logs link works properly, but after the application is finished, the link shows 'Container does not exist.'</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1191">YARN-1191</a>.
Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>[YARN-321] Update artifact versions for application history service</b><br>
<blockquote>Compilation is failing for YARN-321 branch
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1171">YARN-1171</a>.
Major improvement reported by Sandy Ryza and fixed by Naren Koneru (documentation , scheduler)<br>
<b>Add default queue properties to Fair Scheduler documentation </b><br>
<blockquote>The Fair Scheduler doc is missing the following properties.
- defaultMinSharePreemptionTimeout
- queueMaxAppsDefault</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1166">YARN-1166</a>.
Blocker bug reported by Srimanth Gunturi and fixed by Zhijie Shen (resourcemanager)<br>
<b>YARN 'appsFailed' metric should be of type 'counter'</b><br>
<blockquote>Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported.
All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points.
To be consistent, AppsFailed metric should also be of type 'counter'. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1123">YARN-1123</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>[YARN-321] Adding ContainerReport and Protobuf implementation</b><br>
<blockquote>Like YARN-978, we need some client-oriented class to expose the container history info. Neither Container nor RMContainer is the right one.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1071">YARN-1071</a>.
Major bug reported by Srimanth Gunturi and fixed by Jian He (resourcemanager)<br>
<b>ResourceManager's decommissioned and lost node count is 0 after restart</b><br>
<blockquote>I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
{noformat}
"NumActiveNMs" : 3,
"NumDecommissionedNMs" : 1,
"NumLostNMs" : 2,
"NumUnhealthyNMs" : 0,
"NumRebootedNMs" : 0
{noformat}
After restarting RM, the counts were shown as below in JMX.
{noformat}
"NumActiveNMs" : 3,
"NumDecommissionedNMs" : 0,
"NumLostNMs" : 0,
"NumUnhealthyNMs" : 0,
"NumRebootedNMs" : 0
{noformat}
Notice that the lost and decommissioned NM counts are both 0.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1041">YARN-1041</a>.
Major sub-task reported by Steve Loughran and fixed by Jian He (resourcemanager)<br>
<b>Protocol changes for RM to bind and notify a restarted AM of existing containers</b><br>
<blockquote>For long lived containers we don't want the AM to be a SPOF.
When the RM restarts a (failed) AM, it should be given the list of containers it had already been allocated. the AM should then be able to contact the NMs to get details on them. NMs would also need to do any binding of the containers needed to handle a moved/restarted AM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1023">YARN-1023</a>.
Major sub-task reported by Devaraj K and fixed by Zhijie Shen <br>
<b>[YARN-321] Webservices REST API's support for Application History</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1017">YARN-1017</a>.
Blocker sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Document RM Restart feature</b><br>
<blockquote>This should give users a general idea about how RM Restart works and how to use RM Restart</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1007">YARN-1007</a>.
Major sub-task reported by Devaraj K and fixed by Mayank Bansal <br>
<b>[YARN-321] Enhance History Reader interface for Containers</b><br>
<blockquote>If we want to show the containers used by application/app attempt, We need to have two more API's which returns collection of ContainerHistoryData for application id and applcation attempt id something like below.
{code:xml}
Collection&lt;ContainerHistoryData&gt; getContainers(
ApplicationAttemptId appAttemptId);
Collection&lt;ContainerHistoryData&gt; getContainers(ApplicationId appId);
{code}
{code:xml}
/**
* This method returns {@link Container} for specified {@link ContainerId}.
*
* @param {@link ContainerId}
* @return {@link Container} for ContainerId
*/
ContainerHistoryData getAMContainer(ContainerId containerId);
{code}
In the above API, we need to change the argument to application attempt id or we can remove this API because every attempt history data has master container id field, using master container id, history data can get using this below API if it takes argument as container id.
{code:xml}
/**
* This method returns {@link ContainerHistoryData} for specified
* {@link ApplicationAttemptId}.
*
* @param {@link ApplicationAttemptId}
* @return {@link ContainerHistoryData} for ApplicationAttemptId
*/
ContainerHistoryData getContainer(ApplicationAttemptId appAttemptId);
{code}
Here application attempt can use numbers of containers but we cannot choose which container history data to return. This API argument also need to be changed to take container id instead of app attempt id.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-987">YARN-987</a>.
Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>Adding ApplicationHistoryManager responsible for exposing reports to all clients</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-986">YARN-986</a>.
Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Karthik Kambatla <br>
<b>RM DT token service should have service addresses of both RMs</b><br>
<blockquote>Previously: YARN should use cluster-id as token service address
This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors.
Some workarounds for other related issues were put in place at YARN-945.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-984">YARN-984</a>.
Major sub-task reported by Devaraj K and fixed by Devaraj K <br>
<b>[YARN-321] Move classes from applicationhistoryservice.records.pb.impl package to applicationhistoryservice.records.impl.pb</b><br>
<blockquote>While creating instance for applicationhistoryservice.records.* pb records, It is throwing the ClassNotFoundException.
{code:xml}
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.applicationhistoryservice.records.impl.pb.ApplicationHistoryDataPBImpl not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1619)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:56)
... 49 more
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-979">YARN-979</a>.
Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>[YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol</b><br>
<blockquote>ApplicationHistoryProtocol should have the following APIs as well:
* getApplicationAttemptReport
* getApplicationAttempts
* getContainerReport
* getContainers
The corresponding request and response classes need to be added as well.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-978">YARN-978</a>.
Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>[YARN-321] Adding ApplicationAttemptReport and Protobuf implementation</b><br>
<blockquote>We dont have ApplicationAttemptReport and Protobuf implementation.
Adding that.
Thanks,
Mayank</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-975">YARN-975</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Add a file-system implementation for history-storage</b><br>
<blockquote>HDFS implementation should be a standard persistence strategy of history storage</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-974">YARN-974</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>RMContainer should collect more useful information to be recorded in Application-History</b><br>
<blockquote>To record the history of a container, users may be also interested in the following information:
1. Start Time
2. Stop Time
3. Diagnostic Information
4. URL to the Log File
5. Actually Allocated Resource
6. Actually Assigned Node
These should be remembered during the RMContainer's life cycle.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-967">YARN-967</a>.
Major sub-task reported by Devaraj K and fixed by Mayank Bansal <br>
<b>[YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-962">YARN-962</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Update application_history_service.proto</b><br>
<blockquote>1. Change it's name to application_history_client.proto
2. Fix the incorrect proto reference.
3. Correct the dir in pom.xml</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-956">YARN-956</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>[YARN-321] Add a testable in-memory HistoryStorage </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-955">YARN-955</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Mayank Bansal <br>
<b>[YARN-321] Implementation of ApplicationHistoryProtocol</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-954">YARN-954</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>[YARN-321] History Service should create the webUI and wire it to HistoryStorage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-953">YARN-953</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>[YARN-321] Enable ResourceManager to write history data</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-947">YARN-947</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Defining the history data classes for the implementation of the reading/writing interface</b><br>
<blockquote>We need to define the history data classes have the exact fields to be stored. Therefore, all the implementations don't need to have the duplicate logic to exact the required information from RMApp, RMAppAttempt and RMContainer.
We use protobuf to define these classes, such that they can be ser/des to/from bytes, which are easier for persistence.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-935">YARN-935</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>YARN-321 branch is broken due to applicationhistoryserver module's pom.xml</b><br>
<blockquote>The branch was created from branch-2, hadoop-yarn-server-applicationhistoryserver/pom.xml should use 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be built correctly because of wrong dependency.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-934">YARN-934</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>HistoryStorage writer interface for Application History Server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-930">YARN-930</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Bootstrap ApplicationHistoryService module</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-713">YARN-713</a>.
Critical bug reported by Jason Lowe and fixed by Jian He (resourcemanager)<br>
<b>ResourceManager can exit unexpectedly if DNS is unavailable</b><br>
<blockquote>As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5813">MAPREDUCE-5813</a>.
Blocker bug reported by Gera Shegalov and fixed by Gera Shegalov (mrv2 , task)<br>
<b>YarnChild does not load job.xml with mapreduce.job.classloader=true </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5810">MAPREDUCE-5810</a>.
Major bug reported by Mit Desai and fixed by Akira AJISAKA (contrib/streaming)<br>
<b>TestStreamingTaskLog#testStreamingTaskLogWithHadoopCmd is failing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5806">MAPREDUCE-5806</a>.
Major bug reported by Eugene Koifman and fixed by Varun Vasudev <br>
<b>Log4j settings in container-log4j.properties cannot be overridden </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5805">MAPREDUCE-5805</a>.
Major bug reported by Fengdong Yu and fixed by Akira AJISAKA (jobhistoryserver)<br>
<b>Unable to parse launch time from job history file</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5795">MAPREDUCE-5795</a>.
Major bug reported by Yesha Vora and fixed by Xuan Gong <br>
<b>Job should be marked as Failed if it is recovered from commit.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5794">MAPREDUCE-5794</a>.
Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)<br>
<b>SliveMapper always uses default FileSystem.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5791">MAPREDUCE-5791</a>.
Major bug reported by Nikola Vujic and fixed by Nikola Vujic (client)<br>
<b>Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not read disks efficiently</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5789">MAPREDUCE-5789</a>.
Major bug reported by Rushabh S Shah and fixed by Rushabh S Shah (jobhistoryserver , webapps)<br>
<b>Average Reduce time is incorrect on Job Overview page</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5787">MAPREDUCE-5787</a>.
Critical sub-task reported by Rajesh Balamohan and fixed by Rajesh Balamohan (nodemanager)<br>
<b>Modify ShuffleHandler to support Keep-Alive</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5780">MAPREDUCE-5780</a>.
Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)<br>
<b>SliveTest always uses default FileSystem</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5778">MAPREDUCE-5778</a>.
Major bug reported by Jason Lowe and fixed by Akira AJISAKA (jobhistoryserver)<br>
<b>JobSummary does not escape newlines in the job name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5773">MAPREDUCE-5773</a>.
Blocker improvement reported by Gera Shegalov and fixed by Gera Shegalov (mr-am)<br>
<b>Provide dedicated MRAppMaster syslog length limit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5770">MAPREDUCE-5770</a>.
Major bug reported by Yesha Vora and fixed by Jian He <br>
<b>Redirection from AM-URL is broken with HTTPS_ONLY policy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5769">MAPREDUCE-5769</a>.
Major bug reported by Rohith and fixed by Rohith <br>
<b>Unregistration to RM should not be called if AM is crashed before registering with RM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5768">MAPREDUCE-5768</a>.
Major bug reported by Zhijie Shen and fixed by Gera Shegalov <br>
<b>TestMRJobs.testContainerRollingLog fails on trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5766">MAPREDUCE-5766</a>.
Minor bug reported by Ramya Sunil and fixed by Jian He (applicationmaster)<br>
<b>Ping messages from attempts should be moved to DEBUG</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5761">MAPREDUCE-5761</a>.
Trivial improvement reported by Yesha Vora and fixed by Jian He <br>
<b>Add a log message like "encrypted shuffle is ON" in nodemanager logs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5757">MAPREDUCE-5757</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (client)<br>
<b>ConcurrentModificationException in JobControl.toList</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5754">MAPREDUCE-5754</a>.
Major improvement reported by Gera Shegalov and fixed by Gera Shegalov (jobhistoryserver , mr-am)<br>
<b>Preserve Job diagnostics in history</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5751">MAPREDUCE-5751</a>.
Major bug reported by Sangjin Lee and fixed by Sangjin Lee <br>
<b>MR app master fails to start in some cases if mapreduce.job.classloader is true</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5746">MAPREDUCE-5746</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)<br>
<b>Job diagnostics can implicate wrong task for a failed job</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5732">MAPREDUCE-5732</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Report proper queue when job has been automatically placed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5699">MAPREDUCE-5699</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (applicationmaster)<br>
<b>Allow setting tags on MR jobs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5688">MAPREDUCE-5688</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestStagingCleanup fails intermittently with JDK7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5670">MAPREDUCE-5670</a>.
Minor bug reported by Jason Lowe and fixed by Chen He (mrv2)<br>
<b>CombineFileRecordReader should report progress when moving to the next file</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5570">MAPREDUCE-5570</a>.
Major bug reported by Jason Lowe and fixed by Rushabh S Shah (mr-am , mrv2)<br>
<b>Map task attempt with fetch failure has incorrect attempt finish time</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5553">MAPREDUCE-5553</a>.
Minor improvement reported by Paul Han and fixed by Paul Han (applicationmaster)<br>
<b>Add task state filters on Application/MRJob page for MR Application master </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5028">MAPREDUCE-5028</a>.
Critical bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Maps fail when io.sort.mb is set to high value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4052">MAPREDUCE-4052</a>.
Major bug reported by xieguiming and fixed by Jian He (job submission)<br>
<b>Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2349">MAPREDUCE-2349</a>.
Major improvement reported by Joydeep Sen Sarma and fixed by Siddharth Seth (task)<br>
<b>speed up list[located]status calls from input formats</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6166">HDFS-6166</a>.
Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (balancer)<br>
<b>revisit balancer so_timeout </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6163">HDFS-6163</a>.
Minor bug reported by Fengdong Yu and fixed by Fengdong Yu (documentation)<br>
<b>Fix a minor bug in the HA upgrade document</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6157">HDFS-6157</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Fix the entry point of OfflineImageViewer for hdfs.cmd</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6150">HDFS-6150</a>.
Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Add inode id information in the logs to make debugging easier</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6140">HDFS-6140</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (webhdfs)<br>
<b>WebHDFS cannot create a file with spaces in the name after HA failover changes.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6138">HDFS-6138</a>.
Minor improvement reported by Sanjay Radia and fixed by Sanjay Radia (documentation)<br>
<b>User Guide for how to use viewfs with federation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6135">HDFS-6135</a>.
Blocker bug reported by Jing Zhao and fixed by Jing Zhao (journal-node)<br>
<b>In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6131">HDFS-6131</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao (documentation)<br>
<b>Move HDFSHighAvailabilityWithNFS.apt.vm and HDFSHighAvailabilityWithQJM.apt.vm from Yarn to HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6130">HDFS-6130</a>.
Blocker bug reported by Fengdong Yu and fixed by Haohui Mai (namenode)<br>
<b>NPE when upgrading namenode from fsimages older than -32</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6129">HDFS-6129</a>.
Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (datanode)<br>
<b>When a replica is not found for deletion, do not throw exception.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6127">HDFS-6127</a>.
Major bug reported by Arpit Gupta and fixed by Haohui Mai (ha)<br>
<b>WebHDFS tokens cannot be renewed in HA setup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6124">HDFS-6124</a>.
Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>Add final modifier to class members</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6123">HDFS-6123</a>.
Minor improvement reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (datanode)<br>
<b>Improve datanode error messages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6120">HDFS-6120</a>.
Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)<br>
<b>Fix and improve safe mode log messages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6117">HDFS-6117</a>.
Minor bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Print file path information in FileNotFoundException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6115">HDFS-6115</a>.
Minor bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode)<br>
<b>flush() should be called for every append on block scan verification log</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6107">HDFS-6107</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)<br>
<b>When a block can't be cached due to limited space on the DataNode, that block becomes uncacheable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6106">HDFS-6106</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6105">HDFS-6105</a>.
Major bug reported by Kihwal Lee and fixed by Haohui Mai <br>
<b>NN web UI for DN list loads the same jmx page multiple times.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6102">HDFS-6102</a>.
Blocker bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Lower the default maximum items per directory to fix PB fsimage loading</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6100">HDFS-6100</a>.
Major bug reported by Arpit Gupta and fixed by Haohui Mai (ha)<br>
<b>DataNodeWebHdfsMethods does not failover in HA mode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6099">HDFS-6099</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>HDFS file system limits not enforced on renames.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6097">HDFS-6097</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>zero-copy reads are incorrectly disabled on file offsets above 2GB</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6096">HDFS-6096</a>.
Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)<br>
<b>TestWebHdfsTokens may timeout</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6094">HDFS-6094</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)<br>
<b>The same block can be counted twice towards safe mode threshold</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6090">HDFS-6090</a>.
Minor improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (test)<br>
<b>Use MiniDFSCluster.Builder instead of deprecated constructors</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6089">HDFS-6089</a>.
Major bug reported by Arpit Gupta and fixed by Jing Zhao (ha)<br>
<b>Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6086">HDFS-6086</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)<br>
<b>Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6085">HDFS-6085</a>.
Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)<br>
<b>Improve CacheReplicationMonitor log messages a bit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6084">HDFS-6084</a>.
Minor improvement reported by Travis Thompson and fixed by Travis Thompson (namenode)<br>
<b>Namenode UI - "Hadoop" logo link shouldn't go to hadoop homepage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6080">HDFS-6080</a>.
Major improvement reported by Abin Shahab and fixed by Abin Shahab (nfs , performance)<br>
<b>Improve NFS gateway performance by making rtmax and wtmax configurable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6079">HDFS-6079</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (hdfs-client)<br>
<b>Timeout for getFileBlockStorageLocations does not work</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6078">HDFS-6078</a>.
Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestIncrementalBlockReports is flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6077">HDFS-6077</a>.
Major bug reported by Arpit Gupta and fixed by Jing Zhao <br>
<b>running slive with webhdfs on secure HA cluster fails with unkown host exception</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6076">HDFS-6076</a>.
Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (datanode , test)<br>
<b>SimulatedDataSet should not create DatanodeRegistration with namenode layout version and type</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6072">HDFS-6072</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Clean up dead code of FSImage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6071">HDFS-6071</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>BlockReaderLocal doesn't return -1 on EOF when doing a zero-length read on a short file</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6070">HDFS-6070</a>.
Trivial improvement reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Cleanup use of ReadStatistics in DFSInputStream</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6069">HDFS-6069</a>.
Trivial improvement reported by Andrew Wang and fixed by Chris Nauroth (namenode)<br>
<b>Quash stack traces when ACLs are disabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6068">HDFS-6068</a>.
Major bug reported by Andrew Wang and fixed by sathish (snapshots)<br>
<b>Disallow snapshot names that are also invalid directory names</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6067">HDFS-6067</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>TestPread.testMaxOutHedgedReadPool is flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6065">HDFS-6065</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>HDFS zero-copy reads should return null on EOF when doing ZCR</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6064">HDFS-6064</a>.
Minor bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode)<br>
<b>DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_DEFAULT is not updated with latest block report interval of 6 hrs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6063">HDFS-6063</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Chris Nauroth (test , tools)<br>
<b>TestAclCLI fails intermittently when running test 24: copyFromLocal</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6062">HDFS-6062</a>.
Minor bug reported by Jing Zhao and fixed by Jing Zhao <br>
<b>TestRetryCacheWithHA#testConcat is flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6061">HDFS-6061</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)<br>
<b>Allow dfs.datanode.shared.file.descriptor.path to contain multiple entries and fall back when needed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6060">HDFS-6060</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>NameNode should not check DataNode layout version</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6059">HDFS-6059</a>.
Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA <br>
<b>TestBlockReaderLocal fails if native library is not available</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6058">HDFS-6058</a>.
Major bug reported by Vinayakumar B and fixed by Haohui Mai <br>
<b>Fix TestHDFSCLI failures after HADOOP-8691 change</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6057">HDFS-6057</a>.
Blocker bug reported by Eric Sirianni and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>DomainSocketWatcher.watcherThread should be marked as daemon thread</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6055">HDFS-6055</a>.
Major improvement reported by Suresh Srinivas and fixed by Chris Nauroth (namenode)<br>
<b>Change default configuration to limit file name length in HDFS</b><br>
<blockquote>The default configuration of HDFS now sets dfs.namenode.fs-limits.max-component-length to 255 for improved interoperability with other file system implementations. This limits each component of a file system path to a maximum of 255 bytes in UTF-8 encoding. Attempts to create new files that violate this rule will fail with an error. Existing files that violate the rule are not effected. Previously, dfs.namenode.fs-limits.max-component-length was set to 0 (ignored). If necessary, it is possible to set the value back to 0 in the cluster's configuration to restore the old behavior.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6053">HDFS-6053</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Fix TestDecommissioningStatus and TestDecommission in branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6051">HDFS-6051</a>.
Blocker bug reported by Chris Nauroth and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>HDFS cannot run on Windows since short-circuit shared memory segment changes.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6047">HDFS-6047</a>.
Major bug reported by stack and fixed by stack <br>
<b>TestPread NPE inside in DFSInputStream hedgedFetchBlockByteRange</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6046">HDFS-6046</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>add dfs.client.mmap.enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6044">HDFS-6044</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Add property for setting the NFS look up time for users</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6043">HDFS-6043</a>.
Major improvement reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Give HDFS daemons NFS3 and Portmap their own OPTS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6040">HDFS-6040</a>.
Blocker sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>fix DFSClient issue without libhadoop.so and some other ShortCircuitShm cleanups</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6039">HDFS-6039</a>.
Major bug reported by Yesha Vora and fixed by Chris Nauroth (namenode)<br>
<b>Uploading a File under a Dir with default acls throws "Duplicated ACLFeature"</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6038">HDFS-6038</a>.
Major sub-task reported by Haohui Mai and fixed by Jing Zhao (journal-node , namenode)<br>
<b>Allow JournalNode to handle editlog produced by new release with future layoutversion</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6033">HDFS-6033</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (caching)<br>
<b>PBImageXmlWriter incorrectly handles processing cache directives</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6030">HDFS-6030</a>.
Trivial task reported by Yongjun Zhang and fixed by Yongjun Zhang <br>
<b>Remove an unused constructor in INode.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6028">HDFS-6028</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>Print clearer error message when user attempts to delete required mask entry from ACL.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6025">HDFS-6025</a>.
Minor task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (build)<br>
<b>Update findbugsExcludeFile.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6018">HDFS-6018</a>.
Trivial improvement reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Exception recorded in LOG when IPCLoggerChannel#close is called</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6008">HDFS-6008</a>.
Minor bug reported by Benoy Antony and fixed by Benoy Antony (namenode)<br>
<b>Namenode dead node link is giving HTTP error 500</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-6006">HDFS-6006</a>.
Trivial improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (namenode)<br>
<b>Remove duplicate code in FSNameSystem#getFileInfo</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5988">HDFS-5988</a>.
Blocker bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Bad fsimage always generated after upgrade</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5986">HDFS-5986</a>.
Major improvement reported by Suresh Srinivas and fixed by Chris Nauroth (namenode)<br>
<b>Capture the number of blocks pending deletion on namenode webUI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5982">HDFS-5982</a>.
Critical bug reported by Tassapol Athiapinya and fixed by Jing Zhao (namenode)<br>
<b>Need to update snapshot manager when applying editlog for deleting a snapshottable directory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5981">HDFS-5981</a>.
Minor bug reported by Haohui Mai and fixed by Haohui Mai (tools)<br>
<b>PBImageXmlWriter generates malformed XML</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5979">HDFS-5979</a>.
Minor improvement reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Typo and logger fix for fsimage PB code</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5973">HDFS-5973</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>add DomainSocket#shutdown method</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5962">HDFS-5962</a>.
Critical bug reported by Kihwal Lee and fixed by Akira AJISAKA <br>
<b>Mtime and atime are not persisted for symbolic links</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5961">HDFS-5961</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>OIV cannot load fsimages containing a symbolic link</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5959">HDFS-5959</a>.
Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA <br>
<b>Fix typo at section name in FSImageFormatProtobuf.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5956">HDFS-5956</a>.
Major sub-task reported by Akira AJISAKA and fixed by Akira AJISAKA (tools)<br>
<b>A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5953">HDFS-5953</a>.
Major test reported by Ted Yu and fixed by Akira AJISAKA <br>
<b>TestBlockReaderFactory fails if libhadoop.so has not been built</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5950">HDFS-5950</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode , hdfs-client)<br>
<b>The DFSClient and DataNode should use shared memory segments to communicate short-circuit information</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5949">HDFS-5949</a>.
Minor bug reported by Travis Thompson and fixed by Travis Thompson (namenode)<br>
<b>New Namenode UI when trying to download a file, the browser doesn't know the file name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5948">HDFS-5948</a>.
Major bug reported by Andrew Wang and fixed by Haohui Mai <br>
<b>TestBackupNode flakes with port in use error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5944">HDFS-5944</a>.
Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong (namenode)<br>
<b>LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5943">HDFS-5943</a>.
Major bug reported by Yesha Vora and fixed by Suresh Srinivas <br>
<b>'dfs.namenode.https-address.ns1' property is not used in federation setup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5942">HDFS-5942</a>.
Minor sub-task reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation , tools)<br>
<b>Fix javadoc in OfflineImageViewer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5941">HDFS-5941</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai (documentation , namenode)<br>
<b>add dfs.namenode.secondary.https-address and dfs.namenode.secondary.https-address in hdfs-default.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5940">HDFS-5940</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5939">HDFS-5939</a>.
Major improvement reported by Yongjun Zhang and fixed by Yongjun Zhang (hdfs-client)<br>
<b>WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5938">HDFS-5938</a>.
Trivial sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>Make BlockReaderFactory#BlockReaderPeer a static class</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5936">HDFS-5936</a>.
Major test reported by Andrew Wang and fixed by Binglin Chang (namenode , test)<br>
<b>MiniDFSCluster does not clean data left behind by SecondaryNameNode.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5935">HDFS-5935</a>.
Minor improvement reported by Travis Thompson and fixed by Travis Thompson (namenode)<br>
<b>New Namenode UI FS browser should throw smarter error messages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5934">HDFS-5934</a>.
Minor bug reported by Travis Thompson and fixed by Travis Thompson (namenode)<br>
<b>New Namenode UI back button doesn't work as expected</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5929">HDFS-5929</a>.
Major improvement reported by Siqi Li and fixed by Siqi Li (federation)<br>
<b>Add Block pool % usage to HDFS federated nn page</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5922">HDFS-5922</a>.
Major bug reported by Aaron T. Myers and fixed by Arpit Agarwal (datanode)<br>
<b>DN heartbeat thread can get stuck in tight loop</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5915">HDFS-5915</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai (namenode)<br>
<b>Refactor FSImageFormatProtobuf to simplify cross section reads</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5913">HDFS-5913</a>.
Minor bug reported by Ted Yu and fixed by Brandon Li (nfs)<br>
<b>Nfs3Utils#getWccAttr() should check attr parameter against null</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5910">HDFS-5910</a>.
Major improvement reported by Benoy Antony and fixed by Benoy Antony (security)<br>
<b>Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5904">HDFS-5904</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestFileStatus fails intermittently on trunk and branch2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5901">HDFS-5901</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B (namenode)<br>
<b>NameNode new UI doesn't support IE8 and IE9 on windows 7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5900">HDFS-5900</a>.
Major bug reported by Tassapol Athiapinya and fixed by Andrew Wang (caching)<br>
<b>Cannot set cache pool limit of "unlimited" via CacheAdmin</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5898">HDFS-5898</a>.
Major sub-task reported by Jing Zhao and fixed by Abin Shahab (nfs)<br>
<b>Allow NFS gateway to login/relogin from its kerberos keytab</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5895">HDFS-5895</a>.
Major bug reported by Tassapol Athiapinya and fixed by Tassapol Athiapinya (tools)<br>
<b>HDFS cacheadmin -listPools has exit_code of 1 when the command returns 0 result.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5893">HDFS-5893</a>.
Major bug reported by Yesha Vora and fixed by Haohui Mai <br>
<b>HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5892">HDFS-5892</a>.
Minor test reported by Ted Yu and fixed by <br>
<b>TestDeleteBlockPool fails in branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5891">HDFS-5891</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai (namenode , webhdfs)<br>
<b>webhdfs should not try connecting the DN during redirection</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5886">HDFS-5886</a>.
Major bug reported by Ted Yu and fixed by Brandon Li (nfs)<br>
<b>Potential null pointer deference in RpcProgramNfs3#readlink()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5882">HDFS-5882</a>.
Minor test reported by Jimmy Xiang and fixed by Jimmy Xiang <br>
<b>TestAuditLogs is flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5881">HDFS-5881</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Fix skip() of the short-circuit local reader (legacy).</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5879">HDFS-5879</a>.
Major bug reported by Gera Shegalov and fixed by Gera Shegalov (test)<br>
<b>Some TestHftpFileSystem tests do not close streams</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5868">HDFS-5868</a>.
Major sub-task reported by Taylor, Buddy and fixed by (datanode)<br>
<b>Make hsync implementation pluggable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5866">HDFS-5866</a>.
Major sub-task reported by Akira AJISAKA and fixed by Akira AJISAKA (tools)<br>
<b>'-maxSize' and '-step' option fail in OfflineImageViewer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5859">HDFS-5859</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)<br>
<b>DataNode#checkBlockToken should check block tokens even if security is not enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5857">HDFS-5857</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestWebHDFS#testNamenodeRestart fails intermittently with NPE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5856">HDFS-5856</a>.
Minor bug reported by Josh Elser and fixed by Josh Elser (datanode)<br>
<b>DataNode.checkDiskError might throw NPE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5847">HDFS-5847</a>.
Major sub-task reported by Haohui Mai and fixed by Jing Zhao <br>
<b>Consolidate INodeReference into a separate section</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5846">HDFS-5846</a>.
Major bug reported by Nikola Vujic and fixed by Nikola Vujic (namenode)<br>
<b>Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5843">HDFS-5843</a>.
Major bug reported by Laurent Goujon and fixed by Laurent Goujon (datanode)<br>
<b>DFSClient.getFileChecksum() throws IOException if checksum is disabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5840">HDFS-5840</a>.
Blocker bug reported by Aaron T. Myers and fixed by Jing Zhao (ha , journal-node , namenode)<br>
<b>Follow-up to HDFS-5138 to improve error handling during partial upgrade failures</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5828">HDFS-5828</a>.
Major bug reported by Taylor, Buddy and fixed by Taylor, Buddy (namenode)<br>
<b>BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same node group when dfs.namenode.avoid.write.stale.datanode is true</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5821">HDFS-5821</a>.
Major bug reported by Gera Shegalov and fixed by Gera Shegalov (test)<br>
<b>TestHDFSCLI fails for user names with the dash character</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5810">HDFS-5810</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>Unify mmap cache and short-circuit file descriptor cache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5807">HDFS-5807</a>.
Major bug reported by Mit Desai and fixed by Chen He (test)<br>
<b>TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5804">HDFS-5804</a>.
Major sub-task reported by Abin Shahab and fixed by Abin Shahab (nfs)<br>
<b>HDFS NFS Gateway fails to mount and proxy when using Kerberos</b><br>
<blockquote>Fixes NFS on Kerberized cluster.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5803">HDFS-5803</a>.
Major bug reported by Mit Desai and fixed by Chen He <br>
<b>TestBalancer.testBalancer0 fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5791">HDFS-5791</a>.
Major bug reported by Brandon Li and fixed by Haohui Mai (test)<br>
<b>TestHttpsFileSystem should use a random port to avoid binding error during testing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5790">HDFS-5790</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode , performance)<br>
<b>LeaseManager.findPath is very slow when many leases need recovery</b><br>
<blockquote>Committed to branch-2 and trunk.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5781">HDFS-5781</a>.
Minor improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5780">HDFS-5780</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestRBWBlockInvalidation times out intemittently on branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5776">HDFS-5776</a>.
Major improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client)<br>
<b>Support 'hedged' reads in DFSClient</b><br>
<blockquote>If a read from a block is slow, start up another parallel, 'hedged' read against a different block replica. We then take the result of which ever read returns first (the outstanding read is cancelled). This 'hedged' read feature will help rein in the outliers, the odd read that takes a long time because it hit a bad patch on the disc, etc.
This feature is off by default. To enable this feature, set &lt;code&gt;dfs.client.hedged.read.threadpool.size&lt;/code&gt; to a positive number. The threadpool size is how many threads to dedicate to the running of these 'hedged', concurrent reads in your client.
Then set &lt;code&gt;dfs.client.hedged.read.threshold.millis&lt;/code&gt; to the number of milliseconds to wait before starting up a 'hedged' read. For example, if you set this property to 10, then if a read has not returned within 10 milliseconds, we will start up a new read against a different block replica.
This feature emits new metrics:
+ hedgedReadOps
+ hedgeReadOpsWin -- how many times the hedged read 'beat' the original read
+ hedgedReadOpsInCurThread -- how many times we went to do a hedged read but we had to run it in the current thread because dfs.client.hedged.read.threadpool.size was at a maximum.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5775">HDFS-5775</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai (namenode)<br>
<b>Consolidate the code for serialization in CacheManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5768">HDFS-5768</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai (namenode)<br>
<b>Consolidate the serialization code in DelegationTokenSecretManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5767">HDFS-5767</a>.
Blocker bug reported by Yongjun Zhang and fixed by Yongjun Zhang (nfs)<br>
<b>NFS implementation assumes userName userId mapping to be unique, which is not true sometimes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5759">HDFS-5759</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Web UI does not show up during the period of loading FSImage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5746">HDFS-5746</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode , hdfs-client)<br>
<b>add ShortCircuitSharedMemorySegment</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5742">HDFS-5742</a>.
Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>DatanodeCluster (mini cluster of DNs) fails to start</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5726">HDFS-5726</a>.
Minor sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Fix compilation error in AbstractINodeDiff for JDK7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5716">HDFS-5716</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai (webhdfs)<br>
<b>Allow WebHDFS to use pluggable authentication filter</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5715">HDFS-5715</a>.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5709">HDFS-5709</a>.
Major improvement reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Improve NameNode upgrade with existing reserved paths and path components</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5705">HDFS-5705</a>.
Major bug reported by Ted Yu and fixed by Ted Yu (datanode)<br>
<b>TestSecondaryNameNodeUpgrade#testChangeNsIDFails may fail due to ConcurrentModificationException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5698">HDFS-5698</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai (namenode)<br>
<b>Use protobuf to serialize / deserialize FSImage</b><br>
<blockquote>Use protobuf to serialize/deserialize the FSImage.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5672">HDFS-5672</a>.
Major test reported by Ted Yu and fixed by Jing Zhao (namenode)<br>
<b>TestHASafeMode#testSafeBlockTracking fails in trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5647">HDFS-5647</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai (namenode)<br>
<b>Merge INodeDirectory.Feature and INodeFile.Feature</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5638">HDFS-5638</a>.
Major sub-task reported by Chris Nauroth and fixed by Vinayakumar B (hdfs-client)<br>
<b>HDFS implementation of FileContext API for ACLs.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5632">HDFS-5632</a>.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Add Snapshot feature to INodeDirectory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5626">HDFS-5626</a>.
Major bug reported by Stephen Chu and fixed by Colin Patrick McCabe (caching)<br>
<b>dfsadmin -report shows incorrect cache values</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5554">HDFS-5554</a>.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Add Snapshot Feature to INodeFile</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5537">HDFS-5537</a>.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode , snapshots)<br>
<b>Remove FileWithSnapshot interface</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5535">HDFS-5535</a>.
Major new feature reported by Nathan Roberts and fixed by Tsz Wo Nicholas Sze (datanode , ha , hdfs-client , namenode)<br>
<b>Umbrella jira for improved HDFS rolling upgrades</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5531">HDFS-5531</a>.
Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (namenode)<br>
<b>Combine the getNsQuota() and getDsQuota() methods in INode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5516">HDFS-5516</a>.
Major bug reported by Chris Nauroth and fixed by Miodrag Radulovic (webhdfs)<br>
<b>WebHDFS does not require user name when anonymous http requests are disallowed.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5492">HDFS-5492</a>.
Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5483">HDFS-5483</a>.
Major sub-task reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)<br>
<b>NN should gracefully handle multiple block replicas on same DN</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5339">HDFS-5339</a>.
Major bug reported by Stephen Chu and fixed by Haohui Mai (webhdfs)<br>
<b>WebHDFS URI does not accept logical nameservices when security is enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5321">HDFS-5321</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Clean up the HTTP-related configuration in HDFS</b><br>
<blockquote>dfs.http.port and dfs.https.port are removed. Filesystem clients, such as WebHdfsFileSystem, now have fixed instead of configurable default ports (i.e., 50070 for http and 50470 for https).
Users can explicitly specify the port in the URI to access the file system which runs on non-default ports.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5318">HDFS-5318</a>.
Major improvement reported by Eric Sirianni and fixed by (namenode)<br>
<b>Support read-only and read-write paths to shared replicas</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5286">HDFS-5286</a>.
Major sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (namenode)<br>
<b>Flatten INodeDirectory hierarchy: add DirectoryWithQuotaFeature</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5285">HDFS-5285</a>.
Major sub-task reported by Tsz Wo Nicholas Sze and fixed by Jing Zhao (namenode)<br>
<b>Flatten INodeFile hierarchy: Add UnderContruction Feature</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5244">HDFS-5244</a>.
Major bug reported by Jinghui Wang and fixed by Jinghui Wang (test)<br>
<b>TestNNStorageRetentionManager#testPurgeMultipleDirs fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5167">HDFS-5167</a>.
Minor sub-task reported by Jing Zhao and fixed by Tsuyoshi OZAWA (ha , namenode)<br>
<b>Add metrics about the NameNode retry cache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5153">HDFS-5153</a>.
Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (datanode)<br>
<b>Datanode should send block reports for each storage in a separate message</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5138">HDFS-5138</a>.
Blocker bug reported by Kihwal Lee and fixed by Aaron T. Myers <br>
<b>Support HDFS upgrade in HA</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5064">HDFS-5064</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)<br>
<b>Standby checkpoints should not block concurrent readers</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4911">HDFS-4911</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>Reduce PeerCache timeout to be commensurate with dfs.datanode.socket.reuse.keepalive</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4858">HDFS-4858</a>.
Minor bug reported by Jagane Sundar and fixed by Henry Wang (datanode)<br>
<b>HDFS DataNode to NameNode RPC should timeout</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4685">HDFS-4685</a>.
Major new feature reported by Sachin Jose and fixed by Chris Nauroth (hdfs-client , namenode , security)<br>
<b>Implementation of ACLs in HDFS</b><br>
<blockquote>HDFS now supports ACLs (Access Control Lists). ACLs can specify fine-grained file permissions for specific named users or named groups.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4564">HDFS-4564</a>.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)<br>
<b>Webhdfs returns incorrect http response codes for denied operations</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4370">HDFS-4370</a>.
Major improvement reported by Konstantin Shvachko and fixed by Chu Tong (datanode)<br>
<b>Fix typo Blanacer in DataNode</b><br>
<blockquote>I just committed this. Thank you Chu.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4200">HDFS-4200</a>.
Major improvement reported by Suresh Srinivas and fixed by Andrew Wang (datanode)<br>
<b>Reduce the size of synchronized sections in PacketResponder </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3969">HDFS-3969</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs-client)<br>
<b>Small bug fixes and improvements for disk locations API</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3405">HDFS-3405</a>.
Major improvement reported by Aaron T. Myers and fixed by Vinayakumar B <br>
<b>Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3128">HDFS-3128</a>.
Minor bug reported by Eli Collins and fixed by Andrew Wang (test)<br>
<b>Unit tests should not use a test root in /tmp</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10450">HADOOP-10450</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (io , native)<br>
<b>Build zlib native code bindings in hadoop.dll for Windows.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10449">HADOOP-10449</a>.
Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (security)<br>
<b>Fix the javac warnings in the security packages.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10442">HADOOP-10442</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Group look-up can cause segmentation fault when certain JNI-based mapping module is used.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10441">HADOOP-10441</a>.
Blocker bug reported by Jing Zhao and fixed by Jing Zhao (metrics)<br>
<b>Namenode metric "rpc.RetryCache/NameNodeRetryCache.CacheHit" can't be correctly processed by Ganglia</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10440">HADOOP-10440</a>.
Major bug reported by guodongdong and fixed by guodongdong (fs)<br>
<b>HarFsInputStream of HarFileSystem, when reading data, computing the position has bug</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10437">HADOOP-10437</a>.
Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (conf , util)<br>
<b>Fix the javac warnings in the conf and the util package</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10425">HADOOP-10425</a>.
Critical bug reported by Brandon Li and fixed by Tsz Wo Nicholas Sze (fs)<br>
<b>Incompatible behavior of LocalFileSystem:getContentSummary</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10423">HADOOP-10423</a>.
Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (documentation)<br>
<b>Clarify compatibility policy document for combination of new client and old server.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10422">HADOOP-10422</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (ipc)<br>
<b>Remove redundant logging of RPC retry attempts.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10407">HADOOP-10407</a>.
Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (ipc)<br>
<b>Fix the javac warnings in the ipc package.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10399">HADOOP-10399</a>.
Major sub-task reported by Chris Nauroth and fixed by Vinayakumar B (fs)<br>
<b>FileContext API for ACLs.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10395">HADOOP-10395</a>.
Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestCallQueueManager is flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10394">HADOOP-10394</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestAuthenticationFilter is flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10393">HADOOP-10393</a>.
Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (security)<br>
<b>Fix hadoop-auth javac warnings</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10386">HADOOP-10386</a>.
Minor improvement reported by Arpit Gupta and fixed by Haohui Mai (ha)<br>
<b>Log proxy hostname in various exceptions being thrown in a HA setup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10383">HADOOP-10383</a>.
Major improvement reported by Enis Soztutar and fixed by Enis Soztutar <br>
<b>InterfaceStability annotations should have RetentionPolicy.RUNTIME</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10379">HADOOP-10379</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Protect authentication cookies with the HttpOnly and Secure flags</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10374">HADOOP-10374</a>.
Major improvement reported by Enis Soztutar and fixed by Enis Soztutar <br>
<b>InterfaceAudience annotations should have RetentionPolicy.RUNTIME</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10368">HADOOP-10368</a>.
Minor bug reported by Ted Yu and fixed by Tsuyoshi OZAWA (util)<br>
<b>InputStream is not closed in VersionInfo ctor</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10355">HADOOP-10355</a>.
Major bug reported by Akira AJISAKA and fixed by Haohui Mai <br>
<b>TestLoadGenerator#testLoadGenerator fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10353">HADOOP-10353</a>.
Major bug reported by Tudor Scurtu and fixed by Tudor Scurtu (fs)<br>
<b>FsUrlStreamHandlerFactory is not thread safe</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10348">HADOOP-10348</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Deprecate hadoop.ssl.configuration in branch-2, and remove it in trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10346">HADOOP-10346</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (security)<br>
<b>Deadlock while logging tokens</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10343">HADOOP-10343</a>.
Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>Change info to debug log in LossyRetryInvocationHandler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10338">HADOOP-10338</a>.
Major bug reported by Andrew Wang and fixed by Colin Patrick McCabe <br>
<b>Cannot get the FileStatus of the root inode from the new Globber</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10337">HADOOP-10337</a>.
Major bug reported by Liang Xie and fixed by Liang Xie (metrics)<br>
<b>ConcurrentModificationException from MetricsDynamicMBeanBase.createMBeanInfo()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10333">HADOOP-10333</a>.
Trivial improvement reported by Ren&#233; Nyffenegger and fixed by Ren&#233; Nyffenegger <br>
<b>Fix grammatical error in overview.html document</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10330">HADOOP-10330</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestFrameDecoder fails if it cannot bind port 12345</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10328">HADOOP-10328</a>.
Major bug reported by Arpit Gupta and fixed by Haohui Mai (tools)<br>
<b>loadGenerator exit code is not reliable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10327">HADOOP-10327</a>.
Blocker bug reported by Vinayakumar B and fixed by Vinayakumar B (native)<br>
<b>Trunk windows build broken after HDFS-5746</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10326">HADOOP-10326</a>.
Major bug reported by Manuel DE FERRAN and fixed by bc Wong (security)<br>
<b>M/R jobs can not access S3 if Kerberos is enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10320">HADOOP-10320</a>.
Trivial bug reported by Ren&#233; Nyffenegger and fixed by Ren&#233; Nyffenegger (documentation)<br>
<b>Javadoc in InterfaceStability.java lacks final &lt;/ul&gt;</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10314">HADOOP-10314</a>.
Major bug reported by Kihwal Lee and fixed by Rushabh S Shah <br>
<b>The ls command help still shows outdated 0.16 format.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10301">HADOOP-10301</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (security)<br>
<b>AuthenticationFilter should return Forbidden for failed authentication</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10295">HADOOP-10295</a>.
Major improvement reported by Jing Zhao and fixed by Jing Zhao (tools/distcp)<br>
<b>Allow distcp to automatically identify the checksum type of source files and use it for the target</b><br>
<blockquote>Add option for distcp to preserve the checksum type of the source files. Users can use "-pc" as distcp command option to preserve the checksum type.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10285">HADOOP-10285</a>.
Major sub-task reported by Chris Li and fixed by <br>
<b>Admin interface to swap callqueue at runtime</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10280">HADOOP-10280</a>.
Major sub-task reported by Chris Li and fixed by Chris Li <br>
<b>Make Schedulables return a configurable identity of user or group</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10278">HADOOP-10278</a>.
Major sub-task reported by Chris Li and fixed by Chris Li (ipc)<br>
<b>Refactor to make CallQueue pluggable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10249">HADOOP-10249</a>.
Major bug reported by Dilli Arumugam and fixed by Dilli Arumugam <br>
<b>LdapGroupsMapping should trim ldap password read from file</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10221">HADOOP-10221</a>.
Major improvement reported by Benoy Antony and fixed by Benoy Antony (security)<br>
<b>Add a plugin to specify SaslProperties for RPC protocol based on connection properties</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10211">HADOOP-10211</a>.
Major improvement reported by Benoy Antony and fixed by Benoy Antony (security)<br>
<b>Enable RPC protocol to negotiate SASL-QOP values between clients and servers</b><br>
<blockquote>The hadoop.rpc.protection configuration property previously supported specifying a single value: one of authentication, integrity or privacy. An unrecognized value was silently assumed to mean authentication. This configuration property now accepts a comma-separated list of any of the 3 values, and unrecognized values are rejected with an error. Existing configurations containing an invalid value must be corrected. If the property is empty or not specified, authentication is assumed. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10191">HADOOP-10191</a>.
Blocker bug reported by Gera Shegalov and fixed by Gera Shegalov (viewfs)<br>
<b>Missing executable permission on viewfs internal dirs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10184">HADOOP-10184</a>.
Major new feature reported by Chris Nauroth and fixed by Chris Nauroth (fs , security)<br>
<b>Hadoop Common changes required to support HDFS ACLs.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10139">HADOOP-10139</a>.
Major improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Update and improve the Single Cluster Setup document</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10085">HADOOP-10085</a>.
Blocker bug reported by Karthik Kambatla and fixed by Steve Loughran <br>
<b>CompositeService should allow adding services while being inited</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10070">HADOOP-10070</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (security)<br>
<b>RPC client doesn't use per-connection conf to determine server's expected Kerberos principal name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10015">HADOOP-10015</a>.
Minor bug reported by Haohui Mai and fixed by Nicolas Liochon (security)<br>
<b>UserGroupInformation prints out excessive ERROR warnings</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9525">HADOOP-9525</a>.
Major test reported by Ivan Mitic and fixed by Ivan Mitic (test , util)<br>
<b>Add tests that validate winutils chmod behavior on folders</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9454">HADOOP-9454</a>.
Major improvement reported by Jordan Mendelson and fixed by Akira AJISAKA (fs/s3)<br>
<b>Support multipart uploads for s3native</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8691">HADOOP-8691</a>.
Minor improvement reported by Jason Lowe and fixed by Daryn Sharp (fs)<br>
<b>FsShell can print "Found xxx items" unnecessarily often</b><br>
<blockquote>The `ls` command only prints "Found foo items" once when listing the directories recursively.</blockquote></li>
</ul>
</body></html>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.3.0 Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.3.0 Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.2.0</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1642">YARN-1642</a>.
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>RMDTRenewer#getRMClient should use ClientRMProxy</b><br>
<blockquote>RMDTRenewer#getRMClient gets a proxy to the RM in the conf directly instead of going through ClientRMProxy.
{code}
final YarnRPC rpc = YarnRPC.create(conf);
return (ApplicationClientProtocol)rpc.getProxy(ApplicationClientProtocol.class, addr, conf);
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1630">YARN-1630</a>.
Major bug reported by Aditya Acharya and fixed by Aditya Acharya (client)<br>
<b>Introduce timeout for async polling operations in YarnClientImpl</b><br>
<blockquote>I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was "Watiting for application application_1389036507624_0018 to be killed."
The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated.
I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1629">YARN-1629</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer</b><br>
<blockquote>This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1628">YARN-1628</a>.
Major bug reported by Mit Desai and fixed by Vinod Kumar Vavilapalli <br>
<b>TestContainerManagerSecurity fails on trunk</b><br>
<blockquote>The Test fails with the following error
{noformat}
java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1624">YARN-1624</a>.
Major bug reported by Aditya Acharya and fixed by Aditya Acharya (scheduler)<br>
<b>QueuePlacementPolicy format is not easily readable via a JAXB parser</b><br>
<blockquote>The current format for specifying queue placement rules in the fair scheduler allocations file does not lend itself to easy parsing via a JAXB parser. In particular, relying on the tag name to encode information about which rule to use makes it very difficult for an xsd-based JAXB parser to preserve the order of the rules, which is essential.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1623">YARN-1623</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Include queue name in RegisterApplicationMasterResponse</b><br>
<blockquote>This provides the YARN change necessary to support MAPREDUCE-5732.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1618">YARN-1618</a>.
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Fix invalid RMApp transition from NEW to FINAL_SAVING</b><br>
<blockquote>YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing.
Previous description:
ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1616">YARN-1616</a>.
Trivial improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>RMFatalEventDispatcher should log the cause of the event</b><br>
<blockquote>RMFatalEventDispatcher#handle() logs the receipt of an event and its type, but leaves out the cause. The cause captures why the event was raised and would help debugging issues. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1608">YARN-1608</a>.
Trivial bug reported by Karthik Kambatla and fixed by Karthik Kambatla (nodemanager)<br>
<b>LinuxContainerExecutor has a few DEBUG messages at INFO level</b><br>
<blockquote>LCE has a few INFO level log messages meant to be at debug level. In fact, they are logged both at INFO and DEBUG. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1607">YARN-1607</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>TestRM expects the capacity scheduler</b><br>
<blockquote>We should either explicitly set the Capacity Scheduler or make it scheduler-agnostic</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1603">YARN-1603</a>.
Trivial bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Remove two *.orig files which were unexpectedly committed</b><br>
<blockquote>FairScheduler.java.orig and TestFifoScheduler.java.orig</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1601">YARN-1601</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
<b>3rd party JARs are missing from hadoop-dist output</b><br>
<blockquote>With the build changes of YARN-888 we are leaving out all 3rd party JArs used directly by YARN under /share/hadoop/yarn/lib/.
We did not notice this when running minicluster because they all happen to be in the classpath from hadoop-common and hadoop-yarn.
As 3d party JARs are not 'public' interfaces we cannot rely on them being provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd party dependency that yarn uses this would break yarn if yarn does not pull that dependency explicitly).
Also, this will break bigtop hadoop build when they move to use branch-2 as they expect to find jars in /share/hadoop/yarn/lib/</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1600">YARN-1600</a>.
Blocker bug reported by Jason Lowe and fixed by Haohui Mai (resourcemanager)<br>
<b>RM does not startup when security is enabled without spnego configured</b><br>
<blockquote>We have a custom auth filter in front of our various UI pages that handles user authentication. However currently the RM assumes that if security is enabled then the user must have configured spnego as well for the RM web pages which is not true in our case.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1598">YARN-1598</a>.
Critical sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (client , resourcemanager)<br>
<b>HA-related rmadmin commands don't work on a secure cluster</b><br>
<blockquote>The HA-related commands like -getServiceState -checkHealth etc. don't work in a secure cluster.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1579">YARN-1579</a>.
Trivial sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>ActiveRMInfoProto fields should be optional</b><br>
<blockquote>Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1575">YARN-1575</a>.
Critical sub-task reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Public localizer crashes with "Localized unkown resource"</b><br>
<blockquote>The public localizer can crash with the error:
{noformat}
2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1574">YARN-1574</a>.
Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>RMDispatcher should be reset on transition to standby</b><br>
<blockquote>Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService.
Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1573">YARN-1573</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>ZK store should use a private password for root-node-acls</b><br>
<blockquote>Currently, when HA is enabled, ZK store uses cluster-timestamp as the password for root node ACLs to give the Active RM exclusive access to the store. A more private value like a random number might be better. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1568">YARN-1568</a>.
Trivial task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Rename clusterid to clusterId in ActiveRMInfoProto </b><br>
<blockquote>YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field clusterid, which is inconsistent with other fields. Better to fix it immediately than leave the inconsistency. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1567">YARN-1567</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1560">YARN-1560</a>.
Major test reported by Ted Yu and fixed by Ted Yu <br>
<b>TestYarnClient#testAMMRTokens fails with null AMRM token</b><br>
<blockquote>The following can be reproduced locally:
{code}
testAMMRTokens(org.apache.hadoop.yarn.client.api.impl.TestYarnClient) Time elapsed: 3.341 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:48)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertNotNull(Assert.java:218)
at junit.framework.Assert.assertNotNull(Assert.java:211)
at org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testAMMRTokens(TestYarnClient.java:382)
{code}
This test didn't appear in https://builds.apache.org/job/Hadoop-Yarn-trunk/442/consoleFull</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1559">YARN-1559</a>.
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE</b><br>
<blockquote>RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and ClientRMProxy set it. This leads to races as witnessed on - YARN-1482.
Sample trace:
{noformat}
java.lang.IllegalArgumentException: RM does not support this client protocol
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119)
at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88)
at org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1549">YARN-1549</a>.
Major test reported by Ted Yu and fixed by haosdent <br>
<b>TestUnmanagedAMLauncher#testDSShell fails in trunk</b><br>
<blockquote>The following error is reproducible:
{code}
testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec &lt;&lt;&lt; ERROR!
java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447)
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352)
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147)
{code}
See https://builds.apache.org/job/Hadoop-Yarn-trunk/435</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1541">YARN-1541</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn&#8217;t get wrong information.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1527">YARN-1527</a>.
Trivial bug reported by Jian He and fixed by Akira AJISAKA <br>
<b>yarn rmadmin command prints wrong usage info:</b><br>
<blockquote>The usage should be: yarn rmadmin, instead of java RMAdmin, and the -refreshQueues should be in the second line.
{code} Usage: java RMAdmin -refreshQueues
-refreshNodes
-refreshSuperUserGroupsConfiguration
-refreshUserToGroupsMappings
-refreshAdminAcls
-refreshServiceAcl
-getGroups [username]
-help [cmd]
-transitionToActive &lt;serviceId&gt;
-transitionToStandby &lt;serviceId&gt;
-failover [--forcefence] [--forceactive] &lt;serviceId&gt; &lt;serviceId&gt;
-getServiceState &lt;serviceId&gt;
-checkHealth &lt;serviceId&gt;
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1523">YARN-1523</a>.
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Use StandbyException instead of RMNotYetReadyException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1522">YARN-1522</a>.
Major bug reported by Liyin Liang and fixed by Liyin Liang <br>
<b>TestApplicationCleanup.testAppCleanup occasionally fails</b><br>
<blockquote>TestApplicationCleanup is occasionally failing with the error:
{code}
-------------------------------------------------------------------------------
Test set: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
-------------------------------------------------------------------------------
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.215 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
testAppCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup) Time elapsed: 5.555 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: expected:&lt;1&gt; but was:&lt;0&gt;
at org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup.testAppCleanup(TestApplicationCleanup.java:119)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1505">YARN-1505</a>.
Blocker bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>WebAppProxyServer should not set localhost as YarnConfiguration.PROXY_ADDRESS by itself</b><br>
<blockquote>At WebAppProxyServer::startServer(), it will set up YarnConfiguration.PROXY_ADDRESS to localhost:9099 by itself. So, no matter what is the value we set YarnConfiguration.PROXY_ADDRESS in configuration, the proxyserver will bind to localhost:9099</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1491">YARN-1491</a>.
Trivial bug reported by Jonathan Eagles and fixed by Chen He <br>
<b>Upgrade JUnit3 TestCase to JUnit 4</b><br>
<blockquote>There are still four references to test classes that extend from junit.framework.TestCase
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsResourceCalculatorPlugin.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1485">YARN-1485</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs</b><br>
<blockquote>After YARN-1325, the YarnConfiguration.RM_HA_IDS will contain multiple RM_Ids. We need to verify that the RM service addresses configurations have been set for all of RM_Ids.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1482">YARN-1482</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong <br>
<b>WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM</b><br>
<blockquote>This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1481">YARN-1481</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Move internal services logic from AdminService to ResourceManager</b><br>
<blockquote>This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues
- Not easy to follow RM's service life cycle
-- RM adds only AdminService as its service directly.
-- Other services are added to RM when AdminService's init calls RM.activeServices.init()
- Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1463">YARN-1463</a>.
Major test reported by Ted Yu and fixed by Vinod Kumar Vavilapalli <br>
<b>Tests should avoid starting http-server where possible or creates spnego keytab/principals</b><br>
<blockquote>Here is stack trace:
{code}
testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec &lt;&lt;&lt; ERROR!
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED
at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1454">YARN-1454</a>.
Critical bug reported by Jian He and fixed by Karthik Kambatla <br>
<b>TestRMRestart.testRMDelegationTokenRestoredOnRMRestart is failing intermittently </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1451">YARN-1451</a>.
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>TestResourceManager relies on the scheduler assigning multiple containers in a single node update</b><br>
<blockquote>TestResourceManager rely on the capacity scheduler.
It relies on a scheduler that assigns multiple containers in a single heartbeat, which not all schedulers do by default. It also relies on schedulers that don't consider CPU capacities. It would be simple to change the test to use multiple heartbeats and increase the vcore capacities of the nodes in the test.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1450">YARN-1450</a>.
Major bug reported by Akira AJISAKA and fixed by Binglin Chang (applications/distributed-shell)<br>
<b>TestUnmanagedAMLauncher#testDSShell fails on trunk</b><br>
<blockquote>TestUnmanagedAMLauncher fails on trunk. The console output is
{code}
Running org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 35.937 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.558 sec &lt;&lt;&lt; ERROR!
java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=ACCEPTED, ExpectedStates=FINISHED,FAILED,KILLED
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447)
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352)
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:145)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1448">YARN-1448</a>.
Major sub-task reported by Wangda Tan and fixed by Wangda Tan (api , resourcemanager)<br>
<b>AM-RM protocol changes to support container resizing</b><br>
<blockquote>As described in YARN-1197, we need add API in RM to support
1) Add increase request in AllocateRequest
2) Can get successfully increased/decreased containers from RM in AllocateResponse</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1447">YARN-1447</a>.
Major sub-task reported by Wangda Tan and fixed by Wangda Tan (api)<br>
<b>Common PB type definitions for container resizing</b><br>
<blockquote>As described in YARN-1197, we need add some common PB types for container resource change, like ResourceChangeContext, etc. These types will be both used by RM/NM protocols</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1446">YARN-1446</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Change killing application to wait until state store is done</b><br>
<blockquote>When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1435">YARN-1435</a>.
Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>Distributed Shell should not run other commands except "sh", and run the custom script at the same time.</b><br>
<blockquote>Currently, if we want to run custom script at DS. We can do it like this :
--shell_command sh --shell_script custom_script.sh
But it may be better to separate running shell_command and shell_script</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1425">YARN-1425</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument</b><br>
<blockquote>TestRMRestart is failing on trunk. Fixing it. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1423">YARN-1423</a>.
Major improvement reported by Sandy Ryza and fixed by Ted Malaska (scheduler)<br>
<b>Support queue placement by secondary group in the Fair Scheduler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1419">YARN-1419</a>.
Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (scheduler)<br>
<b>TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 </b><br>
<blockquote>QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1416">YARN-1416</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Jian He <br>
<b>InvalidStateTransitions getting reported in multiple test cases even though they pass</b><br>
<blockquote>It might be worth checking why they are reporting this.
Testcase : TestRMAppTransitions, TestRM
there are large number of such errors.
can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1411">YARN-1411</a>.
Critical sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>HA config shouldn't affect NodeManager RPC addresses</b><br>
<blockquote>When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this.
Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1409">YARN-1409</a>.
Major bug reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA <br>
<b>NonAggregatingLogHandler can throw RejectedExecutionException</b><br>
<blockquote>This problem is caused by handling APPLICATION_FINISHED events after calling sched.shotdown() in NonAggregatingLongHandler#serviceStop(). org.apache.hadoop.mapred.TestJobCleanup can fail because of RejectedExecutionException by NonAggregatingLogHandler.
{code}
2013-11-13 10:53:06,970 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Error in dispatcher thread
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@d51df63 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@7a20e369[Shutting down, pool size = 4, active threads = 0, queued tasks = 7, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:121)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:49)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:159)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:95)
at java.lang.Thread.run(Thread.java:724)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1407">YARN-1407</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>RM Web UI and REST APIs should uniformly use YarnApplicationState</b><br>
<blockquote>RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState.
It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1405">YARN-1405</a>.
Major sub-task reported by Yesha Vora and fixed by Jian He <br>
<b>RM hangs on shutdown if calling system.exit in serviceInit or serviceStart</b><br>
<blockquote>Enable yarn.resourcemanager.recovery.enabled=true and Pass a local path to yarn.resourcemanager.fs.state-store.uri. such as "file:///tmp/MYTMP"
if the directory /tmp/MYTMP is not readable or writable, RM should crash and should print "Permission denied Error"
Currently, RM throws "java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist" Error. RM returns Exiting status 1 but RM process does not shutdown.
Snapshot of Resource manager log:
2013-09-27 18:31:36,621 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for nm-tokens
2013-09-27 18:31:36,694 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:379)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:188)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:112)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:635)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
2013-09-27 18:31:36,697 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1403">YARN-1403</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Separate out configuration loading from QueueManager in the Fair Scheduler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1401">YARN-1401</a>.
Major bug reported by Gera Shegalov and fixed by Gera Shegalov (nodemanager)<br>
<b>With zero sleep-delay-before-sigkill.ms, no signal is ever sent</b><br>
<blockquote>If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 then an unresponsive child JVM is never killed. In MRv1, TT used to immediately SIGKILL in this case. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1400">YARN-1400</a>.
Trivial bug reported by Raja Aluri and fixed by Raja Aluri (resourcemanager)<br>
<b>yarn.cmd uses HADOOP_RESOURCEMANAGER_OPTS. Should be YARN_RESOURCEMANAGER_OPTS.</b><br>
<blockquote>yarn.cmd uses HADOOP_RESOURCEMANAGER_OPTS. Should be YARN_RESOURCEMANAGER_OPTS.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1395">YARN-1395</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications/distributed-shell)<br>
<b>Distributed shell application master launched with debug flag can hang waiting for external ls process.</b><br>
<blockquote>Distributed shell launched with the debug flag will run {{ApplicationMaster#dumpOutDebugInfo}}. This method launches an external process to run ls and print the contents of the current working directory. We've seen that this can cause the application master to hang on {{Process#waitFor}}.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1392">YARN-1392</a>.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Allow sophisticated app-to-queue placement policies in the Fair Scheduler</b><br>
<blockquote>Currently the Fair Scheduler supports app-to-queue placement by username. It would be beneficial to allow more sophisticated policies that rely on primary and secondary groups and fallbacks.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1388">YARN-1388</a>.
Trivial bug reported by Liyin Liang and fixed by Liyin Liang (resourcemanager)<br>
<b>Fair Scheduler page always displays blank fair share</b><br>
<blockquote>YARN-1044 fixed min/max/used resource display problem in the scheduler page. But the "Fair Share" has the same problem and need to fix it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1387">YARN-1387</a>.
Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (api)<br>
<b>RMWebServices should use ClientRMService for filtering applications</b><br>
<blockquote>YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1386">YARN-1386</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>NodeManager mistakenly loses resources and relocalizes them</b><br>
<blockquote>When a local resource that should already be present is requested again, the nodemanager checks to see if it still present. However the method it uses to check for presence is via File.exists() as the user of the nodemanager process. If the resource was a private resource localized for another user, it will be localized to a location that is not accessible by the nodemanager user. Therefore File.exists() returns false, the nodemanager mistakenly believes the resource is no longer available, and it proceeds to localize it over and over.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1381">YARN-1381</a>.
Minor bug reported by Ted Yu and fixed by Ted Yu <br>
<b>Same relaxLocality appears twice in exception message of AMRMClientImpl#checkLocalityRelaxationConflict() </b><br>
<blockquote>Here is related code:
{code}
throw new InvalidContainerRequestException("Cannot submit a "
+ "ContainerRequest asking for location " + location
+ " with locality relaxation " + relaxLocality + " when it has "
+ "already been requested with locality relaxation " + relaxLocality);
{code}
The last relaxLocality should be reqs.values().iterator().next().remoteRequest.getRelaxLocality() </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1378">YARN-1378</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Implement a RMStateStore cleaner for deleting application/attempt info</b><br>
<blockquote>Now that we are storing the final state of application/attempt instead of removing application/attempt info on application/attempt completion(YARN-891), we need a separate RMStateStore cleaner for cleaning the application/attempt state.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1374">YARN-1374</a>.
Blocker bug reported by Devaraj K and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Resource Manager fails to start due to ConcurrentModificationException</b><br>
<blockquote>Resource Manager is failing to start with the below ConcurrentModificationException.
{code:xml}
2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException
java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby
2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby
2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
************************************************************/
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1358">YARN-1358</a>.
Minor test reported by Chuan Liu and fixed by Chuan Liu (client)<br>
<b>TestYarnCLI fails on Windows due to line endings</b><br>
<blockquote>The unit test fails on Windows due to incorrect line endings was used for comparing the output from command line output. Error messages are as follows.
{noformat}
junit.framework.ComparisonFailure: expected:&lt;...argument for options[]
usage: application
...&gt; but was:&lt;...argument for options[
]
usage: application
...&gt;
at junit.framework.Assert.assertEquals(Assert.java:85)
at junit.framework.Assert.assertEquals(Assert.java:91)
at org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1357">YARN-1357</a>.
Minor test reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
<b>TestContainerLaunch.testContainerEnvVariables fails on Windows</b><br>
<blockquote>This test fails on Windows due to incorrect use of batch script command. Error messages are as follows.
{noformat}
junit.framework.AssertionFailedError: expected:&lt;java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]&gt; but was:&lt;java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]&gt;
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:74)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1351">YARN-1351</a>.
Trivial bug reported by Konstantin Weitz and fixed by Konstantin Weitz (resourcemanager)<br>
<b>Invalid string format in Fair Scheduler log warn message</b><br>
<blockquote>While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file:
_trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_.
The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1349">YARN-1349</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>yarn.cmd does not support passthrough to any arbitrary class.</b><br>
<blockquote>The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1343">YARN-1343</a>.
Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager)<br>
<b>NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs</b><br>
<blockquote>If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1335">YARN-1335</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication</b><br>
<blockquote>FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places. They both extend SchedulerApplication. We can move a lot of this duplicate code into SchedulerApplication.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1333">YARN-1333</a>.
Major improvement reported by Sandy Ryza and fixed by Tsuyoshi OZAWA (scheduler)<br>
<b>Support blacklisting in the Fair Scheduler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1332">YARN-1332</a>.
Minor improvement reported by Sandy Ryza and fixed by Sebastian Wong <br>
<b>In TestAMRMClient, replace assertTrue with assertEquals where possible</b><br>
<blockquote>TestAMRMClient uses a lot of "assertTrue(amClient.ask.size() == 0)" where "assertEquals(0, amClient.ask.size())" would make it easier to see why it's failing at a glance.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1331">YARN-1331</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>yarn.cmd exits with NoClassDefFoundError trying to run rmadmin or logs</b><br>
<blockquote>The yarn shell script was updated so that the rmadmin and logs sub-commands launch {{org.apache.hadoop.yarn.client.cli.RMAdminCLI}} and {{org.apache.hadoop.yarn.client.cli.LogsCLI}}. The yarn.cmd script also needs to be updated so that the commands work on Windows.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1325">YARN-1325</a>.
Major sub-task reported by Tsuyoshi OZAWA and fixed by Xuan Gong (resourcemanager)<br>
<b>Enabling HA should check Configuration contains multiple RMs</b><br>
<blockquote>Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS). This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs.
One idea is to support "strict mode" to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1323">YARN-1323</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Set HTTPS webapp address along with other RPC addresses in HAUtil</b><br>
<blockquote>YARN-1232 adds the ability to configure multiple RMs, but missed out the https web app address. Need to add that in.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1321">YARN-1321</a>.
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (client)<br>
<b>NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly</b><br>
<blockquote>NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens.
The error observed in the client side is something like:
{code}
ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
NMToken for application attempt : appattempt_1382038445650_0002_000001 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_000001
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1320">YARN-1320</a>.
Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>Custom log4j properties in Distributed shell does not work properly.</b><br>
<blockquote>Distributed shell cannot pick up custom log4j properties (specified with -log_properties). It always uses default log4j properties.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1318">YARN-1318</a>.
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Promote AdminService to an Always-On service and merge in RMHAProtocolService</b><br>
<blockquote>Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1315">YARN-1315</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>TestQueueACLs should also test FairScheduler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1314">YARN-1314</a>.
Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>Cannot pass more than 1 argument to shell command</b><br>
<blockquote>Distributed shell cannot accept more than 1 parameters in argument parts.
All of these commands are treated as 1 parameter:
/usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar &lt;distrubuted shell jar&gt; -shell_command echo -shell_args "'"My name" "is Teddy"'"
/usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar &lt;distrubuted shell jar&gt; -shell_command echo -shell_args "''My name' 'is Teddy''"
/usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar &lt;distrubuted shell jar&gt; -shell_command echo -shell_args "'My name' 'is Teddy'"</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1311">YARN-1311</a>.
Trivial sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Fix app specific scheduler-events' names to be app-attempt based</b><br>
<blockquote>Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1307">YARN-1307</a>.
Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>Rethink znode structure for RM HA</b><br>
<blockquote>Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222:
{quote}
We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore.
{quote}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1306">YARN-1306</a>.
Major bug reported by Wei Yan and fixed by Wei Yan <br>
<b>Clean up hadoop-sls sample-conf according to YARN-1228</b><br>
<blockquote>Move fair scheduler allocations configuration to fair-scheduler.xml, and move all scheduler stuffs to yarn-site.xml</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1305">YARN-1305</a>.
Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException</b><br>
<blockquote>When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit calls HAUtil.setAllRpcAddresses. If the configuration values are null, it just throws IllegalArgumentException.
It's messy to analyse which keys are null, so we should handle it and log the name of keys which are null.
A current log dump is as follows:
{code}
2013-10-15 06:24:53,431 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: Service RMHAProtocolService failed in state INITED; cause: java.lang.IllegalArgumentException: Property value must not be null
java.lang.IllegalArgumentException: Property value must not be null
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
at org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
at org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1303">YARN-1303</a>.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>Allow multiple commands separating with ";" in distributed-shell</b><br>
<blockquote>In shell, we can do "ls; ls" to run 2 commands at once.
In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1300">YARN-1300</a>.
Major bug reported by Ted Yu and fixed by Ted Yu <br>
<b>SLS tests fail because conf puts yarn properties in fair-scheduler.xml</b><br>
<blockquote>I was looking at https://builds.apache.org/job/PreCommit-YARN-Build/2165//testReport/org.apache.hadoop.yarn.sls/TestSLSRunner/testSimulatorRunning/
I am able to reproduce the failure locally.
I found that FairSchedulerConfiguration.getAllocationFile() doesn't read the yarn.scheduler.fair.allocation.file config entry from fair-scheduler.xml
This leads to the following:
{code}
Caused by: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Bad fair scheduler config file: top-level element not &lt;allocations&gt;
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.reloadAllocs(QueueManager.java:302)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.initialize(QueueManager.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1145)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1295">YARN-1295</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors</b><br>
<blockquote>I missed this when working on YARN-1271.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1293">YARN-1293</a>.
Major bug reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA <br>
<b>TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk</b><br>
<blockquote>{quote}
-------------------------------------------------------------------------------
Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
-------------------------------------------------------------------------------
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 0.114 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:48)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertTrue(Assert.java:27)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
{quote}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1290">YARN-1290</a>.
Major improvement reported by Wei Yan and fixed by Wei Yan <br>
<b>Let continuous scheduling achieve more balanced task assignment</b><br>
<blockquote>Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks.
We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1288">YARN-1288</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Make Fair Scheduler ACLs more user friendly</b><br>
<blockquote>The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to "*". Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler.
We should also not trim the acl strings, which makes it impossible to only specify groups in an acl.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1284">YARN-1284</a>.
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)<br>
<b>LCE: Race condition leaves dangling cgroups entries for killed containers</b><br>
<blockquote>When LCE &amp; cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup.
LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like:
{code}
2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_000011 is : 143
2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_000011 of type UPDATE_DIAGNOSTICS_MSG
2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_000011
2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_000011
{code}
CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers.
Still, waiting for extra 500ms seems too expensive.
We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1283">YARN-1283</a>.
Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi <br>
<b>Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY</b><br>
<blockquote>After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect "The url to track the job".
Currently, its printing http://RM:&lt;httpsport&gt;/proxy/application_1381162886563_0001/ instead https://RM:&lt;httpsport&gt;/proxy/application_1381162886563_0001/
http://hostname:8088/proxy/application_1381162886563_0001/ is invalid
hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1
13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/100.00.00.000:8032
13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1
13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
13/10/07 18:39:40 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/10/07 18:39:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1381162886563_0001
13/10/07 18:39:40 INFO impl.YarnClientImpl: Submitted application application_1381162886563_0001 to ResourceManager at hostname/100.00.00.000:8032
13/10/07 18:39:40 INFO mapreduce.Job: The url to track the job: http://hostname:8088/proxy/application_1381162886563_0001/
13/10/07 18:39:40 INFO mapreduce.Job: Running job: job_1381162886563_0001
13/10/07 18:39:46 INFO mapreduce.Job: Job job_1381162886563_0001 running in uber mode : false
13/10/07 18:39:46 INFO mapreduce.Job: map 0% reduce 0%
13/10/07 18:39:53 INFO mapreduce.Job: map 100% reduce 0%
13/10/07 18:39:58 INFO mapreduce.Job: map 100% reduce 100%
13/10/07 18:39:58 INFO mapreduce.Job: Job job_1381162886563_0001 completed successfully
13/10/07 18:39:58 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=26
FILE: Number of bytes written=177279
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=48
HDFS: Number of bytes written=0
HDFS: Number of read operations=1
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=7136
Total time spent by all reduces in occupied slots (ms)=6062
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=4
Map output materialized bytes=22
Input split bytes=48
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=22
Reduce input records=1
Reduce output records=0
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=60
CPU time spent (ms)=1700
Physical memory (bytes) snapshot=567582720
Virtual memory (bytes) snapshot=4292997120
Total committed heap usage (bytes)=846594048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1268">YARN-1268</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>TestFairScheduler.testContinuousScheduling is flaky</b><br>
<blockquote>It looks like there's a timeout in it that's causing it to be flaky.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1265">YARN-1265</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair Scheduler chokes on unhealthy node reconnect</b><br>
<blockquote>Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this.
I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1259">YARN-1259</a>.
Trivial bug reported by Sandy Ryza and fixed by Robert Kanter (scheduler)<br>
<b>In Fair Scheduler web UI, queue num pending and num active apps switched</b><br>
<blockquote>The values returned in FairSchedulerLeafQueueInfo by numPendingApplications and numActiveApplications should be switched.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1258">YARN-1258</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Allow configuring the Fair Scheduler root queue</b><br>
<blockquote>This would be useful for acls, maxRunningApps, scheduling modes, etc.
The allocation file should be able to accept both:
* An implicit root queue
* A root queue at the top of the hierarchy with all queues under/inside of it</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1253">YARN-1253</a>.
Blocker new feature reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)<br>
<b>Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode</b><br>
<blockquote>When using cgroups we require LCE to be configured in the cluster to start containers.
When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues:
* LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
* Because users can impersonate other users, any user would have access to any local file of other users
Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1241">YARN-1241</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>In Fair Scheduler, maxRunningApps does not work for non-leaf queues</b><br>
<blockquote>Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1239">YARN-1239</a>.
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Save version information in the state store</b><br>
<blockquote>When creating root dir for the first time we should write version 1. If root dir exists then we should check that the version in the state store matches the version from config.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1232">YARN-1232</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Configuration to support multiple RMs</b><br>
<blockquote>We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1222">YARN-1222</a>.
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Make improvements in ZKRMStateStore for fencing</b><br>
<blockquote>Using multi-operations for every ZK interaction.
In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1210">YARN-1210</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>During RM restart, RM should start a new attempt only when previous attempt exits for real</b><br>
<blockquote>When RM recovers, it can wait for existing AMs to contact RM back and then kill them forcefully before even starting a new AM. Worst case, RM will start a new AppAttempt after waiting for 10 mins ( the expiry interval). This way we'll minimize multiple AMs racing with each other. This can help issues with downstream components like Pig, Hive and Oozie during RM restart.
In the mean while, new apps will proceed as usual as existing apps wait for recovery.
This can continue to be useful after work-preserving restart, so that AMs which can properly sync back up with RM can continue to run and those that don't are guaranteed to be killed before starting a new attempt.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1199">YARN-1199</a>.
Major improvement reported by Mit Desai and fixed by Mit Desai <br>
<b>Make NM/RM Versions Available</b><br>
<blockquote>Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1188">YARN-1188</a>.
Trivial bug reported by Akira AJISAKA and fixed by Tsuyoshi OZAWA <br>
<b>The context of QueueMetrics becomes 'default' when using FairScheduler</b><br>
<blockquote>I found the context of QueueMetrics changed to 'default' from 'yarn' when I was using FairScheduler.
The context should always be 'yarn' by adding an annotation to FSQueueMetrics like below:
{code}
+ @Metrics(context="yarn")
public class FSQueueMetrics extends QueueMetrics {
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1185">YARN-1185</a>.
Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (resourcemanager)<br>
<b>FileSystemRMStateStore can leave partial files that prevent subsequent recovery</b><br>
<blockquote>FileSystemRMStateStore writes directly to the destination file when storing state. However if the RM were to crash in the middle of the write, the recovery method could encounter a partially-written file and either outright crash during recovery or silently load incomplete state.
To avoid this, the data should be written to a temporary file and renamed to the destination file afterwards.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1183">YARN-1183</a>.
Major bug reported by Andrey Klochkov and fixed by Andrey Klochkov <br>
<b>MiniYARNCluster shutdown takes several minutes intermittently</b><br>
<blockquote>As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for &gt;6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1182">YARN-1182</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>MiniYARNCluster creates and inits the RM/NM only on start()</b><br>
<blockquote>MiniYARNCluster creates and inits the RM/NM only on start(). It should create and init() during init() itself.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1181">YARN-1181</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Augment MiniYARNCluster to support HA mode</b><br>
<blockquote>MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1180">YARN-1180</a>.
Trivial bug reported by Thomas Graves and fixed by Chen He (capacityscheduler)<br>
<b>Update capacity scheduler docs to include types on the configs</b><br>
<blockquote>The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1176">YARN-1176</a>.
Critical bug reported by Thomas Graves and fixed by Jonathan Eagles (resourcemanager)<br>
<b>RM web services ClusterMetricsInfo total nodes doesn't include unhealthy nodes</b><br>
<blockquote>In the web services api for the cluster/metrics, the totalNodes reported doesn't include the unhealthy nodes.
this.totalNodes = activeNodes + lostNodes + decommissionedNodes
+ rebootedNodes;</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1172">YARN-1172</a>.
Major sub-task reported by Karthik Kambatla and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>Convert *SecretManagers in the RM to services</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1145">YARN-1145</a>.
Major bug reported by Rohith and fixed by Rohith <br>
<b>Potential file handle leak in aggregated logs web ui</b><br>
<blockquote>Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed.
Now, it reader is not closed which causing many connections in close_wait state.
hadoopuser@hadoopuser:&gt; jps
*27909* JobHistoryServer
DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS.
hadoopuser@hadoopuser:&gt; netstat -tanlp |grep 50010
tcp 0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java
tcp 1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java
tcp 1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1138">YARN-1138</a>.
Major bug reported by Yingda Chen and fixed by Chuan Liu (api)<br>
<b>yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows</b><br>
<blockquote>yarn-default.xml has "yarn.application.classpath" entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1121">YARN-1121</a>.
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>RMStateStore should flush all pending store events before closing</b><br>
<blockquote>on serviceStop it should wait for all internal pending events to drain before stopping.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1119">YARN-1119</a>.
Major test reported by Robert Parker and fixed by Mit Desai (resourcemanager)<br>
<b>Add ClusterMetrics checks to tho TestRMNodeTransitions tests</b><br>
<blockquote>YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1109">YARN-1109</a>.
Major improvement reported by Sandy Ryza and fixed by haosdent (nodemanager)<br>
<b>Demote NodeManager "Sending out status for container" logs to debug</b><br>
<blockquote>Diagnosing NodeManager and container launch problems is made more difficult by the enormous number of logs like
{code}
Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 18, cluster_timestamp: 1377559361179, }, attemptId: 1, }, id: 1337, }, state: C_RUNNING, diagnostics: "Container killed by the ApplicationMaster.\n", exit_status: -1000
{code}
On an NM with a few containers I am seeing tens of these per second.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1101">YARN-1101</a>.
Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)<br>
<b>Active nodes can be decremented below 0</b><br>
<blockquote>The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class. The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1098">YARN-1098</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Separate out RM services into "Always On" and "Active"</b><br>
<blockquote>From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby().
The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1068">YARN-1068</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Add admin support for HA operations</b><br>
<blockquote>Support HA admin operations to facilitate transitioning the RM to Active and Standby states.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1060">YARN-1060</a>.
Major bug reported by Sandy Ryza and fixed by Niranjan Singh (scheduler)<br>
<b>Two tests in TestFairScheduler are missing @Test annotation</b><br>
<blockquote>Amazingly, these tests appear to pass with the annotations added.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1053">YARN-1053</a>.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Diagnostic message from ContainerExitEvent is ignored in ContainerImpl</b><br>
<blockquote>If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1044">YARN-1044</a>.
Critical bug reported by Sangjin Lee and fixed by Sangjin Lee (resourcemanager , scheduler)<br>
<b>used/min/max resources do not display info in the scheduler page</b><br>
<blockquote>Go to the scheduler page in RM, and click any queue to display the detailed info. You'll find that none of the resources entries (used, min, or max) would display values.
It is because the values contain brackets ("&lt;" and "&gt;") and are not properly html-escaped.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1033">YARN-1033</a>.
Major sub-task reported by Nemon Lou and fixed by Karthik Kambatla <br>
<b>Expose RM active/standby state to Web UI and REST API</b><br>
<blockquote>Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1029">YARN-1029</a>.
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Allow embedding leader election into the RM</b><br>
<blockquote>It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1028">YARN-1028</a>.
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Add FailoverProxyProvider like capability to RMProxy</b><br>
<blockquote>RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1027">YARN-1027</a>.
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Implement RMHAProtocolService</b><br>
<blockquote>Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1022">YARN-1022</a>.
Trivial bug reported by Bikas Saha and fixed by haosdent <br>
<b>Unnecessary INFO logs in AMRMClientAsync</b><br>
<blockquote>Logs like the following should be debug or else every legitimate stop causes unnecessary exception traces in the logs.
464 2013-08-03 20:01:34,459 INFO [AMRM Heartbeater thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: Heartbeater interrupted
465 java.lang.InterruptedException: sleep interrupted
466 at java.lang.Thread.sleep(Native Method)
467 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249)
468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
469 java.lang.InterruptedException
470 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer. java:1961)
471 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
472 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
473 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1021">YARN-1021</a>.
Major new feature reported by Wei Yan and fixed by Wei Yan (scheduler)<br>
<b>Yarn Scheduler Load Simulator</b><br>
<blockquote>The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful.
We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation.
The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM.
To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler.
The simulator will produce real time metrics while executing, including:
* Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc).
* Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits.
The simulator will provide real time charts showing the behavior of the scheduler and its performance.
A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1010">YARN-1010</a>.
Critical improvement reported by Alejandro Abdelnur and fixed by Wei Yan (scheduler)<br>
<b>FairScheduler: decouple container scheduling from nodemanager heartbeats</b><br>
<blockquote>Currently scheduling for a node is done when a node heartbeats.
For large cluster where the heartbeat interval is set to several seconds this delays scheduling of incoming allocations significantly.
We could have a continuous loop scanning all nodes and doing scheduling. If there is availability AMs will get the allocation in the next heartbeat after the one that placed the request.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-985">YARN-985</a>.
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (nodemanager)<br>
<b>Nodemanager should log where a resource was localized</b><br>
<blockquote>When a resource is localized, we should log WHERE on the local disk it was localized. This helps in debugging afterwards (e.g. if the disk was to go bad).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-976">YARN-976</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (documentation)<br>
<b>Document the meaning of a virtual core</b><br>
<blockquote>As virtual cores are a somewhat novel concept, it would be helpful to have thorough documentation that clarifies their meaning.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-895">YARN-895</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>RM crashes if it restarts while the state-store is down</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-891">YARN-891</a>.
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Store completed application information in RM state store</b><br>
<blockquote>Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-888">YARN-888</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
<b>clean up POM dependencies</b><br>
<blockquote>Intermediate 'pom' modules define dependencies inherited by leaf modules.
This is causing issues in intellij IDE.
We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-879">YARN-879</a>.
Major bug reported by Junping Du and fixed by Junping Du <br>
<b>Fix tests w.r.t o.a.h.y.server.resourcemanager.Application</b><br>
<blockquote>getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-819">YARN-819</a>.
Major sub-task reported by Robert Parker and fixed by Robert Parker (nodemanager , resourcemanager)<br>
<b>ResourceManager and NodeManager should check for a minimum allowed version</b><br>
<blockquote>Our use case is during upgrade on a large cluster several NodeManagers may not restart with the new version. Once the RM comes back up the NodeManager will re-register without issue to the RM.
The NM should report the version the RM. The RM should have a configuration to disallow the check (default), equal to the RM (to prevent config change for each release), equal to or greater than RM (to allow NM upgrades), and finally an explicit version or version range.
The RM should also have an configuration on how to treat the mismatch: REJECT, or REBOOT the NM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-807">YARN-807</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>When querying apps by queue, iterating over all apps is inefficient and limiting </b><br>
<blockquote>The question "which apps are in queue x" can be asked via the RM REST APIs, through the ClientRMService, and through the command line. In all these cases, the question is answered by scanning through every RMApp and filtering by the app's queue name.
All schedulers maintain a mapping of queues to applications. I think it would make more sense to ask the schedulers which applications are in a given queue. This is what was done in MR1. This would also have the advantage of allowing a parent queue to return all the applications on leaf queues under it, and allow queue name aliases, as in the way that "root.default" and "default" refer to the same queue in the fair scheduler.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-786">YARN-786</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Expose application resource usage in RM REST API</b><br>
<blockquote>It might be good to require users to explicitly ask for this information, as it's a little more expensive to collect than the other fields in AppInfo.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-764">YARN-764</a>.
Major bug reported by Nemon Lou and fixed by Nemon Lou (resourcemanager)<br>
<b>blank Used Resources on Capacity Scheduler page </b><br>
<blockquote>Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.)
After changing resource.java's toString method by replacing "&lt;&gt;" with "{}",this bug gets fixed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-709">YARN-709</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>verify that new jobs submitted with old RM delegation tokens after RM restart are accepted</b><br>
<blockquote>More elaborate test for restoring RM delegation tokens on RM restart.
New jobs with old RM delegation tokens should be accepted by new RM as long as the token is still valid</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-674">YARN-674</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi (resourcemanager)<br>
<b>Slow or failing DelegationToken renewals on submission itself make RM unavailable</b><br>
<blockquote>This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-649">YARN-649</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>Make container logs available over HTTP in plain text</b><br>
<blockquote>It would be good to make container logs available over the REST API for MAPREDUCE-4362 and so that they can be accessed programatically in general.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-584">YARN-584</a>.
Major bug reported by Sandy Ryza and fixed by Harshit Daga (scheduler)<br>
<b>In scheduler web UIs, queues unexpand on refresh</b><br>
<blockquote>In the fair scheduler web UI, you can expand queue information. Refreshing the page causes the expansions to go away, which is annoying for someone who wants to monitor the scheduler page and needs to reopen all the queues they care about each time.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-546">YARN-546</a>.
Major bug reported by Lohit Vijayarenu and fixed by Sandy Ryza (scheduler)<br>
<b>Allow disabling the Fair Scheduler event log</b><br>
<blockquote>Hadoop 1.0 supported an option to turn on/off FairScheduler event logging using mapred.fairscheduler.eventlog.enabled. In Hadoop 2.0, it looks like this option has been removed (or not ported?) which causes event logging to be enabled by default and there is no way to turn it off.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-478">YARN-478</a>.
Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>fix coverage org.apache.hadoop.yarn.webapp.log</b><br>
<blockquote>fix coverage org.apache.hadoop.yarn.webapp.log
one patch for trunk, branch-2, branch-0.23</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-465">YARN-465</a>.
Major sub-task reported by Aleksey Gorshkov and fixed by Andrey Klochkov <br>
<b>fix coverage org.apache.hadoop.yarn.server.webproxy</b><br>
<blockquote>fix coverage org.apache.hadoop.yarn.server.webproxy
patch YARN-465-trunk.patch for trunk
patch YARN-465-branch-2.patch for branch-2
patch YARN-465-branch-0.23.patch for branch-0.23
There is issue in branch-0.23 . Patch does not creating .keep file.
For fix it need to run commands:
mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-461">YARN-461</a>.
Major bug reported by Sandy Ryza and fixed by Wei Yan (resourcemanager)<br>
<b>Fair scheduler should not accept apps with empty string queue name</b><br>
<blockquote>When an app is submitted with "" for the queue, the RMAppManager passes it on like it does with any other string.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-427">YARN-427</a>.
Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>Coverage fix for org.apache.hadoop.yarn.server.api.*</b><br>
<blockquote>Coverage fix for org.apache.hadoop.yarn.server.api.*
patch YARN-427-trunk.patch for trunk
patch YARN-427-branch-2.patch for branch-2 and branch-0.23</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-425">YARN-425</a>.
Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>coverage fix for yarn api</b><br>
<blockquote>coverage fix for yarn api
patch YARN-425-trunk-a.patch for trunk
patch YARN-425-branch-2.patch for branch-2
patch YARN-425-branch-0.23.patch for branch-0.23</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-408">YARN-408</a>.
Minor bug reported by Mayank Bansal and fixed by Mayank Bansal (scheduler)<br>
<b>Capacity Scheduler delay scheduling should not be disabled by default</b><br>
<blockquote>Capacity Scheduler delay scheduling should not be disabled by default.
Enabling it to number of nodes in one rack.
Thanks,
Mayank</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-353">YARN-353</a>.
Major sub-task reported by Hitesh Shah and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Add Zookeeper-based store implementation for RMStateStore</b><br>
<blockquote>Add store that write RM state data to ZK
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-312">YARN-312</a>.
Major sub-task reported by Junping Du and fixed by Junping Du (api)<br>
<b>Add updateNodeResource in ResourceManagerAdministrationProtocol</b><br>
<blockquote>Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's resource change. For design detail, please refer parent JIRA: YARN-291.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-311">YARN-311</a>.
Major sub-task reported by Junping Du and fixed by Junping Du (resourcemanager , scheduler)<br>
<b>Dynamic node resource configuration: core scheduler changes</b><br>
<blockquote>As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler.
The flow to update node's resource and awareness in resource scheduling is:
1. Resource update is through admin API to RM and take effect on RMNodeImpl.
2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens.
3. Scheduler do resource allocation according to new availableResource in SchedulerNode.
For more design details, please refer proposal and discussions in parent JIRA: YARN-291.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-305">YARN-305</a>.
Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu (resourcemanager)<br>
<b>Fair scheduler logs too many "Node offered to app:..." messages</b><br>
<blockquote>Running fair scheduler YARN shows that RM has lots of messages like the below.
{noformat}
INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Node offered to app: application_1357147147433_0002 reserved: false
{noformat}
They dont seem to tell much and same line is dumped many times in RM log. It would be good to have it improved with node information or moved to some other logging level with enough debug information</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-7">YARN-7</a>.
Major sub-task reported by Arun C Murthy and fixed by Junping Du <br>
<b>Add support for DistributedShell to ask for CPUs along with memory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5744">MAPREDUCE-5744</a>.
Blocker bug reported by Sangjin Lee and fixed by Gera Shegalov <br>
<b>Job hangs because RMContainerAllocator$AssignedRequests.preemptReduce() violates the comparator contract</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5743">MAPREDUCE-5743</a>.
Major bug reported by Ted Yu and fixed by Ted Yu <br>
<b>TestRMContainerAllocator is failing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5729">MAPREDUCE-5729</a>.
Critical bug reported by Karthik Kambatla and fixed by Karthik Kambatla (mrv2)<br>
<b>mapred job -list throws NPE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5725">MAPREDUCE-5725</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>TestNetworkedJob relies on the Capacity Scheduler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5724">MAPREDUCE-5724</a>.
Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (jobhistoryserver)<br>
<b>JobHistoryServer does not start if HDFS is not running</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5723">MAPREDUCE-5723</a>.
Blocker bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam (applicationmaster)<br>
<b>MR AM container log can be truncated or empty</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5694">MAPREDUCE-5694</a>.
Major bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam <br>
<b>MR AM container syslog is empty </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5693">MAPREDUCE-5693</a>.
Major bug reported by Gera Shegalov and fixed by Gera Shegalov (mrv2)<br>
<b>Restore MRv1 behavior for log flush</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5692">MAPREDUCE-5692</a>.
Major improvement reported by Gera Shegalov and fixed by Gera Shegalov (mrv2)<br>
<b>Add explicit diagnostics when a task attempt is killed due to speculative execution</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5689">MAPREDUCE-5689</a>.
Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu <br>
<b>MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5687">MAPREDUCE-5687</a>.
Major test reported by Ted Yu and fixed by Jian He <br>
<b>TestYARNRunner#testResourceMgrDelegate fails with NPE after YARN-1446</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5685">MAPREDUCE-5685</a>.
Blocker bug reported by Yi Song and fixed by Yi Song (client)<br>
<b>getCacheFiles() api doesn't work in WrappedReducer.java due to typo</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5679">MAPREDUCE-5679</a>.
Major bug reported by Liyin Liang and fixed by Liyin Liang <br>
<b>TestJobHistoryParsing has race condition</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5674">MAPREDUCE-5674</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu (client)<br>
<b>Missing start and finish time in mapred.JobStatus</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5672">MAPREDUCE-5672</a>.
Major improvement reported by Gera Shegalov and fixed by Gera Shegalov (mr-am , mrv2)<br>
<b>Provide optional RollingFileAppender for container log4j (syslog)</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5656">MAPREDUCE-5656</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe <br>
<b>bzip2 codec can drop records when reading data in splits</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5650">MAPREDUCE-5650</a>.
Major bug reported by Gera Shegalov and fixed by Gera Shegalov (mrv2)<br>
<b>Job fails when hprof mapreduce.task.profile.map/reduce.params is specified</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5645">MAPREDUCE-5645</a>.
Major bug reported by Jonathan Eagles and fixed by Mit Desai <br>
<b>TestFixedLengthInputFormat fails with native libs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5640">MAPREDUCE-5640</a>.
Trivial improvement reported by Jason Lowe and fixed by Jason Lowe (test)<br>
<b>Rename TestLineRecordReader in jobclient module</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5632">MAPREDUCE-5632</a>.
Major test reported by Ted Yu and fixed by Jonathan Eagles <br>
<b>TestRMContainerAllocator#testUpdatedNodes fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5631">MAPREDUCE-5631</a>.
Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles <br>
<b>TestJobEndNotifier.testNotifyRetries fails with Should have taken more than 5 seconds in jdk7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5625">MAPREDUCE-5625</a>.
Major test reported by Jonathan Eagles and fixed by Mariappan Asokan <br>
<b>TestFixedLengthInputFormat fails in jdk7 environment</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5623">MAPREDUCE-5623</a>.
Major bug reported by Tsuyoshi OZAWA and fixed by Jason Lowe <br>
<b>TestJobCleanup fails because of RejectedExecutionException and NPE.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5616">MAPREDUCE-5616</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>MR Client-AppMaster RPC max retries on socket timeout is too high.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5613">MAPREDUCE-5613</a>.
Major bug reported by Gera Shegalov and fixed by Gera Shegalov (applicationmaster)<br>
<b>DefaultSpeculator holds and checks hashmap that is always empty</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5610">MAPREDUCE-5610</a>.
Major test reported by Jonathan Eagles and fixed by Jonathan Eagles <br>
<b>TestSleepJob fails in jdk7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5604">MAPREDUCE-5604</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5601">MAPREDUCE-5601</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>ShuffleHandler fadvises file regions as DONTNEED even when fetch fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5598">MAPREDUCE-5598</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter (test)<br>
<b>TestUserDefinedCounters.testMapReduceJob is flakey</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5596">MAPREDUCE-5596</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Allow configuring the number of threads used to serve shuffle connections</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5587">MAPREDUCE-5587</a>.
Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles <br>
<b>TestTextOutputFormat fails on JDK7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5586">MAPREDUCE-5586</a>.
Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles <br>
<b>TestCopyMapper#testCopyFailOnBlockSizeDifference fails when run from hadoop-tools/hadoop-distcp directory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5585">MAPREDUCE-5585</a>.
Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles <br>
<b>TestCopyCommitter#testNoCommitAction Fails on JDK7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5569">MAPREDUCE-5569</a>.
Major bug reported by Nathan Roberts and fixed by Nathan Roberts <br>
<b>FloatSplitter is not generating correct splits</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5561">MAPREDUCE-5561</a>.
Critical bug reported by Cindy Li and fixed by Karthik Kambatla <br>
<b>org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl testcase failing on trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5550">MAPREDUCE-5550</a>.
Major bug reported by Vrushali C and fixed by Gera Shegalov <br>
<b>Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5546">MAPREDUCE-5546</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>mapred.cmd on Windows set HADOOP_OPTS incorrectly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5522">MAPREDUCE-5522</a>.
Minor bug reported by Jinghui Wang and fixed by Jinghui Wang (test)<br>
<b>Incorrectly expect the array of JobQueueInfo returned by o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5518">MAPREDUCE-5518</a>.
Trivial bug reported by Albert Chu and fixed by Albert Chu (examples)<br>
<b>Fix typo "can't read paritions file"</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5514">MAPREDUCE-5514</a>.
Blocker bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>TestRMContainerAllocator fails on trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5504">MAPREDUCE-5504</a>.
Major bug reported by Thomas Graves and fixed by Kousuke Saruta (client)<br>
<b>mapred queue -info inconsistent with types</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5487">MAPREDUCE-5487</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (performance , task)<br>
<b>In task processes, JobConf is unnecessarily loaded again in Limits</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5484">MAPREDUCE-5484</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (task)<br>
<b>YarnChild unnecessarily loads job conf twice</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5481">MAPREDUCE-5481</a>.
Blocker bug reported by Jason Lowe and fixed by Sandy Ryza (mrv2 , test)<br>
<b>Enable uber jobs to have multiple reducers </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5464">MAPREDUCE-5464</a>.
Major task reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5463">MAPREDUCE-5463</a>.
Major task reported by Sandy Ryza and fixed by Tsuyoshi OZAWA <br>
<b>Deprecate SLOTS_MILLIS counters</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5457">MAPREDUCE-5457</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5451">MAPREDUCE-5451</a>.
Major bug reported by Mostafa Elhemali and fixed by Yingda Chen <br>
<b>MR uses LD_LIBRARY_PATH which doesn't mean anything in Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5431">MAPREDUCE-5431</a>.
Major bug reported by Timothy St. Clair and fixed by Timothy St. Clair (build)<br>
<b>Missing pom dependency in MR-client</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5411">MAPREDUCE-5411</a>.
Major sub-task reported by Ashwin Shankar and fixed by Ashwin Shankar (jobhistoryserver)<br>
<b>Refresh size of loaded job cache on history server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5409">MAPREDUCE-5409</a>.
Major sub-task reported by Devaraj K and fixed by Gera Shegalov <br>
<b>MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5404">MAPREDUCE-5404</a>.
Major bug reported by Ted Yu and fixed by Ted Yu (jobhistoryserver)<br>
<b>HSAdminServer does not use ephemeral ports in minicluster mode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5386">MAPREDUCE-5386</a>.
Major sub-task reported by Ashwin Shankar and fixed by Ashwin Shankar (jobhistoryserver)<br>
<b>Ability to refresh history server job retention and job cleaner settings</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5380">MAPREDUCE-5380</a>.
Major bug reported by Stephen Chu and fixed by Stephen Chu <br>
<b>Invalid mapred command should return non-zero exit code</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5373">MAPREDUCE-5373</a>.
Major bug reported by Chuan Liu and fixed by Jonathan Eagles <br>
<b>TestFetchFailure.testFetchFailureMultipleReduces could fail intermittently</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5356">MAPREDUCE-5356</a>.
Major sub-task reported by Ashwin Shankar and fixed by Ashwin Shankar (jobhistoryserver)<br>
<b>Ability to refresh aggregated log retention period and check interval </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5332">MAPREDUCE-5332</a>.
Major new feature reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)<br>
<b>Support token-preserving restart of history server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5329">MAPREDUCE-5329</a>.
Major bug reported by Avner BenHanoch and fixed by Avner BenHanoch (mr-am)<br>
<b>APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5316">MAPREDUCE-5316</a>.
Major bug reported by Ashwin Shankar and fixed by Ashwin Shankar (client)<br>
<b>job -list-attempt-ids command does not handle illegal task-state</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5266">MAPREDUCE-5266</a>.
Major new feature reported by Jason Lowe and fixed by Ashwin Shankar (jobhistoryserver)<br>
<b>Ability to refresh retention settings on history server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5265">MAPREDUCE-5265</a>.
Major new feature reported by Jason Lowe and fixed by Ashwin Shankar (jobhistoryserver)<br>
<b>History server admin service to refresh user and superuser group mappings</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5186">MAPREDUCE-5186</a>.
Critical bug reported by Sangjin Lee and fixed by Robert Parker (job submission)<br>
<b>mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5102">MAPREDUCE-5102</a>.
Major test reported by Aleksey Gorshkov and fixed by Andrey Klochkov <br>
<b>fix coverage org.apache.hadoop.mapreduce.lib.db and org.apache.hadoop.mapred.lib.db</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5084">MAPREDUCE-5084</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>fix coverage org.apache.hadoop.mapreduce.v2.app.webapp and org.apache.hadoop.mapreduce.v2.hs.webapp</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5052">MAPREDUCE-5052</a>.
Critical bug reported by Kendall Thrapp and fixed by Chen He (jobhistoryserver , webapps)<br>
<b>Job History UI and web services confusing job start time and job submit time</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5020">MAPREDUCE-5020</a>.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (client)<br>
<b>Compile failure with JDK8</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4680">MAPREDUCE-4680</a>.
Major bug reported by Sandy Ryza and fixed by Robert Kanter (jobhistoryserver)<br>
<b>Job history cleaner should only check timestamps of files in old enough directories</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4421">MAPREDUCE-4421</a>.
Major improvement reported by Arun C Murthy and fixed by Jason Lowe <br>
<b>Run MapReduce framework via the distributed cache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3310">MAPREDUCE-3310</a>.
Major improvement reported by Mathias Herberts and fixed by Alejandro Abdelnur (client)<br>
<b>Custom grouping comparator cannot be set for Combiners</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-1176">MAPREDUCE-1176</a>.
Major new feature reported by BitsOfInfo and fixed by Mariappan Asokan <br>
<b>FixedLengthInputFormat and FixedLengthRecordReader</b><br>
<blockquote>Addition of FixedLengthInputFormat and FixedLengthRecordReader in the org.apache.hadoop.mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
Please see javadoc for more details.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-434">MAPREDUCE-434</a>.
Minor improvement reported by Yoram Arnon and fixed by Aaron Kimball <br>
<b>LocalJobRunner limited to single reducer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5921">HDFS-5921</a>.
Critical bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)<br>
<b>Cannot browse file system via NN web UI if any directory has the sticky bit set</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5876">HDFS-5876</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai (datanode)<br>
<b>SecureDataNodeStarter does not pick up configuration in hdfs-site.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5873">HDFS-5873</a>.
Major bug reported by Yesha Vora and fixed by Haohui Mai <br>
<b>dfs.http.policy should have higher precedence over dfs.https.enable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5845">HDFS-5845</a>.
Blocker bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>SecondaryNameNode dies when checkpointing with cache pools</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5844">HDFS-5844</a>.
Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Fix broken link in WebHDFS.apt.vm</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5842">HDFS-5842</a>.
Major bug reported by Arpit Gupta and fixed by Jing Zhao (security)<br>
<b>Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5841">HDFS-5841</a>.
Major improvement reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Update HDFS caching documentation with new changes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5837">HDFS-5837</a>.
Major bug reported by Bryan Beaudreault and fixed by Tao Luo (namenode)<br>
<b>dfs.namenode.replication.considerLoad does not consider decommissioned nodes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5833">HDFS-5833</a>.
Trivial improvement reported by Bangtao Zhou and fixed by (namenode)<br>
<b>SecondaryNameNode have an incorrect java doc</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5830">HDFS-5830</a>.
Blocker bug reported by Yongjun Zhang and fixed by Yongjun Zhang (caching , hdfs-client)<br>
<b>WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster. </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5825">HDFS-5825</a>.
Minor improvement reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5806">HDFS-5806</a>.
Major bug reported by Nathan Roberts and fixed by Nathan Roberts (balancer)<br>
<b>balancer should set SoTimeout to avoid indefinite hangs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5800">HDFS-5800</a>.
Trivial bug reported by Kousuke Saruta and fixed by Kousuke Saruta (hdfs-client)<br>
<b>Typo: soft-limit for hard-limit in DFSClient</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5789">HDFS-5789</a>.
Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (namenode)<br>
<b>Some of snapshot APIs missing checkOperation double check in fsn</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5788">HDFS-5788</a>.
Major improvement reported by Nathan Roberts and fixed by Nathan Roberts (namenode)<br>
<b>listLocatedStatus response can be very large</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5784">HDFS-5784</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)<br>
<b>reserve space in edit log header and fsimage header for feature flag section</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5777">HDFS-5777</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Update LayoutVersion for the new editlog op OP_ADD_BLOCK</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5766">HDFS-5766</a>.
Major bug reported by Liang Xie and fixed by Liang Xie (hdfs-client)<br>
<b>In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5762">HDFS-5762</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5756">HDFS-5756</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)<br>
<b>hadoopRzOptionsSetByteBufferPool does not accept NULL argument, contrary to docs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5748">HDFS-5748</a>.
Major improvement reported by Kihwal Lee and fixed by Haohui Mai <br>
<b>Too much information shown in the dfs health page.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5747">HDFS-5747</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Arpit Agarwal (namenode)<br>
<b>BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5728">HDFS-5728</a>.
Critical bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode)<br>
<b>[Diskfull] Block recovery will fail if the metafile does not have crc for all chunks of the block</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5721">HDFS-5721</a>.
Minor improvement reported by Ted Yu and fixed by Ted Yu <br>
<b>sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5719">HDFS-5719</a>.
Minor bug reported by Ted Yu and fixed by Ted Yu (namenode)<br>
<b>FSImage#doRollback() should close prevState before return</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5710">HDFS-5710</a>.
Major bug reported by Ted Yu and fixed by Uma Maheswara Rao G <br>
<b>FSDirectory#getFullPathName should check inodes against null</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5704">HDFS-5704</a>.
Major bug reported by Suresh Srinivas and fixed by Jing Zhao (namenode)<br>
<b>Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK</b><br>
<blockquote>Add a new editlog record (OP_ADD_BLOCK) that only records allocation of the new block instead of the entire block list, on every block allocation.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5703">HDFS-5703</a>.
Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (webhdfs)<br>
<b>Add support for HTTPS and swebhdfs to HttpFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5695">HDFS-5695</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai (test)<br>
<b>Clean up TestOfflineEditsViewer and OfflineEditsViewerHelper</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5691">HDFS-5691</a>.
Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Fix typo in ShortCircuitLocalRead document</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5690">HDFS-5690</a>.
Blocker bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>DataNode fails to start in secure mode when dfs.http.policy equals to HTTP_ONLY</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5681">HDFS-5681</a>.
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>renewLease should not hold fsn write lock</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5677">HDFS-5677</a>.
Minor improvement reported by Vincent Sheffer and fixed by Vincent Sheffer (datanode , ha)<br>
<b>Need error checking for HA cluster configuration</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5676">HDFS-5676</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>fix inconsistent synchronization of CachingStrategy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5675">HDFS-5675</a>.
Minor bug reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov (benchmarks)<br>
<b>Add Mkdirs operation to NNThroughputBenchmark</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5674">HDFS-5674</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Editlog code cleanup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5671">HDFS-5671</a>.
Critical bug reported by JamesLi and fixed by JamesLi (hdfs-client)<br>
<b>Fix socket leak in DFSInputStream#getBlockReader</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5667">HDFS-5667</a>.
Major sub-task reported by Eric Sirianni and fixed by Arpit Agarwal (datanode)<br>
<b>Include DatanodeStorage in StorageReport</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5666">HDFS-5666</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Jimmy Xiang (namenode)<br>
<b>Fix inconsistent synchronization in BPOfferService</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5663">HDFS-5663</a>.
Major improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client)<br>
<b>make the retry time and interval value configurable in openInfo()</b><br>
<blockquote>Makes the retries and time between retries getting the length of the last block on file configurable. Below are the new configurations.
dfs.client.retry.times.get-last-block-length
dfs.client.retry.interval-ms.get-last-block-length
They are set to the 3 and 4000 respectively, these being what was previously hardcoded.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5662">HDFS-5662</a>.
Major improvement reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>Can't decommission a DataNode due to file's replication factor larger than the rest of the cluster size</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5661">HDFS-5661</a>.
Major bug reported by Benoy Antony and fixed by Benoy Antony <br>
<b>Browsing FileSystem via web ui, should use datanode's fqdn instead of ip address</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5657">HDFS-5657</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>race condition causes writeback state error in NFS gateway</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5652">HDFS-5652</a>.
Minor improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client)<br>
<b>refactoring/uniforming invalid block token exception handling in DFSInputStream</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5649">HDFS-5649</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Unregister NFS and Mount service when NFS gateway is shutting down</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5637">HDFS-5637</a>.
Major improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client , security)<br>
<b>try to refeatchToken while local read InvalidToken occurred</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5634">HDFS-5634</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>allow BlockReaderLocal to switch between checksumming and not</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5633">HDFS-5633</a>.
Minor improvement reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Improve OfflineImageViewer to use less memory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5629">HDFS-5629</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Support HTTPS in JournalNode and SecondaryNameNode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5592">HDFS-5592</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B <br>
<b>"DIR* completeFile: /file is closed by DFSClient_" should be logged only for successful closure of the file.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5590">HDFS-5590</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Block ID and generation stamp may be reused when persistBlocks is set to false</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5587">HDFS-5587</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>add debug information when NFS fails to start with duplicate user or group names</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5582">HDFS-5582</a>.
Minor bug reported by Henry Hung and fixed by sathish <br>
<b>hdfs getconf -excludeFile or -includeFile always failed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5581">HDFS-5581</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B (namenode)<br>
<b>NameNodeFsck should use only one instance of BlockPlacementPolicy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5580">HDFS-5580</a>.
Major bug reported by Binglin Chang and fixed by Binglin Chang <br>
<b>Infinite loop in Balancer.waitForMoveCompletion</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5579">HDFS-5579</a>.
Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong (namenode)<br>
<b>Under construction files make DataNode decommission take very long hours</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5577">HDFS-5577</a>.
Trivial improvement reported by Brandon Li and fixed by Brandon Li (documentation)<br>
<b>NFS user guide update</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5568">HDFS-5568</a>.
Major improvement reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)<br>
<b>Support inclusion of snapshot paths in Namenode fsck</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5563">HDFS-5563</a>.
Major improvement reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>NFS gateway should commit the buffered data when read request comes after write to the same file</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5561">HDFS-5561</a>.
Minor improvement reported by Fengdong Yu and fixed by Haohui Mai (namenode)<br>
<b>FSNameSystem#getNameJournalStatus() in JMX should return plain text instead of HTML</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5560">HDFS-5560</a>.
Major bug reported by Josh Elser and fixed by Josh Elser <br>
<b>Trash configuration log statements prints incorrect units</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5558">HDFS-5558</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>LeaseManager monitor thread can crash if the last block is complete but another block is not.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5557">HDFS-5557</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Write pipeline recovery for the last packet in the block may cause rejection of valid replicas</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5552">HDFS-5552</a>.
Major bug reported by Shinichi Yamashita and fixed by Haohui Mai (namenode)<br>
<b>Fix wrong information of "Cluster summay" in dfshealth.html</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5548">HDFS-5548</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai (nfs)<br>
<b>Use ConcurrentHashMap in portmap</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5545">HDFS-5545</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Allow specifying endpoints for listeners in HttpServer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5544">HDFS-5544</a>.
Minor bug reported by sathish and fixed by sathish (hdfs-client)<br>
<b>Adding Test case For Checking dfs.checksum type as NULL value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5540">HDFS-5540</a>.
Minor bug reported by Binglin Chang and fixed by Binglin Chang <br>
<b>Fix intermittent failure in TestBlocksWithNotEnoughRacks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5538">HDFS-5538</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>URLConnectionFactory should pick up the SSL related configuration by default</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5536">HDFS-5536</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Implement HTTP policy for Namenode and DataNode</b><br>
<blockquote>Add new HTTP policy configuration. Users can use "dfs.http.policy" to control the HTTP endpoints for NameNode and DataNode. Specifically, The following values are supported:
- HTTP_ONLY : Service is provided only on http
- HTTPS_ONLY : Service is provided only on https
- HTTP_AND_HTTPS : Service is provided both on http and https
hadoop.ssl.enabled and dfs.https.enabled are deprecated. When the deprecated configuration properties are still configured, currently http policy is decided based on the following rules:
1. If dfs.http.policy is set to HTTPS_ONLY or HTTP_AND_HTTPS. It picks the specified policy, otherwise it proceeds to 2~4.
2. It picks HTTPS_ONLY if hadoop.ssl.enabled equals to true.
3. It picks HTTP_AND_HTTPS if dfs.https.enable equals to true.
4. It picks HTTP_ONLY for other configurations.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5533">HDFS-5533</a>.
Minor bug reported by Binglin Chang and fixed by Binglin Chang (snapshots)<br>
<b>Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5532">HDFS-5532</a>.
Major improvement reported by Vinayakumar B and fixed by Vinayakumar B (webhdfs)<br>
<b>Enable the webhdfs by default to support new HDFS web UI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5526">HDFS-5526</a>.
Blocker bug reported by Tsz Wo (Nicholas), SZE and fixed by Kihwal Lee (datanode)<br>
<b>Datanode cannot roll back to previous layout version</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5525">HDFS-5525</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Inline dust templates</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5519">HDFS-5519</a>.
Minor sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>COMMIT handler should update the commit status after sync</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5514">HDFS-5514</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>FSNamesystem's fsLock should allow custom implementation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5506">HDFS-5506</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Use URLConnectionFactory in DelegationTokenFetcher</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5504">HDFS-5504</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)<br>
<b>In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5502">HDFS-5502</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Fix HTTPS support in HsftpFileSystem</b><br>
<blockquote>Fix the https support in HsftpFileSystem. With the change the client now verifies the server certificate. In particular, client side will verify the Common Name of the certificate using a strategy specified by the configuration property "hadoop.ssl.hostname.verifier".</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5495">HDFS-5495</a>.
Major improvement reported by Andrew Wang and fixed by Jarek Jarcec Cecho <br>
<b>Remove further JUnit3 usages from HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5489">HDFS-5489</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Use TokenAspect in WebHDFSFileSystem</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5488">HDFS-5488</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Clean up TestHftpURLTimeout</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5487">HDFS-5487</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Introduce unit test for TokenAspect</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5476">HDFS-5476</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5474">HDFS-5474</a>.
Major bug reported by Uma Maheswara Rao G and fixed by sathish (snapshots)<br>
<b>Deletesnapshot can make Namenode in safemode on NN restarts.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5469">HDFS-5469</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Add configuration property for the sub-directroy export path</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5467">HDFS-5467</a>.
Trivial improvement reported by Andrew Wang and fixed by Shinichi Yamashita <br>
<b>Remove tab characters in hdfs-default.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5458">HDFS-5458</a>.
Major bug reported by Andrew Wang and fixed by Mike Mellenthin (datanode)<br>
<b>Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5456">HDFS-5456</a>.
Critical bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5454">HDFS-5454</a>.
Minor sub-task reported by Eric Sirianni and fixed by Arpit Agarwal (datanode)<br>
<b>DataNode UUID should be assigned prior to FsDataset initialization</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5449">HDFS-5449</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>WebHdfs compatibility broken between 2.2 and 1.x / 23.x</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5444">HDFS-5444</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Choose default web UI based on browser capabilities</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5443">HDFS-5443</a>.
Major bug reported by Uma Maheswara Rao G and fixed by Jing Zhao (snapshots)<br>
<b>Delete 0-sized block when deleting an under-construction file that is included in snapshot</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5440">HDFS-5440</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Extract the logic of handling delegation tokens in HftpFileSystem to the TokenAspect class</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5438">HDFS-5438</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
<b>Flaws in block report processing can cause data loss</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5436">HDFS-5436</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5434">HDFS-5434</a>.
Minor bug reported by Buddy and fixed by (namenode)<br>
<b>Write resiliency for replica count 1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5433">HDFS-5433</a>.
Critical bug reported by Aaron T. Myers and fixed by Aaron T. Myers (snapshots)<br>
<b>When reloading fsimage during checkpointing, we should clear existing snapshottable directories</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5432">HDFS-5432</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode , test)<br>
<b>TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5428">HDFS-5428</a>.
Major bug reported by Vinayakumar B and fixed by Jing Zhao (snapshots)<br>
<b>under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5427">HDFS-5427</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)<br>
<b>not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5425">HDFS-5425</a>.
Major bug reported by sathish and fixed by Jing Zhao (namenode , snapshots)<br>
<b>Renaming underconstruction file with snapshots can make NN failure on restart</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5413">HDFS-5413</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (scripts)<br>
<b>hdfs.cmd does not support passthrough to any arbitrary class.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5407">HDFS-5407</a>.
Trivial bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Fix typos in DFSClientCache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5406">HDFS-5406</a>.
Major sub-task reported by Arpit Agarwal and fixed by Arpit Agarwal (datanode)<br>
<b>Send incremental block reports for all storages in a single call</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5403">HDFS-5403</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (webhdfs)<br>
<b>WebHdfs client cannot communicate with older WebHdfs servers post HDFS-5306</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5400">HDFS-5400</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>DFS_CLIENT_MMAP_CACHE_THREAD_RUNS_PER_TIMEOUT constant is set to the wrong value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5399">HDFS-5399</a>.
Major improvement reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Revisit SafeModeException and corresponding retry policies</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5393">HDFS-5393</a>.
Minor sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Serve bootstrap and jQuery locally</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5382">HDFS-5382</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Implement the UI of browsing filesystems in HTML 5 page</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5379">HDFS-5379</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Update links to datanode information in dfshealth.html</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5375">HDFS-5375</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (tools)<br>
<b>hdfs.cmd does not expose several snapshot commands.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5374">HDFS-5374</a>.
Trivial bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>Remove deadcode in DFSOutputStream</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5372">HDFS-5372</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Vinayakumar B (namenode)<br>
<b>In FSNamesystem, hasReadLock() returns false if the current thread holds the write lock</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5371">HDFS-5371</a>.
Minor improvement reported by Jing Zhao and fixed by Jing Zhao (ha , test)<br>
<b>Let client retry the same NN when "dfs.client.test.drop.namenode.response.number" is enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5370">HDFS-5370</a>.
Trivial bug reported by Kousuke Saruta and fixed by Kousuke Saruta (hdfs-client)<br>
<b>Typo in Error Message: different between range in condition and range in error message</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5365">HDFS-5365</a>.
Major bug reported by Radim Kolar and fixed by Radim Kolar (build , libhdfs)<br>
<b>Fix libhdfs compile error on FreeBSD9</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5364">HDFS-5364</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Add OpenFileCtx cache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5363">HDFS-5363</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Refactor WebHdfsFileSystem: move SPENGO-authenticated connection creation to URLConnectionFactory </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5360">HDFS-5360</a>.
Minor improvement reported by Shinichi Yamashita and fixed by Shinichi Yamashita (snapshots)<br>
<b>Improvement of usage message of renameSnapshot and deleteSnapshot</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5353">HDFS-5353</a>.
Blocker bug reported by Haohui Mai and fixed by Colin Patrick McCabe <br>
<b>Short circuit reads fail when dfs.encrypt.data.transfer is enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5352">HDFS-5352</a>.
Minor bug reported by Ted Yu and fixed by Ted Yu <br>
<b>Server#initLog() doesn't close InputStream in httpfs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5350">HDFS-5350</a>.
Minor improvement reported by Rob Weltman and fixed by Jimmy Xiang (namenode)<br>
<b>Name Node should report fsimage transfer time as a metric</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5347">HDFS-5347</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (documentation)<br>
<b>add HDFS NFS user guide</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5346">HDFS-5346</a>.
Major bug reported by Kihwal Lee and fixed by Ravi Prakash (namenode , performance)<br>
<b>Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5344">HDFS-5344</a>.
Minor improvement reported by sathish and fixed by sathish (snapshots , tools)<br>
<b>Make LsSnapshottableDir as Tool interface implementation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5343">HDFS-5343</a>.
Major bug reported by sathish and fixed by sathish (hdfs-client)<br>
<b>When cat command is issued on snapshot files getting unexpected result</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5342">HDFS-5342</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Provide more information in the FSNamesystem JMX interfaces</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5341">HDFS-5341</a>.
Major bug reported by qus-jiawei and fixed by qus-jiawei (datanode)<br>
<b>Reduce fsdataset lock duration during directory scanning.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5338">HDFS-5338</a>.
Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Add a conf to disable hostname check in DN registration</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5337">HDFS-5337</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>should do hsync for a commit request even there is no pending writes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5336">HDFS-5336</a>.
Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (namenode)<br>
<b>DataNode should not output 'StartupProgress' metrics</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5335">HDFS-5335</a>.
Major bug reported by Arpit Gupta and fixed by Haohui Mai <br>
<b>DFSOutputStream#close() keeps throwing exceptions when it is called multiple times</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5334">HDFS-5334</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Implement dfshealth.jsp in HTML pages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5331">HDFS-5331</a>.
Major improvement reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)<br>
<b>make SnapshotDiff.java to a o.a.h.util.Tool interface implementation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5330">HDFS-5330</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>fix readdir and readdirplus for large directories</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5329">HDFS-5329</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (namenode , nfs)<br>
<b>Update FSNamesystem#getListing() to handle inode path in startAfter token</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5325">HDFS-5325</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Remove WebHdfsFileSystem#ConnRunner</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5323">HDFS-5323</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)<br>
<b>Remove some deadcode in BlockManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5322">HDFS-5322</a>.
Major bug reported by Arpit Gupta and fixed by Jing Zhao (ha)<br>
<b>HDFS delegation token not found in cache errors seen on secure HA clusters</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5317">HDFS-5317</a>.
Critical sub-task reported by Suresh Srinivas and fixed by Haohui Mai <br>
<b>Go back to DFS Home link does not work on datanode webUI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5316">HDFS-5316</a>.
Critical sub-task reported by Suresh Srinivas and fixed by Haohui Mai <br>
<b>Namenode ignores the default https port</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5312">HDFS-5312</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Generate HTTP / HTTPS URL in DFSUtil#getInfoServer() based on the configured http policy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5307">HDFS-5307</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Support both HTTP and HTTPS in jsp pages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5305">HDFS-5305</a>.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>Add https support in HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5297">HDFS-5297</a>.
Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Fix dead links in HDFS site documents</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5291">HDFS-5291</a>.
Critical bug reported by Arpit Gupta and fixed by Jing Zhao (ha)<br>
<b>Clients need to retry when Active NN is in SafeMode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5288">HDFS-5288</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai (nfs)<br>
<b>Close idle connections in portmap</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5283">HDFS-5283</a>.
Critical bug reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)<br>
<b>NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5281">HDFS-5281</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>COMMIT request should not block</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5276">HDFS-5276</a>.
Major bug reported by Chengxiang Li and fixed by Colin Patrick McCabe <br>
<b>FileSystem.Statistics got performance issue on multi-thread read/write.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5267">HDFS-5267</a>.
Minor improvement reported by Junping Du and fixed by Junping Du <br>
<b>Remove volatile from LightWeightHashSet</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5260">HDFS-5260</a>.
Major new feature reported by Chris Nauroth and fixed by Chris Nauroth (hdfs-client , libhdfs)<br>
<b>Merge zero-copy memory-mapped HDFS client reads to trunk and branch-2.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5257">HDFS-5257</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B (hdfs-client , namenode)<br>
<b>addBlock() retry should return LocatedBlock with locations else client will get AIOBE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5252">HDFS-5252</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Stable write is not handled correctly in someplace</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5240">HDFS-5240</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>Separate formatting from logging in the audit logger API</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5239">HDFS-5239</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>Allow FSNamesystem lock fairness to be configurable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5220">HDFS-5220</a>.
Major improvement reported by Rob Weltman and fixed by Jimmy Xiang (namenode)<br>
<b>Expose group resolution time as metric</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5207">HDFS-5207</a>.
Major improvement reported by Junping Du and fixed by Junping Du (namenode)<br>
<b>In BlockPlacementPolicy, update 2 parameters of chooseTarget()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5188">HDFS-5188</a>.
Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Clean up BlockPlacementPolicy and its implementations</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5171">HDFS-5171</a>.
Major sub-task reported by Brandon Li and fixed by Haohui Mai (nfs)<br>
<b>NFS should create input stream for a file and try to share it with multiple read requests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5170">HDFS-5170</a>.
Trivial bug reported by Andrew Wang and fixed by Andrew Wang <br>
<b>BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5164">HDFS-5164</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)<br>
<b>deleteSnapshot should check if OperationCategory.WRITE is possible before taking write lock</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5144">HDFS-5144</a>.
Minor improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Document time unit to NameNodeMetrics.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5136">HDFS-5136</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>MNT EXPORT should give the full group list which can mount the exports</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5130">HDFS-5130</a>.
Minor test reported by Binglin Chang and fixed by Binglin Chang (test)<br>
<b>Add test for snapshot related FsShell and DFSAdmin commands</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5122">HDFS-5122</a>.
Major bug reported by Arpit Gupta and fixed by Haohui Mai (ha , webhdfs)<br>
<b>Support failover and retry in WebHdfsFileSystem for NN HA</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5110">HDFS-5110</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Change FSDataOutputStream to HdfsDataOutputStream for opened streams to fix type cast error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5107">HDFS-5107</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Fix array copy error in Readdir and Readdirplus responses</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5104">HDFS-5104</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Support dotdot name in NFS LOOKUP operation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5093">HDFS-5093</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
<b>TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5078">HDFS-5078</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Support file append in NFSv3 gateway to enable data streaming to HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5075">HDFS-5075</a>.
Major bug reported by Timothy St. Clair and fixed by Timothy St. Clair <br>
<b>httpfs-config.sh calls out incorrect env script name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5074">HDFS-5074</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)<br>
<b>Allow starting up from an fsimage checkpoint in the middle of a segment</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5073">HDFS-5073</a>.
Minor bug reported by Kihwal Lee and fixed by Arpit Agarwal (test)<br>
<b>TestListCorruptFileBlocks fails intermittently </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5071">HDFS-5071</a>.
Major sub-task reported by Kihwal Lee and fixed by Brandon Li (nfs)<br>
<b>Change hdfs-nfs parent project to hadoop-project</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5069">HDFS-5069</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Include hadoop-nfs and hadoop-hdfs-nfs into hadoop dist for NFS deployment</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5068">HDFS-5068</a>.
Major improvement reported by Konstantin Shvachko and fixed by Konstantin Shvachko (benchmarks)<br>
<b>Convert NNThroughputBenchmark to a Tool to allow generic options.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5065">HDFS-5065</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic (hdfs-client , test)<br>
<b>TestSymlinkHdfsDisable fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5043">HDFS-5043</a>.
Major bug reported by Brandon Li and fixed by Brandon Li <br>
<b>For HdfsFileStatus, set default value of childrenNum to -1 instead of 0 to avoid confusing applications</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5037">HDFS-5037</a>.
Critical improvement reported by Todd Lipcon and fixed by Andrew Wang (ha , namenode)<br>
<b>Active NN should trigger its own edit log rolls</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5035">HDFS-5035</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>getFileLinkStatus and rename do not correctly check permissions of symlinks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5034">HDFS-5034</a>.
Trivial improvement reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Remove debug prints from getFileLinkInfo</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5023">HDFS-5023</a>.
Major bug reported by Ravi Prakash and fixed by Mit Desai (snapshots , test)<br>
<b>TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5014">HDFS-5014</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode , ha)<br>
<b>BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5004">HDFS-5004</a>.
Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (namenode)<br>
<b>Add additional JMX bean for NameNode status data</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4997">HDFS-4997</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)<br>
<b>libhdfs doesn't return correct error codes in most cases</b><br>
<blockquote>libhdfs now returns correct codes in errno. Previously, due to a bug, many functions set errno to 255 instead of the more specific error code.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4995">HDFS-4995</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
<b>Make getContentSummary() less expensive</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4994">HDFS-4994</a>.
Minor bug reported by Kihwal Lee and fixed by Robert Parker (namenode)<br>
<b>Audit log getContentSummary() calls</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4983">HDFS-4983</a>.
Major improvement reported by Harsh J and fixed by Yongjun Zhang (webhdfs)<br>
<b>Numeric usernames do not work with WebHDFS FS</b><br>
<blockquote>Add a new configuration property "dfs.webhdfs.user.provider.user.pattern" for specifying user name filters for WebHDFS.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4962">HDFS-4962</a>.
Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (nfs)<br>
<b>Use enum for nfs constants</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4949">HDFS-4949</a>.
Major new feature reported by Andrew Wang and fixed by Andrew Wang (datanode , namenode)<br>
<b>Centralized cache management in HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4948">HDFS-4948</a>.
Major bug reported by Robert Joseph Evans and fixed by Brandon Li <br>
<b>mvn site for hadoop-hdfs-nfs fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4947">HDFS-4947</a>.
Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)<br>
<b>Add NFS server export table to control export by hostname or IP range</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4885">HDFS-4885</a>.
Major sub-task reported by Junping Du and fixed by Junping Du <br>
<b>Update verifyBlockPlacement() API in BlockPlacementPolicy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4879">HDFS-4879</a>.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (namenode)<br>
<b>Add "blocked ArrayList" collection to avoid CMS full GCs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4860">HDFS-4860</a>.
Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (namenode)<br>
<b>Add additional attributes to JMX beans</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4816">HDFS-4816</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>transitionToActive blocks if the SBN is doing checkpoint image transfer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4772">HDFS-4772</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>Add number of children in HdfsFileStatus</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4763">HDFS-4763</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Add script changes/utility for starting NFS gateway</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4762">HDFS-4762</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Provide HDFS based NFSv3 and Mountd implementation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4657">HDFS-4657</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)<br>
<b>Limit the number of blocks logged by the NN after a block report to a configurable value.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4633">HDFS-4633</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (hdfs-client , test)<br>
<b>TestDFSClientExcludedNodes fails sporadically if excluded nodes cache expires too quickly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4517">HDFS-4517</a>.
Major test reported by Vadim Bondarev and fixed by Ivan A. Veselovsky <br>
<b>Cover class RemoteBlockReader with unit tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4516">HDFS-4516</a>.
Critical bug reported by Uma Maheswara Rao G and fixed by Vinayakumar B (namenode)<br>
<b>Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4512">HDFS-4512</a>.
Major test reported by Vadim Bondarev and fixed by Vadim Bondarev <br>
<b>Cover package org.apache.hadoop.hdfs.server.common with tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4511">HDFS-4511</a>.
Major test reported by Vadim Bondarev and fixed by Andrey Klochkov <br>
<b>Cover package org.apache.hadoop.hdfs.tools with unit test</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4510">HDFS-4510</a>.
Major test reported by Vadim Bondarev and fixed by Andrey Klochkov <br>
<b>Cover classes ClusterJspHelper/NamenodeJspHelper with unit tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4491">HDFS-4491</a>.
Major test reported by Tsuyoshi OZAWA and fixed by Andrey Klochkov (test)<br>
<b>Parallel testing HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4376">HDFS-4376</a>.
Major bug reported by Aaron T. Myers and fixed by Junping Du (balancer)<br>
<b> Fix several race conditions in Balancer and resolve intermittent timeout of TestBalancerWithNodeGroup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4329">HDFS-4329</a>.
Major bug reported by Andy Isaacson and fixed by Cristina L. Abad (hdfs-client)<br>
<b>DFSShell issues with directories with spaces in name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4278">HDFS-4278</a>.
Major improvement reported by Harsh J and fixed by Kousuke Saruta (datanode , namenode)<br>
<b>Log an ERROR when DFS_BLOCK_ACCESS_TOKEN_ENABLE config is disabled but security is turned on.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4201">HDFS-4201</a>.
Critical bug reported by Eli Collins and fixed by Jimmy Xiang (namenode)<br>
<b>NPE in BPServiceActor#sendHeartBeat</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4096">HDFS-4096</a>.
Major sub-task reported by Jing Zhao and fixed by Haohui Mai (datanode , namenode)<br>
<b>Add snapshot information to namenode WebUI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3987">HDFS-3987</a>.
Major sub-task reported by Alejandro Abdelnur and fixed by Haohui Mai <br>
<b>Support webhdfs over HTTPS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3981">HDFS-3981</a>.
Major bug reported by Xiaobo Peng and fixed by Xiaobo Peng (namenode)<br>
<b>access time is set without holding FSNamesystem write lock</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3934">HDFS-3934</a>.
Minor bug reported by Andy Isaacson and fixed by Colin Patrick McCabe <br>
<b>duplicative dfs_hosts entries handled wrong</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2933">HDFS-2933</a>.
Major improvement reported by Philip Zeyliger and fixed by Vivek Ganesan (datanode)<br>
<b>Improve DataNode Web UI Index Page</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10317">HADOOP-10317</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Rename branch-2.3 release version from 2.4.0-SNAPSHOT to 2.3.0-SNAPSHOT</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10313">HADOOP-10313</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>Script and jenkins job to produce Hadoop release artifacts</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10311">HADOOP-10311</a>.
Blocker bug reported by Suresh Srinivas and fixed by Alejandro Abdelnur <br>
<b>Cleanup vendor names from the code base</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10310">HADOOP-10310</a>.
Blocker bug reported by Aaron T. Myers and fixed by Aaron T. Myers (security)<br>
<b>SaslRpcServer should be initialized even when no secret manager present</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10305">HADOOP-10305</a>.
Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (metrics)<br>
<b>Add "rpc.metrics.quantile.enable" and "rpc.metrics.percentiles.intervals" to core-default.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10292">HADOOP-10292</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Restore HttpServer from branch-2.2 in branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10291">HADOOP-10291</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestSecurityUtil#testSocketAddrWithIP fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10288">HADOOP-10288</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (util)<br>
<b>Explicit reference to Log4JLogger breaks non-log4j users</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10274">HADOOP-10274</a>.
Minor improvement reported by takeshi.miao and fixed by takeshi.miao (security)<br>
<b>Lower the logging level from ERROR to WARN for UGI.doAs method</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10273">HADOOP-10273</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (build)<br>
<b>Fix 'mvn site'</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10255">HADOOP-10255</a>.
Blocker bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Rename HttpServer to HttpServer2 to retain older HttpServer in branch-2 for compatibility</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10252">HADOOP-10252</a>.
Major bug reported by Jimmy Xiang and fixed by Jimmy Xiang <br>
<b>HttpServer can't start if hostname is not specified</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10250">HADOOP-10250</a>.
Major bug reported by Yongjun Zhang and fixed by Yongjun Zhang <br>
<b>VersionUtil returns wrong value when comparing two versions</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10248">HADOOP-10248</a>.
Major improvement reported by Ted Yu and fixed by Akira AJISAKA <br>
<b>Property name should be included in the exception where property value is null</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10240">HADOOP-10240</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (documentation)<br>
<b>Windows build instructions incorrectly state requirement of protoc 2.4.1 instead of 2.5.0</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10236">HADOOP-10236</a>.
Trivial bug reported by Akira AJISAKA and fixed by Akira AJISAKA <br>
<b>Fix typo in o.a.h.ipc.Client#checkResponse</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10235">HADOOP-10235</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>Hadoop tarball has 2 versions of stax-api JARs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10234">HADOOP-10234</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (scripts)<br>
<b>"hadoop.cmd jar" does not propagate exit code.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10228">HADOOP-10228</a>.
Minor improvement reported by Haohui Mai and fixed by Haohui Mai (fs)<br>
<b>FsPermission#fromShort() should cache FsAction.values()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10223">HADOOP-10223</a>.
Minor bug reported by Ted Yu and fixed by Ted Yu <br>
<b>MiniKdc#main() should close the FileReader it creates</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10214">HADOOP-10214</a>.
Major bug reported by Liang Xie and fixed by Liang Xie (ha)<br>
<b>Fix multithreaded correctness warnings in ActiveStandbyElector</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10212">HADOOP-10212</a>.
Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Incorrect compile command in Native Library document</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10208">HADOOP-10208</a>.
Trivial improvement reported by Benoy Antony and fixed by Benoy Antony <br>
<b>Remove duplicate initialization in StringUtils.getStringCollection</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10207">HADOOP-10207</a>.
Minor test reported by Jimmy Xiang and fixed by Jimmy Xiang <br>
<b>TestUserGroupInformation#testLogin is flaky</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10203">HADOOP-10203</a>.
Major bug reported by Andrei Savu and fixed by Andrei Savu (fs/s3)<br>
<b>Connection leak in Jets3tNativeFileSystemStore#retrieveMetadata </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10198">HADOOP-10198</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)<br>
<b>DomainSocket: add support for socketpair</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10193">HADOOP-10193</a>.
Minor bug reported by Gregory Chanan and fixed by Gregory Chanan (security)<br>
<b>hadoop-auth's PseudoAuthenticationHandler can consume getInputStream</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10178">HADOOP-10178</a>.
Major bug reported by shanyu zhao and fixed by shanyu zhao (conf)<br>
<b>Configuration deprecation always emit "deprecated" warnings when a new key is used</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10175">HADOOP-10175</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu (fs)<br>
<b>Har files system authority should preserve userinfo</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10173">HADOOP-10173</a>.
Critical improvement reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>Remove UGI from DIGEST-MD5 SASL server creation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10172">HADOOP-10172</a>.
Critical improvement reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>Cache SASL server factories</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10171">HADOOP-10171</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>TestRPC fails intermittently on jkd7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10169">HADOOP-10169</a>.
Minor improvement reported by Liang Xie and fixed by Liang Xie (metrics)<br>
<b>remove the unnecessary synchronized in JvmMetrics class</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10168">HADOOP-10168</a>.
Major bug reported by Thejas M Nair and fixed by Thejas M Nair <br>
<b>fix javadoc of ReflectionUtils.copy </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10167">HADOOP-10167</a>.
Major improvement reported by Mikhail Antonov and fixed by (build)<br>
<b>Mark hadoop-common source as UTF-8 in Maven pom files / refactoring</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10164">HADOOP-10164</a>.
Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans <br>
<b>Allow UGI to login with a known Subject</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10162">HADOOP-10162</a>.
Major bug reported by Mit Desai and fixed by Mit Desai <br>
<b>Fix symlink-related test failures in TestFileContextResolveAfs and TestStat in branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10147">HADOOP-10147</a>.
Minor bug reported by Eric Sirianni and fixed by Steve Loughran (build)<br>
<b>Upgrade to commons-logging 1.1.3 to avoid potential deadlock in MiniDFSCluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10146">HADOOP-10146</a>.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (util)<br>
<b>Workaround JDK7 Process fd close bug</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10143">HADOOP-10143</a>.
Major improvement reported by Liang Xie and fixed by Liang Xie (io)<br>
<b>replace WritableFactories's hashmap with ConcurrentHashMap</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10142">HADOOP-10142</a>.
Major bug reported by Vinayakumar B and fixed by Vinayakumar B <br>
<b>Avoid groups lookup for unprivileged users such as "dr.who"</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10135">HADOOP-10135</a>.
Major bug reported by David Dobbins and fixed by David Dobbins (fs)<br>
<b>writes to swift fs over partition size leave temp files and empty output file</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10132">HADOOP-10132</a>.
Minor improvement reported by Ted Yu and fixed by Ted Yu <br>
<b>RPC#stopProxy() should log the class of proxy when IllegalArgumentException is encountered</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10130">HADOOP-10130</a>.
Minor bug reported by Binglin Chang and fixed by Binglin Chang <br>
<b>RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10129">HADOOP-10129</a>.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (tools/distcp)<br>
<b>Distcp may succeed when it fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10127">HADOOP-10127</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (ipc)<br>
<b>Add ipc.client.connect.retry.interval to control the frequency of connection retries</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10126">HADOOP-10126</a>.
Minor bug reported by Vinayakumar B and fixed by Vinayakumar B (util)<br>
<b>LightWeightGSet log message is confusing : "2.0% max memory = 2.0 GB"</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10125">HADOOP-10125</a>.
Major bug reported by Ming Ma and fixed by Ming Ma (ipc)<br>
<b>no need to process RPC request if the client connection has been dropped</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10112">HADOOP-10112</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (tools)<br>
<b>har file listing doesn't work with wild card</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10111">HADOOP-10111</a>.
Major improvement reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Allow DU to be initialized with an initial value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10110">HADOOP-10110</a>.
Blocker bug reported by Chuan Liu and fixed by Chuan Liu (build)<br>
<b>hadoop-auth has a build break due to missing dependency</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10109">HADOOP-10109</a>.
Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (test)<br>
<b>Fix test failure in TestOfflineEditsViewer introduced by HADOOP-10052</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10107">HADOOP-10107</a>.
Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Kihwal Lee (ipc)<br>
<b>Server.getNumOpenConnections may throw NPE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10106">HADOOP-10106</a>.
Minor bug reported by Ming Ma and fixed by Ming Ma <br>
<b>Incorrect thread name in RPC log messages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10103">HADOOP-10103</a>.
Minor sub-task reported by Steve Loughran and fixed by Akira AJISAKA (build)<br>
<b>update commons-lang to 2.6</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10102">HADOOP-10102</a>.
Minor sub-task reported by Steve Loughran and fixed by Akira AJISAKA (build)<br>
<b>update commons IO from 2.1 to 2.4</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10100">HADOOP-10100</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter <br>
<b>MiniKDC shouldn't use apacheds-all artifact</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10095">HADOOP-10095</a>.
Minor improvement reported by Nicolas Liochon and fixed by Nicolas Liochon (io)<br>
<b>Performance improvement in CodecPool</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10094">HADOOP-10094</a>.
Trivial bug reported by Enis Soztutar and fixed by Enis Soztutar (util)<br>
<b>NPE in GenericOptionsParser#preProcessForWindows()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10093">HADOOP-10093</a>.
Major bug reported by shanyu zhao and fixed by shanyu zhao (conf)<br>
<b>hadoop-env.cmd sets HADOOP_CLIENT_OPTS with a max heap size that is too small.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10090">HADOOP-10090</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic (metrics)<br>
<b>Jobtracker metrics not updated properly after execution of a mapreduce job</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10088">HADOOP-10088</a>.
Major bug reported by Raja Aluri and fixed by Raja Aluri (build)<br>
<b>copy-nativedistlibs.sh needs to quote snappy lib dir</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10087">HADOOP-10087</a>.
Major bug reported by Yu Gao and fixed by Colin Patrick McCabe (security)<br>
<b>UserGroupInformation.getGroupNames() fails to return primary group first when JniBasedUnixGroupsMappingWithFallback is used</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10086">HADOOP-10086</a>.
Minor improvement reported by Masatake Iwasaki and fixed by Masatake Iwasaki (documentation)<br>
<b>User document for authentication in secure cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10081">HADOOP-10081</a>.
Critical bug reported by Jason Lowe and fixed by Tsuyoshi OZAWA (ipc)<br>
<b>Client.setupIOStreams can leak socket resources on exception or error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10079">HADOOP-10079</a>.
Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>log a warning message if group resolution takes too long.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10078">HADOOP-10078</a>.
Minor bug reported by Robert Kanter and fixed by Robert Kanter (security)<br>
<b>KerberosAuthenticator always does SPNEGO</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10072">HADOOP-10072</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (nfs , test)<br>
<b>TestNfsExports#testMultiMatchers fails due to non-deterministic timing around cache expiry check.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10067">HADOOP-10067</a>.
Minor improvement reported by Robert Rati and fixed by Robert Rati <br>
<b>Missing POM dependency on jsr305</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10064">HADOOP-10064</a>.
Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (build)<br>
<b>Upgrade to maven antrun plugin version 1.7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10058">HADOOP-10058</a>.
Minor bug reported by Akira AJISAKA and fixed by Chen He (metrics)<br>
<b>TestMetricsSystemImpl#testInitFirstVerifyStopInvokedImmediately fails on trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10055">HADOOP-10055</a>.
Trivial bug reported by Eli Collins and fixed by Akira AJISAKA (documentation)<br>
<b>FileSystemShell.apt.vm doc has typo "numRepicas" </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10052">HADOOP-10052</a>.
Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
<b>Temporarily disable client-side symlink resolution</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10047">HADOOP-10047</a>.
Major new feature reported by Gopal V and fixed by Gopal V (io)<br>
<b>Add a directbuffer Decompressor API to hadoop</b><br>
<blockquote>Direct Bytebuffer decompressors for Zlib (Deflate &amp; Gzip) and Snappy </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10046">HADOOP-10046</a>.
Trivial improvement reported by David S. Wang and fixed by David S. Wang <br>
<b>Print a log message when SSL is enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10040">HADOOP-10040</a>.
Major bug reported by Yingda Chen and fixed by Chris Nauroth <br>
<b>hadoop.cmd in UNIX format and would not run by default on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10039">HADOOP-10039</a>.
Major bug reported by Suresh Srinivas and fixed by Haohui Mai (security)<br>
<b>Add Hive to the list of projects using AbstractDelegationTokenSecretManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10031">HADOOP-10031</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu (fs)<br>
<b>FsShell -get/copyToLocal/moveFromLocal should support Windows local path</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10030">HADOOP-10030</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>FsShell -put/copyFromLocal should support Windows local path</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10029">HADOOP-10029</a>.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (fs)<br>
<b>Specifying har file to MR job fails in secure cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10028">HADOOP-10028</a>.
Minor bug reported by Jing Zhao and fixed by Haohui Mai <br>
<b>Malformed ssl-server.xml.example </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10006">HADOOP-10006</a>.
Blocker bug reported by Junping Du and fixed by Junping Du (fs , util)<br>
<b>Compilation failure in trunk for o.a.h.fs.swift.util.JSONUtil</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10005">HADOOP-10005</a>.
Trivial improvement reported by Jackie Chang and fixed by Jackie Chang <br>
<b>No need to check INFO severity level is enabled or not</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9998">HADOOP-9998</a>.
Major improvement reported by Junping Du and fixed by Junping Du (net)<br>
<b>Provide methods to clear only part of the DNSToSwitchMapping</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9982">HADOOP-9982</a>.
Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)<br>
<b>Fix dead links in hadoop site docs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9981">HADOOP-9981</a>.
Critical bug reported by Kihwal Lee and fixed by Colin Patrick McCabe <br>
<b>globStatus should minimize its listStatus and getFileStatus calls</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9964">HADOOP-9964</a>.
Major bug reported by Junping Du and fixed by Junping Du (util)<br>
<b>O.A.H.U.ReflectionUtils.printThreadInfo() is not thread-safe which cause TestHttpServer pending 10 minutes or longer.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9956">HADOOP-9956</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>RPC listener inefficiently assigns connections to readers</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9955">HADOOP-9955</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>RPC idle connection closing is extremely inefficient</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9929">HADOOP-9929</a>.
Major bug reported by Jason Lowe and fixed by Colin Patrick McCabe (fs)<br>
<b>Insufficient permissions for a path reported as file not found</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9915">HADOOP-9915</a>.
Trivial improvement reported by Binglin Chang and fixed by Binglin Chang <br>
<b>o.a.h.fs.Stat support on Macosx</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9909">HADOOP-9909</a>.
Major improvement reported by Shinichi Yamashita and fixed by (fs)<br>
<b>org.apache.hadoop.fs.Stat should permit other LANG</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9908">HADOOP-9908</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (util)<br>
<b>Fix NPE when versioninfo properties file is missing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9898">HADOOP-9898</a>.
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc , net)<br>
<b>Set SO_KEEPALIVE on all our sockets</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9897">HADOOP-9897</a>.
Trivial improvement reported by Binglin Chang and fixed by Binglin Chang (fs)<br>
<b>Add method to get path start position without drive specifier in o.a.h.fs.Path </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9889">HADOOP-9889</a>.
Major bug reported by Wei Yan and fixed by Wei Yan <br>
<b>Refresh the Krb5 configuration when creating a new kdc in Hadoop-MiniKDC</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9887">HADOOP-9887</a>.
Major bug reported by Chris Nauroth and fixed by Chuan Liu (fs)<br>
<b>globStatus does not correctly handle paths starting with a drive spec on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9875">HADOOP-9875</a>.
Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)<br>
<b>TestDoAsEffectiveUser can fail on JDK 7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9871">HADOOP-9871</a>.
Minor bug reported by Luke Lu and fixed by Junping Du <br>
<b>Fix intermittent findbug warnings in DefaultMetricsSystem</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9866">HADOOP-9866</a>.
Major test reported by Alejandro Abdelnur and fixed by Wei Yan (test)<br>
<b>convert hadoop-auth testcases requiring kerberos to use minikdc</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9865">HADOOP-9865</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>FileContext.globStatus() has a regression with respect to relative path</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9860">HADOOP-9860</a>.
Major improvement reported by Wei Yan and fixed by Wei Yan <br>
<b>Remove class HackedKeytab and HackedKeytabEncoder from hadoop-minikdc once jira DIRSERVER-1882 solved</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9848">HADOOP-9848</a>.
Major new feature reported by Wei Yan and fixed by Wei Yan (security , test)<br>
<b>Create a MiniKDC for use with security testing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9847">HADOOP-9847</a>.
Minor bug reported by Andrew Wang and fixed by Colin Patrick McCabe <br>
<b>TestGlobPath symlink tests fail to cleanup properly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9833">HADOOP-9833</a>.
Minor improvement reported by Steve Loughran and fixed by Kousuke Saruta (build)<br>
<b>move slf4j to version 1.7.5</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9830">HADOOP-9830</a>.
Trivial bug reported by Dmitry Lysnichenko and fixed by Kousuke Saruta (documentation)<br>
<b>Typo at http://hadoop.apache.org/docs/current/</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9820">HADOOP-9820</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)<br>
<b>RPCv9 wire protocol is insufficient to support multiplexing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9817">HADOOP-9817</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>FileSystem#globStatus and FileContext#globStatus need to work with symlinks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9806">HADOOP-9806</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>PortmapInterface should check if the procedure is out-of-range</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9791">HADOOP-9791</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>Add a test case covering long paths for new FileUtil access check methods</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9787">HADOOP-9787</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (util)<br>
<b>ShutdownHelper util to shutdown threads and threadpools</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9784">HADOOP-9784</a>.
Major improvement reported by Junping Du and fixed by Junping Du <br>
<b>Add a builder for HttpServer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9748">HADOOP-9748</a>.
Critical sub-task reported by Daryn Sharp and fixed by Daryn Sharp (security)<br>
<b>Reduce blocking on UGI.ensureInitialized</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9703">HADOOP-9703</a>.
Minor bug reported by Mark Miller and fixed by Tsuyoshi OZAWA <br>
<b>org.apache.hadoop.ipc.Client leaks threads on stop.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9698">HADOOP-9698</a>.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>RPCv9 client must honor server's SASL negotiate response</b><br>
<blockquote>The RPC client now waits for the Server's SASL negotiate response before instantiating its SASL client.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9693">HADOOP-9693</a>.
Trivial improvement reported by Steve Loughran and fixed by <br>
<b>Shell should add a probe for OSX</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9686">HADOOP-9686</a>.
Major improvement reported by Jason Lowe and fixed by Jason Lowe (conf)<br>
<b>Easy access to final parameters in Configuration</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9683">HADOOP-9683</a>.
Blocker sub-task reported by Luke Lu and fixed by Daryn Sharp (ipc)<br>
<b>Wrap IpcConnectionContext in RPC headers</b><br>
<blockquote>Connection context is now sent as a rpc header wrapped protobuf.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9660">HADOOP-9660</a>.
Major bug reported by Enis Soztutar and fixed by Enis Soztutar (scripts , util)<br>
<b>[WINDOWS] Powershell / cmd parses -Dkey=value from command line as [-Dkey, value] which breaks GenericsOptionParser</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9652">HADOOP-9652</a>.
Major improvement reported by Colin Patrick McCabe and fixed by Andrew Wang <br>
<b>Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9635">HADOOP-9635</a>.
Major bug reported by V. Karthik Kumar and fixed by (native)<br>
<b>Fix Potential Stack Overflow in DomainSocket.c</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9623">HADOOP-9623</a>.
Major improvement reported by Timothy St. Clair and fixed by Amandeep Khurana (fs/s3)<br>
<b>Update jets3t dependency to 0.9.0 </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9618">HADOOP-9618</a>.
Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (util)<br>
<b>Add thread which detects JVM pauses</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9611">HADOOP-9611</a>.
Major improvement reported by Timothy St. Clair and fixed by Timothy St. Clair (build)<br>
<b>mvn-rpmbuild against google-guice &gt; 3.0 yields missing cglib dependency</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9598">HADOOP-9598</a>.
Major test reported by Aleksey Gorshkov and fixed by Andrey Klochkov <br>
<b>Improve code coverage of RMAdminCLI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9594">HADOOP-9594</a>.
Major improvement reported by Timothy St. Clair and fixed by Timothy St. Clair (build)<br>
<b>Update apache commons math dependency</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9582">HADOOP-9582</a>.
Major bug reported by Ashwin Shankar and fixed by Ashwin Shankar (conf)<br>
<b>Non-existent file to "hadoop fs -conf" doesn't throw error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9527">HADOOP-9527</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (fs , test)<br>
<b>Add symlink support to LocalFileSystem on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9515">HADOOP-9515</a>.
Major new feature reported by Brandon Li and fixed by Brandon Li <br>
<b>Add general interface for NFS and Mount</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9509">HADOOP-9509</a>.
Major new feature reported by Brandon Li and fixed by Brandon Li <br>
<b>Implement ONCRPC and XDR</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9494">HADOOP-9494</a>.
Major improvement reported by Dennis Y and fixed by Andrey Klochkov <br>
<b>Excluded auto-generated and examples code from clover reports</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9487">HADOOP-9487</a>.
Major improvement reported by Steve Loughran and fixed by (conf)<br>
<b>Deprecation warnings in Configuration should go to their own log or otherwise be suppressible</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9470">HADOOP-9470</a>.
Major improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky (test)<br>
<b>eliminate duplicate FQN tests in different Hadoop modules</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9432">HADOOP-9432</a>.
Minor new feature reported by Steve Loughran and fixed by (build , documentation)<br>
<b>Add support for markdown .md files in site documentation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9421">HADOOP-9421</a>.
Blocker sub-task reported by Sanjay Radia and fixed by Daryn Sharp <br>
<b>Convert SASL to use ProtoBuf and provide negotiation capabilities</b><br>
<blockquote>Raw SASL protocol now uses protobufs wrapped with RPC headers.
The negotiation sequence incorporates the state of the exchange.
The server now has the ability to advertise its supported auth types.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9420">HADOOP-9420</a>.
Major bug reported by Todd Lipcon and fixed by Liang Xie (ipc , metrics)<br>
<b>Add percentile or max metric for rpcQueueTime, processing time</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9417">HADOOP-9417</a>.
Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
<b>Support for symlink resolution in LocalFileSystem / RawLocalFileSystem</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9350">HADOOP-9350</a>.
Minor bug reported by Steve Loughran and fixed by Robert Kanter (build)<br>
<b>Hadoop not building against Java7 on OSX </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9319">HADOOP-9319</a>.
Major improvement reported by Arpit Agarwal and fixed by Binglin Chang <br>
<b>Update bundled lz4 source to latest version</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9291">HADOOP-9291</a>.
Major test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky <br>
<b>enhance unit-test coverage of package o.a.h.metrics2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9254">HADOOP-9254</a>.
Major test reported by Vadim Bondarev and fixed by Vadim Bondarev <br>
<b>Cover packages org.apache.hadoop.util.bloom, org.apache.hadoop.util.hash</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9241">HADOOP-9241</a>.
Trivial improvement reported by Harsh J and fixed by Harsh J <br>
<b>DU refresh interval is not configurable</b><br>
<blockquote>The 'du' (disk usage command from Unix) script refresh monitor is now configurable in the same way as its 'df' counterpart, via the property 'fs.du.interval', the default of which is 10 minute (in ms).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9225">HADOOP-9225</a>.
Major test reported by Vadim Bondarev and fixed by Andrey Klochkov <br>
<b>Cover package org.apache.hadoop.compress.Snappy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9199">HADOOP-9199</a>.
Major test reported by Vadim Bondarev and fixed by Andrey Klochkov <br>
<b>Cover package org.apache.hadoop.io with unit tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9114">HADOOP-9114</a>.
Minor bug reported by liuyang and fixed by sathish <br>
<b>After defined the dfs.checksum.type as the NULL, write file and hflush will through java.lang.ArrayIndexOutOfBoundsException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9078">HADOOP-9078</a>.
Major test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky <br>
<b>enhance unit-test coverage of class org.apache.hadoop.fs.FileContext</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9063">HADOOP-9063</a>.
Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky <br>
<b>enhance unit-test coverage of class org.apache.hadoop.fs.FileUtil</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9016">HADOOP-9016</a>.
Minor bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky <br>
<b>org.apache.hadoop.fs.HarFileSystem.HarFSDataInputStream.HarFsInputStream.skip(long) must never return negative value.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8814">HADOOP-8814</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li (conf , fs , fs/s3 , ha , io , metrics , performance , record , security , util)<br>
<b>Inefficient comparison with the empty string. Use isEmpty() instead</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8753">HADOOP-8753</a>.
Minor bug reported by Nishan Shetty, Huawei and fixed by Benoy Antony <br>
<b>LocalDirAllocator throws "ArithmeticException: / by zero" when there is no available space on configured local dir</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8704">HADOOP-8704</a>.
Major improvement reported by Thomas Graves and fixed by Jonathan Eagles <br>
<b>add request logging to jetty/httpserver</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8545">HADOOP-8545</a>.
Major new feature reported by Tim Miller and fixed by Dmitry Mezhensky (fs)<br>
<b>Filesystem Implementation for OpenStack Swift</b><br>
<blockquote>Added file system implementation for OpenStack Swift.
There are two implementation: block and native (similar to Amazon S3 integration).
Data locality issue solved by patch in Swift, commit procedure to OpenStack is in progress.
To use implementation add to core-site.xml following:
...
&lt;property&gt;
&lt;name&gt;fs.swift.impl&lt;/name&gt;
&lt;value&gt;com.mirantis.fs.SwiftFileSystem&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.swift.block.impl&lt;/name&gt;
&lt;value&gt;com.mirantis.fs.block.SwiftBlockFileSystem&lt;/value&gt;
&lt;/property&gt;
...
In MapReduce job specify following configs for OpenStack Keystone authentication:
conf.set("swift.auth.url", "http://172.18.66.117:5000/v2.0/tokens");
conf.set("swift.tenant", "superuser");
conf.set("swift.username", "admin1");
conf.set("swift.password", "password");
conf.setInt("swift.http.port", 8080);
conf.setInt("swift.https.port", 443);
Additional information specified on github: https://github.com/DmitryMezhensky/Hadoop-and-Swift-integration</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7344">HADOOP-7344</a>.
Major bug reported by Daryn Sharp and fixed by Colin Patrick McCabe (fs)<br>
<b>globStatus doesn't grok groupings with a slash</b><br>
<blockquote></blockquote></li>
</ul>
</body></html>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.2.0 Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.2.0 Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.1.1-beta</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1278">YARN-1278</a>.
Blocker bug reported by Yesha Vora and fixed by Hitesh Shah <br>
<b>New AM does not start after rm restart</b><br>
<blockquote>The new AM fails to start after RM restarts. It fails to start new Application master and job fails with below error.
/usr/bin/mapred job -status job_1380985373054_0001
13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at hostname
Job: job_1380985373054_0001
Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
Job Tracking URL : http://hostname:8088/cluster/app/application_1380985373054_0001
Uber job : false
Number of maps: 0
Number of reduces: 0
map() completion: 0.0
reduce() completion: 0.0
Job state: FAILED
retired: false
reason for failure: There are no failed tasks for the job. Job is failed due to some other reason and reason can be found in the logs.
Counters: 0</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1277">YARN-1277</a>.
Major sub-task reported by Suresh Srinivas and fixed by Omkar Vinit Joshi <br>
<b>Add http policy support for YARN daemons</b><br>
<blockquote>This YARN part of HADOOP-10022.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1274">YARN-1274</a>.
Blocker bug reported by Alejandro Abdelnur and fixed by Siddharth Seth (nodemanager)<br>
<b>LCE fails to run containers that don't have resources to localize</b><br>
<blockquote>LCE container launch assumes the usercache/USER directory exists and it is owned by the user running the container process.
But the directory is created only if there are resources to localize by the LCE localization command, if there are not resourcdes to localize, LCE localization never executes and launching fails reporting 255 exit code and the NM logs have something like:
{code}
2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 1
2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is llama
2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_000004 - Permission denied
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1273">YARN-1273</a>.
Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Distributed shell does not account for start container failures reported asynchronously.</b><br>
<blockquote>2013-10-04 22:09:15,234 ERROR [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] distributedshell.ApplicationMaster (ApplicationMaster.java:onStartContainerError(719)) - Failed to start Container container_1380920347574_0018_01_000006</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1271">YARN-1271</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>"Text file busy" errors launching containers again</b><br>
<blockquote>The error is shown below in the comments.
MAPREDUCE-2374 fixed this by removing "-c" when running the container launch script. It looks like the "-c" got brought back during the windows branch merge, so we should remove it again.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1262">YARN-1262</a>.
Major bug reported by Sandy Ryza and fixed by Karthik Kambatla <br>
<b>TestApplicationCleanup relies on all containers assigned in a single heartbeat</b><br>
<blockquote>TestApplicationCleanup submits container requests and waits for allocations to come in. It only sends a single node heartbeat to the node, expecting multiple containers to be assigned on this heartbeat, which not all schedulers do by default.
This is causing the test to fail when run with the Fair Scheduler.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1260">YARN-1260</a>.
Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi <br>
<b>RM_HOME link breaks when webapp.https.address related properties are not specified</b><br>
<blockquote>This issue happens in multiple node cluster where resource manager and node manager are running on different machines.
Steps to reproduce:
1) set yarn.resourcemanager.hostname = &lt;resourcemanager host&gt; in yarn-site.xml
2) set hadoop.ssl.enabled = true in core-site.xml
3) Do not specify below property in yarn-site.xml
yarn.nodemanager.webapp.https.address and yarn.resourcemanager.webapp.https.address
Here, the default value of above two property will be considered.
4) Go to nodemanager web UI "https://&lt;nodemanager host&gt;:8044/node"
5) Click on RM_HOME link
This link redirects to "https://&lt;nodemanager host&gt;:8090/cluster" instead "https://&lt;resourcemanager host&gt;:8090/cluster"
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1256">YARN-1256</a>.
Critical sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
<b>NM silently ignores non-existent service in StartContainerRequest</b><br>
<blockquote>A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1254">YARN-1254</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>NM is polluting container's credentials</b><br>
<blockquote>Before launching the container, NM is using the same credential object and so is polluting what container should see. We should fix this.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1251">YARN-1251</a>.
Major bug reported by Junping Du and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>TestDistributedShell#TestDSShell failed with timeout</b><br>
<blockquote>TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently recently.
The Stacktrace is:
{code}
java.lang.Exception: test timed out after 90000 milliseconds
at com.google.protobuf.LiteralByteString.&lt;init&gt;(LiteralByteString.java:234)
at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255)
at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286)
at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462)
at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84)
at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989)
at org.apache.hadoop.ipc.Client.call(Client.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1357)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy70.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at $Proxy71.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195)
at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622)
at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597)
at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125)
{code}
For details, please refer:
https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1247">YARN-1247</a>.
Major bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (nodemanager)<br>
<b>test-container-executor has gotten out of sync with the changes to container-executor</b><br>
<blockquote>If run under the super-user account test-container-executor.c fails in multiple different places. It would be nice to fix it so that we have better testing of LCE functionality.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1246">YARN-1246</a>.
Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>Log application status in the rm log when app is done running</b><br>
<blockquote>Since there is no yarn history server it becomes difficult to determine what the status of an old application is. One has to be familiar with the state transition in yarn to know what means a success.
We should add a log at info level that captures what the finalStatus of an app is. This would be helpful while debugging applications if the RM has restarted and we no longer can use the UI.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1236">YARN-1236</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)<br>
<b>FairScheduler setting queue name in RMApp is not working </b><br>
<blockquote>The fair scheduler sometimes picks a different queue than the one an application was submitted to, such as when user-as-default-queue is turned on. It needs to update the queue name in the RMApp so that this choice will be reflected in the UI.
This isn't working because the scheduler is looking up the RMApp by application attempt id instead of app id and failing to find it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1229">YARN-1229</a>.
Blocker bug reported by Tassapol Athiapinya and fixed by Xuan Gong (nodemanager)<br>
<b>Define constraints on Auxiliary Service names. Change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.</b><br>
<blockquote>I run sleep job. If AM fails to start, this exception could occur:
13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_000001 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_000001/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
': not a valid identifier
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
.Failing this attempt.. Failing the application.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1228">YARN-1228</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Clean up Fair Scheduler configuration loading</b><br>
<blockquote>Currently the Fair Scheduler is configured in two ways
* An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties.
* With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format.
The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml.
It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary.
Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems
1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default.
2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified.
We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1221">YARN-1221</a>.
Major bug reported by Sandy Ryza and fixed by Siqi Li (resourcemanager , scheduler)<br>
<b>With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1219">YARN-1219</a>.
Major bug reported by shanyu zhao and fixed by shanyu zhao (nodemanager)<br>
<b>FSDownload changes file suffix making FileUtil.unTar() throw exception</b><br>
<blockquote>While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into ".tmp" before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is "gzipped" by looking at the file suffix:
{code}
boolean gzipped = inFile.toString().endsWith("gz");
{code}
To fix this problem, we can remove the ".tmp" in the temp file name.
Here is the detailed exception:
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240)
at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676)
at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625)
at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1215">YARN-1215</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu (api)<br>
<b>Yarn URL should include userinfo</b><br>
<blockquote>In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an userinfo as part of the URL. When converting a {{java.net.URI}} object into the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will set uri host as the url host. If the uri has a userinfo part, the userinfo is discarded. This will lead to information loss if the original uri has the userinfo, e.g. foo://username:password@example.com will be converted to foo://example.com and username/password information is lost during the conversion.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1214">YARN-1214</a>.
Critical sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Register ClientToken MasterKey in SecretManager after it is saved</b><br>
<blockquote>Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token.
We can register the client token master key after it is saved in the store.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1213">YARN-1213</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Restore config to ban submitting to undeclared pools in the Fair Scheduler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1204">YARN-1204</a>.
Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi <br>
<b>Need to add https port related property in Yarn</b><br>
<blockquote>There is no yarn property available to configure https port for Resource manager, nodemanager and history server. Currently, Yarn services uses the port defined for http [defined by 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 'yarn.resourcemanager.webapp.address'] for running services on https protocol.
Yarn should have list of property to assign https port for RM, NM and JHS.
It can be like below.
yarn.nodemanager.webapp.https.address
yarn.resourcemanager.webapp.https.address
mapreduce.jobhistory.webapp.https.address </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1203">YARN-1203</a>.
Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi <br>
<b>Application Manager UI does not appear with Https enabled</b><br>
<blockquote>Need to add support to disable 'hadoop.ssl.enabled' for MR jobs.
A job should be able to run on http protocol by setting 'hadoop.ssl.enabled' property at job level.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1167">YARN-1167</a>.
Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>Submitted distributed shell application shows appMasterHost = empty</b><br>
<blockquote>Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty.
==console logs==
distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED,
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1157">YARN-1157</a>.
Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (resourcemanager)<br>
<b>ResourceManager UI has invalid tracking URL link for distributed shell application</b><br>
<blockquote>Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1149">YARN-1149</a>.
Major bug reported by Ramya Sunil and fixed by Xuan Gong <br>
<b>NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING</b><br>
<blockquote>When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
{noformat}
2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118
2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/&lt;host&gt;_45454.tmp
2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118
2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_000004. Current good log dirs are /tmp/yarn/local
2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118
2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
at java.lang.Thread.run(Thread.java:662)
2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null
2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040
{noformat}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1141">YARN-1141</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Updating resource requests should be decoupled with updating blacklist</b><br>
<blockquote>Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1131">YARN-1131</a>.
Minor sub-task reported by Tassapol Athiapinya and fixed by Siddharth Seth (client)<br>
<b>$yarn logs command should return an appropriate error message if YARN application is still running</b><br>
<blockquote>In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId &lt;app ID&gt; while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress.
{code}
-bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
-bash-4.1$
{code}
At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException.
{code}
$ /usr/bin/yarn logs -applicationId application_00000
Exception in thread "main" java.util.NoSuchElementException
at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1128">YARN-1128</a>.
Major bug reported by Sandy Ryza and fixed by Karthik Kambatla (scheduler)<br>
<b>FifoPolicy.computeShares throws NPE on empty list of Schedulables</b><br>
<blockquote>FifoPolicy gives all of a queue's share to the earliest-scheduled application.
{code}
Schedulable earliest = null;
for (Schedulable schedulable : schedulables) {
if (earliest == null ||
schedulable.getStartTime() &lt; earliest.getStartTime()) {
earliest = schedulable;
}
}
earliest.setFairShare(Resources.clone(totalResources));
{code}
If the queue has no schedulables in it, earliest will be left null, leading to an NPE on the last line.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1090">YARN-1090</a>.
Major bug reported by Yesha Vora and fixed by Jian He <br>
<b>Job does not get into Pending State</b><br>
<blockquote>When there is no resource available to run a job, next job should go in pending state. RM UI should show next job as pending app and the counter for the pending app should be incremented.
But Currently. Next job stays in ACCEPTED state and No AM has been assigned to this job.Though Pending App count is not incremented.
Running 'job status &lt;nextjob&gt;' shows job state=PREP.
$ mapred job -status job_1377122233385_0002
13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at host1/ip1
Job: job_1377122233385_0002
Job File: /ABC/.staging/job_1377122233385_0002/job.xml
Job Tracking URL : http://host1:port1/application_1377122233385_0002/
Uber job : false
Number of maps: 0
Number of reduces: 0
map() completion: 0.0
reduce() completion: 0.0
Job state: PREP
retired: false
reason for failure:</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1070">YARN-1070</a>.
Major sub-task reported by Hitesh Shah and fixed by Zhijie Shen (nodemanager)<br>
<b>ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1032">YARN-1032</a>.
Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu <br>
<b>NPE in RackResolve</b><br>
<blockquote>We found a case where our rack resolve script was not returning rack due to problem with resolving host address. This exception was see in RackResolver.java as NPE, ultimately caught in RMContainerAllocator.
{noformat}
2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM.
java.lang.NullPointerException
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
at java.lang.Thread.run(Thread.java:722)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-899">YARN-899</a>.
Major sub-task reported by Sandy Ryza and fixed by Xuan Gong (scheduler)<br>
<b>Get queue administration ACLs working</b><br>
<blockquote>The Capacity Scheduler documents the yarn.scheduler.capacity.root.&lt;queue-path&gt;.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-890">YARN-890</a>.
Major bug reported by Trupti Dhavle and fixed by Xuan Gong (resourcemanager)<br>
<b>The roundup for memory values on resource manager UI is misleading</b><br>
<blockquote>
From the yarn-site.xml, I see following values-
&lt;property&gt;
&lt;name&gt;yarn.nodemanager.resource.memory-mb&lt;/name&gt;
&lt;value&gt;4192&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.scheduler.maximum-allocation-mb&lt;/name&gt;
&lt;value&gt;4192&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.scheduler.minimum-allocation-mb&lt;/name&gt;
&lt;value&gt;1024&lt;/value&gt;
&lt;/property&gt;
However the resourcemanager UI shows total memory as 5MB
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-876">YARN-876</a>.
Major bug reported by PengZhang and fixed by PengZhang (resourcemanager)<br>
<b>Node resource is added twice when node comes back from unhealthy to healthy</b><br>
<blockquote>When an unhealthy restarts, its resource maybe added twice in scheduler.
First time is at node's reconnection, while node's final state is still "UNHEALTHY".
And second time is at node's update, while node's state changing from "UNHEALTHY" to "HEALTHY".</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-621">YARN-621</a>.
Critical sub-task reported by Allen Wittenauer and fixed by Omkar Vinit Joshi (resourcemanager)<br>
<b>RM triggers web auth failure before first job</b><br>
<blockquote>On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-49">YARN-49</a>.
Major sub-task reported by Hitesh Shah and fixed by Vinod Kumar Vavilapalli (applications/distributed-shell)<br>
<b>Improve distributed shell application to work on a secure cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5562">MAPREDUCE-5562</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>MR AM should exit when unregister() throws exception</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5554">MAPREDUCE-5554</a>.
Minor bug reported by Robert Kanter and fixed by Robert Kanter (test)<br>
<b>hdfs-site.xml included in hadoop-mapreduce-client-jobclient tests jar is breaking tests for downstream components</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5551">MAPREDUCE-5551</a>.
Blocker sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Binary Incompatibility of O.A.H.U.mapred.SequenceFileAsBinaryOutputFormat.WritableValueBytes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5545">MAPREDUCE-5545</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter <br>
<b>org.apache.hadoop.mapred.TestTaskAttemptListenerImpl.testCommitWindow times out</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5544">MAPREDUCE-5544</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>JobClient#getJob loads job conf twice</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5538">MAPREDUCE-5538</a>.
Blocker sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>MRAppMaster#shutDownJob shouldn't send job end notification before checking isLastRetry</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5536">MAPREDUCE-5536</a>.
Blocker bug reported by Yesha Vora and fixed by Omkar Vinit Joshi <br>
<b>mapreduce.jobhistory.webapp.https.address property is not respected</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5533">MAPREDUCE-5533</a>.
Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applicationmaster)<br>
<b>Speculative execution does not function for reduce</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5531">MAPREDUCE-5531</a>.
Blocker sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv1 , mrv2)<br>
<b>Binary and source incompatibility in mapreduce.TaskID and mapreduce.TaskAttemptID between branch-1 and branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5530">MAPREDUCE-5530</a>.
Blocker sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv1 , mrv2)<br>
<b>Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5529">MAPREDUCE-5529</a>.
Blocker sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv1 , mrv2)<br>
<b>Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5525">MAPREDUCE-5525</a>.
Minor test reported by Chuan Liu and fixed by Chuan Liu (mrv2 , test)<br>
<b>Increase timeout of TestDFSIO.testAppend and TestMRJobsWithHistoryService.testJobHistoryData</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5523">MAPREDUCE-5523</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Need to add https port related property in Job history server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5515">MAPREDUCE-5515</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Application Manager UI does not appear with Https enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5513">MAPREDUCE-5513</a>.
Major bug reported by Jason Lowe and fixed by Robert Parker <br>
<b>ConcurrentModificationException in JobControl</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5505">MAPREDUCE-5505</a>.
Critical sub-task reported by Jian He and fixed by Zhijie Shen <br>
<b>Clients should be notified job finished only after job successfully unregistered </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5503">MAPREDUCE-5503</a>.
Blocker bug reported by Jason Lowe and fixed by Jian He (mrv2)<br>
<b>TestMRJobClient.testJobClient is failing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5489">MAPREDUCE-5489</a>.
Critical bug reported by Yesha Vora and fixed by Zhijie Shen <br>
<b>MR jobs hangs as it does not use the node-blacklisting feature in RM requests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5488">MAPREDUCE-5488</a>.
Major bug reported by Arpit Gupta and fixed by Jian He <br>
<b>Job recovery fails after killing all the running containers for the app</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5459">MAPREDUCE-5459</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Update the doc of running MRv1 examples jar on YARN</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5442">MAPREDUCE-5442</a>.
Major bug reported by Yingda Chen and fixed by Yingda Chen (client)<br>
<b>$HADOOP_MAPRED_HOME/$HADOOP_CONF_DIR setting not working on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5170">MAPREDUCE-5170</a>.
Trivial bug reported by Sangjin Lee and fixed by Sangjin Lee (mrv2)<br>
<b>incorrect exception message if min node size &gt; min rack size</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5308">HDFS-5308</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Replace HttpConfig#getSchemePrefix with implicit schemes in HDFS JSP </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5306">HDFS-5306</a>.
Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (datanode , namenode)<br>
<b>Datanode https port is not available at the namenode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5300">HDFS-5300</a>.
Major bug reported by Vinay and fixed by Vinay (namenode)<br>
<b>FSNameSystem#deleteSnapshot() should not check owner in case of permissions disabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5299">HDFS-5299</a>.
Blocker bug reported by Vinay and fixed by Vinay (namenode)<br>
<b>DFS client hangs in updatePipeline RPC when failover happened</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5289">HDFS-5289</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)<br>
<b>Race condition in TestRetryCacheWithHA#testCreateSymlink causes spurious test failure</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5279">HDFS-5279</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>Guard against NullPointerException in NameNode JSP pages before initialization of FSNamesystem.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5268">HDFS-5268</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>NFS write commit verifier is not set in a few places</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5265">HDFS-5265</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Namenode fails to start when dfs.https.port is unspecified</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5259">HDFS-5259</a>.
Major sub-task reported by Yesha Vora and fixed by Brandon Li (nfs)<br>
<b>Support client which combines appended data with old data before sends it to NFS server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5258">HDFS-5258</a>.
Minor bug reported by Chris Nauroth and fixed by Chuan Liu (test)<br>
<b>Skip tests in TestHDFSCLI that are not applicable on Windows.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5256">HDFS-5256</a>.
Major improvement reported by Haohui Mai and fixed by Haohui Mai (nfs)<br>
<b>Use guava LoadingCache to implement DFSClientCache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5255">HDFS-5255</a>.
Major bug reported by Yesha Vora and fixed by Arpit Agarwal <br>
<b>Distcp job fails with hsftp when https is enabled in insecure cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5251">HDFS-5251</a>.
Major bug reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Race between the initialization of NameNode and the http server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5246">HDFS-5246</a>.
Major sub-task reported by Jinghui Wang and fixed by Jinghui Wang (nfs)<br>
<b>Make Hadoop nfs server port and mount daemon port configurable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5230">HDFS-5230</a>.
Major sub-task reported by Haohui Mai and fixed by Haohui Mai (nfs)<br>
<b>Introduce RpcInfo to decouple XDR classes from the RPC API</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5228">HDFS-5228</a>.
Blocker bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs-client)<br>
<b>The RemoteIterator returned by DistributedFileSystem.listFiles(..) may throw NPE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5186">HDFS-5186</a>.
Minor test reported by Chuan Liu and fixed by Chuan Liu (namenode , test)<br>
<b>TestFileJournalManager fails on Windows due to file handle leaks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5139">HDFS-5139</a>.
Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (tools)<br>
<b>Remove redundant -R option from setrep</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5031">HDFS-5031</a>.
Blocker bug reported by Vinay and fixed by Vinay (datanode)<br>
<b>BlockScanner scans the block multiple times and on restart scans everything</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4817">HDFS-4817</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>make HDFS advisory caching configurable on a per-file basis</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10020">HADOOP-10020</a>.
Blocker sub-task reported by Colin Patrick McCabe and fixed by Sanjay Radia (fs)<br>
<b>disable symlinks temporarily</b><br>
<blockquote>During review of symbolic links, many issues were found related impact on semantics of existing APIs such FileSystem#listStatus, FileSystem#globStatus etc. There were also many issues brought up about symbolic links and the impact on security and functionality of HDFS. All these issues will be address in the upcoming release 2.3. Until then the feature is temporarily disabled.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10017">HADOOP-10017</a>.
Major sub-task reported by Jing Zhao and fixed by Haohui Mai <br>
<b>Fix NPE in DFSClient#getDelegationToken when doing Distcp from a secured cluster to an insecured cluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10012">HADOOP-10012</a>.
Blocker bug reported by Arpit Gupta and fixed by Suresh Srinivas (ha)<br>
<b>Secure Oozie jobs fail with delegation token renewal exception in Namenode HA setup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-10003">HADOOP-10003</a>.
Major bug reported by Jason Dere and fixed by (fs)<br>
<b>HarFileSystem.listLocatedStatus() fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9976">HADOOP-9976</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Different versions of avro and avro-maven-plugin</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9948">HADOOP-9948</a>.
Minor test reported by Chuan Liu and fixed by Chuan Liu (test)<br>
<b>Add a config value to CLITestHelper to skip tests on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9776">HADOOP-9776</a>.
Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)<br>
<b>HarFileSystem.listStatus() returns invalid authority if port number is empty</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9761">HADOOP-9761</a>.
Blocker bug reported by Andrew Wang and fixed by Andrew Wang (viewfs)<br>
<b>ViewFileSystem#rename fails when using DistributedFileSystem</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9758">HADOOP-9758</a>.
Major improvement reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Provide configuration option for FileSystem/FileContext symlink resolution</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8315">HADOOP-8315</a>.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (auto-failover , ha)<br>
<b>Support SASL-authenticated ZooKeeper in ActiveStandbyElector</b><br>
<blockquote></blockquote></li>
</ul>
</body></html>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.1.1-beta Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.1.1-beta Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.1.0-beta</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1194">YARN-1194</a>.
Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (nodemanager)<br>
<b>TestContainerLogsPage fails with native builds</b><br>
<blockquote>Running TestContainerLogsPage on trunk while Native IO is enabled makes it fail</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1189">YARN-1189</a>.
Blocker bug reported by Jason Lowe and fixed by Omkar Vinit Joshi <br>
<b>NMTokenSecretManagerInNM is not being told when applications have finished </b><br>
<blockquote>The {{appFinished}} method is not being called when applications have finished. This causes a couple of leaks as {{oldMasterKeys}} and {{appToAppAttemptMap}} are never being pruned.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1184">YARN-1184</a>.
Major bug reported by J.Andreina and fixed by Chris Douglas (capacityscheduler , resourcemanager)<br>
<b>ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA</b><br>
<blockquote>preemption is enabled.
Queue = a,b
a capacity = 30%
b capacity = 70%
Step 1: Assign a big job to queue a ( so that job_a will utilize some resources from queue b)
Step 2: Assigne a big job to queue b.
Following exception is thrown at Resource Manager
{noformat}
2013-09-12 10:42:32,535 ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception.
java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be cast to java.util.NavigableSet
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
at java.lang.Thread.run(Thread.java:662)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1176">YARN-1176</a>.
Critical bug reported by Thomas Graves and fixed by Jonathan Eagles (resourcemanager)<br>
<b>RM web services ClusterMetricsInfo total nodes doesn't include unhealthy nodes</b><br>
<blockquote>In the web services api for the cluster/metrics, the totalNodes reported doesn't include the unhealthy nodes.
this.totalNodes = activeNodes + lostNodes + decommissionedNodes
+ rebootedNodes;</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1170">YARN-1170</a>.
Blocker bug reported by Arun C Murthy and fixed by Binglin Chang <br>
<b>yarn proto definitions should specify package as 'hadoop.yarn'</b><br>
<blockquote>yarn proto definitions should specify package as 'hadoop.yarn' similar to protos with 'hadoop.common' &amp; 'hadoop.hdfs' in Common &amp; HDFS respectively.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1152">YARN-1152</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Invalid key to HMAC computation error when getting application report for completed app attempt</b><br>
<blockquote>On a secure cluster, an invalid key to HMAC error is thrown when trying to get an application report for an application with an attempt that has unregistered.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1144">YARN-1144</a>.
Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager)<br>
<b>Unmanaged AMs registering a tracking URI should not be proxy-fied</b><br>
<blockquote>Unmanaged AMs do not run in the cluster, their tracking URL should not be proxy-fied.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1137">YARN-1137</a>.
Major improvement reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)<br>
<b>Add support whitelist for system users to Yarn container-executor.c</b><br>
<blockquote>Currently container-executor.c has a banned set of users (mapred, hdfs &amp; bin) and configurable min.user.id (defaulting to 1000).
This presents a problem for systems that run as system users (below 1000) if these systems want to start containers.
Systems like Impala fit in this category. A (local) 'impala' system user is created when installing Impala on the nodes.
Note that the same thing happens when installing system like HDFS, Yarn, Oozie, from packages (Bigtop); local system users are created.
For Impala to be able to run containers in a secure cluster, the 'impala' system user must whitelisted.
For this, adding a configuration 'allowed.system.users' option in the container-executor.cfg and the logic in container-executor.c would allow the usernames in that list.
Because system users are not guaranteed to have the same UID in different machines, the 'allowed.system.users' property should use usernames and not UIDs.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1124">YARN-1124</a>.
Blocker bug reported by Omkar Vinit Joshi and fixed by Xuan Gong <br>
<b>By default yarn application -list should display all the applications in a state other than FINISHED / FAILED</b><br>
<blockquote>Today we are just listing application in RUNNING state by default for "yarn application -list". Instead we should show all the applications which are either submitted/accepted/running.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1120">YARN-1120</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>Make ApplicationConstants.Environment.USER definition OS neutral</b><br>
<blockquote>In YARN-557, we added some code to make {{ApplicationConstants.Environment.USER}} has OS-specific definition in order to fix the unit test TestUnmanagedAMLauncher. In YARN-571, the relevant test code was corrected. In YARN-602, we actually will explicitly set the environment variables for the child containers. With these changes, I think we can revert the YARN-557 change to make {{ApplicationConstants.Environment.USER}} OS neutral. The main benefit is that we can use the same method over the Enum constants. This should also fix the TestContainerLaunch#testContainerEnvVariables failure on Windows. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1117">YARN-1117</a>.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)<br>
<b>Improve help message for $ yarn applications and $yarn node</b><br>
<blockquote>There is standardization of help message in YARN-1080. It is nice to have similar changes for $ yarn appications and yarn node</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1116">YARN-1116</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Populate AMRMTokens back to AMRMTokenSecretManager after RM restarts</b><br>
<blockquote>The AMRMTokens are now only saved in RMStateStore and not populated back to AMRMTokenSecretManager after RM restarts. This is more needed now since AMRMToken also becomes used in non-secure env.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1107">YARN-1107</a>.
Blocker bug reported by Arpit Gupta and fixed by Omkar Vinit Joshi (resourcemanager)<br>
<b>Job submitted with Delegation token in secured environment causes RM to fail during RM restart</b><br>
<blockquote>If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1101">YARN-1101</a>.
Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)<br>
<b>Active nodes can be decremented below 0</b><br>
<blockquote>The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class. The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1094">YARN-1094</a>.
Blocker bug reported by Yesha Vora and fixed by Vinod Kumar Vavilapalli <br>
<b>RM restart throws Null pointer Exception in Secure Env</b><br>
<blockquote>Enable rmrestart feature And restart Resorce Manager while a job is running.
Resorce Manager fails to start with below error
2013-08-23 17:57:40,705 INFO resourcemanager.RMAppManager (RMAppManager.java:recover(370)) - Recovering application application_1377280618693_0001
2013-08-23 17:57:40,763 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(617)) - Failed to load/recover state
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.setTimerForTokenRenewal(DelegationTokenRenewer.java:371)
at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addApplication(DelegationTokenRenewer.java:307)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:291)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:371)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:819)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:613)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:832)
2013-08-23 17:57:40,766 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1093">YARN-1093</a>.
Major bug reported by Wing Yew Poon and fixed by (documentation)<br>
<b>Corrections to Fair Scheduler documentation</b><br>
<blockquote>The fair scheduler is still evolving, but the current documentation contains some inaccuracies.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1085">YARN-1085</a>.
Blocker task reported by Jaimin D Jetly and fixed by Omkar Vinit Joshi (nodemanager , resourcemanager)<br>
<b>Yarn and MRv2 should do HTTP client authentication in kerberos setup.</b><br>
<blockquote>In kerberos setup it's expected for a http client to authenticate to kerberos before allowing user to browse any information.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1083">YARN-1083</a>.
Major bug reported by Yesha Vora and fixed by Zhijie Shen (resourcemanager)<br>
<b>ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval</b><br>
<blockquote>if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes'
Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1082">YARN-1082</a>.
Blocker bug reported by Arpit Gupta and fixed by Vinod Kumar Vavilapalli (resourcemanager)<br>
<b>Secure RM with recovery enabled and rm state store on hdfs fails with gss exception</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1081">YARN-1081</a>.
Minor improvement reported by Tassapol Athiapinya and fixed by Akira AJISAKA (client)<br>
<b>Minor improvement to output header for $ yarn node -list</b><br>
<blockquote>Output of $ yarn node -list shows number of running containers at each node. I found a case when new user of YARN thinks that this is container ID, use it later in other YARN commands and find an error due to misunderstanding.
{code:title=current output}
2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Running-Containers
2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2
{code}
{code:title=proposed output}
2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Number-of-Running-Containers
2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1080">YARN-1080</a>.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)<br>
<b>Improve help message for $ yarn logs</b><br>
<blockquote>There are 2 parts I am proposing in this jira. They can be fixed together in one patch.
1. Standardize help message for required parameter of $ yarn logs
YARN CLI has a command "logs" ($ yarn logs). The command always requires a parameter of "-applicationId &lt;arg&gt;". However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily.
{code:title=current help message}
-bash-4.1$ yarn logs
usage: general options are:
-applicationId &lt;arg&gt; ApplicationId (required)
-appOwner &lt;arg&gt; AppOwner (assumed to be current user if not
specified)
-containerId &lt;arg&gt; ContainerId (must be specified if node address is
specified)
-nodeAddress &lt;arg&gt; NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}
{code:title=proposed help message}
-bash-4.1$ yarn logs
usage: yarn logs -applicationId &lt;application ID&gt; [OPTIONS]
general options are:
-appOwner &lt;arg&gt; AppOwner (assumed to be current user if not
specified)
-containerId &lt;arg&gt; ContainerId (must be specified if node address is
specified)
-nodeAddress &lt;arg&gt; NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}
2. Add description for help command. As far as I know, a user cannot get logs for running job. Since I spent some time trying to get logs of running applications, it should be nice to say this in command description.
{code:title=proposed help}
Retrieve logs for completed/killed YARN application
usage: general options are...
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1078">YARN-1078</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows</b><br>
<blockquote>The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name "localhost".
{noformat}
org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0_0000_01_000000 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345
{noformat}
{noformat}
testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec &lt;&lt;&lt; FAILURE!
org.junit.ComparisonFailure: expected:&lt;[localhost]:12345&gt; but was:&lt;[127.0.0.1]:12345&gt;
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at $Proxy26.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1077">YARN-1077</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestContainerLaunch fails on Windows</b><br>
<blockquote>Several cases in this unit tests fail on Windows. (Append error log at the end.)
testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and bash script error handling. If some command fails in the cmd script, cmd will continue execute the the rest of the script command. Error handling needs to be explicitly carried out in the script file. The error code of the last command will be returned as the error code of the whole script. In this test, some error happened in the middle of the cmd script, the test expect an exception and non-zero error code. In the cmd script, the intermediate errors are ignored. The last command "call" succeeded and there is no exception.
testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands used by the test.
testContainerEnvVariables and testDelayedKill fail due to a regression from YARN-906.
{noformat}
-------------------------------------------------------------------------------
Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
-------------------------------------------------------------------------------
Tests run: 7, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 11.526 sec &lt;&lt;&lt; FAILURE!
testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 583 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: Should catch exception
at junit.framework.Assert.fail(Assert.java:50)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269)
...
testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 561 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: Should catch exception
at junit.framework.Assert.fail(Assert.java:50)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314)
...
testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 4136 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: expected:&lt;137&gt; but was:&lt;143&gt;
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:500)
...
testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 2744 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: expected:&lt;137&gt; but was:&lt;143&gt;
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:601)
...
{noformat}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1074">YARN-1074</a>.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)<br>
<b>Clean up YARN CLI app list to show only running apps.</b><br>
<blockquote>Once a user brings up YARN daemon, runs jobs, jobs will stay in output returned by $ yarn application -list even after jobs complete already. We want YARN command line to clean up this list. Specifically, we want to remove applications with FINISHED state(not Final-State) or KILLED state from the result.
{code}
[user1@host1 ~]$ yarn application -list
Total Applications:150
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1374638600275_0109 Sleep job MAPREDUCE user1 default KILLED KILLED 100% host1:54059
application_1374638600275_0121 Sleep job MAPREDUCE user1 default FINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0121
application_1374638600275_0020 Sleep job MAPREDUCE user1 default FINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0020
application_1374638600275_0038 Sleep job MAPREDUCE user1 default
....
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1049">YARN-1049</a>.
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)<br>
<b>ContainerExistStatus should define a status for preempted containers</b><br>
<blockquote>With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash.
Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted.
Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1034">YARN-1034</a>.
Trivial task reported by Sandy Ryza and fixed by Karthik Kambatla (documentation , scheduler)<br>
<b>Remove "experimental" in the Fair Scheduler documentation</b><br>
<blockquote>The YARN Fair Scheduler is largely stable now, and should no longer be declared experimental.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1025">YARN-1025</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager , resourcemanager)<br>
<b>ResourceManager and NodeManager do not load native libraries on Windows.</b><br>
<blockquote>ResourceManager and NodeManager do not have the correct setting for java.library.path when launched on Windows. This prevents the processes from loading native code from hadoop.dll. The native code is required for correct functioning on Windows (not optional), so this ultimately can cause failures.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1008">YARN-1008</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)<br>
<b>MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations</b><br>
<blockquote>While the NMs are keyed using the NodeId, the allocation is done based on the hostname.
This makes the different nodes indistinguishable to the scheduler.
There should be an option to enabled the host:port instead just port for allocations. The nodes reported to the AM should report the 'key' (host or host:port).
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1006">YARN-1006</a>.
Major bug reported by Jian He and fixed by Xuan Gong <br>
<b>Nodes list web page on the RM web UI is broken</b><br>
<blockquote>The nodes web page which list all the connected nodes of the cluster is broken.
1. The page is not showing in correct format/style.
2. If we restart the NM, the node list is not refreshed, but just add the new started NM to the list. The old NMs information still remain.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1001">YARN-1001</a>.
Blocker task reported by Srimanth Gunturi and fixed by Zhijie Shen (api)<br>
<b>YARN should provide per application-type and state statistics</b><br>
<blockquote>In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-994">YARN-994</a>.
Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>HeartBeat thread in AMRMClientAsync does not handle runtime exception correctly</b><br>
<blockquote>YARN-654 performs sanity checks for parameters of public methods in AMRMClient. Those may create runtime exception.
Currently, heartBeat thread in AMRMClientAsync only captures IOException and YarnException, and will not handle Runtime Exception properly.
Possible solution can be: heartbeat thread will catch throwable and notify the callbackhandler thread via existing savedException</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-981">YARN-981</a>.
Major bug reported by Xuan Gong and fixed by Jian He <br>
<b>YARN/MR2/Job-history /logs link does not have correct content</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-966">YARN-966</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED</b><br>
<blockquote>In ContainerImpl.getLocalizedResources(), there's:
{code}
assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!!
{code}
ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-957">YARN-957</a>.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Capacity Scheduler tries to reserve the memory more than what node manager reports.</b><br>
<blockquote>I have 2 node managers.
* one with 1024 MB memory.(nm1)
* second with 2048 MB memory.(nm2)
I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are
* stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first).
* now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory.
* now start nm2 with 2048 MB memory.
It hangs forever... Ideally this has two potential issues.
* It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that.
* Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-948">YARN-948</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>RM should validate the release container list before actually releasing them</b><br>
<blockquote>At present we are blinding passing the allocate request containing containers to be released to the scheduler. This may result into one application releasing another application's container.
{code}
@Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
List&lt;ResourceRequest&gt; ask, List&lt;ContainerId&gt; release,
List&lt;String&gt; blacklistAdditions, List&lt;String&gt; blacklistRemovals) {
FiCaSchedulerApp application = getApplication(applicationAttemptId);
....
....
// Release containers
for (ContainerId releasedContainerId : release) {
RMContainer rmContainer = getRMContainer(releasedContainerId);
if (rmContainer == null) {
RMAuditLogger.logFailure(application.getUser(),
AuditConstants.RELEASE_CONTAINER,
"Unauthorized access or invalid container", "CapacityScheduler",
"Trying to release container not owned by app or with invalid id",
application.getApplicationId(), releasedContainerId);
}
completedContainer(rmContainer,
SchedulerUtils.createAbnormalContainerStatus(
releasedContainerId,
SchedulerUtils.RELEASED_CONTAINER),
RMContainerEventType.RELEASED);
}
{code}
Current checks are not sufficient and we should prevent this..... thoughts?</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-942">YARN-942</a>.
Major bug reported by Sandy Ryza and fixed by Akira AJISAKA (scheduler)<br>
<b>In Fair Scheduler documentation, inconsistency on which properties have prefix</b><br>
<blockquote>locality.threshold.node and locality.threshold.rack should have the yarn.scheduler.fair prefix like the items before them
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-910">YARN-910</a>.
Major improvement reported by Sandy Ryza and fixed by Alejandro Abdelnur (nodemanager)<br>
<b>Allow auxiliary services to listen for container starts and completions</b><br>
<blockquote>Making container start and completion events available to auxiliary services would allow them to be resource-aware. The auxiliary service would be able to notify a co-located service that is opportunistically using free capacity of allocation changes.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-906">YARN-906</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Cancelling ContainerLaunch#call at KILLING causes that the container cannot be completed</b><br>
<blockquote>See https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-903">YARN-903</a>.
Major bug reported by Abhishek Kapoor and fixed by Omkar Vinit Joshi (applications/distributed-shell)<br>
<b>DistributedShell throwing Errors in logs after successfull completion</b><br>
<blockquote>I have tried running DistributedShell and also used ApplicationMaster of the same for my test.
The application is successfully running through logging some errors which would be useful to fix.
Below are the logs from NodeManager and ApplicationMasterode
Log Snippet for NodeManager
=============================
2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586
2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570
2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of &lt;memory:10240, vCores:8&gt;
2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000001 by user sunny
2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001
2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000001
2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING
2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000001 to application application_1373184544832_0001
2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from INITING to RUNNING
2013-07-07 13:39:35,528 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from NEW to LOCALIZING
2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from INIT to DOWNLOADING
2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1373184544832_0001_01_000001
2013-07-07 13:39:35,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens. Credentials list:
2013-07-07 13:39:35,694 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user sunny
2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001.tokens
2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001 = file:/home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001
2013-07-07 13:39:36,136 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:36,406 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from DOWNLOADING to LOCALIZED
2013-07-07 13:39:36,409 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZING to LOCALIZED
2013-07-07 13:39:36,524 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZED to RUNNING
2013-07-07 13:39:36,692 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001/default_container_executor.sh]
2013-07-07 13:39:37,144 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:38,147 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,151 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,209 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000001
2013-07-07 13:39:39,259 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 11552
2013-07-07 13:39:39,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 29524 for container-id container_1373184544832_0001_01_000001: 79.9 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used
2013-07-07 13:39:39,645 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000002 by user sunny
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000002
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000002 to application application_1373184544832_0001
2013-07-07 13:39:39,652 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from NEW to LOCALIZED
2013-07-07 13:39:39,660 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002
2013-07-07 13:39:39,661 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Returning container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from LOCALIZED to RUNNING
2013-07-07 13:39:39,873 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000002/default_container_executor.sh]
2013-07-07 13:39:39,898 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000002 succeeded
2013-07-07 13:39:39,899 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-07-07 13:39:39,900 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000002
2013-07-07 13:39:39,942 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000002
2013-07-07 13:39:39,943 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from EXITED_WITH_SUCCESS to DONE
2013-07-07 13:39:39,944 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000002 from application application_1373184544832_0001
2013-07-07 13:39:40,155 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:40,157 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 2, }, state: C_COMPLETE, diagnostics: "", exit_status: 0,
2013-07-07 13:39:40,158 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000002
2013-07-07 13:39:40,683 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002
2013-07-07 13:39:40,686 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
2013-07-07 13:39:40,687 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51085: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862)
2013-07-07 13:39:41,162 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:41,691 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000001 succeeded
2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000001
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000001
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from EXITED_WITH_SUCCESS to DONE
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000001 from application application_1373184544832_0001
2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_COMPLETE, diagnostics: "", exit_status: 0,
2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000001
2013-07-07 13:39:42,191 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:42,195 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000001
2013-07-07 13:39:42,196 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
2013-07-07 13:39:42,196 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51086: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862)
2013-07-07 13:39:42,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000002
2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000002
2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000001
2013-07-07 13:39:43,173 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2013-07-07 13:39:43,174 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1373184544832_0001
2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1373184544832_0001, with delay of 10800 seconds
Log Snippet for Application Manager
==================================
13/07/07 13:39:36 INFO client.SimpleApplicationMaster: Initializing ApplicationMaster
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Application master for app, appId=1, clustertimestamp=1373184544832, attemptId=1
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Starting ApplicationMaster
13/07/07 13:39:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/07 13:39:37 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
13/07/07 13:39:37 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Max mem capabililty of resources in this cluster 8192
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Requested container ask: Capability[&lt;memory:100, vCores:0&gt;]Priority[0]ContainerCount[1]
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Got response from RM for container ask, allocatedCnt=1
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Launching shell command on a new container., containerId=container_1373184544832_0001_01_000002, containerNode=sunny-Inspiron:9993, containerNodeURI=sunny-Inspiron:8042, containerResourceMemory1024
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Setting up container launch container for containerid=container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : sunny-Inspiron:9993
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Succeeded to start Container container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1373184544832_0001_01_000002
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got response from RM for container ask, completedCnt=1
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got container status for containerID=container_1373184544832_0001_01_000002, state=COMPLETE, exitStatus=0, diagnostics=
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Container completed successfully., containerId=container_1373184544832_0001_01_000002
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Application completed. Stopping running containers
13/07/07 13:39:40 ERROR impl.NMClientImpl: Failed to stop Container container_1373184544832_0001_01_000002when stopping NMClientImpl
13/07/07 13:39:40 INFO impl.ContainerManagementProtocolProxy: Closing proxy : sunny-Inspiron:9993
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Application completed. Signalling finish to RM
13/07/07 13:39:41 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:281)
13/07/07 13:39:41 INFO client.SimpleApplicationMaster: Application Master completed successfully. exiting
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-881">YARN-881</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Priority#compareTo method seems to be wrong.</b><br>
<blockquote>if lower int value means higher priority, shouldn't we "return other.getPriority() - this.getPriority() " </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-771">YARN-771</a>.
Major sub-task reported by Bikas Saha and fixed by Junping Du <br>
<b>AMRMClient support for resource blacklisting</b><br>
<blockquote>After YARN-750 AMRMClient should support blacklisting via the new YARN API's</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-758">YARN-758</a>.
Minor improvement reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Augment MockNM to use multiple cores</b><br>
<blockquote>YARN-757 got fixed by changing the scheduler from Fair to default (which is capacity).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-707">YARN-707</a>.
Blocker improvement reported by Bikas Saha and fixed by Jason Lowe <br>
<b>Add user info in the YARN ClientToken</b><br>
<blockquote>If user info is present in the client token then it can be used to do limited authz in the AM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-696">YARN-696</a>.
Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (resourcemanager)<br>
<b>Enable multiple states to to be specified in Resource Manager apps REST call</b><br>
<blockquote>Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://&lt;rm http address:port&gt;/ws/v1/cluster/apps).
There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7).
The proposal is to be able to specify multiple states in a single REST call.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-643">YARN-643</a>.
Major bug reported by Jian He and fixed by Xuan Gong <br>
<b>WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition</b><br>
<blockquote>The jira is tracking why appToken and clientToAMToken is removed separately, and why they are distributed in different transitions, ideally there may be a common place where these two tokens can be removed at the same time. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-602">YARN-602</a>.
Major bug reported by Xuan Gong and fixed by Kenji Kikushima <br>
<b>NodeManager should mandatorily set some Environment variables into every containers that it launches</b><br>
<blockquote>NodeManager should mandatorily set some Environment variables into every containers that it launches, such as Environment.user, Environment.pwd. If both users and NodeManager set those variables, the value set by NM should be used </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-589">YARN-589</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Expose a REST API for monitoring the fair scheduler</b><br>
<blockquote>The fair scheduler should have an HTTP interface that exposes information such as applications per queue, fair shares, demands, current allocations.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-573">YARN-573</a>.
Critical sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Shared data structures in Public Localizer and Private Localizer are not Thread safe.</b><br>
<blockquote>PublicLocalizer
1) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ).
PrivateLocalizer
1) pending accessed by addResource (part of event handling) and findNextResource (i.remove()). Also update method should be fixed. It too is sharing pending list.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-540">YARN-540</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Race condition causing RM to potentially relaunch already unregistered AMs on RM restart</b><br>
<blockquote>When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-502">YARN-502</a>.
Major sub-task reported by Lohit Vijayarenu and fixed by Mayank Bansal <br>
<b>RM crash with NPE on NODE_REMOVED event with FairScheduler</b><br>
<blockquote>While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha
{noformat}
2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node YYYY:55680 as it is now LOST
2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: YYYY:55680 Node Transitioned from UNHEALTHY to LOST
2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
at java.lang.Thread.run(Thread.java:662)
2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@XXXX:50030
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-337">YARN-337</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RM handles killed application tracking URL poorly</b><br>
<blockquote>When the ResourceManager kills an application, it leaves the proxy URL redirecting to the original tracking URL for the application even though the ApplicationMaster is no longer there to service it. It should redirect it somewhere more useful, like the RM's web page for the application, where the user can find that the application was killed and links to the AM logs.
In addition, sometimes the AM during teardown from the kill can attempt to unregister and provide an updated tracking URL, but unfortunately the RM has "forgotten" the AM due to the kill and refuses to process the unregistration. Instead it logs:
{noformat}
2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_000001
{noformat}
It should go ahead and process the unregistration to update the tracking URL since the application offered it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-292">YARN-292</a>.
Major sub-task reported by Devaraj K and fixed by Zhijie Shen (resourcemanager)<br>
<b>ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt</b><br>
<blockquote>{code:xml}
2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525
java.lang.ArrayIndexOutOfBoundsException: 0
at java.util.Arrays$ArrayList.get(Arrays.java:3381)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-107">YARN-107</a>.
Major bug reported by Devaraj K and fixed by Xuan Gong (resourcemanager)<br>
<b>ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5497">MAPREDUCE-5497</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>'5s sleep' in MRAppMaster.shutDownJob is only needed before stopping ClientService</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5493">MAPREDUCE-5493</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)<br>
<b>In-memory map outputs can be leaked after shuffle completes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5483">MAPREDUCE-5483</a>.
Major bug reported by Alejandro Abdelnur and fixed by Robert Kanter (distcp)<br>
<b>revert MAPREDUCE-5357</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5478">MAPREDUCE-5478</a>.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (examples)<br>
<b>TeraInputFormat unnecessarily defines its own FileSplit subclass</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5476">MAPREDUCE-5476</a>.
Blocker bug reported by Jian He and fixed by Jian He <br>
<b>Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5475">MAPREDUCE-5475</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am , mrv2)<br>
<b>MRClientService does not verify ACLs properly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5470">MAPREDUCE-5470</a>.
Major bug reported by Chris Nauroth and fixed by Sandy Ryza <br>
<b>LocalJobRunner does not work on Windows.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5468">MAPREDUCE-5468</a>.
Blocker bug reported by Yesha Vora and fixed by Vinod Kumar Vavilapalli <br>
<b>AM recovery does not work for map only jobs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5466">MAPREDUCE-5466</a>.
Blocker bug reported by Yesha Vora and fixed by Jian He <br>
<b>Historyserver does not refresh the result of restarted jobs after RM restart</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5462">MAPREDUCE-5462</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (performance , task)<br>
<b>In map-side sort, swap entire meta entries instead of indexes for better cache performance </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5454">MAPREDUCE-5454</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (test)<br>
<b>TestDFSIO fails intermittently on JDK7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5446">MAPREDUCE-5446</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , test)<br>
<b>TestJobHistoryEvents and TestJobHistoryParsing have race conditions</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5441">MAPREDUCE-5441</a>.
Major bug reported by Rohith Sharma K S and fixed by Jian He (applicationmaster , client)<br>
<b>JobClient exit whenever RM issue Reboot command to 1st attempt App Master.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5440">MAPREDUCE-5440</a>.
Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)<br>
<b>TestCopyCommitter Fails on JDK7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5428">MAPREDUCE-5428</a>.
Major bug reported by Jason Lowe and fixed by Karthik Kambatla (jobhistoryserver , mrv2)<br>
<b>HistoryFileManager doesn't stop threads when service is stopped</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5425">MAPREDUCE-5425</a>.
Major bug reported by Ashwin Shankar and fixed by Robert Parker (jobhistoryserver)<br>
<b>Junit in TestJobHistoryServer failing in jdk 7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5414">MAPREDUCE-5414</a>.
Major bug reported by Nemon Lou and fixed by Nemon Lou (test)<br>
<b>TestTaskAttempt fails jdk7 with NullPointerException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5385">MAPREDUCE-5385</a>.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>JobContext cache files api are broken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5379">MAPREDUCE-5379</a>.
Major improvement reported by Sandy Ryza and fixed by Karthik Kambatla (job submission , security)<br>
<b>Include token tracking ids in jobconf</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5367">MAPREDUCE-5367</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Local jobs all use same local working directory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5358">MAPREDUCE-5358</a>.
Major bug reported by Devaraj K and fixed by Devaraj K (mr-am)<br>
<b>MRAppMaster throws invalid transitions for JobImpl</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5317">MAPREDUCE-5317</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)<br>
<b>Stale files left behind for failed jobs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5251">MAPREDUCE-5251</a>.
Major bug reported by Jason Lowe and fixed by Ashwin Shankar (mrv2)<br>
<b>Reducer should not implicate map attempt if it has insufficient space to fetch map output</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5164">MAPREDUCE-5164</a>.
Major bug reported by Nemon Lou and fixed by Nemon Lou <br>
<b>command "mapred job" and "mapred queue" omit HADOOP_CLIENT_OPTS </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5020">MAPREDUCE-5020</a>.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (client)<br>
<b>Compile failure with JDK8</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5001">MAPREDUCE-5001</a>.
Major bug reported by Brock Noland and fixed by Sandy Ryza <br>
<b>LocalJobRunner has race condition resulting in job failures </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3193">MAPREDUCE-3193</a>.
Major bug reported by Ramgopal N and fixed by Devaraj K (mrv1 , mrv2)<br>
<b>FileInputFormat doesn't read files recursively in the input path dir</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-1981">MAPREDUCE-1981</a>.
Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (job submission)<br>
<b>Improve getSplits performance by using listLocatedStatus</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5199">HDFS-5199</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Add more debug trace for NFS READ and WRITE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5192">HDFS-5192</a>.
Minor bug reported by Jing Zhao and fixed by Jing Zhao <br>
<b>NameNode may fail to start when dfs.client.test.drop.namenode.response.number is set</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5159">HDFS-5159</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)<br>
<b>Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5150">HDFS-5150</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Allow per NN SPN for internal SPNEGO.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5140">HDFS-5140</a>.
Blocker bug reported by Arpit Gupta and fixed by Jing Zhao (ha)<br>
<b>Too many safemode monitor threads being created in the standby namenode causing it to fail with out of memory error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5136">HDFS-5136</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>MNT EXPORT should give the full group list which can mount the exports</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5132">HDFS-5132</a>.
Blocker bug reported by Arpit Gupta and fixed by Kihwal Lee (namenode)<br>
<b>Deadlock in NameNode between SafeModeMonitor#run and DatanodeManager#handleHeartbeat</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5128">HDFS-5128</a>.
Critical improvement reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Allow multiple net interfaces to be used with HA namenode RPC server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5124">HDFS-5124</a>.
Blocker bug reported by Deepesh Khandelwal and fixed by Daryn Sharp (namenode)<br>
<b>DelegationTokenSecretManager#retrievePassword can cause deadlock in NameNode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5118">HDFS-5118</a>.
Major new feature reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Provide testing support for DFSClient to drop RPC responses</b><br>
<blockquote>Used for testing when NameNode HA is enabled. Users can use a new configuration property "dfs.client.test.drop.namenode.response.number" to specify the number of responses that DFSClient will drop in each RPC call. This feature can help testing functionalities such as NameNode retry cache.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5111">HDFS-5111</a>.
Minor bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)<br>
<b>Remove duplicated error message for snapshot commands when processing invalid arguments</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5110">HDFS-5110</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Change FSDataOutputStream to HdfsDataOutputStream for opened streams to fix type cast error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5107">HDFS-5107</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Fix array copy error in Readdir and Readdirplus responses</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5106">HDFS-5106</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
<b>TestDatanodeBlockScanner fails on Windows due to incorrect path format</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5105">HDFS-5105</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestFsck fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5104">HDFS-5104</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Support dotdot name in NFS LOOKUP operation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5103">HDFS-5103</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
<b>TestDirectoryScanner fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5102">HDFS-5102</a>.
Major bug reported by Aaron T. Myers and fixed by Jing Zhao (snapshots)<br>
<b>Snapshot names should not be allowed to contain slash characters</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5100">HDFS-5100</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
<b>TestNamenodeRetryCache fails on Windows due to incorrect cleanup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5099">HDFS-5099</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu (namenode)<br>
<b>Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5091">HDFS-5091</a>.
Minor bug reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Support for spnego keytab separate from the JournalNode keytab for secure HA</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5085">HDFS-5085</a>.
Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)<br>
<b>Refactor o.a.h.nfs to support different types of authentications</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5080">HDFS-5080</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao (ha , qjm)<br>
<b>BootstrapStandby not working with QJM when the existing NN is active</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5078">HDFS-5078</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Support file append in NFSv3 gateway to enable data streaming to HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5076">HDFS-5076</a>.
Minor new feature reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Add MXBean methods to query NN's transaction information and JournalNode's journal status</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5071">HDFS-5071</a>.
Major sub-task reported by Kihwal Lee and fixed by Brandon Li (nfs)<br>
<b>Change hdfs-nfs parent project to hadoop-project</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5069">HDFS-5069</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Include hadoop-nfs and hadoop-hdfs-nfs into hadoop dist for NFS deployment</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5067">HDFS-5067</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Support symlink operations</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5061">HDFS-5061</a>.
Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)<br>
<b>Make FSNameSystem#auditLoggers an unmodifiable list</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5055">HDFS-5055</a>.
Blocker bug reported by Allen Wittenauer and fixed by Vinay (namenode)<br>
<b>nn fails to download checkpointed image from snn in some setups</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5047">HDFS-5047</a>.
Major bug reported by Kihwal Lee and fixed by Robert Parker (namenode)<br>
<b>Supress logging of full stack trace of quota and lease exceptions</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5045">HDFS-5045</a>.
Minor improvement reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Add more unit tests for retry cache to cover all AtMostOnce methods</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5043">HDFS-5043</a>.
Major bug reported by Brandon Li and fixed by Brandon Li <br>
<b>For HdfsFileStatus, set default value of childrenNum to -1 instead of 0 to avoid confusing applications</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5028">HDFS-5028</a>.
Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong <br>
<b>LeaseRenewer throw java.util.ConcurrentModificationException when timeout</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4993">HDFS-4993</a>.
Major bug reported by Kihwal Lee and fixed by Robert Parker <br>
<b>fsck can fail if a file is renamed or deleted</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4962">HDFS-4962</a>.
Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (nfs)<br>
<b>Use enum for nfs constants</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4947">HDFS-4947</a>.
Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)<br>
<b>Add NFS server export table to control export by hostname or IP range</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4926">HDFS-4926</a>.
Trivial improvement reported by Joseph Lorenzini and fixed by Vivek Ganesan (namenode)<br>
<b>namenode webserver's page has a tooltip that is inconsistent with the datanode HTML link</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4905">HDFS-4905</a>.
Minor improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (tools)<br>
<b>Add appendToFile command to "hdfs dfs"</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4898">HDFS-4898</a>.
Minor bug reported by Eric Sirianni and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4763">HDFS-4763</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Add script changes/utility for starting NFS gateway</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4680">HDFS-4680</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode , security)<br>
<b>Audit logging of delegation tokens for MR tracing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4632">HDFS-4632</a>.
Major bug reported by Chris Nauroth and fixed by Chuan Liu (test)<br>
<b>globStatus using backslash for escaping does not work on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4594">HDFS-4594</a>.
Minor bug reported by Arpit Gupta and fixed by Chris Nauroth (webhdfs)<br>
<b>WebHDFS open sets Content-Length header to what is specified by length parameter rather than how much data is actually returned. </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4329">HDFS-4329</a>.
Major bug reported by Andy Isaacson and fixed by Cristina L. Abad (hdfs-client)<br>
<b>DFSShell issues with directories with spaces in name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3245">HDFS-3245</a>.
Major improvement reported by Todd Lipcon and fixed by Ravi Prakash (namenode)<br>
<b>Add metrics and web UI for cluster version summary</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2933">HDFS-2933</a>.
Major improvement reported by Philip Zeyliger and fixed by Vivek Ganesan (datanode)<br>
<b>Improve DataNode Web UI Index Page</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9962">HADOOP-9962</a>.
Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build)<br>
<b>in order to avoid dependency divergence within Hadoop itself lets enable DependencyConvergence</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9961">HADOOP-9961</a>.
Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build)<br>
<b>versions of a few transitive dependencies diverged between hadoop subprojects</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9960">HADOOP-9960</a>.
Blocker bug reported by Brock Noland and fixed by Karthik Kambatla <br>
<b>Upgrade Jersey version to 1.9</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9958">HADOOP-9958</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Add old constructor back to DelegationTokenInformation to unbreak downstream builds</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9945">HADOOP-9945</a>.
Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (ha)<br>
<b>HAServiceState should have a state for stopped services</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9944">HADOOP-9944</a>.
Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>RpcRequestHeaderProto defines callId as uint32 while ipc.Client.CONNECTION_CONTEXT_CALL_ID is signed (-3)</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9932">HADOOP-9932</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Improper synchronization in RetryCache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9924">HADOOP-9924</a>.
Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)<br>
<b>FileUtil.createJarWithClassPath() does not generate relative classpath correctly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9918">HADOOP-9918</a>.
Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Add addIfService() to CompositeService</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9916">HADOOP-9916</a>.
Minor bug reported by Binglin Chang and fixed by Binglin Chang <br>
<b>Race condition in ipc.Client causes TestIPC timeout</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9910">HADOOP-9910</a>.
Minor bug reported by Andr&#233; Kelpe and fixed by <br>
<b>proxy server start and stop documentation wrong</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9906">HADOOP-9906</a>.
Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (ha)<br>
<b>Move HAZKUtil to o.a.h.util.ZKUtil and make inner-classes public</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9899">HADOOP-9899</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (security)<br>
<b>Remove the debug message added by HADOOP-8855</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9886">HADOOP-9886</a>.
Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>Turn warning message in RetryInvocationHandler to debug</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9880">HADOOP-9880</a>.
Blocker bug reported by Kihwal Lee and fixed by Daryn Sharp <br>
<b>SASL changes from HADOOP-9421 breaks Secure HA NN </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9879">HADOOP-9879</a>.
Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (build)<br>
<b>Move the version info of zookeeper dependencies to hadoop-project/pom</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9868">HADOOP-9868</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>Server must not advertise kerberos realm</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9858">HADOOP-9858</a>.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)<br>
<b>Remove unused private RawLocalFileSystem#execCommand method from branch-2.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9857">HADOOP-9857</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (build , test)<br>
<b>Tests block and sometimes timeout on Windows due to invalid entropy source.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9833">HADOOP-9833</a>.
Minor improvement reported by Steve Loughran and fixed by Kousuke Saruta (build)<br>
<b>move slf4j to version 1.7.5</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9831">HADOOP-9831</a>.
Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (bin)<br>
<b>Make checknative shell command accessible on Windows.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9821">HADOOP-9821</a>.
Minor improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA <br>
<b>ClientId should have getMsb/getLsb methods</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9820">HADOOP-9820</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)<br>
<b>RPCv9 wire protocol is insufficient to support multiplexing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9806">HADOOP-9806</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>PortmapInterface should check if the procedure is out-of-range</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9803">HADOOP-9803</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)<br>
<b>Add generic type parameter to RetryInvocationHandler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9802">HADOOP-9802</a>.
Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (io)<br>
<b>Support Snappy codec on Windows.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9801">HADOOP-9801</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (conf)<br>
<b>Configuration#writeXml uses platform defaulting encoding, which may mishandle multi-byte characters.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9789">HADOOP-9789</a>.
Critical new feature reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)<br>
<b>Support server advertised kerberos principals</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9774">HADOOP-9774</a>.
Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)<br>
<b>RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9768">HADOOP-9768</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)<br>
<b>chown and chgrp reject users and groups with spaces on platforms where spaces are otherwise acceptable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9757">HADOOP-9757</a>.
Major bug reported by Jason Lowe and fixed by Cristina L. Abad (fs)<br>
<b>Har metadata cache can grow without limit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9686">HADOOP-9686</a>.
Major improvement reported by Jason Lowe and fixed by Jason Lowe (conf)<br>
<b>Easy access to final parameters in Configuration</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9672">HADOOP-9672</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Upgrade Avro dependency to 1.7.4</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9557">HADOOP-9557</a>.
Major bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu (build)<br>
<b>hadoop-client excludes commons-httpclient</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9446">HADOOP-9446</a>.
Major improvement reported by Yu Gao and fixed by Yu Gao (security)<br>
<b>Support Kerberos HTTP SPNEGO authentication for non-SUN JDK</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9435">HADOOP-9435</a>.
Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (build)<br>
<b>Support building the JNI code against the IBM JVM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9381">HADOOP-9381</a>.
Trivial bug reported by Keegan Witt and fixed by Keegan Witt <br>
<b>Document dfs cp -f option</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9315">HADOOP-9315</a>.
Major bug reported by Dennis Y and fixed by Chris Nauroth (build)<br>
<b>Port HADOOP-9249 hadoop-maven-plugins Clover fix to branch-2 to fix build failures</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8814">HADOOP-8814</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li (conf , fs , fs/s3 , ha , io , metrics , performance , record , security , util)<br>
<b>Inefficient comparison with the empty string. Use isEmpty() instead</b><br>
<blockquote></blockquote></li>
</ul>
</body></html>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.1.0-beta Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.1.0-beta Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.0.5-alpha</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1056">YARN-1056</a>.
Trivial bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}</b><br>
<blockquote>Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} to have a *resourcemanager* only once, make them consistent with other such yarn configs and add entries in yarn-default.xml</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1046">YARN-1046</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Disable mem monitoring by default in MiniYARNCluster</b><br>
<blockquote>Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it.
{noformat}
2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_000001] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container.
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1045">YARN-1045</a>.
Major improvement reported by Siddharth Seth and fixed by Jian He <br>
<b>Improve toString implementation for PBImpls</b><br>
<blockquote>The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll("\\n", ", ").replaceAll("\\s+", " ");{code} is rather inefficient - replacing "\n" and "\s" to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}.
If we can get this into 2.1.0 - great, otherwise the next release.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-1043">YARN-1043</a>.
Major bug reported by Yusaku Sako and fixed by Jian He <br>
<b>YARN Queue metrics are getting pushed to neither file nor Ganglia</b><br>
<blockquote>YARN Queue metrics are not getting pushed to file or Ganglia via Hadoop Metrics 2.
QueueMetrics are still accessible via JMX and RM REST API (&lt;hostname&gt;:8088/ws/v1/cluster/scheduler).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-968">YARN-968</a>.
Blocker bug reported by Kihwal Lee and fixed by Vinod Kumar Vavilapalli <br>
<b>RM admin commands don't work</b><br>
<blockquote>If an RM admin command is issued using CLI, I get something like following:
13/07/24 17:19:40 INFO client.RMProxy: Connecting to ResourceManager at xxxx.com/1.2.3.4:1234
refreshQueues: Unknown protocol: org.apache.hadoop.yarn.api.ResourceManagerAdministrationProtocolPB
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-961">YARN-961</a>.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>ContainerManagerImpl should enforce token on server. Today it is [TOKEN, SIMPLE]</b><br>
<blockquote>We should only accept SecurityAuthMethod.TOKEN for ContainerManagementProtocol. Today it also accepts SIMPLE for unsecured environment.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-960">YARN-960</a>.
Blocker bug reported by Alejandro Abdelnur and fixed by Daryn Sharp <br>
<b>TestMRCredentials and TestBinaryTokenFile are failing on trunk</b><br>
<blockquote>Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
Making it a blocker until full impact of the issue is scoped.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-945">YARN-945</a>.
Blocker bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli <br>
<b>AM register failing after AMRMToken</b><br>
<blockquote>509 2013-07-19 15:53:55,569 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54313: readAndProcess from client 127.0.0.1 threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]]
510 org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]
511 at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1531)
512 at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1482)
513 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:788)
514 at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:587)
515 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:562)
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-937">YARN-937</a>.
Blocker bug reported by Arun C Murthy and fixed by Alejandro Abdelnur <br>
<b>Fix unmanaged AM in non-secure/secure setup post YARN-701</b><br>
<blockquote>Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will be used in both scenarios.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-932">YARN-932</a>.
Major bug reported by Sandy Ryza and fixed by Karthik Kambatla <br>
<b>TestResourceLocalizationService.testLocalizationInit can fail on JDK7</b><br>
<blockquote>It looks like this is occurring when testLocalizationInit doesn't run first. Somehow yarn.nodemanager.log-dirs is getting set by one of the other tests (to ${yarn.log.dir}/userlogs), but yarn.log.dir isn't being set.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-927">YARN-927</a>.
Major task reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest</b><br>
<blockquote>The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest().</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-926">YARN-926</a>.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>ContainerManagerProtcol APIs should take in requests for multiple containers</b><br>
<blockquote>AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses.
The client libraries could expose both the single and multi-container requests.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-922">YARN-922</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Change FileSystemRMStateStore to use directories</b><br>
<blockquote>Store each app and its attempts in the same directory so that removing application state is only one operation</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-919">YARN-919</a>.
Minor bug reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>Document setting default heap sizes in yarn env</b><br>
<blockquote>Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script.
There is no straight forward way to change it in script. Just updating the variables with defaults.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-918">YARN-918</a>.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701</b><br>
<blockquote>Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-912">YARN-912</a>.
Major bug reported by Bikas Saha and fixed by Mayank Bansal <br>
<b>Create exceptions package in common/api for yarn and move client facing exceptions to them</b><br>
<blockquote>Exceptions like InvalidResourceBlacklistRequestException, InvalidResourceRequestException, InvalidApplicationMasterRequestException etc are currently inside ResourceManager and not visible to clients.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-909">YARN-909</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
<b>Disable TestLinuxContainerExecutorWithMocks on Windows</b><br>
<blockquote>This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-897">YARN-897</a>.
Blocker bug reported by Djellel Eddine Difallah and fixed by Djellel Eddine Difallah (capacityscheduler)<br>
<b>CapacityScheduler wrongly sorted queues</b><br>
<blockquote>The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-894">YARN-894</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
<b>NodeHealthScriptRunner timeout checking is inaccurate on Windows</b><br>
<blockquote>In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution.
Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout.
We have following execution sequence in Shell:
1) In main thread, schedule a delayed timer task that will kill the original process upon timeout.
2) In main thread, open a buffered reader and feed in the process's standard input stream.
3) When timeout happens, the timer task will call {{Process#destroy()}}
to kill the main process.
On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread.
On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-883">YARN-883</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Expose Fair Scheduler-specific queue metrics</b><br>
<blockquote>When the Fair Scheduler is enabled, QueueMetrics should include fair share, minimum share, and maximum share.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-877">YARN-877</a>.
Major sub-task reported by Junping Du and fixed by Junping Du (scheduler)<br>
<b>Allow for black-listing resources in FifoScheduler</b><br>
<blockquote>YARN-750 already addressed black-list staff in YARN API and CS scheduler, this jira add implementation for FifoScheduler.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-875">YARN-875</a>.
Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
<b>Application can hang if AMRMClientAsync callback thread has exception</b><br>
<blockquote>Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError().</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-874">YARN-874</a>.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Tracking YARN/MR test failures after HADOOP-9421 and YARN-827</b><br>
<blockquote>HADOOP-9421 and YARN-827 broke some YARN/MR tests. Tracking those..</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-873">YARN-873</a>.
Major sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
<b>YARNClient.getApplicationReport(unknownAppId) returns a null report</b><br>
<blockquote>How can the client find out that app does not exist?</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-869">YARN-869</a>.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ResourceManagerAdministrationProtocol should neither be public(yet) nor in yarn.api</b><br>
<blockquote>This is a admin only api that we don't know yet if people can or should write new tools against. I am going to move it to yarn.server.api and make it @Private..</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-866">YARN-866</a>.
Major test reported by Wei Yan and fixed by Wei Yan <br>
<b>Add test for class ResourceWeights</b><br>
<blockquote>Add test case for the class ResourceWeights</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-865">YARN-865</a>.
Major improvement reported by Xuan Gong and fixed by Xuan Gong <br>
<b>RM webservices can't query based on application Types</b><br>
<blockquote>The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-861">YARN-861</a>.
Critical bug reported by Devaraj K and fixed by Vinod Kumar Vavilapalli (nodemanager)<br>
<b>TestContainerManager is failing</b><br>
<blockquote>https://builds.apache.org/job/Hadoop-Yarn-trunk/246/
{code:xml}
Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec &lt;&lt;&lt; FAILURE!
testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager) Time elapsed: 286 sec &lt;&lt;&lt; FAILURE!
junit.framework.ComparisonFailure: expected:&lt;[asf009.sp2.ygridcore.ne]t&gt; but was:&lt;[localhos]t&gt;
at junit.framework.Assert.assertEquals(Assert.java:85)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-854">YARN-854</a>.
Blocker bug reported by Ramya Sunil and fixed by Omkar Vinit Joshi <br>
<b>App submission fails on secure deploy</b><br>
<blockquote>App submission on secure cluster fails with the following exception:
{noformat}
INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0
main : user is qa_user
javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.
at org.apache.hadoop.ipc.Client.call(Client.java:1298)
at org.apache.hadoop.ipc.Client.call(Client.java:1250)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
at $Proxy7.heartbeat(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
... 3 more
.Failing this attempt.. Failing the application.
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-853">YARN-853</a>.
Major bug reported by Devaraj K and fixed by Devaraj K (capacityscheduler)<br>
<b>maximum-am-resource-percent doesn't work after refreshQueues command</b><br>
<blockquote>If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.&lt;queue-path&gt;.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing 'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-852">YARN-852</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestAggregatedLogFormat.testContainerLogsFileAccess fails on Windows</b><br>
<blockquote>The YARN unit test case fails on Windows when comparing expected message with log message in the file. The expected message constructed in the test case has two problems: 1) it uses Path.separator to concatenate path string. Path.separator is always a forward slash, which does not match the backslash used in the log message. 2) On Windows, the default file owner is Administrators group if the file is created by an Administrators user. The test expect the user to be the current user.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-851">YARN-851</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Share NMTokens using NMTokenCache (api-based) instead of memory based approach which is used currently.</b><br>
<blockquote>It is a follow up ticket for YARN-694. Changing the way NMTokens are shared.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-850">YARN-850</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Rename getClusterAvailableResources to getAvailableResources in AMRMClients</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-848">YARN-848</a>.
Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Nodemanager does not register with RM using the fully qualified hostname</b><br>
<blockquote>If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly.
Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-846">YARN-846</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move pb Impl from yarn-api to yarn-common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-845">YARN-845</a>.
Major sub-task reported by Arpit Gupta and fixed by Mayank Bansal (resourcemanager)<br>
<b>RM crash with NPE on NODE_UPDATE</b><br>
<blockquote>the following stack trace is generated in rm
{code}
n, service: 68.142.246.147:45454 }, ] resource=&lt;memory:1536, vCores:1&gt; queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=&lt;memory:44544, vCores:29&gt;usedCapacity=0.90625, absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=&lt;memory:44544, vCores:29&gt; cluster=&lt;memory:49152, vCores:48&gt;
2013-06-17 12:43:53,655 INFO capacity.ParentQueue (ParentQueue.java:completedContainer(696)) - completedContainer queue=root usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=&lt;memory:44544, vCores:29&gt; cluster=&lt;memory:49152, vCores:48&gt;
2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(832)) - Application appattempt_1371448527090_0844_000001 released container container_1371448527090_0844_01_000005 on node: host: hostXX:45454 #containers=4 available=2048 used=6144 with event: FINISHED
2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for application application_1371448527090_0844 on node: hostXX:45454
2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp (FiCaSchedulerApp.java:unreserve(435)) - Application application_1371448527090_0844 unreserved on node host: hostXX:45454 #containers=4 available=2048 used=6144, currently has 4 at priority 20; currentReservation &lt;memory:6144, vCores:4&gt;
2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for deactivate...
2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
at java.lang.Thread.run(Thread.java:662)
2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager (ResourceManager.java:run(426)) - Exiting, bbye..
2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@hostXX:8088
2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
2013-06-17 12:43:53,768 WARN amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2013-06-17 12:43:53,768 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8033
2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8033
2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8032
2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8032
2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8030
2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8030
2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8031
2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
2013-06-17 12:43:53,774 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8031
2013-06-17 12:43:53,775 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-841">YARN-841</a>.
Major sub-task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
<b>Annotate and document AuxService APIs</b><br>
<blockquote>For users writing their own AuxServices, these APIs should be annotated and need better documentation. Also, the classes may need to move out of the NodeManager.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-840">YARN-840</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move ProtoUtils to yarn.api.records.pb.impl</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-839">YARN-839</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestContainerLaunch.testContainerEnvVariables fails on Windows</b><br>
<blockquote>The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails.
Exception in trunk:
{noformat}
Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec &lt;&lt;&lt; FAILURE!
testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 1307 sec &lt;&lt;&lt; ERROR!
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-837">YARN-837</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>ClusterInfo.java doesn't seem to belong to org.apache.hadoop.yarn</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-834">YARN-834</a>.
Blocker sub-task reported by Arun C Murthy and fixed by Zhijie Shen <br>
<b>Review/fix annotations for yarn-client module and clearly differentiate *Async apis</b><br>
<blockquote>Review/fix annotations for yarn-client module</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-833">YARN-833</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Move Graph and VisualizeStateMachine into yarn.state package</b><br>
<blockquote>Graph and VisualizeStateMachine are only used by state machine, they should belong to state package.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-831">YARN-831</a>.
Blocker sub-task reported by Jian He and fixed by Jian He <br>
<b>Remove resource min from GetNewApplicationResponse</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-829">YARN-829</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Rename RMTokenSelector to be RMDelegationTokenSelector</b><br>
<blockquote>Therefore, the name of it will be consistent with that of RMDelegationTokenIdentifier.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-828">YARN-828</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Remove YarnVersionAnnotation</b><br>
<blockquote>YarnVersionAnnotation is not used at all, and the version information can be accessed through YarnVersionInfo instead.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-827">YARN-827</a>.
Critical sub-task reported by Bikas Saha and fixed by Jian He <br>
<b>Need to make Resource arithmetic methods accessible</b><br>
<blockquote>org.apache.hadoop.yarn.server.resourcemanager.resource has stuff like Resources and Calculators that help compare/add resources etc. Without these users will be forced to replicate the logic, potentially incorrectly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-826">YARN-826</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Move Clock/SystemClock to util package</b><br>
<blockquote>Clock/SystemClock should belong to util.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-825">YARN-825</a>.
Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Fix yarn-common javadoc annotations</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-824">YARN-824</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Add static factory to yarn client lib interface and change it to abstract class</b><br>
<blockquote>Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private.
The purpose is not to expose impl</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-823">YARN-823</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move RMAdmin from yarn.client to yarn.client.cli and rename as RMAdminCLI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-822">YARN-822</a>.
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Rename ApplicationToken to AMRMToken</b><br>
<blockquote>API change. At present this token is getting used on scheduler api AMRMProtocol. Right now name wise it is little confusing as it might be useful for the application to talk to complete yarn system (RM/NM) but that is not the case after YARN-694. NM will have specific NMToken so it is better to name it as AMRMToken.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-821">YARN-821</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Rename FinishApplicationMasterRequest.setFinishApplicationStatus to setFinalApplicationStatus to be consistent with getter</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-820">YARN-820</a>.
Major sub-task reported by Bikas Saha and fixed by Mayank Bansal <br>
<b>NodeManager has invalid state transition after error in resource localization</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-814">YARN-814</a>.
Major sub-task reported by Hitesh Shah and fixed by Jian He <br>
<b>Difficult to diagnose a failed container launch when error due to invalid environment variable</b><br>
<blockquote>The container's launch script sets up environment variables, symlinks etc.
If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure.
To reproduce, set an env var where the value contains characters that throw syntax errors in bash. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-812">YARN-812</a>.
Major bug reported by Ramya Sunil and fixed by Siddharth Seth <br>
<b>Enabling app summary logs causes 'FileNotFound' errors</b><br>
<blockquote>RM app summary logs have been enabled as per the default config:
{noformat}
#
# Yarn ResourceManager Application Summary Log
#
# Set the ResourceManager summary log filename
yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
# Set the ResourceManager summary log level and appender
yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
# Appender for ResourceManager Application Summary Log
# Requires the following properties to be set
# - hadoop.log.dir (Hadoop Log directory)
# - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
# - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
log4j.appender.RMSUMMARY.MaxFileSize=256MB
log4j.appender.RMSUMMARY.MaxBackupIndex=20
log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
{noformat}
This however, throws errors while running commands as non-superuser:
{noformat}
-bash-4.1$ hadoop dfs -ls /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:192)
at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:116)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.&lt;clinit&gt;(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
at org.apache.commons.logging.impl.Log4JLogger.&lt;init&gt;(Log4JLogger.java:109)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858)
at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
at org.apache.hadoop.fs.FsShell.&lt;clinit&gt;(FsShell.java:41)
Found 1 items
drwxr-xr-x - hadoop hadoop 0 2013-06-12 21:28 /user
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-806">YARN-806</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move ContainerExitStatus from yarn.api to yarn.api.records</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-805">YARN-805</a>.
Blocker sub-task reported by Jian He and fixed by Jian He <br>
<b>Fix yarn-api javadoc annotations</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-803">YARN-803</a>.
Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager , scheduler)<br>
<b>factor out scheduler config validation from the ResourceManager to each scheduler implementation</b><br>
<blockquote>Per discussion in YARN-789 we should factor out from the ResourceManager class the scheduler config validations.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-799">YARN-799</a>.
Major bug reported by Chris Riccomini and fixed by Chris Riccomini (nodemanager)<br>
<b>CgroupsLCEResourcesHandler tries to write to cgroup.procs</b><br>
<blockquote>The implementation of
bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
Tells the container-executor to write PIDs to cgroup.procs:
{code}
public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder("cgroups=");
if (isCpuWeightEnabled()) {
sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
sb.append(",");
}
if (sb.charAt(sb.length() - 1) == ',') {
sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
}
{code}
Apparently, this file has not always been writeable:
https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.
{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
{quote}
As a result, when the container-executor tries to run, it fails with this error message:
bq. fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",
This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
{quote}
$ pwd
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}
I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.
I can think of several potential resolutions to this ticket:
1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.
Thoughts?</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-795">YARN-795</a>.
Major bug reported by Wei Yan and fixed by Wei Yan (scheduler)<br>
<b>Fair scheduler queue metrics should subtract allocated vCores from available vCores</b><br>
<blockquote>The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect.
This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-792">YARN-792</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move NodeHealthStatus from yarn.api.record to yarn.server.api.record</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-791">YARN-791</a>.
Blocker sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , resourcemanager)<br>
<b>Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-789">YARN-789</a>.
Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (scheduler)<br>
<b>Enable zero capabilities resource requests in fair scheduler</b><br>
<blockquote>Per discussion in YARN-689, reposting updated use case:
1. I have a set of services co-existing with a Yarn cluster.
2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing.
3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa.
By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources.
These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping.
The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory).
The current limitation is that the increment is also the minimum.
If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc).
If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster.
Finally, on hard enforcement.
* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024.
* For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-787">YARN-787</a>.
Blocker sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)<br>
<b>Remove resource min from Yarn client API</b><br>
<blockquote>Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-782">YARN-782</a>.
Critical improvement reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way </b><br>
<blockquote>The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not.
If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions.
The lack of consistency will exacerbate the already difficult problem of resource configuration.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-781">YARN-781</a>.
Major sub-task reported by Devaraj Das and fixed by Jian He <br>
<b>Expose LOGDIR that containers should use for logging</b><br>
<blockquote>The LOGDIR is known. We should expose this to the container's environment.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-777">YARN-777</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Remove unreferenced objects from proto</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-773">YARN-773</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move YarnRuntimeException from package api.yarn to api.yarn.exceptions</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-767">YARN-767</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Initialize Application status metrics when QueueMetrics is initialized</b><br>
<blockquote>Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed
For now these metrics are created only when they are needed, we want to make them be seen when QueueMetrics is initialized</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-764">YARN-764</a>.
Major bug reported by nemon lou and fixed by nemon lou (resourcemanager)<br>
<b>blank Used Resources on Capacity Scheduler page </b><br>
<blockquote>Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.)
After changing resource.java's toString method by replacing "&lt;&gt;" with "{}",this bug gets fixed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-763">YARN-763</a>.
Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
<b>AMRMClientAsync should stop heartbeating after receiving shutdown from RM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-761">YARN-761</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>TestNMClientAsync fails sometimes</b><br>
<blockquote>See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/.
It passed on my machine though.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-760">YARN-760</a>.
Major bug reported by Sandy Ryza and fixed by Niranjan Singh (nodemanager)<br>
<b>NodeManager throws AvroRuntimeException on failed start</b><br>
<blockquote>NodeManager wraps exceptions that occur in its start method in AvroRuntimeExceptions, even though it doesn't use Avro anywhere else.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-759">YARN-759</a>.
Major sub-task reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Create Command enum in AllocateResponse</b><br>
<blockquote>Use command enums for shutdown/resync instead of booleans.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-757">YARN-757</a>.
Blocker bug reported by Bikas Saha and fixed by Bikas Saha <br>
<b>TestRMRestart failing/stuck on trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-756">YARN-756</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move PreemptionContainer/PremptionContract/PreemptionMessage/StrictPreemptionContract/PreemptionResourceRequest to api.records</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-755">YARN-755</a>.
Major sub-task reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Rename AllocateResponse.reboot to AllocateResponse.resync</b><br>
<blockquote>For work preserving rm restart the am's will be resyncing instead of rebooting. rebooting is an action that currently satisfies the resync requirement. Changing the name now so that it continues to make sense in the real resync case. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-753">YARN-753</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Add individual factory method for api protocol records</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-752">YARN-752</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (api , applications)<br>
<b>In AMRMClient, automatically add corresponding rack requests for requested nodes</b><br>
<blockquote>A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-750">YARN-750</a>.
Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>Allow for black-listing resources in YARN API and Impl in CS</b><br>
<blockquote>YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources.
This jira is a companion to allow for black-listing (in CS).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-749">YARN-749</a>.
Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>Rename ResourceRequest (get,set)HostName to (get,set)ResourceName</b><br>
<blockquote>We should rename ResourceRequest (get,set)HostName to (get,set)ResourceName since the name can be host, rack or *.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-748">YARN-748</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move BuilderUtils from yarn-common to yarn-server-common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-746">YARN-746</a>.
Major sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>rename Service.register() and Service.unregister() to registerServiceListener() &amp; unregisterServiceListener() respectively</b><br>
<blockquote>make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} &amp; {{unregisterServiceListener()}} respectively.
This only affects a couple of production classes; {{Service.register()}} and is used in some of the lifecycle tests of the YARN-530. There are no tests of {{Service.unregister()}}, which is something that could be corrected.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-742">YARN-742</a>.
Major bug reported by Kihwal Lee and fixed by Jason Lowe (nodemanager)<br>
<b>Log aggregation causes a lot of redundant setPermission calls</b><br>
<blockquote>In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/&lt;user&gt;/logs. Also mkdirs calls are made before this.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-739">YARN-739</a>.
Major sub-task reported by Siddharth Seth and fixed by Omkar Vinit Joshi <br>
<b>NM startContainer should validate the NodeId</b><br>
<blockquote>The NM validates certain fields from the ContainerToken on a startContainer call. It shoudl also validate the NodeId (which needs to be added to the ContianerToken).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-737">YARN-737</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142 </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-736">YARN-736</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Add a multi-resource fair sharing metric</b><br>
<blockquote>Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it. This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions.
With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense. It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-735">YARN-735</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Make ApplicationAttemptID, ContainerID, NodeID immutable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-733">YARN-733</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>TestNMClient fails occasionally</b><br>
<blockquote>The problem happens at:
{code}
// getContainerStatus can be called after stopContainer
try {
ContainerStatus status = nmClient.getContainerStatus(
container.getId(), container.getNodeId(),
container.getContainerToken());
assertEquals(container.getId(), status.getContainerId());
assertEquals(ContainerState.RUNNING, status.getState());
assertTrue("" + i, status.getDiagnostics().contains(
"Container killed by the ApplicationMaster."));
assertEquals(-1000, status.getExitStatus());
} catch (YarnRemoteException e) {
fail("Exception is not expected");
}
{code}
NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one.
There will be the similar problem wrt NMClientImpl#startContainer.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-731">YARN-731</a>.
Major sub-task reported by Siddharth Seth and fixed by Zhijie Shen <br>
<b>RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions</b><br>
<blockquote>Will be required for YARN-662. Also, remote NPEs show up incorrectly for some unit tests.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-727">YARN-727</a>.
Blocker sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
<b>ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter</b><br>
<blockquote>Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-726">YARN-726</a>.
Critical bug reported by Siddharth Seth and fixed by Mayank Bansal <br>
<b>Queue, FinishTime fields broken on RM UI</b><br>
<blockquote>The queue shows up as "Invalid Date"
Finish Time shows up as a Long value.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-724">YARN-724</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Move ProtoBase from api.records to api.records.impl.pb</b><br>
<blockquote>Simply move ProtoBase to records.impl.pb</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-720">YARN-720</a>.
Major sub-task reported by Siddharth Seth and fixed by Zhijie Shen <br>
<b>container-log4j.properties should not refer to mapreduce properties</b><br>
<blockquote>This refers to yarn.app.mapreduce.container.log.dir and yarn.app.mapreduce.container.log.filesize. This should either be moved into the MR codebase. Alternately the parameters should be renamed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-719">YARN-719</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Move RMIdentifier from Container to ContainerTokenIdentifier</b><br>
<blockquote>This needs to be done for YARN-684 to happen.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-717">YARN-717</a>.
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Copy BuilderUtil methods into token-related records</b><br>
<blockquote>This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA.
We may remove the ClientToken/ContainerToken/DelegationToken interface and just use the common Token interface </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-716">YARN-716</a>.
Major task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Make ApplicationID immutable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-715">YARN-715</a>.
Major bug reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
<b>TestDistributedShell and TestUnmanagedAMLauncher are failing</b><br>
<blockquote>Tests are timing out. Looks like this is related to YARN-617.
{code}
2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container.
Expected containerId: user Found: container_1369183214008_0001_01_000001
2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
Expected containerId: user Found: container_1369183214008_0001_01_000001
2013-05-21 17:40:23,695 INFO [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
Expected containerId: user Found: container_1369183214008_0001_01_000001
org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container.
Expected containerId: user Found: container_1369183214008_0001_01_000001
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-714">YARN-714</a>.
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>AMRM protocol changes for sending NMToken list</b><br>
<blockquote>NMToken will be sent to AM on allocate call if
1) AM doesn't already have NMToken for the underlying NM
2) Key rolled over on RM and AM gets new container on the same NM.
On allocate call RM will send a consolidated list of all required NMTokens.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-711">YARN-711</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>Copy BuilderUtil methods into individual records</b><br>
<blockquote>BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records.
As a first step, we should just copy all the factory methods into individual classes, deprecate BuilderUtils and then slowly move all code off BuilderUtils.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-708">YARN-708</a>.
Major task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Move RecordFactory classes to hadoop-yarn-api, miscellaneous fixes to the interfaces</b><br>
<blockquote>This is required for additional changes in YARN-528.
Some of the interfaces could use some cleanup as well - they shouldn't be declaring YarnException (Runtime) in their signature.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-706">YARN-706</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Race Condition in TestFSDownload</b><br>
<blockquote>See the test failure in YARN-695
https://builds.apache.org/job/PreCommit-YARN-Build/957//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPatternJar/</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-701">YARN-701</a>.
Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ApplicationTokens should be used irrespective of kerberos</b><br>
<blockquote> - Single code path for secure and non-secure cases is useful for testing, coverage.
- Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-700">YARN-700</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestInfoBlock fails on Windows because of line ending missmatch</b><br>
<blockquote>Exception:
{noformat}
Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec &lt;&lt;&lt; FAILURE!
testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock) Time elapsed: 873 sec &lt;&lt;&lt; FAILURE!
java.lang.AssertionError:
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
{noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-695">YARN-695</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>masterContainer and status are in ApplicationReportProto but not in ApplicationReport</b><br>
<blockquote>If masterContainer and status are no longer part of ApplicationReport, they should be removed from proto as well.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-694">YARN-694</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Start using NMTokens to authenticate all communication with NM</b><br>
<blockquote>AM uses the NMToken to authenticate all the AM-NM communication.
NM will validate NMToken in below manner
* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId.
* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this.
* If NMToken is invalid then NM will reject AM calls.
Modification for ContainerToken
* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM).
* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container).
* ContainerToken will exist and it will only be used to validate the AM's container start request.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-693">YARN-693</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Sending NMToken to AM on allocate call</b><br>
<blockquote>This is part of YARN-613.
As per the updated design, AM will receive per NM, NMToken in following scenarios
* AM is receiving first container on underlying NM.
* AM is receiving container on underlying NM after either NM or RM rebooted.
** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment).
** After NM reboot, RM will delete the token information corresponding to that AM for all AMs.
* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side.
In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one.
AMRMClient should expose these NMToken to client. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-692">YARN-692</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.</b><br>
<blockquote>This is related to YARN-613 . Here we will be implementing NMToken generation on RM side and sharing it with NM during RM-NM heartbeat. As a part of this JIRA mater key will only be made available to NM but there will be no validation done until AM-NM communication is fixed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-690">YARN-690</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM exits on token cancel/renew problems</b><br>
<blockquote>The DelegationTokenRenewer thread is critical to the RM. When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread. It should be exiting only on non-RuntimeExceptions.
The problem is especially bad in 23 because the yarn protobuf layer converts IOExceptions into UndeclaredThrowableExceptions (RuntimeException) which causes the renewer to abort the process. An UnknownHostException takes down the RM...</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-688">YARN-688</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater</b><br>
<blockquote>Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers event happens to be on the same dispatcher thread, CleanupContainers Event will not be processed until SHUTDOWN event is processed. see similar problem on YARN-495.
On normal NM shutdown, this is not a problem since normal stop happens on shutdownHook thread.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-686">YARN-686</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
<b>Flatten NodeReport</b><br>
<blockquote>The NodeReport returned by getClusterNodes or given to AMs in heartbeat responses includes both a NodeState (enum) and a NodeHealthStatus (object). As UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem necessary. I propose eliminating NodeHealthStatus#getIsNodeHealthy and moving its two other methods, getHealthReport and getLastHealthReportTime, into NodeReport.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-684">YARN-684</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ContainerManager.startContainer needs to only have ContainerTokenIdentifier instead of the whole Container</b><br>
<blockquote>The NM only needs the token, the whole Container is unnecessary.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-663">YARN-663</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Change ResourceTracker API and LocalizationProtocol API to throw YarnRemoteException and IOException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-661">YARN-661</a>.
Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>NM fails to cleanup local directories for users</b><br>
<blockquote>YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-660">YARN-660</a>.
Major sub-task reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Improve AMRMClient with matching requests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-656">YARN-656</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>In scheduler UI, including reserved memory in "Memory Total" can make it exceed cluster capacity.</b><br>
<blockquote>"Memory Total" is currently a sum of availableMB, allocatedMB, and reservedMB. Including reservedMB in this sum can make the total exceed the capacity of the cluster. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-655">YARN-655</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler metrics should subtract allocated memory from available memory</b><br>
<blockquote>In the scheduler web UI, cluster metrics reports that the "Memory Total" goes up when an application is allocated resources.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-654">YARN-654</a>.
Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
<b>AMRMClient: Perform sanity checks for parameters of public methods</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-651">YARN-651</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Change ContainerManagerPBClientImpl and RMAdminProtocolPBClientImpl to throw IOException and YarnRemoteException</b><br>
<blockquote>YARN-632 AND YARN-633 changes RMAdmin and ContainerManager api to throw YarnRemoteException and IOException. RMAdminProtocolPBClientImpl and ContainerManagerPBClientImpl should do the same changes</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-648">YARN-648</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>FS: Add documentation for pluggable policy</b><br>
<blockquote>YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add documentation on how to use this.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-646">YARN-646</a>.
Major bug reported by Dapeng Sun and fixed by Dapeng Sun (documentation)<br>
<b>Some issues in Fair Scheduler's document</b><br>
<blockquote>Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
1.In the section &#8220;Configuration&#8221;, It contains two properties named &#8220;yarn.scheduler.fair.minimum-allocation-mb&#8221;, the second one should be &#8220;yarn.scheduler.fair.maximum-allocation-mb&#8221;
2.In the section &#8220;Allocation file format&#8221;, the document tells &#8220; The format contains three types of elements&#8221;, but it lists four types of elements following that.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-645">YARN-645</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Move RMDelegationTokenSecretManager from yarn-server-common to yarn-server-resourcemanager</b><br>
<blockquote>RMDelegationTokenSecretManager is specific to resource manager, should not belong to server-common</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-642">YARN-642</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (api , resourcemanager)<br>
<b>Fix up /nodes REST API to have 1 param and be consistent with the Java API</b><br>
<blockquote>The code behind the /nodes RM REST API is unnecessarily muddled, logs the same misspelled INFO message repeatedly, and does not return unhealthy nodes, even when asked.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-639">YARN-639</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen (applications/distributed-shell)<br>
<b>Make AM of Distributed Shell Use NMClient</b><br>
<blockquote>YARN-422 adds NMClient. AM of Distributed Shell should use it instead of using ContainerManager directly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-638">YARN-638</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Restore RMDelegationTokens after RM Restart</b><br>
<blockquote>This is missed in YARN-581. After RM restart, RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-637">YARN-637</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>FS: maxAssign is not honored</b><br>
<blockquote>maxAssign limits the number of containers that can be assigned in a single heartbeat. Currently, FS doesn't keep track of number of assigned containers to check this.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-635">YARN-635</a>.
Major sub-task reported by Xuan Gong and fixed by Siddharth Seth <br>
<b>Rename YarnRemoteException to YarnException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-634">YARN-634</a>.
Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Make YarnRemoteException not backed by PB and introduce a SerializedException</b><br>
<blockquote>LocalizationProtocol sends an exception over the wire. This currently uses YarnRemoteException. Post YARN-627, this needs to be changed and a new serialized exception is required.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-633">YARN-633</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Change RMAdminProtocol api to throw IOException and YarnRemoteException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-632">YARN-632</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Change ContainerManager api to throw IOException and YarnRemoteException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-631">YARN-631</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Change ClientRMProtocol api to throw IOException and YarnRemoteException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-630">YARN-630</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Change AMRMProtocol api to throw IOException and YarnRemoteException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-629">YARN-629</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Make YarnRemoteException not be rooted at IOException</b><br>
<blockquote>After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-628">YARN-628</a>.
Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Fix YarnException unwrapping</b><br>
<blockquote>Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, RPCUtil post YARN-625) is broken, and often ends up throwin UndeclaredThrowableException. This needs to be fixed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-625">YARN-625</a>.
Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-618">YARN-618</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Modify RM_INVALID_IDENTIFIER to a -ve number</b><br>
<blockquote>RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-617">YARN-617</a>.
Minor sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>In unsercure mode, AM can fake resource requirements </b><br>
<blockquote>Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel.
In the minimum, this will avoid accidental bugs in AMs in unsecure mode.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-615">YARN-615</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ContainerLaunchContext.containerTokens should simply be called tokens</b><br>
<blockquote>ContainerToken is the name of the specific token that AMs use to launch containers on NMs, so we should rename CLC.containerTokens to be simply tokens.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-613">YARN-613</a>.
Major sub-task reported by Bikas Saha and fixed by Omkar Vinit Joshi <br>
<b>Create NM proxy per NM instead of per container</b><br>
<blockquote>Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-610">YARN-610</a>.
Blocker sub-task reported by Siddharth Seth and fixed by Omkar Vinit Joshi <br>
<b>ClientToken (ClientToAMToken) should not be set in the environment</b><br>
<blockquote>Similar to YARN-579, this can be set via ContainerTokens</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-605">YARN-605</a>.
Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Failing unit test in TestNMWebServices when using git for source control </b><br>
<blockquote>Failed tests: testNode(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
testNodeSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
testNodeDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
testNodeInfo(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
testNodeInfoSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
testNodeInfoDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
testSingleNodesXML(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-600">YARN-600</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Hook up cgroups CPU settings to the number of virtual cores allocated</b><br>
<blockquote>YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-599">YARN-599</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Refactoring submitApplication in ClientRMService and RMAppManager</b><br>
<blockquote>Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event.
In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission.
Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-598">YARN-598</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Add virtual cores to queue metrics</b><br>
<blockquote>QueueMetrics includes allocatedMB, availableMB, pendingMB, reservedMB. It should have equivalents for CPU.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-597">YARN-597</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestFSDownload fails on Windows because of dependencies on tar/gzip/jar tools</b><br>
<blockquote>{{testDownloadArchive}}, {{testDownloadPatternJar}} and {{testDownloadArchiveZip}} fail with the similar Shell ExitCodeException:
{code}
testDownloadArchiveZip(org.apache.hadoop.yarn.util.TestFSDownload) Time elapsed: 480 sec &lt;&lt;&lt; ERROR!
org.apache.hadoop.util.Shell$ExitCodeException: bash: line 0: cd: /D:/svn/t/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/TestFSDownload: No such file or directory
gzip: 1: No such file or directory
at org.apache.hadoop.util.Shell.runCommand(Shell.java:377)
at org.apache.hadoop.util.Shell.run(Shell.java:292)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:497)
at org.apache.hadoop.yarn.util.TestFSDownload.createZipFile(TestFSDownload.java:225)
at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadArchiveZip(TestFSDownload.java:503)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-595">YARN-595</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Refactor fair scheduler to use common Resources</b><br>
<blockquote>resourcemanager.fair and resourcemanager.resources have two copies of basically the same code for operations on Resource objects</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-594">YARN-594</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Update test and add comments in YARN-534</b><br>
<blockquote>This jira is simply to add some comments in the patch YARN-534 and update the test case</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-593">YARN-593</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>container launch on Windows does not correctly populate classpath with new process's environment variables and localized resources</b><br>
<blockquote>On Windows, we must bundle the classpath of a launched container in an intermediate jar with a manifest. Currently, this logic incorrectly uses the nodemanager process's environment variables for substitution. Instead, it needs to use the new environment for the launched process. Also, the bundled classpath is missing some localized resources for directories, due to a quirk in the way {{File#toURI}} decides whether or not to append a trailing '/'.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-591">YARN-591</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>RM recovery related records do not belong to the API</b><br>
<blockquote>We need to move out AppliationStateData and ApplicationAttemptStateData into resourcemanager module. They are not part of the public API..</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-590">YARN-590</a>.
Major improvement reported by Vinod Kumar Vavilapalli and fixed by Mayank Bansal <br>
<b>Add an optional mesage to RegisterNodeManagerResponse as to why NM is being asked to resync or shutdown</b><br>
<blockquote>We should log such message in NM itself. Helps in debugging issues on NM directly instead of distributed debugging between RM and NM when such an action is received from RM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-586">YARN-586</a>.
Trivial bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Typo in ApplicationSubmissionContext#setApplicationId</b><br>
<blockquote>The parameter should be applicationId instead of appplicationId</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-585">YARN-585</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>TestFairScheduler#testNotAllowSubmitApplication is broken due to YARN-514</b><br>
<blockquote>TestFairScheduler#testNotAllowSubmitApplication is broken due to YARN-514. See the discussions in YARN-514.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-583">YARN-583</a>.
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Application cache files should be localized under local-dir/usercache/userid/appcache/appid/filecache</b><br>
<blockquote>Currently application cache files are getting localized under local-dir/usercache/userid/appcache/appid/. however they should be localized under filecache sub directory.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-582">YARN-582</a>.
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Restore appToken and clientToken for app attempt after RM restart</b><br>
<blockquote>These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-581">YARN-581</a>.
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Test and verify that app delegation tokens are added to tokenRenewer after RM restart</b><br>
<blockquote>The code already saves the delegation tokens in AppSubmissionContext. Upon restart the AppSubmissionContext is used to submit the application again and so restores the delegation tokens. This jira tracks testing and verifying this functionality in a secure setup.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-579">YARN-579</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Make ApplicationToken part of Container's token list to help RM-restart</b><br>
<blockquote>Container is already persisted for helping RM restart. Instead of explicitly setting ApplicationToken in AM's env, if we change it to be in Container, we can avoid env and can also help restart.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-578">YARN-578</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>NodeManager should use SecureIOUtils for serving and aggregating logs</b><br>
<blockquote>Log servlets for serving logs and the ShuffleService for serving intermediate outputs both should use SecureIOUtils for avoiding symlink attacks.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-577">YARN-577</a>.
Major sub-task reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>ApplicationReport does not provide progress value of application</b><br>
<blockquote>An application sends its progress % to the RM via AllocateRequest. This should be able to be retrieved by a client via the ApplicationReport.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-576">YARN-576</a>.
Major bug reported by Hitesh Shah and fixed by Kenji Kikushima <br>
<b>RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations</b><br>
<blockquote>If the minimum resource allocation configured for the RM scheduler is 1 GB, the RM should drop all NMs that register with a total capacity of less than 1 GB. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-571">YARN-571</a>.
Major sub-task reported by Hitesh Shah and fixed by Omkar Vinit Joshi <br>
<b>User should not be part of ContainerLaunchContext</b><br>
<blockquote>Today, a user is expected to set the user name in the CLC when either submitting an application or launching a container from the AM. This does not make sense as the user can/has been identified by the RM as part of the RPC layer.
Solution would be to move the user information into either the Container object or directly into the ContainerToken which can then be used by the NM to launch the container. This user information would set into the container by the RM.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-569">YARN-569</a>.
Major sub-task reported by Carlo Curino and fixed by Carlo Curino (capacityscheduler)<br>
<b>CapacityScheduler: support for preemption (using a capacity monitor)</b><br>
<blockquote>There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to
applications resource requests, and node updates, and the more introspective, time-based considerations
needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate
mechanisms of the CapacityScheduler directly to add support for preemption by means of a "Capacity Monitor",
which can be run optionally as a separate service (much like the NMLivelinessMonitor).
The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals
(e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler,
performs off-line computation to determine if preemption is needed, and how best to "edit" the current schedule to
improve capacity, and generates events that produce four possible actions:
# Container de-reservations
# Resource-based preemptions
# Container-based preemptions
# Container killing
The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals.
Note that due to the "lag" in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers
from a queue) and not trying to tightly and consistently micromanage container allocations.
------------- Preemption policy (ProportionalCapacityPreemptionPolicy): -------------
Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows:
# it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*)
# if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**)
# it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and
respecting bounds on the amount of preemption we allow for each round)
# it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order)
# it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits
# (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left,
# (if not enough) it moves onto unreserve and preempt from the next application.
# containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed.
Notes:
(*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any.
(**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point.
Tunables of the ProportionalCapacityPreemptionPolicy:
# observe-only mode (i.e., log the actions it would take, but behave as read-only)
# how frequently to run the policy
# how long to wait between preemption and kill of a container
# which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned)
# deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it)
# overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity)
In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning controllers are likely to apply.
Generality:
The monitor-based scheduler edit, and the preemption mechanisms we introduced here are designed to be more general than enforcing capacity/fairness, in fact, we are considering other monitors that leverage the same idea of "schedule edits" to target different global properties (e.g., allocate enough resources to guarantee deadlines for important jobs, or data-locality optimizations, IO-balancing among nodes, etc...).
Note that by default the preemption policy we describe is disabled in the patch.
Depends on YARN-45 and YARN-567, is related to YARN-568
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-568">YARN-568</a>.
Major improvement reported by Carlo Curino and fixed by Carlo Curino (scheduler)<br>
<b>FairScheduler: support for work-preserving preemption </b><br>
<blockquote>In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-567">YARN-567</a>.
Major sub-task reported by Carlo Curino and fixed by Carlo Curino (resourcemanager)<br>
<b>RM changes to support preemption for FairScheduler and CapacityScheduler</b><br>
<blockquote>A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this.
The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity.
By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569).
The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-563">YARN-563</a>.
Major sub-task reported by Thomas Weise and fixed by Mayank Bansal <br>
<b>Add application type to ApplicationReport </b><br>
<blockquote>This field is needed to distinguish different types of applications (app master implementations). For example, we may run applications of type XYZ in a cluster alongside MR and would like to filter applications by type.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-562">YARN-562</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>NM should reject containers allocated by previous RM</b><br>
<blockquote>Its possible that after RM shutdown, before AM goes down,AM still call startContainer on NM with containers allocated by previous RM. When RM comes back, NM doesn't know whether this container launch request comes from previous RM or the current RM. we should reject containers allocated by previous RM </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-561">YARN-561</a>.
Major sub-task reported by Hitesh Shah and fixed by Xuan Gong <br>
<b>Nodemanager should set some key information into the environment of every container that it launches.</b><br>
<blockquote>Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched.
For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster.
At the minimum, container id would be a useful piece of information. If the container wishes to talk to its local NM, the nodemanager related information would also come in handy. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-557">YARN-557</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications)<br>
<b>TestUnmanagedAMLauncher fails on Windows</b><br>
<blockquote>{{TestUnmanagedAMLauncher}} fails on Windows due to attempting to run a Unix-specific command in distributed shell and use of a Unix-specific environment variable to determine username for the {{ContainerLaunchContext}}.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-553">YARN-553</a>.
Minor sub-task reported by Harsh J and fixed by Karthik Kambatla (client)<br>
<b>Have YarnClient generate a directly usable ApplicationSubmissionContext</b><br>
<blockquote>Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse.
{code}
GetNewApplicationResponse newApp = yarnClient.getNewApplication();
ApplicationId appId = newApp.getApplicationId();
ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class);
appContext.setApplicationId(appId);
{code}
A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like:
{code}
GetNewApplicationResponse newApp = yarnClient.getNewApplication();
ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext();
{code}
[The above method can also take an arg for the container launch spec, or perhaps pre-load defaults like min-resource, etc. in the returned object, aside of just associating the application ID automatically.]</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-549">YARN-549</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>YarnClient.submitApplication should wait for application to be accepted by the RM</b><br>
<blockquote>Currently, when submitting an application, storeApplication will be called for recovery. However, it is a blocking API, and is likely to block concurrent application submissions. Therefore, it is good to make application submission asynchronous, and postpone storeApplication. YarnClient needs to change to wait for the whole operation to complete so that clients can be notified after the application is really submitted. YarnClient needs to wait for application to reach SUBMITTED state or beyond.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-548">YARN-548</a>.
Major sub-task reported by Vadim Bondarev and fixed by Vadim Bondarev <br>
<b>Add tests for YarnUncaughtExceptionHandler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-547">YARN-547</a>.
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Race condition in Public / Private Localizer may result into resource getting downloaded again</b><br>
<blockquote>Public Localizer :
At present when multiple containers try to request a localized resource
* If the resource is not present then first it is created and Resource Localization starts ( LocalizedResource is in DOWNLOADING state)
* Now if in this state multiple ResourceRequestEvents arrive then ResourceLocalizationEvents are sent for all of them.
Most of the times it is not resulting into a duplicate resource download but there is a race condition present there. Inside ResourceLocalization (for public download) all the requests are added to local attempts map. If a new request comes in then first it is checked in this map before a new download starts for the same. For the current download the request will be there in the map. Now if a same resource request comes in then it will rejected (i.e. resource is getting downloaded already). However if the current download completes then the request will be removed from this local map. Now after this removal if the LocalizerRequestEvent comes in then as it is not present in local map the resource will be downloaded again.
PrivateLocalizer :
Here a different but similar race condition is present.
* Here inside findNextResource method call; each LocalizerRunner tries to grab a lock on LocalizerResource. If the lock is not acquired then it will keep trying until the resource state changes to LOCALIZED. This lock will be released by the LocalizerRunner when download completes.
* Now if another ContainerLocalizer tries to grab the lock on a resource before LocalizedResource state changes to LOCALIZED then resource will be downloaded again.
At both the places the root cause of this is that all the threads try to acquire the lock on resource however current state of the LocalizedResource is not taken into consideration.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-542">YARN-542</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>Change the default global AM max-attempts value to be not one</b><br>
<blockquote>Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.
I propose we change it to atleast two. Can change it to 4 to match other retry-configs.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-541">YARN-541</a>.
Blocker bug reported by Krishna Kishore Bonagiri and fixed by Bikas Saha (resourcemanager)<br>
<b>getAllocatedContainers() is not returning all the allocated containers</b><br>
<blockquote>I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another.
My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
Thanks,
Kishore
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-539">YARN-539</a>.
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>LocalizedResources are leaked in memory in case resource localization fails</b><br>
<blockquote>If resource localization fails then resource remains in memory and is
1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory).
2) reused if LocalizationRequest comes again for the same resource.
I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-538">YARN-538</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>RM address DNS lookup can cause unnecessary slowness on every JHS page load </b><br>
<blockquote>When I run the job history server locally, every page load takes in the 10s of seconds. I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname. When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically.
There's no that we need to perform this resolution on every page load.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-536">YARN-536</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object</b><br>
<blockquote>Remove containerstate, containerStatus from container interface. They will not be called by container object</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-534">YARN-534</a>.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>AM max attempts is not checked when RM restart and try to recover attempts</b><br>
<blockquote>Currently,AM max attempts is only checked if the current attempt fails and check to see whether to create new attempt. If the RM restarts before the max-attempt fails, it'll not clean the state store, when RM comes back, it will retry attempt again.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-532">YARN-532</a>.
Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>RMAdminProtocolPBClientImpl should implement Closeable</b><br>
<blockquote>Required for RPC.stopProxy to work. Already done in most of the other protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-530">YARN-530</a>.
Major sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services</b><br>
<blockquote># Extend the YARN {{Service}} interface as discussed in YARN-117
# Implement the changes in {{AbstractService}} and {{FilterService}}.
# Migrate all services in yarn-common to the more robust service model, test.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-525">YARN-525</a>.
Major improvement reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)<br>
<b>make CS node-locality-delay refreshable</b><br>
<blockquote>the config yarn.scheduler.capacity.node-locality-delay doesn't change when you change the value in capacity_scheduler.xml and then run yarn rmadmin -refreshQueues.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-523">YARN-523</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>Container localization failures aren't reported from NM to RM</b><br>
<blockquote>This is mainly a pain on crashing AMs, but once we fix this, containers also can benefit - same fix for both.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-521">YARN-521</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
<b>Augment AM - RM client module to be able to request containers only at specific locations</b><br>
<blockquote>When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-518">YARN-518</a>.
Major improvement reported by Dapeng Sun and fixed by Sandy Ryza (documentation)<br>
<b>Fair Scheduler's document link could be added to the hadoop 2.x main doc page</b><br>
<blockquote>Currently the doc page for Fair Scheduler looks good and it&#8217;s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.
It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-515">YARN-515</a>.
Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans <br>
<b>Node Manager not getting the master key</b><br>
<blockquote>On branch-2 the latest version I see the following on a secure cluster.
{noformat}
2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now
2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of &lt;me
mory:12288, vCores:16&gt;
2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started.
2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
{noformat}
The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-514">YARN-514</a>.
Major sub-task reported by Bikas Saha and fixed by Zhijie Shen (resourcemanager)<br>
<b>Delayed store operations should not result in RM unavailability for app submission</b><br>
<blockquote>Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-513">YARN-513</a>.
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Create common proxy client for communicating with RM</b><br>
<blockquote>When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-512">YARN-512</a>.
Minor bug reported by Jason Lowe and fixed by Maysam Yabandeh (nodemanager)<br>
<b>Log aggregation root directory check is more expensive than it needs to be</b><br>
<blockquote>The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call. That effectively stats the file twice. It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case.
In addition we may consider caching the presence of the directory rather than checking it each time a node aggregates logs for an application.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-507">YARN-507</a>.
Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>Add interface visibility and stability annotations to FS interfaces/classes</b><br>
<blockquote>Many of FS classes/interfaces are missing annotations on visibility and stability.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-506">YARN-506</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute</b><br>
<blockquote>Move to common utils described in HADOOP-9413 that work well cross-platform.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-500">YARN-500</a>.
Major bug reported by Nishan Shetty and fixed by Kenji Kikushima (resourcemanager)<br>
<b>ResourceManager webapp is using next port if configured port is already in use</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-496">YARN-496</a>.
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler configs are refreshed inconsistently in reinitialize</b><br>
<blockquote>When FairScheduler#reinitialize is called, some of the scheduler-wide configs are refreshed and others aren't. They should all be refreshed.
Ones that are refreshed: userAsDefaultQueue, nodeLocalityThreshold, rackLocalityThreshold, preemptionEnabled
Ones that aren't: minimumAllocation, maximumAllocation, assignMultiple, maxAssign</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-495">YARN-495</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Change NM behavior of reboot to resync</b><br>
<blockquote>When a reboot command is sent from RM, the node manager doesn't clean up the containers while its stopping.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-493">YARN-493</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>NodeManager job control logic flaws on Windows</b><br>
<blockquote>Both product and test code contain some platform-specific assumptions, such as availability of bash for executing a command in a container and signals to check existence of a process and terminate it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-491">YARN-491</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>TestContainerLogsPage fails on Windows</b><br>
<blockquote>{{TestContainerLogsPage}} contains some code for initializing a log directory that doesn't work correctly on Windows.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-490">YARN-490</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications/distributed-shell)<br>
<b>TestDistributedShell fails on Windows</b><br>
<blockquote>There are a few platform-specific assumption in distributed shell (both main code and test code) that prevent it from working correctly on Windows.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-488">YARN-488</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>TestContainerManagerSecurity fails on Windows</b><br>
<blockquote>These tests are failing to launch containers correctly when running on Windows.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-487">YARN-487</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>TestDiskFailures fails on Windows due to path mishandling</b><br>
<blockquote>{{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an extra leading '/' on the path within {{LocalDirsHandlerService}} when running on Windows. The test assertions also fail to account for the fact that {{Path}} normalizes '\' to '/'.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-486">YARN-486</a>.
Major sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
<b>Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land</b><br>
<blockquote>Currently, id, resource request etc need to be copied over from Container to ContainerLaunchContext. This can be brittle. Also it leads to duplication of information (such as Resource from CLC and Resource from Container and Container.tokens). Sending Container directly to startContainer solves these problems. It also makes CLC clean by only having stuff in it that it set by the client/AM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-485">YARN-485</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>TestProcfsProcessTree#testProcessTree() doesn't wait long enough for the process to die</b><br>
<blockquote>TestProcfsProcessTree#testProcessTree fails occasionally with the following stack trace
{noformat}
Stack Trace:
junit.framework.AssertionFailedError: expected:&lt;false&gt; but was:&lt;true&gt;
&#160; &#160; &#160; &#160; at org.apache.hadoop.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java)
{noformat}
kill -9 is executed asynchronously, the signal is delivered when the process comes out of the kernel (sys call). Checking if the process died immediately after can fail at times.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-482">YARN-482</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>FS: Extend SchedulingMode to intermediate queues</b><br>
<blockquote>FS allows setting {{SchedulingMode}} for leaf queues. Extending this to non-leaf queues allows using different kinds of fairness: e.g., root can have three child queues - fair-mem, drf-cpu-mem, drf-cpu-disk-mem taking different number of resources into account. In turn, this allows users to decide on the scheduling latency vs sophistication of the scheduling mode.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-481">YARN-481</a>.
Major bug reported by Chris Riccomini and fixed by Chris Riccomini (client)<br>
<b>Add AM Host and RPC Port to ApplicationCLI Status Output</b><br>
<blockquote>Hey Guys,
I noticed that the ApplicationCLI is just randomly not printing some of the values in the ApplicationReport. I've added the getHost and getRpcPort. These are useful for me, since I want to make an RPC call to the AM (not the tracker call).
Thanks!
Chris</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-479">YARN-479</a>.
Major bug reported by Hitesh Shah and fixed by Jian He <br>
<b>NM retry behavior for connection to RM should be similar for lost heartbeats</b><br>
<blockquote>Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-476">YARN-476</a>.
Minor bug reported by Jason Lowe and fixed by Sandy Ryza <br>
<b>ProcfsBasedProcessTree info message confuses users</b><br>
<blockquote>ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following:
{noformat}
2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim.
{noformat}
As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs. It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs.
We should either make this DEBUG or remove it entirely.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-475">YARN-475</a>.
Major sub-task reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment</b><br>
<blockquote>AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-474">YARN-474</a>.
Major bug reported by Hitesh Shah and fixed by Zhijie Shen (capacityscheduler)<br>
<b>CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed</b><br>
<blockquote>Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues.
The 2 applications not yet in running state do not get launched even though limits are increased.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-469">YARN-469</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>Make scheduling mode in FS pluggable</b><br>
<blockquote>Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action.
Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-468">YARN-468</a>.
Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter </b><br>
<blockquote>coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter
patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-467">YARN-467</a>.
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>Jobs fail during resource localization when public distributed-cache hits unix directory limits</b><br>
<blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception.
java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-460">YARN-460</a>.
Blocker bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)<br>
<b>CS user left in list of active users for the queue even when application finished</b><br>
<blockquote>We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-458">YARN-458</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager , resourcemanager)<br>
<b>YARN daemon addresses must be placed in many different configs</b><br>
<blockquote>The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address
A new user trying to configure a cluster needs to know the names of all these four configs.
The same issue exists for nodemanagers.
It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-450">YARN-450</a>.
Major sub-task reported by Bikas Saha and fixed by Zhijie Shen <br>
<b>Define value for * in the scheduling protocol</b><br>
<blockquote>The ResourceRequest has a string field to specify node/rack locations. For the cross-rack/cluster-wide location (ie when there is no locality constraint) the "*" string is used everywhere. However, its not defined anywhere and each piece of code either defines a local constant or uses the string literal. Defining "*" in the protocol and removing other local references from the code base will be good.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-448">YARN-448</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (nodemanager)<br>
<b>Remove unnecessary hflush from log aggregation</b><br>
<blockquote>AggregatedLogFormat#writeVersion() calls hflush() after writing the version. Calling hflush does not seem to be necessary. It can add a lot of load to hdfs in a big busy cluster.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-447">YARN-447</a>.
Minor improvement reported by nemon lou and fixed by nemon lou (scheduler)<br>
<b>applicationComparator improvement for CS</b><br>
<blockquote>Now the compare code is :
return a1.getApplicationId().getId() - a2.getApplicationId().getId();
Will be replaced with :
return a1.getApplicationId().compareTo(a2.getApplicationId());
This will bring some benefits:
1,leave applicationId compare logic to ApplicationId class;
2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-444">YARN-444</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , applications/distributed-shell)<br>
<b>Move special container exit codes from YarnConfiguration to API</b><br>
<blockquote>YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101.
These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants.
Per discussion on YARN-417, appmaster writers need to be able to provide special handling for them, so it might make sense to move these to their own user-facing class.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-441">YARN-441</a>.
Major sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
<b>Clean up unused collection methods in various APIs</b><br>
<blockquote>There's a bunch of unused methods like getAskCount() and getAsk(index) in AllocateRequest, and other interfaces. These should be removed.
In YARN, found them in. MR will have it's own set.
AllocateRequest
StartContaienrResponse</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-440">YARN-440</a>.
Major sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
<b>Flatten RegisterNodeManagerResponse</b><br>
<blockquote>RegisterNodeManagerResponse has another wrapper RegistrationResponse under it, which can be removed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-439">YARN-439</a>.
Major sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
<b>Flatten NodeHeartbeatResponse</b><br>
<blockquote>NodeheartbeatResponse has another wrapper HeartbeatResponse under it, which can be removed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-426">YARN-426</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Failure to download a public resource on a node prevents further downloads of the resource from that node</b><br>
<blockquote>If the NM encounters an error while downloading a public resource, it fails to empty the list of request events corresponding to the resource request in {{attempts}}. If the same public resource is subsequently requested on that node, {{PublicLocalizer.addResource}} will skip the download since it will mistakenly believe a download of that resource is already in progress. At that point any container that requests the public resource will just hang in the {{LOCALIZING}} state.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-422">YARN-422</a>.
Major sub-task reported by Bikas Saha and fixed by Zhijie Shen <br>
<b>Add NM client library</b><br>
<blockquote>Create a simple wrapper over the ContainerManager protocol to provide hide the details of the protocol implementation.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-417">YARN-417</a>.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , applications)<br>
<b>Create AMRMClient wrapper that provides asynchronous callbacks</b><br>
<blockquote>Writing AMs would be easier for some if they did not have to handle heartbeating to the RM on their own.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-412">YARN-412</a>.
Minor bug reported by Roger Hoover and fixed by Roger Hoover (scheduler)<br>
<b>FifoScheduler incorrectly checking for node locality</b><br>
<blockquote>In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455:
application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129
application.getResourceRequest(priority, node.getHostName());
Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-410">YARN-410</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>New lines in diagnostics for a failed app on the per-application page make it hard to read</b><br>
<blockquote>We need to fix the following issues on YARN web-UI:
- Remove the "Note" column from the application list. When a failure happens, this "Note" spoils the table layout.
- When the Application is still not running, the Tracking UI should be title "UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) links to "#".
- The per-application page has all the RM related information like version, start-time etc. Must be some accidental change by one of the patches.
- The diagnostics for a failed app on the per-application page don't retain new lines and wrap'em around - looks hard to read.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-406">YARN-406</a>.
Minor improvement reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>TestRackResolver fails when local network resolves "host1" to a valid host</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-400">YARN-400</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RM can return null application resource usage report leading to NPE in client</b><br>
<blockquote>RMAppImpl.createAndGetApplicationReport can return a report with a null resource usage report if full access to the app is allowed but the application has no current attempt. This leads to NPEs in client code that assumes an app report will always have at least an empty resource usage report.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-398">YARN-398</a>.
Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>Enhance CS to allow for white-list of resources</b><br>
<blockquote>Allow white-list and black-list of resources in scheduler api.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-396">YARN-396</a>.
Major sub-task reported by Bikas Saha and fixed by Zhijie Shen <br>
<b>Rationalize AllocateResponse in RM scheduler API</b><br>
<blockquote>AllocateResponse contains an AMResponse and cluster node count. AMResponse that more data. Unless there is a good reason for this object structure, there should be either AMResponse or AllocateResponse.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-392">YARN-392</a>.
Major sub-task reported by Bikas Saha and fixed by Sandy Ryza (resourcemanager)<br>
<b>Make it possible to specify hard locality constraints in resource requests</b><br>
<blockquote>Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-391">YARN-391</a>.
Trivial improvement reported by Steve Loughran and fixed by Steve Loughran (nodemanager)<br>
<b>detabify LCEResourcesHandler classes</b><br>
<blockquote>the LCEResourcesHandler classes from YARN-3 have had some tab chars that have snuck into the source tree. fix this before that code starts getting branched off and it's too late</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-390">YARN-390</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>ApplicationCLI and NodeCLI use hard-coded platform-specific line separator, which causes test failures on Windows</b><br>
<blockquote>{{ApplicationCLI}}, {{NodeCLI}}, and the corresponding test {{TestYarnCLI}} all use a hard-coded '\n' as the line separator. This causes test failures on Windows.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-387">YARN-387</a>.
Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Fix inconsistent protocol naming</b><br>
<blockquote>We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming.
We should fix these before we go beta. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-385">YARN-385</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
<b>ResourceRequestPBImpl's toString() is missing location and # containers</b><br>
<blockquote>ResourceRequestPBImpl's toString method includes priority and resource capability, but omits location and number of containers.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-383">YARN-383</a>.
Minor bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>AMRMClientImpl should handle null rmClient in stop()</b><br>
<blockquote>2013-02-06 09:31:33,813 INFO [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl
org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null
at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605)
at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-382">YARN-382</a>.
Major improvement reported by Thomas Graves and fixed by Zhijie Shen (scheduler)<br>
<b>SchedulerUtils improve way normalizeRequest sets the resource capabilities</b><br>
<blockquote>In YARN-370, we changed it from setting the capability to directly setting memory and cores:
- ask.setCapability(normalized);
+ ask.getCapability().setMemory(normalized.getMemory());
+ ask.getCapability().setVirtualCores(normalized.getVirtualCores());
We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details.
I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-381">YARN-381</a>.
Minor improvement reported by Eli Collins and fixed by Sandy Ryza (documentation)<br>
<b>Improve FS docs</b><br>
<blockquote>The MR2 FS docs could use some improvements.
Configuration:
- sizebasedweight - what is the "size" here? Total memory usage?
Pool properties:
- minResources - what does min amount of aggregate memory mean given that this is not a reservation?
- maxResources - is this a hard limit?
- weight: How is this ratio configured? Eg base is 1 and all weights are relative to that?
- schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job?
There's no mention of ACLs, even though they're supported. See the CS docs for comparison.
Also there are a couple typos worth fixing while we're at it, eg "finish. apps to run"
Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-380">YARN-380</a>.
Major bug reported by Thomas Graves and fixed by Omkar Vinit Joshi (client)<br>
<b>yarn node -status prints Last-Last-Health-Update</b><br>
<blockquote>I assume the Last-Last-Health-Update is a typo and it should just be Last-Health-Update.
$ yarn node -status foo.com:8041
Node Report :
Node-Id : foo.com:8041
Rack : /10.10.10.0
Node-State : RUNNING
Node-Http-Address : foo.com:8042
Health-Status(isNodeHealthy) : true
Last-Last-Health-Update : 1360118400219
Health-Report :
Containers : 0
Memory-Used : 0M
Memory-Capacity : 24576</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-378">YARN-378</a>.
Major sub-task reported by xieguiming and fixed by Zhijie Shen (client , resourcemanager)<br>
<b>ApplicationMaster retry times should be set by Client</b><br>
<blockquote>We should support that different client or user have different ApplicationMaster retry times. It also say that "yarn.resourcemanager.am.max-retries" should be set by client. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-377">YARN-377</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Chris Nauroth <br>
<b>Fix TestContainersMonitor for HADOOP-9252</b><br>
<blockquote>HADOOP-9252 slightly changed the format of some StringUtils outputs. It caused TestContainersMonitor to fail.
Also, some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-376">YARN-376</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Apps that have completed can appear as RUNNING on the NM UI</b><br>
<blockquote>On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-369">YARN-369</a>.
Major sub-task reported by Hitesh Shah and fixed by Mayank Bansal (resourcemanager)<br>
<b>Handle ( or throw a proper error when receiving) status updates from application masters that have not registered</b><br>
<blockquote>Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped.
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-368">YARN-368</a>.
Trivial bug reported by Albert Chu and fixed by Albert Chu <br>
<b>Fix typo "defiend" should be "defined" in error output</b><br>
<blockquote>Noticed the following in an error log output while doing some experiements
./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle
"defiend" should be "defined"
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-365">YARN-365</a>.
Major sub-task reported by Siddharth Seth and fixed by Xuan Gong (resourcemanager , scheduler)<br>
<b>Each NM heartbeat should not generate an event for the Scheduler</b><br>
<blockquote>Follow up from YARN-275
https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-363">YARN-363</a>.
Major bug reported by Jason Lowe and fixed by Kenji Kikushima <br>
<b>yarn proxyserver fails to find webapps/proxy directory on startup</b><br>
<blockquote>Starting up the proxy server fails with this error:
{noformat}
2013-01-29 17:37:41,357 FATAL webproxy.WebAppProxy (WebAppProxy.java:start(99)) - Could not start proxy web server
java.io.FileNotFoundException: webapps/proxy not found in CLASSPATH
at org.apache.hadoop.http.HttpServer.getWebAppsPath(HttpServer.java:533)
at org.apache.hadoop.http.HttpServer.&lt;init&gt;(HttpServer.java:225)
at org.apache.hadoop.http.HttpServer.&lt;init&gt;(HttpServer.java:164)
at org.apache.hadoop.yarn.server.webproxy.WebAppProxy.start(WebAppProxy.java:90)
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.main(WebAppProxyServer.java:94)
{noformat}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-362">YARN-362</a>.
Minor bug reported by Jason Lowe and fixed by Ravi Prakash <br>
<b>Unexpected extra results when using webUI table search</b><br>
<blockquote>When using the search box on the web UI to search for a specific task number (e.g.: "0831"), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results.
It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-347">YARN-347</a>.
Major improvement reported by Junping Du and fixed by Junping Du (client)<br>
<b>YARN CLI should show CPU info besides memory info in node status</b><br>
<blockquote>With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status &lt;NodeID&gt; should show CPU used and capacity info as memory info.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-345">YARN-345</a>.
Critical bug reported by Devaraj K and fixed by Robert Parker (nodemanager)<br>
<b>Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager</b><br>
<blockquote>{code:xml}
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}
{code:xml}
2013-01-17 04:03:46,726 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}
{code:xml}
2013-01-17 00:01:11,006 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}
{code:xml}
2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358385982671_1304_01_000001 transitioned from NEW to DONE
2013-01-17 10:56:36,975 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at FINISHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null
{code}
{code:xml}
2013-01-17 10:56:36,026 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at FINISHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
2013-01-17 10:56:36,026 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null
{code}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-333">YARN-333</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Schedulers cannot control the queue-name of an application</b><br>
<blockquote>Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to "default".
A scheduler may wish to make its own decision on which queue to place an app in if none is specified. For example, when the fair scheduler user-as-default-queue config option is set to true, and an app is submitted with no queue specified, the fair scheduler should assign the app to a queue with the user's name.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-326">YARN-326</a>.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Add multi-resource scheduling to the fair scheduler</b><br>
<blockquote>With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness.
More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-319">YARN-319</a>.
Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)<br>
<b>Submit a job to a queue that not allowed in fairScheduler, client will hold forever.</b><br>
<blockquote>RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-309">YARN-309</a>.
Major sub-task reported by Xuan Gong and fixed by Xuan Gong (resourcemanager)<br>
<b>Make RM provide heartbeat interval to NM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-297">YARN-297</a>.
Major improvement reported by Arun C Murthy and fixed by Xuan Gong <br>
<b>Improve hashCode implementations for PB records</b><br>
<blockquote>As [~hsn] pointed out in YARN-2, we use very small primes in all our hashCode implementations.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-295">YARN-295</a>.
Major sub-task reported by Devaraj K and fixed by Mayank Bansal (resourcemanager)<br>
<b>Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl</b><br>
<blockquote>{code:xml}
2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-289">YARN-289</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Fair scheduler allows reservations that won't fit on node</b><br>
<blockquote>An application requests a container with 1024 MB. It then requests a container with 2048 MB. A node shows up with 1024 MB available. Even if the application is the only one running, neither request will be scheduled on it.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-269">YARN-269</a>.
Major bug reported by Thomas Graves and fixed by Jason Lowe (resourcemanager)<br>
<b>Resource Manager not logging the health_check_script result when taking it out</b><br>
<blockquote>The Resource Manager not logging the health_check_script result when taking it out. This was added to jobtracker in 1.x with MAPREDUCE-2451, we should do the same thing for RM.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-249">YARN-249</a>.
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (capacityscheduler)<br>
<b>Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)</b><br>
<blockquote>On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-237">YARN-237</a>.
Major improvement reported by Ravi Prakash and fixed by Jian He (resourcemanager)<br>
<b>Refreshing the RM page forgets how many rows I had in my Datatables</b><br>
<blockquote>If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows.
This user preference should be stored in a cookie.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-236">YARN-236</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RM should point tracking URL to RM web page when app fails to start</b><br>
<blockquote>Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful "The requested application exited before setting a tracking URL".
Usually the diagnostic string on the RM app page has something useful, so we might as well point there.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-227">YARN-227</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Application expiration difficult to debug for end-users</b><br>
<blockquote>When an AM attempt expires the AMLivelinessMonitor in the RM will kill the job and mark it as failed. However there are no diagnostic messages set for the application indicating that the application failed because of expiration. Even if the AM logs are examined, it's often not obvious that the application was externally killed. The only evidence of what happened to the application is currently in the RM logs, and those are often not accessible by users.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-209">YARN-209</a>.
Major bug reported by Bikas Saha and fixed by Zhijie Shen (capacityscheduler)<br>
<b>Capacity scheduler doesn't trigger app-activation after adding nodes</b><br>
<blockquote>Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever.
This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-200">YARN-200</a>.
Major sub-task reported by Robert Joseph Evans and fixed by Ravi Prakash <br>
<b>yarn log does not output all needed information, and is in a binary format</b><br>
<blockquote>yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts.
Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep.
The help message can also be more useful to users</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-198">YARN-198</a>.
Minor improvement reported by Ramgopal N and fixed by Jian He (nodemanager)<br>
<b>If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager</b><br>
<blockquote>If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM.
If there is a link to navigate back to RM it would be good</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-196">YARN-196</a>.
Major bug reported by Ramgopal N and fixed by Xuan Gong (nodemanager)<br>
<b>Nodemanager should be more robust in handling connection failure to ResourceManager when a cluster is started</b><br>
<blockquote>If NM is started before starting the RM ,NM is shutting down with the following error
{code}
ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
Caused by: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
... 3 more
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
at $Proxy23.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 5 more
Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
at org.apache.hadoop.ipc.Client.call(Client.java:1141)
at org.apache.hadoop.ipc.Client.call(Client.java:1100)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
... 7 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
at org.apache.hadoop.ipc.Client.call(Client.java:1117)
... 9 more
2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
at java.lang.Thread.run(Thread.java:619)
2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999
2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290
2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290
2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped.
2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
at java.lang.Thread.run(Thread.java:619)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-193">YARN-193</a>.
Major bug reported by Hitesh Shah and fixed by Zhijie Shen (resourcemanager)<br>
<b>Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-142">YARN-142</a>.
Blocker task reported by Siddharth Seth and fixed by <br>
<b>[Umbrella] Cleanup YARN APIs w.r.t exceptions</b><br>
<blockquote>Ref: MAPREDUCE-4067
All YARN APIs currently throw YarnRemoteException.
1) This cannot be extended in it's current form.
2) The RPC layer can throw IOExceptions. These end up showing up as UndeclaredThrowableExceptions.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-125">YARN-125</a>.
Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Make Yarn Client service shutdown operations robust</b><br>
<blockquote>Make the yarn client services more robust against being shut down while not started, or shutdown more than once, by null-checking fields before closing them, setting to null afterwards to prevent double-invocation. This is a subset of MAPREDUCE-3502</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-124">YARN-124</a>.
Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Make Yarn Node Manager services robust against shutdown</b><br>
<blockquote>Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager services. This is done by checking for fields being non-null before shutting down/closing etc, and setting the fields to null afterwards -to be resilient against re-entrancy.
No tests other than manual review.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-123">YARN-123</a>.
Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Make yarn Resource Manager services robust against shutdown</b><br>
<blockquote>Split MAPREDUCE-3502 patches to make the RM code more resilient to being stopped more than once, or before started.
This depends on MAPREDUCE-4014.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-117">YARN-117</a>.
Major improvement reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Enhance YARN service model</b><br>
<blockquote>Having played the YARN service model, there are some issues
that I've identified based on past work and initial use.
This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs.
h2. state model prevents stopped state being entered if you could not successfully start the service.
In the current lifecycle you cannot stop a service unless it was successfully started, but
* {{init()}} may acquire resources that need to be explicitly released
* if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources.
*Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null.
Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than "stopped".
MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one.
h2. AbstractService doesn't prevent duplicate state change requests.
The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} &amp; {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this.
This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} &amp; {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers.
h2. AbstractService state change doesn't defend against race conditions.
There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads.
h2. Static methods to choreograph of lifecycle operations
Helper methods to move things through lifecycles. init-&gt;start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns.
h2. state transition failures are something that registered service listeners may wish to be informed of.
When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics.
*Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface.
h2. Service listener failures not handled
Is this an error an error or not? Log and ignore may not be what is desired.
*Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes.
h2. Support static listeners for all AbstractServices
Add support to {{AbstractService}} that allow callers to register listeners for all instances. The existing listener interface could be used. This allows management tools to hook into the events.
The static listeners would be invoked for all state changes except creation (base class shouldn't be handing out references to itself at this point).
These static events could all be async, pushed through a shared {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the listeners invoked.
h2. Add some example listeners for management/diagnostics
* event to commons log for humans.
* events for machines hooked up to the JSON logger.
* for testing: something that be told to fail.
h2. Services should support signal interruptibility
The services would benefit from a way of shutting them down on a kill signal; this can be done via a runtime hook. It should not be automatic though, as composite services will get into a very complex state during shutdown. Better to provide a hook that lets you register/unregister services to terminate, and have the relevant {{main()}} entry points tell their root services to register themselves.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-112">YARN-112</a>.
Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>Race in localization can cause containers to fail</b><br>
<blockquote>On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-109">YARN-109</a>.
Major bug reported by Jason Lowe and fixed by Mayank Bansal (nodemanager)<br>
<b>.tmp file is not deleted for localized archives</b><br>
<blockquote>When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-101">YARN-101</a>.
Minor bug reported by xieguiming and fixed by Xuan Gong (nodemanager)<br>
<b>If the heartbeat message loss, the nodestatus info of complete container will loss too.</b><br>
<blockquote>see the red color:
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
protected void startStatusUpdater() {
new Thread("Node Status Updater") {
@Override
@SuppressWarnings("unchecked")
public void run() {
int lastHeartBeatID = 0;
while (!isStopped) {
// Send heartbeat
try {
synchronized (heartbeatMonitor) {
heartbeatMonitor.wait(heartBeatInterval);
}
{color:red}
// Before we send the heartbeat, we get the NodeStatus,
// whose method removes completed containers.
NodeStatus nodeStatus = getNodeStatus();
{color}
nodeStatus.setResponseId(lastHeartBeatID);
NodeHeartbeatRequest request = recordFactory
.newRecordInstance(NodeHeartbeatRequest.class);
request.setNodeStatus(nodeStatus);
{color:red}
// But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here.
HeartbeatResponse response =
resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color}
if (response.getNodeAction() == NodeAction.SHUTDOWN) {
LOG
.info("Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat," +
" hence shutting down.");
NodeStatusUpdaterImpl.this.stop();
break;
}
if (response.getNodeAction() == NodeAction.REBOOT) {
LOG.info("Node is out of sync with ResourceManager,"
+ " hence rebooting.");
NodeStatusUpdaterImpl.this.reboot();
break;
}
lastHeartBeatID = response.getResponseId();
List&lt;ContainerId&gt; containersToCleanup = response
.getContainersToCleanupList();
if (containersToCleanup.size() != 0) {
dispatcher.getEventHandler().handle(
new CMgrCompletedContainersEvent(containersToCleanup));
}
List&lt;ApplicationId&gt; appsToCleanup =
response.getApplicationsToCleanupList();
//Only start tracking for keepAlive on FINISH_APP
trackAppsForKeepAlive(appsToCleanup);
if (appsToCleanup.size() != 0) {
dispatcher.getEventHandler().handle(
new CMgrCompletedAppsEvent(appsToCleanup));
}
} catch (Throwable e) {
// TODO Better error handling. Thread can die with the rest of the
// NM still running.
LOG.error("Caught exception in status-updater", e);
}
}
}
}.start();
}
private NodeStatus getNodeStatus() {
NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
nodeStatus.setNodeId(this.nodeId);
int numActiveContainers = 0;
List&lt;ContainerStatus&gt; containersStatuses = new ArrayList&lt;ContainerStatus&gt;();
for (Iterator&lt;Entry&lt;ContainerId, Container&gt;&gt; i =
this.context.getContainers().entrySet().iterator(); i.hasNext();) {
Entry&lt;ContainerId, Container&gt; e = i.next();
ContainerId containerId = e.getKey();
Container container = e.getValue();
// Clone the container to send it to the RM
org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus =
container.cloneAndGetContainerStatus();
containersStatuses.add(containerStatus);
++numActiveContainers;
LOG.info("Sending out status for container: " + containerStatus);
{color:red}
// Here is the part that removes the completed containers.
if (containerStatus.getState() == ContainerState.COMPLETE) {
// Remove
i.remove();
{color}
LOG.info("Removed completed container " + containerId);
}
}
nodeStatus.setContainersStatuses(containersStatuses);
LOG.debug(this.nodeId + " sending out status for "
+ numActiveContainers + " containers");
NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus();
nodeHealthStatus.setHealthReport(healthChecker.getHealthReport());
nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy());
nodeHealthStatus.setLastHealthReportTime(
healthChecker.getLastHealthReportTime());
if (LOG.isDebugEnabled()) {
LOG.debug("Node's health-status : " + nodeHealthStatus.getIsNodeHealthy()
+ ", " + nodeHealthStatus.getHealthReport());
}
nodeStatus.setNodeHealthStatus(nodeHealthStatus);
List&lt;ApplicationId&gt; keepAliveAppIds = createKeepAliveApplicationList();
nodeStatus.setKeepAliveApplications(keepAliveAppIds);
return nodeStatus;
}
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-99">YARN-99</a>.
Major sub-task reported by Devaraj K and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>Jobs fail during resource localization when private distributed-cache hits unix directory limits</b><br>
<blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.
{code:xml}
java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}
We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-84">YARN-84</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li <br>
<b>Use Builder to get RPC server in YARN</b><br>
<blockquote>In HADOOP-8736, a Builder is introduced to replace all the getServer() variants. This JIRA is the change in YARN.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-71">YARN-71</a>.
Critical bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong (nodemanager)<br>
<b>Ensure/confirm that the NodeManager cleans up local-dirs on restart</b><br>
<blockquote>We have to make sure that NodeManagers cleanup their local files on restart.
It may already be working like that in which case we should have tests validating this.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-62">YARN-62</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>AM should not be able to abuse container tokens for repetitive container launches</b><br>
<blockquote>Clone of YARN-51.
ApplicationMaster should not be able to store container tokens and use the same set of tokens for repetitive container launches. The possibility of such abuse is there in the current code, for a duration of 1d+10mins, we need to fix this.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-45">YARN-45</a>.
Major sub-task reported by Chris Douglas and fixed by Carlo Curino (resourcemanager)<br>
<b>Scheduler feedback to AM to release containers</b><br>
<blockquote>The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers.
[1] http://research.yahoo.com/files/yl-2012-003.pdf</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-24">YARN-24</a>.
Major bug reported by Jason Lowe and fixed by Sandy Ryza (nodemanager)<br>
<b>Nodemanager fails to start if log aggregation enabled and namenode unavailable</b><br>
<blockquote>If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5421">MAPREDUCE-5421</a>.
Blocker bug reported by Junping Du and fixed by Junping Du (test)<br>
<b>TestNonExistentJob is failed due to recent changes in YARN</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5419">MAPREDUCE-5419</a>.
Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)<br>
<b>TestSlive is getting FileNotFound Exception</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5412">MAPREDUCE-5412</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Change MR to use multiple containers API of ContainerManager after YARN-926</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5399">MAPREDUCE-5399</a>.
Blocker bug reported by Stanislav Barton and fixed by Stanislav Barton (mrv1 , mrv2)<br>
<b>Unnecessary Configuration instantiation in IFileInputStream slows down merge</b><br>
<blockquote>Fixes blank Configuration object creation overhead by reusing the Job configuration in InMemoryReader.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5398">MAPREDUCE-5398</a>.
Major improvement reported by Bikas Saha and fixed by Jian He <br>
<b>MR changes for YARN-513</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5366">MAPREDUCE-5366</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
<b>TestMRAsyncDiskService fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5360">MAPREDUCE-5360</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
<b>TestMRJobClient fails on Windows due to path format</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5359">MAPREDUCE-5359</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>JobHistory should not use File.separator to match timestamp in path</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5357">MAPREDUCE-5357</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>Job staging directory owner checking could fail on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5355">MAPREDUCE-5355</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>MiniMRYarnCluster with localFs does not work on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5352">MAPREDUCE-5352</a>.
Major improvement reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Optimize node local splits generated by CombineFileInputFormat </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5349">MAPREDUCE-5349</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestClusterMapReduceTestCase and TestJobName fail on Windows in branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5334">MAPREDUCE-5334</a>.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>TestContainerLauncherImpl is failing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5333">MAPREDUCE-5333</a>.
Major test reported by Alejandro Abdelnur and fixed by Wei Yan (mr-am)<br>
<b>Add test that verifies MRAM works correctly when sending requests with non-normalized capabilities</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5328">MAPREDUCE-5328</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>ClientToken should not be set in the environment</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5326">MAPREDUCE-5326</a>.
Blocker bug reported by Arun C Murthy and fixed by Zhijie Shen <br>
<b>Add version to shuffle header</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5325">MAPREDUCE-5325</a>.
Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter---MR changes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5319">MAPREDUCE-5319</a>.
Major bug reported by yeshavora and fixed by Xuan Gong <br>
<b>Job.xml file does not has 'user.name' property for Hadoop2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5315">MAPREDUCE-5315</a>.
Critical bug reported by Mithun Radhakrishnan and fixed by Mithun Radhakrishnan (distcp)<br>
<b>DistCp reports success even on failure.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5312">MAPREDUCE-5312</a>.
Major bug reported by Alejandro Abdelnur and fixed by Sandy Ryza <br>
<b>TestRMNMInfo is failing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5310">MAPREDUCE-5310</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (applicationmaster)<br>
<b>MRAM should not normalize allocation request capabilities</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5308">MAPREDUCE-5308</a>.
Major bug reported by Nathan Roberts and fixed by Nathan Roberts <br>
<b>Shuffling to memory can get out-of-sync when fetching multiple compressed map outputs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5304">MAPREDUCE-5304</a>.
Blocker sub-task reported by Alejandro Abdelnur and fixed by Karthik Kambatla <br>
<b>mapreduce.Job killTask/failTask/getTaskCompletionEvents methods have incompatible signature changes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5303">MAPREDUCE-5303</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Changes on MR after moving ProtoBase to package impl.pb on YARN-724</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5301">MAPREDUCE-5301</a>.
Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Update MR code to work with YARN-635 changes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5300">MAPREDUCE-5300</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Two function signature changes in filecache.DistributedCache</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5299">MAPREDUCE-5299</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Mapred API: void setTaskID(TaskAttemptID) is missing in TaskCompletionEvent </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5298">MAPREDUCE-5298</a>.
Major new feature reported by Steve Loughran and fixed by Steve Loughran (applicationmaster)<br>
<b>Move MapReduce services to YARN-117 stricter lifecycle</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5297">MAPREDUCE-5297</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Update MR App since BuilderUtils is moved to yarn-server-common after YARN-748</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5296">MAPREDUCE-5296</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Mapred API: Function signature change in JobControl</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5291">MAPREDUCE-5291</a>.
Major bug reported by Siddharth Seth and fixed by Zhijie Shen <br>
<b>Change MR App to use update property names in container-log4j.properties</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5289">MAPREDUCE-5289</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>Update MR App to use Token directly after YARN-717</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5286">MAPREDUCE-5286</a>.
Major task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
<b>startContainer call should use the ContainerToken instead of Container [YARN-684]</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5285">MAPREDUCE-5285</a>.
Major bug reported by Jian He and fixed by <br>
<b>Update MR App to use immutable ApplicationAttemptID, ContainerID, NodeID after YARN-735</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5283">MAPREDUCE-5283</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (applicationmaster , test)<br>
<b>Over 10 different tests have near identical implementations of AppContext</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5282">MAPREDUCE-5282</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth <br>
<b>Update MR App to use immutable ApplicationID after YARN-716</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5280">MAPREDUCE-5280</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Mapreduce API: ClusterMetrics incompatibility issues with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5275">MAPREDUCE-5275</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Mapreduce API: TokenCache incompatibility issues with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5274">MAPREDUCE-5274</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Mapreduce API: String toHex(byte[]) is removed from SecureShuffleUtils</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5273">MAPREDUCE-5273</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Protected variables are removed from CombineFileRecordReader in both mapred and mapreduce</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5270">MAPREDUCE-5270</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Migrate from using BuilderUtil factory methods to individual record factory method on MapReduce side</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5268">MAPREDUCE-5268</a>.
Major improvement reported by Jason Lowe and fixed by Karthik Kambatla (jobhistoryserver)<br>
<b>Improve history server startup performance</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5263">MAPREDUCE-5263</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>filecache.DistributedCache incompatiblity issues with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5259">MAPREDUCE-5259</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic (test)<br>
<b>TestTaskLog fails on Windows because of path separators missmatch</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5257">MAPREDUCE-5257</a>.
Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (mr-am , mrv2)<br>
<b>TestContainerLauncherImpl fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5246">MAPREDUCE-5246</a>.
Major improvement reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>Adding application type to submission context</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5245">MAPREDUCE-5245</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>A number of public static variables are removed from JobConf</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5244">MAPREDUCE-5244</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Two functions changed their visibility in JobStatus</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5240">MAPREDUCE-5240</a>.
Blocker bug reported by Roman Shaposhnik and fixed by Vinod Kumar Vavilapalli (mrv2)<br>
<b>inside of FileOutputCommitter the initialized Credentials cache appears to be empty</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5239">MAPREDUCE-5239</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth <br>
<b>Update MR App to reflect YarnRemoteException changes after YARN-634</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5237">MAPREDUCE-5237</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>ClusterStatus incompatiblity issues with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5235">MAPREDUCE-5235</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>mapred.Counters incompatiblity issues with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5234">MAPREDUCE-5234</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Signature changes for getTaskId of TaskReport in mapred</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5233">MAPREDUCE-5233</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Functions are changed or removed from Job in jobcontrol</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5231">MAPREDUCE-5231</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Constructor of DBInputFormat.DBRecordReader in mapred is changed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5230">MAPREDUCE-5230</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>createFileSplit is removed from NLineInputFormat of mapred</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5229">MAPREDUCE-5229</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>TEMP_DIR_NAME is removed from of FileOutputCommitter of mapreduce</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5228">MAPREDUCE-5228</a>.
Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal <br>
<b>Enum Counter is removed from FileInputFormat and FileOutputFormat of both mapred and mapreduce</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5226">MAPREDUCE-5226</a>.
Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Handle exception related changes in YARN's AMRMProtocol api after YARN-630</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5222">MAPREDUCE-5222</a>.
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Fix JobClient incompatibilities with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5220">MAPREDUCE-5220</a>.
Major sub-task reported by Sandy Ryza and fixed by Zhijie Shen (client)<br>
<b>Mapred API: TaskCompletionEvent incompatibility issues with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5213">MAPREDUCE-5213</a>.
Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Re-assess TokenCache methods marked @Private</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5212">MAPREDUCE-5212</a>.
Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Handle exception related changes in YARN's ClientRMProtocol api after YARN-631</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5209">MAPREDUCE-5209</a>.
Minor bug reported by Radim Kolar and fixed by Tsuyoshi OZAWA (mrv2)<br>
<b>ShuffleScheduler log message incorrect</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5208">MAPREDUCE-5208</a>.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>SpillRecord and ShuffleHandler should use SecureIOUtils for reading index file and map output</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5205">MAPREDUCE-5205</a>.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Apps fail in secure cluster setup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5204">MAPREDUCE-5204</a>.
Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Handle YarnRemoteException separately from IOException in MR api </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5199">MAPREDUCE-5199</a>.
Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Daryn Sharp (security)<br>
<b>AppTokens file can/should be removed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5194">MAPREDUCE-5194</a>.
Minor task reported by Chris Douglas and fixed by Chris Douglas (task)<br>
<b>Heed interrupts during Fetcher shutdown</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5193">MAPREDUCE-5193</a>.
Major bug reported by Aaron T. Myers and fixed by Andrew Wang (test)<br>
<b>A few MR tests use block sizes which are smaller than the default minimum block size</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5192">MAPREDUCE-5192</a>.
Minor task reported by Chris Douglas and fixed by Chris Douglas (task)<br>
<b>Separate TCE resolution from fetch</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5191">MAPREDUCE-5191</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestQueue#testQueue fails with timeout on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5187">MAPREDUCE-5187</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu (mrv2)<br>
<b>Create mapreduce command scripts on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5184">MAPREDUCE-5184</a>.
Major sub-task reported by Arun C Murthy and fixed by Zhijie Shen (documentation)<br>
<b>Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2</b><br>
<blockquote>Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2 for end-users.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5181">MAPREDUCE-5181</a>.
Major bug reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli (applicationmaster)<br>
<b>RMCommunicator should not use AMToken from the env</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5179">MAPREDUCE-5179</a>.
Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5178">MAPREDUCE-5178</a>.
Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Fix use of BuilderUtils#newApplicationReport as a result of YARN-577.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5177">MAPREDUCE-5177</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5176">MAPREDUCE-5176</a>.
Major improvement reported by Carlo Curino and fixed by Carlo Curino (mrv2)<br>
<b>Preemptable annotations (to support preemption in MR)</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5175">MAPREDUCE-5175</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong <br>
<b>Update MR App to not set envs that will be set by NMs anyways after YARN-561</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5171">MAPREDUCE-5171</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (applicationmaster)<br>
<b>Expose blacklisted nodes from the MR AM REST API </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5167">MAPREDUCE-5167</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>Update MR App after YARN-562</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5166">MAPREDUCE-5166</a>.
Blocker bug reported by Gunther Hagleitner and fixed by Sandy Ryza <br>
<b>ConcurrentModificationException in LocalJobRunner</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5163">MAPREDUCE-5163</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong <br>
<b>Update MR App after YARN-441</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5159">MAPREDUCE-5159</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Aggregatewordcount and aggregatewordhist in hadoop-1 examples are not binary compatible with hadoop-2 mapred.lib.aggregate</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5157">MAPREDUCE-5157</a>.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Sort in hadoop-1 examples is not binary compatible with hadoop-2 mapred.lib</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5156">MAPREDUCE-5156</a>.
Blocker sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Hadoop-examples-1.x.x.jar cannot run on Yarn</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5152">MAPREDUCE-5152</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>MR App is not using Container from RM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5151">MAPREDUCE-5151</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Sandy Ryza <br>
<b>Update MR App after YARN-444</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5147">MAPREDUCE-5147</a>.
Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)<br>
<b>Maven build should create hadoop-mapreduce-client-app-VERSION.jar directly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5146">MAPREDUCE-5146</a>.
Minor bug reported by Sangjin Lee and fixed by Sangjin Lee (task)<br>
<b>application classloader may be used too early to load classes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5145">MAPREDUCE-5145</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Change default max-attempts to be more than one for MR jobs as well</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5140">MAPREDUCE-5140</a>.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>MR part of YARN-514</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5139">MAPREDUCE-5139</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong <br>
<b>Update MR App after YARN-486</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5138">MAPREDUCE-5138</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>Fix LocalDistributedCacheManager after YARN-112</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5137">MAPREDUCE-5137</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves (applicationmaster)<br>
<b>AM web UI: clicking on Map Task results in 500 error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5136">MAPREDUCE-5136</a>.
Major bug reported by Amir Sanjar and fixed by Amir Sanjar <br>
<b>TestJobImpl-&gt;testJobNoTasks fails with IBM JAVA</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5129">MAPREDUCE-5129</a>.
Minor new feature reported by Billie Rinaldi and fixed by Billie Rinaldi <br>
<b>Add tag info to JH files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5128">MAPREDUCE-5128</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (documentation , jobhistoryserver)<br>
<b>mapred-default.xml is missing a bunch of history server configs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5113">MAPREDUCE-5113</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Streaming input/output types are ignored with java mapper/reducer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5098">MAPREDUCE-5098</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (contrib/gridmix)<br>
<b>Fix findbugs warnings in gridmix</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5086">MAPREDUCE-5086</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>MR app master deletes staging dir when sent a reboot command from the RM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5079">MAPREDUCE-5079</a>.
Critical improvement reported by Jason Lowe and fixed by Jason Lowe (mr-am)<br>
<b>Recovery should restore task state from job history info directly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5078">MAPREDUCE-5078</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>TestMRAppMaster fails on Windows due to mismatched path separators</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5077">MAPREDUCE-5077</a>.
Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (mrv2)<br>
<b>Cleanup: mapreduce.util.ResourceCalculatorPlugin and related code should be removed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5075">MAPREDUCE-5075</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (distcp)<br>
<b>DistCp leaks input file handles</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5069">MAPREDUCE-5069</a>.
Minor improvement reported by Sangjin Lee and fixed by (mrv1 , mrv2)<br>
<b>add concrete common implementations of CombineFileInputFormat</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5066">MAPREDUCE-5066</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>JobTracker should set a timeout when calling into job.end.notification.url</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5065">MAPREDUCE-5065</a>.
Major bug reported by Mithun Radhakrishnan and fixed by Mithun Radhakrishnan (distcp)<br>
<b>DistCp should skip checksum comparisons if block-sizes are different on source/target.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5062">MAPREDUCE-5062</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>MR AM should read max-retries information from the RM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5060">MAPREDUCE-5060</a>.
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans <br>
<b>Fetch failures that time out only count against the first map task</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5059">MAPREDUCE-5059</a>.
Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (jobhistoryserver , webapps)<br>
<b>Job overview shows average merge time larger than for any reduce attempt</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5043">MAPREDUCE-5043</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)<br>
<b>Fetch failure processing can cause AM event queue to backup and eventually OOM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5042">MAPREDUCE-5042</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am , security)<br>
<b>Reducer unable to fetch for a map task that was recovered</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5033">MAPREDUCE-5033</a>.
Minor improvement reported by Andrew Wang and fixed by Andrew Wang <br>
<b>mapred shell script should respect usage flags (--help -help -h)</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5027">MAPREDUCE-5027</a>.
Major bug reported by Jason Lowe and fixed by Robert Parker <br>
<b>Shuffle does not limit number of outstanding connections</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5015">MAPREDUCE-5015</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>Coverage fix for org.apache.hadoop.mapreduce.tools.CLI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5013">MAPREDUCE-5013</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (client)<br>
<b>mapred.JobStatus compatibility: MR2 missing constructors from MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5009">MAPREDUCE-5009</a>.
Critical bug reported by Robert Parker and fixed by Robert Parker (mrv1)<br>
<b>Killing the Task Attempt slated for commit does not clear the value from the Task commitAttempt member</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5008">MAPREDUCE-5008</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Merger progress miscounts with respect to EOF_MARKER</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5007">MAPREDUCE-5007</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>fix coverage org.apache.hadoop.mapreduce.v2.hs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5000">MAPREDUCE-5000</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)<br>
<b>TaskImpl.getCounters() can return the counters for the wrong task attempt when task is speculating</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4994">MAPREDUCE-4994</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (client)<br>
<b>-jt generic command line option does not work</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4992">MAPREDUCE-4992</a>.
Critical bug reported by Robert Parker and fixed by Robert Parker (mr-am)<br>
<b>AM hangs in RecoveryService when recovering tasks with speculative attempts</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4991">MAPREDUCE-4991</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>coverage for gridmix</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4990">MAPREDUCE-4990</a>.
Trivial improvement reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Construct debug strings conditionally in ShuffleHandler.Shuffle#sendMapOutput()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4989">MAPREDUCE-4989</a>.
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver , mr-am)<br>
<b>JSONify DataTables input data for Attempts page</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4987">MAPREDUCE-4987</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (distributed-cache , nodemanager)<br>
<b>TestMRJobs#testDistributedCache fails on Windows due to classpath problems and unexpected behavior of symlinks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4985">MAPREDUCE-4985</a>.
Trivial bug reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov <br>
<b>TestDFSIO supports compression but usages doesn't reflect</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4981">MAPREDUCE-4981</a>.
Minor bug reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov <br>
<b>WordMean, WordMedian, WordStandardDeviation missing from ExamplesDriver</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4974">MAPREDUCE-4974</a>.
Major improvement reported by Arun A K and fixed by Gelesh (mrv1 , mrv2 , performance)<br>
<b>Optimising the LineRecordReader initialize() method</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4972">MAPREDUCE-4972</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>Coverage fixing for org.apache.hadoop.mapreduce.jobhistory </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4951">MAPREDUCE-4951</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (applicationmaster , mr-am , mrv2)<br>
<b>Container preemption interpreted as task failure</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4942">MAPREDUCE-4942</a>.
Major sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv2)<br>
<b>mapreduce.Job has a bunch of methods that throw InterruptedException so its incompatible with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4932">MAPREDUCE-4932</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter (mrv2)<br>
<b>mapreduce.job#getTaskCompletionEvents incompatible with Hadoop 1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4927">MAPREDUCE-4927</a>.
Major bug reported by Jason Lowe and fixed by Ashwin Shankar (jobhistoryserver)<br>
<b>Historyserver 500 error due to NPE when accessing specific counters page for failed job</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4898">MAPREDUCE-4898</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter (mrv2)<br>
<b>FileOutputFormat.checkOutputSpecs and FileOutputFormat.setOutputPath incompatible with MR1</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4896">MAPREDUCE-4896</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (client , scheduler)<br>
<b>"mapred queue -info" spits out ugly exception when queue does not exist</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4892">MAPREDUCE-4892</a>.
Major bug reported by Bikas Saha and fixed by Bikas Saha <br>
<b>CombineFileInputFormat node input split can be skewed on small clusters</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4885">MAPREDUCE-4885</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (contrib/streaming , test)<br>
<b>Streaming tests have multiple failures on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4875">MAPREDUCE-4875</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (test)<br>
<b>coverage fixing for org.apache.hadoop.mapred</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4871">MAPREDUCE-4871</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)<br>
<b>AM uses mapreduce.jobtracker.split.metainfo.maxsize but mapred-default has mapreduce.job.split.metainfo.maxsize</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4846">MAPREDUCE-4846</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (client)<br>
<b>Some JobQueueInfo methods are public in MR1 but protected in MR2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4794">MAPREDUCE-4794</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)<br>
<b>DefaultSpeculator generates error messages on normal shutdown</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4737">MAPREDUCE-4737</a>.
Major bug reported by Daniel Dai and fixed by Arun C Murthy <br>
<b> Hadoop does not close output file / does not call Mapper.cleanup if exception in map</b><br>
<blockquote>Ensure that mapreduce APIs are semantically consistent with mapred API w.r.t Mapper.cleanup and Reducer.cleanup; in the sense that cleanup is now called even if there is an error. The old mapred API already ensures that Mapper.close and Reducer.close are invoked during error handling. Note that it is an incompatible change, however end-users can override Mapper.run and Reducer.run to get the old (inconsistent) behaviour.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4716">MAPREDUCE-4716</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)<br>
<b>TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4693">MAPREDUCE-4693</a>.
Major bug reported by Jason Lowe and fixed by Xuan Gong (jobhistoryserver , mrv2)<br>
<b>Historyserver should provide counters for failed tasks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4671">MAPREDUCE-4671</a>.
Major bug reported by Bikas Saha and fixed by Bikas Saha <br>
<b>AM does not tell the RM about container requests that are no longer needed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4571">MAPREDUCE-4571</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves (webapps)<br>
<b>TestHsWebServicesJobs fails on jdk7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4374">MAPREDUCE-4374</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (mrv2)<br>
<b>Fix child task environment variable config and add support for Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4356">MAPREDUCE-4356</a>.
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)<br>
<b>Provide access to ParsedTask.obtainTaskAttempts()</b><br>
<blockquote>Made the method ParsedTask.obtainTaskAttempts() public.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4149">MAPREDUCE-4149</a>.
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)<br>
<b>Rumen fails to parse certain counter strings</b><br>
<blockquote>Fixes Rumen to parse counter strings containing the special characters "{" and "}".</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4100">MAPREDUCE-4100</a>.
Minor bug reported by Karam Singh and fixed by Amar Kamat (contrib/gridmix)<br>
<b>Sometimes gridmix emulates data larger much larger then acutal counter for map only jobs</b><br>
<blockquote>Bug fixed in compression emulation feature for map only jobs.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4087">MAPREDUCE-4087</a>.
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi <br>
<b>[Gridmix] GenerateDistCacheData job of Gridmix can become slow in some cases</b><br>
<blockquote>Fixes the issue of GenerateDistCacheData job slowness.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4083">MAPREDUCE-4083</a>.
Major bug reported by Karam Singh and fixed by Amar Kamat (contrib/gridmix)<br>
<b>GridMix emulated job tasks.resource-usage emulator for CPU usage throws NPE when Trace contains cumulativeCpuUsage value of 0 at attempt level</b><br>
<blockquote>Fixes NPE in cpu emulation in Gridmix</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4067">MAPREDUCE-4067</a>.
Critical bug reported by Jitendra Nath Pandey and fixed by Xuan Gong <br>
<b>Replace YarnRemoteException with IOException in MRv2 APIs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4019">MAPREDUCE-4019</a>.
Minor bug reported by B Anil Kumar and fixed by Ashwin Shankar (client)<br>
<b>-list-attempt-ids is not working</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3953">MAPREDUCE-3953</a>.
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi <br>
<b>Gridmix throws NPE and does not simulate a job if the trace contains null taskStatus for a task</b><br>
<blockquote>Fixes NPE and makes Gridmix simulate succeeded-jobs-with-failed-tasks. All tasks of such simulated jobs(including the failed ones of original job) will succeed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3872">MAPREDUCE-3872</a>.
Major bug reported by Patrick Hunt and fixed by Robert Kanter (client , mrv2)<br>
<b>event handling races in ContainerLauncherImpl and TestContainerLauncher</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3829">MAPREDUCE-3829</a>.
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)<br>
<b>[Gridmix] Gridmix should give better error message when input-data directory already exists and -generate option is given</b><br>
<blockquote>Makes Gridmix emit out correct error message when the input data directory already exists and -generate option is used. Makes Gridmix exit with proper exit codes when Gridmix fails in args-processing, startup/setup.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3787">MAPREDUCE-3787</a>.
Major improvement reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)<br>
<b>[Gridmix] Improve STRESS mode</b><br>
<blockquote>JobMonitor can now deploy multiple threads for faster job-status polling. Use 'gridmix.job-monitor.thread-count' to set the number of threads. Stress mode now relies on the updates from the job monitor instead of polling for job status. Failures in job submission now get reported to the statistics module and ultimately reported to the user via summary.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3757">MAPREDUCE-3757</a>.
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)<br>
<b>Rumen Folder is not adjusting the shuffleFinished and sortFinished times of reduce task attempts</b><br>
<blockquote>Fixed the sortFinishTime and shuffleFinishTime adjustments in Rumen Folder.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3685">MAPREDUCE-3685</a>.
Critical bug reported by anty.rao and fixed by anty (mrv2)<br>
<b>There are some bugs in implementation of MergeManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3533">MAPREDUCE-3533</a>.
Minor improvement reported by Steve Loughran and fixed by (mrv2)<br>
<b>have the service interface extend Closeable and use close() as its shutdown operation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3502">MAPREDUCE-3502</a>.
Major task reported by Steve Loughran and fixed by Steve Loughran (mrv2)<br>
<b>Review all Service.stop() operations and make sure that they work before a service is started</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3008">MAPREDUCE-3008</a>.
Major sub-task reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)<br>
<b>[Gridmix] Improve cumulative CPU usage emulation for short running tasks</b><br>
<blockquote>Improves cumulative CPU emulation for short running tasks.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2722">MAPREDUCE-2722</a>.
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)<br>
<b>Gridmix simulated job's map's hdfsBytesRead counter is wrong when compressed input is used</b><br>
<blockquote>Makes Gridmix use the uncompressed input data size while simulating map tasks in the case where compressed input data was used in original job.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5083">HDFS-5083</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Update the HDFS compatibility version range</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5027">HDFS-5027</a>.
Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)<br>
<b>On startup, DN should scan volumes in parallel</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5025">HDFS-5025</a>.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao (ha , namenode)<br>
<b>Record ClientId and CallId in EditLog to enable rebuilding retry cache in case of HA failover</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5024">HDFS-5024</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)<br>
<b>Make DatanodeProtocol#commitBlockSynchronization idempotent</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5020">HDFS-5020</a>.
Major improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b>Make DatanodeProtocol#blockReceivedAndDeleted idempotent</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5018">HDFS-5018</a>.
Minor bug reported by Ted Yu and fixed by Ted Yu <br>
<b>Misspelled DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT in javadoc of DatanodeInfo#isStale()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5016">HDFS-5016</a>.
Blocker bug reported by Devaraj Das and fixed by Suresh Srinivas <br>
<b>Deadlock in pipeline recovery causes Datanode to be marked dead</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5010">HDFS-5010</a>.
Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (namenode , performance)<br>
<b>Reduce the frequency of getCurrentUser() calls from namenode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5008">HDFS-5008</a>.
Major improvement reported by Suresh Srinivas and fixed by Jing Zhao (namenode)<br>
<b>Make ClientProtocol#abandonBlock() idempotent</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5007">HDFS-5007</a>.
Minor improvement reported by Kousuke Saruta and fixed by Kousuke Saruta <br>
<b>Replace hard-coded property keys with DFSConfigKeys fields</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5005">HDFS-5005</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Move SnapshotException and SnapshotAccessControlException to o.a.h.hdfs.protocol</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-5003">HDFS-5003</a>.
Minor bug reported by Xi Fang and fixed by Xi Fang (test)<br>
<b>TestNNThroughputBenchmark failed caused by existing directories</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4999">HDFS-4999</a>.
Major bug reported by Kihwal Lee and fixed by Colin Patrick McCabe <br>
<b>fix TestShortCircuitLocalRead on branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4998">HDFS-4998</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (test)<br>
<b>TestUnderReplicatedBlocks fails intermittently</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4996">HDFS-4996</a>.
Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>ClientProtocol#metaSave can be made idempotent by overwriting the output file instead of appending to it</b><br>
<blockquote>The dfsadmin -metasave command has been changed to overwrite the output file. Previously, this command would append to the output file if it already existed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4992">HDFS-4992</a>.
Major improvement reported by Max Lapan and fixed by Max Lapan (balancer)<br>
<b>Make balancer's thread count configurable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4982">HDFS-4982</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (journal-node , security)<br>
<b>JournalNode should relogin from keytab before fetching logs from other JNs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4980">HDFS-4980</a>.
Major bug reported by Mark Grover and fixed by Mark Grover (build)<br>
<b>Incorrect logging.properties file for hadoop-httpfs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4979">HDFS-4979</a>.
Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Implement retry cache on the namenode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4978">HDFS-4978</a>.
Major improvement reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Make disallowSnapshot idempotent</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4974">HDFS-4974</a>.
Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (ha , namenode)<br>
<b>Analyze and add annotations to Namenode protocol methods and enable retry</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4969">HDFS-4969</a>.
Blocker bug reported by Robert Kanter and fixed by Robert Kanter (test , webhdfs)<br>
<b>WebhdfsFileSystem expects non-standard WEBHDFS Json element</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4954">HDFS-4954</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>compile failure in branch-2: getFlushedOffset should catch or rethrow IOException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4951">HDFS-4951</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter (security)<br>
<b>FsShell commands using secure httpfs throw exceptions due to missing TokenRenewer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4948">HDFS-4948</a>.
Major bug reported by Robert Joseph Evans and fixed by Brandon Li <br>
<b>mvn site for hadoop-hdfs-nfs fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4944">HDFS-4944</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (webhdfs)<br>
<b>WebHDFS cannot create a file path containing characters that must be URI-encoded, such as space.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4943">HDFS-4943</a>.
Minor bug reported by Jerry He and fixed by Jerry He (webhdfs)<br>
<b>WebHdfsFileSystem does not work when original file path has encoded chars </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4932">HDFS-4932</a>.
Minor improvement reported by Fengdong Yu and fixed by Fengdong Yu (ha , namenode)<br>
<b>Avoid a wide line on the name node webUI if we have more Journal nodes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4927">HDFS-4927</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>CreateEditsLog creates inodes with an invalid inode ID, which then cannot be loaded by a namenode.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4917">HDFS-4917</a>.
Major bug reported by Fengdong Yu and fixed by Fengdong Yu (datanode , namenode)<br>
<b>Start-dfs.sh cannot pass the parameters correctly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4914">HDFS-4914</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs-client)<br>
<b>When possible, Use DFSClient.Conf instead of Configuration </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4912">HDFS-4912</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Cleanup FSNamesystem#startFileInternal</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4910">HDFS-4910</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestPermission failed in branch-2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4908">HDFS-4908</a>.
Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode , snapshots)<br>
<b>Reduce snapshot inode memory usage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4906">HDFS-4906</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs-client)<br>
<b>HDFS Output streams should not accept writes after being closed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4903">HDFS-4903</a>.
Minor improvement reported by Suresh Srinivas and fixed by Arpit Agarwal (namenode)<br>
<b>Print trash configuration and trash emptier state in namenode log</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4902">HDFS-4902</a>.
Major bug reported by Binglin Chang and fixed by Binglin Chang (snapshots)<br>
<b>DFSClient.getSnapshotDiffReport should use string path rather than o.a.h.fs.Path</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4888">HDFS-4888</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash <br>
<b>Refactor and fix FSNamesystem.getTurnOffTip to sanity</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4887">HDFS-4887</a>.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee (benchmarks , test)<br>
<b>TestNNThroughputBenchmark exits abruptly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4883">HDFS-4883</a>.
Major bug reported by Konstantin Shvachko and fixed by Tao Luo (namenode)<br>
<b>complete() should verify fileId</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4880">HDFS-4880</a>.
Major bug reported by Arpit Agarwal and fixed by Suresh Srinivas (namenode)<br>
<b>Diagnostic logging while loading name/edits files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4878">HDFS-4878</a>.
Major bug reported by Tao Luo and fixed by Tao Luo (namenode)<br>
<b>On Remove Block, Block is not Removed from neededReplications queue</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4877">HDFS-4877</a>.
Blocker bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)<br>
<b>Snapshot: fix the scenario where a directory is renamed under its prior descendant</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4876">HDFS-4876</a>.
Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (snapshots)<br>
<b>The javadoc of FileWithSnapshot is incorrect</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4875">HDFS-4875</a>.
Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Arpit Agarwal (snapshots , test)<br>
<b>Add a test for testing snapshot file length</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4873">HDFS-4873</a>.
Major bug reported by Hari Mankude and fixed by Jing Zhao (snapshots)<br>
<b>callGetBlockLocations returns incorrect number of blocks for snapshotted files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4867">HDFS-4867</a>.
Major bug reported by Kihwal Lee and fixed by Plamen Jeliazkov (namenode)<br>
<b>metaSave NPEs when there are invalid blocks in repl queue.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4866">HDFS-4866</a>.
Blocker bug reported by Ralph Castain and fixed by Arpit Agarwal (namenode)<br>
<b>Protocol buffer support cannot compile under C</b><br>
<blockquote>The Protocol Buffers definition of the inter-namenode protocol required a change for compatibility with compiled C clients. This is a backwards-incompatible change. A namenode prior to this change will not be able to communicate with a namenode after this change.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4865">HDFS-4865</a>.
Major bug reported by Wei Yan and fixed by Wei Yan <br>
<b>Remove sub resource warning from httpfs log at startup time</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4863">HDFS-4863</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)<br>
<b>The root directory should be added to the snapshottable directory list while loading fsimage </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4862">HDFS-4862</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash <br>
<b>SafeModeInfo.isManual() returns true when resources are low even if it wasn't entered into manually</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4857">HDFS-4857</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)<br>
<b>Snapshot.Root and AbstractINodeDiff#snapshotINode should not be put into INodeMap when loading FSImage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4850">HDFS-4850</a>.
Major bug reported by Stephen Chu and fixed by Jing Zhao (tools)<br>
<b>fix OfflineImageViewer to work on fsimages with empty files or snapshots</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4848">HDFS-4848</a>.
Minor improvement reported by Stephen Chu and fixed by Jing Zhao (snapshots)<br>
<b>copyFromLocal and renaming a file to ".snapshot" should output that ".snapshot" is a reserved name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4846">HDFS-4846</a>.
Minor bug reported by Stephen Chu and fixed by Jing Zhao (snapshots)<br>
<b>Clean up snapshot CLI commands output stacktrace for invalid arguments</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4845">HDFS-4845</a>.
Critical bug reported by Kihwal Lee and fixed by Arpit Agarwal (namenode)<br>
<b>FSEditLogLoader gets NPE while accessing INodeMap in TestEditLogRace</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4842">HDFS-4842</a>.
Major sub-task reported by Jing Zhao and fixed by Jing Zhao (snapshots)<br>
<b>Snapshot: identify the correct prior snapshot when deleting a snapshot under a renamed subtree</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4841">HDFS-4841</a>.
Major bug reported by Stephen Chu and fixed by Robert Kanter (security , webhdfs)<br>
<b>FsShell commands using secure webhfds fail ClientFinalizer shutdown hook</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4840">HDFS-4840</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
<b>ReplicationMonitor gets NPE during shutdown</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4832">HDFS-4832</a>.
Critical bug reported by Ravi Prakash and fixed by Ravi Prakash <br>
<b>Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave</b><br>
<blockquote>This change makes name node keep its internal replication queues and data node state updated in manual safe mode. This allows metrics and UI to present up-to-date information while in safe mode. The behavior during start-up safe mode is unchanged. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4830">HDFS-4830</a>.
Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers <br>
<b>Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4827">HDFS-4827</a>.
Major bug reported by Devaraj Das and fixed by Devaraj Das <br>
<b>Slight update to the implementation of API for handling favored nodes in DFSClient</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4826">HDFS-4826</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestNestedSnapshots times out due to repeated slow edit log flushes when running on virtualized disk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4825">HDFS-4825</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (webhdfs)<br>
<b>webhdfs / httpfs tests broken because of min block size change</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4824">HDFS-4824</a>.
Major bug reported by Henry Robinson and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4819">HDFS-4819</a>.
Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (documentation)<br>
<b>Update Snapshot doc for HDFS-4758</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4818">HDFS-4818</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode , test)<br>
<b>several HDFS tests that attempt to make directories unusable do not work correctly on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4815">HDFS-4815</a>.
Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (datanode , test)<br>
<b>TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN: Double call countReplicas() to fetch corruptReplicas and liveReplicas is not needed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4813">HDFS-4813</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)<br>
<b>BlocksMap may throw NullPointerException during shutdown</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4810">HDFS-4810</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>several HDFS HA tests have timeouts that are too short</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4807">HDFS-4807</a>.
Major bug reported by Kihwal Lee and fixed by Cristina L. Abad <br>
<b>DFSOutputStream.createSocketForPipeline() should not include timeout extension on connect</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4805">HDFS-4805</a>.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)<br>
<b>Webhdfs client is fragile to token renewal errors</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4804">HDFS-4804</a>.
Minor improvement reported by Stephen Chu and fixed by Stephen Chu <br>
<b>WARN when users set the block balanced preference percent below 0.5 or above 1.0</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4799">HDFS-4799</a>.
Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode)<br>
<b>Corrupt replica can be prematurely removed from corruptReplicas map</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4797">HDFS-4797</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (datanode)<br>
<b>BlockScanInfo does not override equals(..) and hashCode() consistently</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4787">HDFS-4787</a>.
Major improvement reported by Tian Hong Wang and fixed by Tian Hong Wang <br>
<b>Create a new HdfsConfiguration before each TestDFSClientRetries testcases</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4785">HDFS-4785</a>.
Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Concat operation does not remove concatenated files from InodeMap</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4784">HDFS-4784</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>NPE in FSDirectory.resolvePath()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4783">HDFS-4783</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestDelegationTokensWithHA#testHAUtilClonesDelegationTokens fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4780">HDFS-4780</a>.
Minor bug reported by Kihwal Lee and fixed by Robert Parker (namenode)<br>
<b>Use the correct relogin method for services</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4778">HDFS-4778</a>.
Major bug reported by Devaraj Das and fixed by Devaraj Das (namenode)<br>
<b>Invoke getPipeline in the chooseTarget implementation that has favoredNodes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4772">HDFS-4772</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>Add number of children in HdfsFileStatus</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4768">HDFS-4768</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)<br>
<b>File handle leak in datanode when a block pool is removed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4765">HDFS-4765</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Permission check of symlink deletion incorrectly throws UnresolvedLinkException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4763">HDFS-4763</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Add script changes/utility for starting NFS gateway</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4762">HDFS-4762</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
<b>Provide HDFS based NFSv3 and Mountd implementation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4751">HDFS-4751</a>.
Minor bug reported by Andrew Wang and fixed by Andrew Wang (test)<br>
<b>TestLeaseRenewer#testThreadName flakes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4748">HDFS-4748</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (qjm , test)<br>
<b>MiniJournalCluster#restartJournalNode leaks resources, which causes sporadic test failures</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4745">HDFS-4745</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestDataTransferKeepalive#testSlowReader has race condition that causes sporadic failure</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4743">HDFS-4743</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestNNStorageRetentionManager fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4741">HDFS-4741</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestStorageRestore#testStorageRestoreFailure fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4740">HDFS-4740</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>Fixes for a few test failures on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4739">HDFS-4739</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)<br>
<b>NN can miscalculate the number of extra edit log segments to retain</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4737">HDFS-4737</a>.
Major bug reported by Sean Mackrory and fixed by Sean Mackrory <br>
<b>JVM path embedded in fuse binaries</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4734">HDFS-4734</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal <br>
<b>HDFS Tests that use ShellCommandFencer are broken on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4733">HDFS-4733</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
<b>Make HttpFS username pattern configurable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4732">HDFS-4732</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestDFSUpgradeFromImage fails on Windows due to failure to unpack old image tarball that contains hard links</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4725">HDFS-4725</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode , test , tools)<br>
<b>fix HDFS file handle leaks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4722">HDFS-4722</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic (test)<br>
<b>TestGetConf#testFederation times out on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4721">HDFS-4721</a>.
Major improvement reported by Varun Sharma and fixed by Varun Sharma (namenode)<br>
<b>Speed up lease/block recovery when DN fails and a block goes into recovery</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4714">HDFS-4714</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
<b>Log short messages in Namenode RPC server for exceptions meant for clients</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4705">HDFS-4705</a>.
Minor bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>Address HDFS test failures on Windows because of invalid dfs.namenode.name.dir</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4699">HDFS-4699</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestPipelinesFailover#testPipelineRecoveryStress fails sporadically</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4698">HDFS-4698</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)<br>
<b>provide client-side metrics for remote reads, local reads, and short-circuit reads</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4695">HDFS-4695</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic (test)<br>
<b>TestEditLog leaks open file handles between tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4693">HDFS-4693</a>.
Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>Some test cases in TestCheckpoint do not clean up after themselves</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4687">HDFS-4687</a>.
Minor bug reported by Andrew Wang and fixed by Andrew Wang (test)<br>
<b>TestDelegationTokenForProxyUser#testWebHdfsDoAs is flaky with JDK7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4679">HDFS-4679</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Namenode operation checks should be done in a consistent manner</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4677">HDFS-4677</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>Editlog should support synchronous writes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4676">HDFS-4676</a>.
Minor bug reported by Suresh Srinivas and fixed by Suresh Srinivas (test)<br>
<b>TestHDFSFileSystemContract should set MiniDFSCluster variable to null to free up memory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4674">HDFS-4674</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestBPOfferService fails on Windows due to failure parsing datanode data directory as URI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4669">HDFS-4669</a>.
Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (test)<br>
<b>TestBlockPoolManager fails using IBM java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4659">HDFS-4659</a>.
Major bug reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>Support setting execution bit for regular files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4658">HDFS-4658</a>.
Trivial bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)<br>
<b>Standby NN will log that it has received a block report "after becoming active"</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4655">HDFS-4655</a>.
Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)<br>
<b>DNA_FINALIZE is logged as being an unknown command by the DN when received from the standby NN</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4646">HDFS-4646</a>.
Minor bug reported by Jagane Sundar and fixed by (namenode)<br>
<b>createNNProxyWithClientProtocol ignores configured timeout value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4645">HDFS-4645</a>.
Major improvement reported by Suresh Srinivas and fixed by Arpit Agarwal (namenode)<br>
<b>Move from randomly generated block ID to sequentially generated block ID</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4643">HDFS-4643</a>.
Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (qjm , test)<br>
<b>Fix flakiness in TestQuorumJournalManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4639">HDFS-4639</a>.
Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode)<br>
<b>startFileInternal() should not increment generation stamp</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4635">HDFS-4635</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Move BlockManager#computeCapacity to LightWeightGSet</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4625">HDFS-4625</a>.
Minor bug reported by Arpit Agarwal and fixed by Ivan Mitic (test)<br>
<b>Make TestNNWithQJM#testNewNamenodeTakesOverWriter work on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4621">HDFS-4621</a>.
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (ha , qjm)<br>
<b>additional logging to help diagnose slow QJM logSync</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4620">HDFS-4620</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (documentation)<br>
<b>Documentation for dfs.namenode.rpc-address specifies wrong format</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4618">HDFS-4618</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode)<br>
<b>default for checkpoint txn interval is too low</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4615">HDFS-4615</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>Fix TestDFSShell failures on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4614">HDFS-4614</a>.
Trivial bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)<br>
<b>FSNamesystem#getContentSummary should use getPermissionChecker helper method</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4610">HDFS-4610</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>Move to using common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4609">HDFS-4609</a>.
Minor bug reported by Ivan Mitic and fixed by Ivan Mitic (test)<br>
<b>TestAuditLogs should release log handles between tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4607">HDFS-4607</a>.
Minor bug reported by Ivan Mitic and fixed by Ivan Mitic (test)<br>
<b>TestGetConf#testGetSpecificKey fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4604">HDFS-4604</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestJournalNode fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4603">HDFS-4603</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestMiniDFSCluster fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4602">HDFS-4602</a>.
Major sub-task reported by Suresh Srinivas and fixed by Uma Maheswara Rao G <br>
<b>TestBookKeeperHACheckpoints fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4598">HDFS-4598</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)<br>
<b>WebHDFS concat: the default value of sources in the code does not match the doc</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4596">HDFS-4596</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Shutting down namenode during checkpointing can lead to md5sum error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4595">HDFS-4595</a>.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (hdfs-client)<br>
<b>When short circuit read is fails, DFSClient does not fallback to regular reads</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4593">HDFS-4593</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal <br>
<b>TestSaveNamespace fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4592">HDFS-4592</a>.
Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)<br>
<b>Default values for access time precision are out of sync between hdfs-default.xml and the code</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4591">HDFS-4591</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)<br>
<b>HA clients can fail to fail over while Standby NN is performing long checkpoint</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4586">HDFS-4586</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestDataDirs.testGetDataDirsFromURIs fails with all directories in dfs.datanode.data.dir are invalid</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4583">HDFS-4583</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestNodeCount fails </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4582">HDFS-4582</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestHostsFiles fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4573">HDFS-4573</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>Fix TestINodeFile on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4572">HDFS-4572</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode , test)<br>
<b>Fix TestJournal failures on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4569">HDFS-4569</a>.
Trivial improvement reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Small image transfer related cleanups.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4565">HDFS-4565</a>.
Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta (security)<br>
<b>use DFSUtil.getSpnegoKeytabKey() to get the spnego keytab key in secondary namenode and namenode http server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4544">HDFS-4544</a>.
Major bug reported by Amareshwari Sriramadasu and fixed by Arpit Agarwal <br>
<b>Error in deleting blocks should not do check disk, for all types of errors</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4542">HDFS-4542</a>.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)<br>
<b>Webhdfs doesn't support secure proxy users</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4541">HDFS-4541</a>.
Major bug reported by Arpit Gupta and fixed by Arpit Gupta (datanode , security)<br>
<b>set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4540">HDFS-4540</a>.
Major bug reported by Arpit Gupta and fixed by Arpit Gupta (security)<br>
<b>namenode http server should use the web authentication keytab for spnego principal</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4533">HDFS-4533</a>.
Major bug reported by Fengdong Yu and fixed by Fengdong Yu (datanode , namenode)<br>
<b>start-dfs.sh ignored additional parameters besides -upgrade</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4532">HDFS-4532</a>.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>RPC call queue may fill due to current user lookup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4525">HDFS-4525</a>.
Major sub-task reported by Uma Maheswara Rao G and fixed by SreeHari (namenode)<br>
<b>Provide an API for knowing that whether file is closed or not.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4522">HDFS-4522</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>LightWeightGSet expects incrementing a volatile to be atomic</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4521">HDFS-4521</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>invalid network topologies should not be cached</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4519">HDFS-4519</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode , scripts)<br>
<b>Support override of jsvc binary and log file locations when launching secure datanode.</b><br>
<blockquote>With this improvement the following options are available in release 1.2.0 and later on 1.x release stream:
1. jsvc location can be overridden by setting environment variable JSVC_HOME. Defaults to jsvc binary packaged within the Hadoop distro.
2. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.
3. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err.
With this improvement the following options are available in release 2.0.4 and later on 2.x release stream:
1. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.
2. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err.
For overriding jsvc location on 2.x releases, here is the release notes from HDFS-2303:
To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4518">HDFS-4518</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal <br>
<b>Finer grained metrics for HDFS capacity</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4502">HDFS-4502</a>.
Blocker sub-task reported by Alejandro Abdelnur and fixed by Brandon Li (webhdfs)<br>
<b>WebHdfsFileSystem handling of fileld breaks compatibility</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4495">HDFS-4495</a>.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs-client)<br>
<b>Allow client-side lease renewal to be retried beyond soft-limit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4484">HDFS-4484</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>libwebhdfs compilation broken with gcc 4.6.2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4477">HDFS-4477</a>.
Critical bug reported by Kihwal Lee and fixed by Daryn Sharp (security)<br>
<b>Secondary namenode may retain old tokens</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4471">HDFS-4471</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Namenode WebUI file browsing does not work with wildcard addresses configured</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4470">HDFS-4470</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth <br>
<b>several HDFS tests attempt file operations on invalid HDFS paths when running on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4465">HDFS-4465</a>.
Major improvement reported by Suresh Srinivas and fixed by Aaron T. Myers (datanode)<br>
<b>Optimize datanode ReplicasMap and ReplicaInfo</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4461">HDFS-4461</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>DirectoryScanner: volume path prefix takes up memory for every block that is scanned </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4434">HDFS-4434</a>.
Major sub-task reported by Brandon Li and fixed by Suresh Srinivas (namenode)<br>
<b>Provide a mapping from INodeId to INode</b><br>
<blockquote>This change adds support for referencing files and directories based on fileID/inodeID using a path /.reserved/.inodes/&lt;inodeid&gt;. With this change creating a file or directory /.reserved is not longer allowed. Before upgrading to a release with this change, files /.reserved needs to be renamed to another name.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4382">HDFS-4382</a>.
Major bug reported by Ted Yu and fixed by Ted Yu <br>
<b>Fix typo MAX_NOT_CHANGED_INTERATIONS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4374">HDFS-4374</a>.
Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>Display NameNode startup progress in UI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4373">HDFS-4373</a>.
Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>Add HTTP API for querying NameNode startup progress</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4372">HDFS-4372</a>.
Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>Track NameNode startup progress</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4346">HDFS-4346</a>.
Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Refactor INodeId and GenerationStamp</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4342">HDFS-4342</a>.
Major bug reported by Mark Yang and fixed by Arpit Agarwal (namenode)<br>
<b>Edits dir in dfs.namenode.edits.dir.required will be silently ignored if it is not in dfs.namenode.edits.dir</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4340">HDFS-4340</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (hdfs-client , namenode)<br>
<b>Update addBlock() to inculde inode id as additional argument</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4339">HDFS-4339</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>Persist inode id in fsimage and editlog</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4334">HDFS-4334</a>.
Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>Add a unique id to each INode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4305">HDFS-4305</a>.
Minor bug reported by Todd Lipcon and fixed by Andrew Wang (namenode)<br>
<b>Add a configurable limit on number of blocks per file, and min block size</b><br>
<blockquote>This change introduces a maximum number of blocks per file, by default one million, and a minimum block size, by default 1MB. These can optionally be changed via the configuration settings "dfs.namenode.fs-limits.max-blocks-per-file" and "dfs.namenode.fs-limits.min-block-size", respectively.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4304">HDFS-4304</a>.
Major improvement reported by Todd Lipcon and fixed by Colin Patrick McCabe (namenode)<br>
<b>Make FSEditLogOp.MAX_OP_SIZE configurable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4300">HDFS-4300</a>.
Critical bug reported by Todd Lipcon and fixed by Andrew Wang <br>
<b>TransferFsImage.downloadEditsToStorage should use a tmp file for destination</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4298">HDFS-4298</a>.
Major bug reported by Todd Lipcon and fixed by Aaron T. Myers (namenode)<br>
<b>StorageRetentionManager spews warnings when used with QJM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4296">HDFS-4296</a>.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Add layout version for HDFS-4256 for release 1.2.0</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4287">HDFS-4287</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (webhdfs)<br>
<b>HTTPFS tests fail on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4261">HDFS-4261</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Junping Du (balancer)<br>
<b>TestBalancerWithNodeGroup times out</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4249">HDFS-4249</a>.
Major new feature reported by Suresh Srinivas and fixed by Chris Nauroth (namenode)<br>
<b>Add status NameNode startup to webUI </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4246">HDFS-4246</a>.
Minor improvement reported by Harsh J and fixed by Harsh J (hdfs-client)<br>
<b>The exclude node list should be more forgiving, for each output stream</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4240">HDFS-4240</a>.
Major bug reported by Junping Du and fixed by Junping Du (namenode)<br>
<b>In nodegroup-aware case, make sure nodes are avoided to place replica if some replica are already under the same nodegroup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4235">HDFS-4235</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>when outputting XML, OfflineEditsViewer can't handle some edits containing non-ASCII strings</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4234">HDFS-4234</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (balancer)<br>
<b>Use the generic code for choosing datanode in Balancer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4222">HDFS-4222</a>.
Minor bug reported by Xiaobo Peng and fixed by Xiaobo Peng (namenode)<br>
<b>NN is unresponsive and loses heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4215">HDFS-4215</a>.
Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Improvements on INode and image loading</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4209">HDFS-4209</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Clean up the addNode/addChild/addChildNoQuotaCheck methods in FSDirectory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4206">HDFS-4206</a>.
Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Change the fields in INode and its subclasses to private</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4205">HDFS-4205</a>.
Major bug reported by Andy Isaacson and fixed by Jason Lowe (hdfs-client)<br>
<b>fsck fails with symlinks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4152">HDFS-4152</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)<br>
<b>Add a new class for the parameter in INode.collectSubtreeBlocksAndClear(..)</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4151">HDFS-4151</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Passing INodesInPath instead of INode[] in FSDirectory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4129">HDFS-4129</a>.
Minor test reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Add utility methods to dump NameNode in memory tree for testing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4128">HDFS-4128</a>.
Major bug reported by Todd Lipcon and fixed by Kihwal Lee (namenode)<br>
<b>2NN gets stuck in inconsistent state if edit log replay fails in the middle</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4124">HDFS-4124</a>.
Minor new feature reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4053">HDFS-4053</a>.
Major improvement reported by Eli Collins and fixed by Eli Collins <br>
<b>Increase the default block size</b><br>
<blockquote>The default blocks size prior to this change was 64MB. This jira changes the default block size to 128MB. To go back to previous behavior, please configure the in hdfs-site.xml, the configuration parameter "dfs.blocksize" to 67108864.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4013">HDFS-4013</a>.
Trivial bug reported by Chao Shi and fixed by Chao Shi (hdfs-client)<br>
<b>TestHftpURLTimeouts throws NPE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3940">HDFS-3940</a>.
Minor improvement reported by Eli Collins and fixed by Suresh Srinivas <br>
<b>Add Gset#clear method and clear the block map when namenode is shutdown</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3934">HDFS-3934</a>.
Minor bug reported by Andy Isaacson and fixed by Colin Patrick McCabe <br>
<b>duplicative dfs_hosts entries handled wrong</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3880">HDFS-3880</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li (datanode , ha , namenode , security)<br>
<b>Use Builder to get RPC server in HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3875">HDFS-3875</a>.
Critical bug reported by Todd Lipcon and fixed by Kihwal Lee (datanode , hdfs-client)<br>
<b>Issue handling checksum errors in write pipeline</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3817">HDFS-3817</a>.
Major improvement reported by Brandon Li and fixed by Brandon Li (namenode)<br>
<b>avoid printing stack information for SafeModeException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3792">HDFS-3792</a>.
Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (build , namenode)<br>
<b>Fix two findbugs introduced by HDFS-3695</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3769">HDFS-3769</a>.
Critical sub-task reported by liaowenrui and fixed by (ha)<br>
<b>standby namenode become active fails because starting log segment fail on shared storage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3601">HDFS-3601</a>.
Major new feature reported by Junping Du and fixed by Junping Du (namenode)<br>
<b>Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network topology</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3499">HDFS-3499</a>.
Major bug reported by Junping Du and fixed by Junping Du (datanode)<br>
<b>Make NetworkTopology support user specified topology class</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3498">HDFS-3498</a>.
Major improvement reported by Junping Du and fixed by Junping Du (namenode)<br>
<b>Make Replica Removal Policy pluggable and ReplicaPlacementPolicyDefault extensible for reusing code in subclass</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3495">HDFS-3495</a>.
Major new feature reported by Junping Du and fixed by Junping Du (balancer)<br>
<b>Update Balancer to support new NetworkTopology with NodeGroup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3277">HDFS-3277</a>.
Major bug reported by Colin Patrick McCabe and fixed by Andrew Wang <br>
<b>fail over to loading a different FSImage if the first one we try to load is corrupt</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3180">HDFS-3180</a>.
Major bug reported by Daryn Sharp and fixed by Chris Nauroth (webhdfs)<br>
<b>Add socket timeouts to webhdfs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3163">HDFS-3163</a>.
Trivial improvement reported by Brandon Li and fixed by Brandon Li (test)<br>
<b>TestHDFSCLI.testAll fails if the user name is not all lowercase</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3009">HDFS-3009</a>.
Trivial bug reported by Hari Mankude and fixed by Hari Mankude (hdfs-client)<br>
<b>DFSClient islocaladdress() can use similar routine in netutils</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2857">HDFS-2857</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>Cleanup BlockInfo class</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2576">HDFS-2576</a>.
Major new feature reported by Pritam Damania and fixed by Devaraj Das (hdfs-client , namenode)<br>
<b>Namenode should have a favored nodes hint to enable clients to have control over block placement.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2572">HDFS-2572</a>.
Trivial improvement reported by Harsh J and fixed by Harsh J (datanode)<br>
<b>Unnecessary double-check in DN#getHostName</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2042">HDFS-2042</a>.
Minor improvement reported by Eli Collins and fixed by (libhdfs)<br>
<b>Require c99 when building libhdfs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-1804">HDFS-1804</a>.
Minor new feature reported by Harsh J and fixed by Aaron T. Myers (datanode)<br>
<b>Add a new block-volume device choosing policy that looks at free space</b><br>
<blockquote>There is now a new option to have the DN take into account available disk space on each volume when choosing where to place a replica when performing an HDFS write. This can be enabled by setting the config "dfs.datanode.fsdataset.volume.choosing.policy" to the value "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</a>.
Major improvement reported by George Porter and fixed by Colin Patrick McCabe (datanode , hdfs-client , performance)<br>
<b>DFS read performance suboptimal when client co-located on nodes with data</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9872">HADOOP-9872</a>.
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>Improve protoc version handling and detection</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9850">HADOOP-9850</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>RPC kerberos errors don't trigger relogin</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9845">HADOOP-9845</a>.
Blocker improvement reported by stack and fixed by Alejandro Abdelnur (performance)<br>
<b>Update protobuf to 2.5 from 2.4.x</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9832">HADOOP-9832</a>.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>Add RPC header to client ping</b><br>
<blockquote>Client ping will be sent as a RPC header with a reserved callId instead of as a sentinel RPC packet length.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9820">HADOOP-9820</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)<br>
<b>RPCv9 wire protocol is insufficient to support multiplexing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9816">HADOOP-9816</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)<br>
<b>RPC Sasl QOP is broken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9792">HADOOP-9792</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)<br>
<b>Retry the methods that are tagged @AtMostOnce along with @Idempotent</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9787">HADOOP-9787</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (util)<br>
<b>ShutdownHelper util to shutdown threads and threadpools</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9786">HADOOP-9786</a>.
Major bug reported by Jing Zhao and fixed by Jing Zhao <br>
<b>RetryInvocationHandler#isRpcInvocation should support ProtocolTranslator </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9773">HADOOP-9773</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)<br>
<b>TestLightWeightCache fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9770">HADOOP-9770</a>.
Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)<br>
<b>Make RetryCache#state non volatile</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9763">HADOOP-9763</a>.
Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (util)<br>
<b>Extends LightWeightGSet to support eviction of expired elements</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9762">HADOOP-9762</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)<br>
<b>RetryCache utility for implementing RPC retries</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9760">HADOOP-9760</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)<br>
<b>Move GSet and LightWeightGSet to hadoop-common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9759">HADOOP-9759</a>.
Critical bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>Add support for NativeCodeLoader#getLibraryName on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9756">HADOOP-9756</a>.
Minor improvement reported by Junping Du and fixed by Junping Du (ipc)<br>
<b>Additional cleanup RPC code</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9754">HADOOP-9754</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)<br>
<b>Clean up RPC code</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9751">HADOOP-9751</a>.
Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)<br>
<b>Add clientId and retryCount to RpcResponseHeaderProto</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9738">HADOOP-9738</a>.
Major bug reported by Kihwal Lee and fixed by Jing Zhao (tools)<br>
<b>TestDistCh fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9734">HADOOP-9734</a>.
Minor improvement reported by Jason Lowe and fixed by Jason Lowe (ipc)<br>
<b>Common protobuf definitions for GetUserMappingsProtocol, RefreshAuthorizationPolicyProtocol and RefreshUserMappingsProtocol</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9720">HADOOP-9720</a>.
Major sub-task reported by Suresh Srinivas and fixed by Arpit Agarwal <br>
<b>Rename Client#uuid to Client#clientId</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9717">HADOOP-9717</a>.
Major improvement reported by Suresh Srinivas and fixed by Jing Zhao (ipc)<br>
<b>Add retry attempt count to the RPC requests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9716">HADOOP-9716</a>.
Major improvement reported by Suresh Srinivas and fixed by Tsz Wo (Nicholas), SZE (ipc)<br>
<b>Move the Rpc request call ID generation to client side InvocationHandler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9707">HADOOP-9707</a>.
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (util)<br>
<b>Fix register lists for crc32c inline assembly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9701">HADOOP-9701</a>.
Minor bug reported by Steve Loughran and fixed by Karthik Kambatla (documentation)<br>
<b>mvn site ambiguous links in hadoop-common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9698">HADOOP-9698</a>.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>RPCv9 client must honor server's SASL negotiate response</b><br>
<blockquote>The RPC client now waits for the Server's SASL negotiate response before instantiating its SASL client.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9691">HADOOP-9691</a>.
Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (ipc)<br>
<b>RPC clients can generate call ID using AtomicInteger instead of synchronizing on the Client instance.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9688">HADOOP-9688</a>.
Blocker improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)<br>
<b>Add globally unique Client ID to RPC requests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9683">HADOOP-9683</a>.
Blocker sub-task reported by Luke Lu and fixed by Daryn Sharp (ipc)<br>
<b>Wrap IpcConnectionContext in RPC headers</b><br>
<blockquote>Connection context is now sent as a rpc header wrapped protobuf.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9681">HADOOP-9681</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>FileUtil.unTarUsingJava() should close the InputStream upon finishing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9678">HADOOP-9678</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestRPC#testStopsAllThreads intermittently fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9676">HADOOP-9676</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>make maximum RPC buffer size configurable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9673">HADOOP-9673</a>.
Trivial improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (net)<br>
<b>NetworkTopology: when a node can't be added, print out its location for diagnostic purposes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9665">HADOOP-9665</a>.
Critical bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>BlockDecompressorStream#decompress will throw EOFException instead of return -1 when EOF</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9661">HADOOP-9661</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (metrics)<br>
<b>Allow metrics sources to be extended</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9656">HADOOP-9656</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test , tools)<br>
<b>Gridmix unit tests fail on Windows and Linux</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9649">HADOOP-9649</a>.
Blocker improvement reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Promote YARN service life-cycle libraries into Hadoop Common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9643">HADOOP-9643</a>.
Minor bug reported by Mark Miller and fixed by Mark Miller (security)<br>
<b>org.apache.hadoop.security.SecurityUtil calls toUpperCase(Locale.getDefault()) as well as toLowerCase(Locale.getDefault()) on hadoop.security.authentication value.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9638">HADOOP-9638</a>.
Major bug reported by Chris Nauroth and fixed by Andrey Klochkov (test)<br>
<b>parallel test changes caused invalid test path for several HDFS tests on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9637">HADOOP-9637</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>Adding Native Fstat for Windows as needed by YARN</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9632">HADOOP-9632</a>.
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestShellCommandFencer will fail if there is a 'host' machine in the network</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9630">HADOOP-9630</a>.
Major sub-task reported by Luke Lu and fixed by Junping Du (ipc)<br>
<b>Remove IpcSerializationType</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9625">HADOOP-9625</a>.
Minor improvement reported by Paul Han and fixed by (bin , conf)<br>
<b>HADOOP_OPTS not picked up by hadoop command</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9624">HADOOP-9624</a>.
Minor test reported by Xi Fang and fixed by Xi Fang (test)<br>
<b>TestFSMainOperationsLocalFileSystem failed when the Hadoop test root path has "X" in its name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9619">HADOOP-9619</a>.
Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (documentation)<br>
<b>Mark stability of .proto files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9607">HADOOP-9607</a>.
Minor bug reported by Timothy St. Clair and fixed by (documentation)<br>
<b>Fixes in Javadoc build</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9605">HADOOP-9605</a>.
Major improvement reported by Timothy St. Clair and fixed by (build)<br>
<b>Update junit dependency</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9604">HADOOP-9604</a>.
Minor improvement reported by Jingguo Yao and fixed by Jingguo Yao (fs)<br>
<b>Wrong Javadoc of FSDataOutputStream</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9599">HADOOP-9599</a>.
Major bug reported by Mostafa Elhemali and fixed by Mostafa Elhemali <br>
<b>hadoop-config.cmd doesn't set JAVA_LIBRARY_PATH correctly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9593">HADOOP-9593</a>.
Major bug reported by Steve Loughran and fixed by Steve Loughran (util)<br>
<b>stack trace printed at ERROR for all yarn clients without hadoop.home set</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9581">HADOOP-9581</a>.
Major bug reported by Ashwin Shankar and fixed by Ashwin Shankar (scripts)<br>
<b>hadoop --config non-existent directory should result in error </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9574">HADOOP-9574</a>.
Major bug reported by Jian He and fixed by Jian He <br>
<b>Add new methods in AbstractDelegationTokenSecretManager for restoring RMDelegationTokens on RMRestart</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9566">HADOOP-9566</a>.
Major bug reported by Lenni Kuff and fixed by Colin Patrick McCabe (native)<br>
<b>Performing direct read using libhdfs sometimes raises SIGPIPE (which in turn throws SIGABRT) causing client crashes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9563">HADOOP-9563</a>.
Major bug reported by Kihwal Lee and fixed by Tian Hong Wang (util)<br>
<b>Fix incompatibility introduced by HADOOP-9523</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9560">HADOOP-9560</a>.
Minor improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (metrics)<br>
<b>metrics2#JvmMetrics should have max memory size of JVM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9556">HADOOP-9556</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (ha , test)<br>
<b>disable HA tests on Windows that fail due to ZooKeeper client connection management bug</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9553">HADOOP-9553</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestAuthenticationToken fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9550">HADOOP-9550</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Remove aspectj dependency</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9549">HADOOP-9549</a>.
Blocker bug reported by Kihwal Lee and fixed by Daryn Sharp (security)<br>
<b>WebHdfsFileSystem hangs on close()</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9532">HADOOP-9532</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (bin)<br>
<b>HADOOP_CLIENT_OPTS is appended twice by Windows cmd scripts</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9527">HADOOP-9527</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (fs , test)<br>
<b>Add symlink support to LocalFileSystem on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9526">HADOOP-9526</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestShellCommandFencer and TestShell fail on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9524">HADOOP-9524</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (ha)<br>
<b>Fix ShellCommandFencer to work on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9523">HADOOP-9523</a>.
Major improvement reported by Tian Hong Wang and fixed by Tian Hong Wang <br>
<b>Provide a generic IBM java vendor flag in PlatformName.java to support non-Sun JREs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9517">HADOOP-9517</a>.
Blocker bug reported by Arun C Murthy and fixed by Karthik Kambatla (documentation)<br>
<b>Document Hadoop Compatibility</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9515">HADOOP-9515</a>.
Major new feature reported by Brandon Li and fixed by Brandon Li <br>
<b>Add general interface for NFS and Mount</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9511">HADOOP-9511</a>.
Major improvement reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Adding support for additional input streams (FSDataInputStream and RandomAccessFile) in SecureIOUtils.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9509">HADOOP-9509</a>.
Major new feature reported by Brandon Li and fixed by Brandon Li <br>
<b>Implement ONCRPC and XDR</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9507">HADOOP-9507</a>.
Minor bug reported by Mostafa Elhemali and fixed by Chris Nauroth (fs)<br>
<b>LocalFileSystem rename() is broken in some cases when destination exists</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9504">HADOOP-9504</a>.
Critical bug reported by Liang Xie and fixed by Liang Xie (metrics)<br>
<b>MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9503">HADOOP-9503</a>.
Minor improvement reported by Varun Sharma and fixed by Varun Sharma (ipc)<br>
<b>Remove sleep between IPC client connect timeouts</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9500">HADOOP-9500</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestUserGroupInformation#testGetServerSideGroups fails on Windows due to failure to find winutils.exe</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9496">HADOOP-9496</a>.
Critical bug reported by Gopal V and fixed by Harsh J (bin)<br>
<b>Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need HADOOP_CLASSPATH </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9490">HADOOP-9490</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic (fs)<br>
<b>LocalFileSystem#reportChecksumFailure not closing the checksum file handle before rename</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9488">HADOOP-9488</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)<br>
<b>FileUtil#createJarWithClassPath only substitutes environment variables from current process environment/does not support overriding when launching new process</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9486">HADOOP-9486</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Chris Nauroth <br>
<b>Promote Windows and Shell related utils from YARN to Hadoop Common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9485">HADOOP-9485</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (net)<br>
<b>No default value in the code for hadoop.rpc.socket.factory.class.default</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9483">HADOOP-9483</a>.
Major improvement reported by Chris Nauroth and fixed by Arpit Agarwal (util)<br>
<b>winutils support for readlink command</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9481">HADOOP-9481</a>.
Minor bug reported by Vadim Bondarev and fixed by Vadim Bondarev <br>
<b>Broken conditional logic with HADOOP_SNAPPY_LIBRARY</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9473">HADOOP-9473</a>.
Trivial bug reported by Glen Mazza and fixed by (fs)<br>
<b>typo in FileUtil copy() method</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9469">HADOOP-9469</a>.
Major bug reported by Thomas Graves and fixed by Robert Parker <br>
<b>mapreduce/yarn source jars not included in dist tarball</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9459">HADOOP-9459</a>.
Critical bug reported by Vinay and fixed by Vinay (ha)<br>
<b>ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9455">HADOOP-9455</a>.
Minor bug reported by Sangjin Lee and fixed by Chris Nauroth (bin)<br>
<b>HADOOP_CLIENT_OPTS appended twice causes JVM failures</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9451">HADOOP-9451</a>.
Major bug reported by Junping Du and fixed by Junping Du (net)<br>
<b>Node with one topology layer should be handled as fault topology when NodeGroup layer is enabled</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9450">HADOOP-9450</a>.
Major improvement reported by Mitch Wyle and fixed by Harsh J (scripts)<br>
<b>HADOOP_USER_CLASSPATH_FIRST is not honored; CLASSPATH is PREpended instead of APpended</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9443">HADOOP-9443</a>.
Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>Port winutils static code analysis change to trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9439">HADOOP-9439</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)<br>
<b>JniBasedUnixGroupsMapping: fix some crash bugs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9437">HADOOP-9437</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestNativeIO#testRenameTo fails on Windows due to assumption that POSIX errno is embedded in NativeIOException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9430">HADOOP-9430</a>.
Major bug reported by Amir Sanjar and fixed by (security)<br>
<b>TestSSLFactory fails on IBM JVM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9429">HADOOP-9429</a>.
Major bug reported by Amir Sanjar and fixed by (test)<br>
<b>TestConfiguration fails with IBM JAVA </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9425">HADOOP-9425</a>.
Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
<b>Add error codes to rpc-response</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9421">HADOOP-9421</a>.
Blocker sub-task reported by Sanjay Radia and fixed by Daryn Sharp <br>
<b>Convert SASL to use ProtoBuf and provide negotiation capabilities</b><br>
<blockquote>Raw SASL protocol now uses protobufs wrapped with RPC headers.
The negotiation sequence incorporates the state of the exchange.
The server now has the ability to advertise its supported auth types.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9418">HADOOP-9418</a>.
Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
<b>Add symlink resolution support to DistributedFileSystem</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9416">HADOOP-9416</a>.
Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
<b>Add new symlink resolution methods in FileSystem and FileSystemLinkResolver</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9414">HADOOP-9414</a>.
Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
<b>Refactor out FSLinkResolver and relevant helper methods</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9413">HADOOP-9413</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>Introduce common utils for File#setReadable/Writable/Executable and File#canRead/Write/Execute that work cross-platform</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9408">HADOOP-9408</a>.
Minor bug reported by rajeshbabu and fixed by rajeshbabu (conf)<br>
<b>misleading description for net.topology.table.file.name property in core-default.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9407">HADOOP-9407</a>.
Major bug reported by Sangjin Lee and fixed by Sangjin Lee (build)<br>
<b>commons-daemon 1.0.3 dependency has bad group id causing build issues</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9405">HADOOP-9405</a>.
Minor bug reported by Andrew Wang and fixed by Andrew Wang (test , tools)<br>
<b>TestGridmixSummary#testExecutionSummarizer is broken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9401">HADOOP-9401</a>.
Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>CodecPool: Add counters for number of (de)compressors leased out</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9399">HADOOP-9399</a>.
Minor bug reported by Todd Lipcon and fixed by Konstantin Boudnik (build)<br>
<b>protoc maven plugin doesn't work on mvn 3.0.2</b><br>
<blockquote>Committed to 2.0.4-alpha branch</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9397">HADOOP-9397</a>.
Major bug reported by Jason Lowe and fixed by Chris Nauroth (build)<br>
<b>Incremental dist tar build fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9388">HADOOP-9388</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestFsShellCopy fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9380">HADOOP-9380</a>.
Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
<b>Add totalLength to rpc response</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9379">HADOOP-9379</a>.
Trivial improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>capture the ulimit info after printing the log to the console</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9376">HADOOP-9376</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestProxyUserFromEnv fails on a Windows domain joined machine</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9373">HADOOP-9373</a>.
Minor bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>Merge CHANGES.branch-trunk-win.txt to CHANGES.txt</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9369">HADOOP-9369</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (net)<br>
<b>DNS#reverseDns() can return hostname with . appended at the end</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9365">HADOOP-9365</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestHAZKUtil fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9364">HADOOP-9364</a>.
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>PathData#expandAsGlob does not return correct results for absolute paths on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9358">HADOOP-9358</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc , security)<br>
<b>"Auth failed" log should include exception string</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9355">HADOOP-9355</a>.
Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
<b>Abstract symlink tests to use either FileContext or FileSystem</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9353">HADOOP-9353</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (build)<br>
<b>Activate native-win profile by default on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9352">HADOOP-9352</a>.
Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)<br>
<b>Expose UGI.setLoginUser for tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9349">HADOOP-9349</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (tools)<br>
<b>Confusing output when running hadoop version from one hadoop installation when HADOOP_HOME points to another</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9343">HADOOP-9343</a>.
Major improvement reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Allow additional exceptions through the RPC layer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9342">HADOOP-9342</a>.
Major bug reported by Thomas Weise and fixed by Thomas Weise (build)<br>
<b>Remove jline from distribution</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9339">HADOOP-9339</a>.
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>IPC.Server incorrectly sets UGI auth type</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9338">HADOOP-9338</a>.
Major new feature reported by Nick White and fixed by Nick White (fs)<br>
<b>FsShell Copy Commands Should Optionally Preserve File Attributes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9337">HADOOP-9337</a>.
Major bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky <br>
<b>org.apache.hadoop.fs.DF.getMount() does not work on Mac OS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9336">HADOOP-9336</a>.
Critical improvement reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>Allow UGI of current connection to be queried</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9334">HADOOP-9334</a>.
Minor improvement reported by Nicolas Liochon and fixed by Nicolas Liochon (build)<br>
<b>Update netty version</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9323">HADOOP-9323</a>.
Minor bug reported by Hao Zhong and fixed by Suresh Srinivas (documentation , fs , io , record)<br>
<b>Typos in API documentation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9322">HADOOP-9322</a>.
Minor improvement reported by Harsh J and fixed by Harsh J (security)<br>
<b>LdapGroupsMapping doesn't seem to set a timeout for its directory search</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9318">HADOOP-9318</a>.
Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>when exiting on a signal, print the signal name first</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9307">HADOOP-9307</a>.
Major bug reported by Todd Lipcon and fixed by Todd Lipcon (fs)<br>
<b>BufferedFSInputStream.read returns wrong results after certain seeks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9305">HADOOP-9305</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (security)<br>
<b>Add support for running the Hadoop client on 64-bit AIX</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9304">HADOOP-9304</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>remove addition of avro genreated-sources dirs to build</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9303">HADOOP-9303</a>.
Major bug reported by Thomas Graves and fixed by Andy Isaacson <br>
<b>command manual dfsadmin missing entry for restoreFailedStorage option</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9302">HADOOP-9302</a>.
Major bug reported by Thomas Graves and fixed by Andy Isaacson (documentation)<br>
<b>HDFS docs not linked from top level</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9299">HADOOP-9299</a>.
Blocker bug reported by Roman Shaposhnik and fixed by Daryn Sharp (security)<br>
<b>kerberos name resolution is kicking in even when kerberos is not configured</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9297">HADOOP-9297</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
<b>remove old record IO generation and tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9294">HADOOP-9294</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>GetGroupsTestBase fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9290">HADOOP-9290</a>.
Major bug reported by Arpit Agarwal and fixed by Chris Nauroth (build , native)<br>
<b>Some tests cannot load native library</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9287">HADOOP-9287</a>.
Major test reported by Tsuyoshi OZAWA and fixed by Andrey Klochkov (test)<br>
<b>Parallel testing hadoop-common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9283">HADOOP-9283</a>.
Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (security)<br>
<b>Add support for running the Hadoop client on AIX</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9279">HADOOP-9279</a>.
Major improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (build , documentation)<br>
<b>Document the need to build hadoop-maven-plugins for eclipse and separate project builds</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9267">HADOOP-9267</a>.
Minor bug reported by Andrew Wang and fixed by Andrew Wang <br>
<b>hadoop -help, -h, --help should show usage instructions</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9264">HADOOP-9264</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)<br>
<b>port change to use Java untar API on Windows from branch-1-win to trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9253">HADOOP-9253</a>.
Major improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>Capture ulimit info in the logs at service start time</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9246">HADOOP-9246</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (build)<br>
<b>Execution phase for hadoop-maven-plugin should be process-resources</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9245">HADOOP-9245</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (build)<br>
<b>mvn clean without running mvn install before fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9233">HADOOP-9233</a>.
Major test reported by Vadim Bondarev and fixed by Vadim Bondarev <br>
<b>Cover package org.apache.hadoop.io.compress.zlib with unit tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9230">HADOOP-9230</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (test)<br>
<b>TestUniformSizeInputFormat fails intermittently</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9222">HADOOP-9222</a>.
Major test reported by Vadim Bondarev and fixed by Vadim Bondarev <br>
<b>Cover package with org.apache.hadoop.io.lz4 unit tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9220">HADOOP-9220</a>.
Critical bug reported by Tom White and fixed by Tom White (ha)<br>
<b>Unnecessary transition to standby in ActiveStandbyElector</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9218">HADOOP-9218</a>.
Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
<b>Document the Rpc-wrappers used internally</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9211">HADOOP-9211</a>.
Major bug reported by Sarah Weissman and fixed by Plamen Jeliazkov (conf)<br>
<b>HADOOP_CLIENT_OPTS default setting fixes max heap size at 128m, disregards HADOOP_HEAPSIZE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9209">HADOOP-9209</a>.
Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (fs , tools)<br>
<b>Add shell command to dump file checksums</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9164">HADOOP-9164</a>.
Minor improvement reported by Binglin Chang and fixed by Binglin Chang (native)<br>
<b>Print paths of loaded native libraries in NativeLibraryChecker</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9163">HADOOP-9163</a>.
Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
<b>The rpc msg in ProtobufRpcEngine.proto should be moved out to avoid an extra copy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9154">HADOOP-9154</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (io)<br>
<b>SortedMapWritable#putAll() doesn't add key/value classes to the map</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9151">HADOOP-9151</a>.
Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
<b>Include RPC error info in RpcResponseHeader instead of sending it separately</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9150">HADOOP-9150</a>.
Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (fs/s3 , ha , performance , viewfs)<br>
<b>Unnecessary DNS resolution attempts for logical URIs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9140">HADOOP-9140</a>.
Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
<b>Cleanup rpc PB protos</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9131">HADOOP-9131</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
<b>TestLocalFileSystem#testListStatusWithColons cannot run on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9125">HADOOP-9125</a>.
Major bug reported by Kai Zheng and fixed by Kai Zheng (security)<br>
<b>LdapGroupsMapping threw CommunicationException after some idle time</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9117">HADOOP-9117</a>.
Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>replace protoc ant plugin exec with a maven plugin</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9043">HADOOP-9043</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (util)<br>
<b>disallow in winutils creating symlinks with forwards slashes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8982">HADOOP-8982</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (net)<br>
<b>TestSocketIOWithTimeout fails on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8973">HADOOP-8973</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (util)<br>
<b>DiskChecker cannot reliably detect an inaccessible disk on Windows with NTFS ACLs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8958">HADOOP-8958</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (viewfs)<br>
<b>ViewFs:Non absolute mount name failures when running multiple tests on Windows</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8957">HADOOP-8957</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)<br>
<b>AbstractFileSystem#IsValidName should be overridden for embedded file systems like ViewFs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8924">HADOOP-8924</a>.
Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (build)<br>
<b>Add maven plugin alternative to shell script to save package-info.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8917">HADOOP-8917</a>.
Major bug reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>add LOCALE.US to toLowerCase in SecurityUtil.replacePattern</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8886">HADOOP-8886</a>.
Major improvement reported by Eli Collins and fixed by Eli Collins (fs)<br>
<b>Remove KFS support</b><br>
<blockquote>Kosmos FS (KFS) is no longer maintained and Hadoop support has been removed. KFS has been replaced by QFS (HADOOP-8885).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8711">HADOOP-8711</a>.
Major improvement reported by Brandon Li and fixed by Brandon Li (ipc)<br>
<b>provide an option for IPC server users to avoid printing stack information for certain exceptions</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8569">HADOOP-8569</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8562">HADOOP-8562</a>.
Major new feature reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Enhancements to support Hadoop on Windows Server and Windows Azure environments</b><br>
<blockquote>This umbrella jira makes enhancements to support Hadoop natively on Windows Server and Windows Azure environments.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8470">HADOOP-8470</a>.
Major sub-task reported by Junping Du and fixed by Junping Du <br>
<b>Implementation of 4-layer subclass of NetworkTopology (NetworkTopologyWithNodeGroup)</b><br>
<blockquote>This patch should be checked in together (or after) with JIRA Hadoop-8469: https://issues.apache.org/jira/browse/HADOOP-8469</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8469">HADOOP-8469</a>.
Major sub-task reported by Junping Du and fixed by Junping Du <br>
<b>Make NetworkTopology class pluggable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8462">HADOOP-8462</a>.
Major improvement reported by Govind Kamat and fixed by Govind Kamat (io)<br>
<b>Native-code implementation of bzip2 codec</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8440">HADOOP-8440</a>.
Minor bug reported by Ivan Mitic and fixed by Ivan Mitic (fs)<br>
<b>HarFileSystem.decodeHarURI fails for URIs whose host contains numbers</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8415">HADOOP-8415</a>.
Minor improvement reported by Jan van der Lugt and fixed by Jan van der Lugt (conf)<br>
<b>getDouble() and setDouble() in org.apache.hadoop.conf.Configuration</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7487">HADOOP-7487</a>.
Major bug reported by Todd Lipcon and fixed by Andrew Wang (fs)<br>
<b>DF should throw a more reasonable exception when mount cannot be determined</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7391">HADOOP-7391</a>.
Major bug reported by Sanjay Radia and fixed by Sanjay Radia <br>
<b>Document Interface Classification from HADOOP-5073</b><br>
<blockquote></blockquote></li>
</ul>
</body></html>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.0.5-alpha Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.0.5-alpha Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.0.4-alpha</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5240">MAPREDUCE-5240</a>.
Blocker bug reported by Roman Shaposhnik and fixed by Vinod Kumar Vavilapalli (mrv2)<br>
<b>inside of FileOutputCommitter the initialized Credentials cache appears to be empty</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4482">HDFS-4482</a>.
Blocker bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (namenode)<br>
<b>ReplicationMonitor thread can exit with NPE due to the race between delete and replication of same file.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9407">HADOOP-9407</a>.
Major bug reported by Sangjin Lee and fixed by Sangjin Lee (build)<br>
<b>commons-daemon 1.0.3 dependency has bad group id causing build issues</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8419">HADOOP-8419</a>.
Major bug reported by Luke Lu and fixed by Yu Li (io)<br>
<b>GzipCodec NPE upon reset with IBM JDK</b><br>
<blockquote></blockquote></li>
</ul>
</body></html>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.0.4-alpha Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.0.4-alpha Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.0.3-alpha</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-470">YARN-470</a>.
Major bug reported by Hitesh Shah and fixed by Siddharth Seth (nodemanager)<br>
<b>Support a way to disable resource monitoring on the NodeManager</b><br>
<blockquote>Currently, the memory management monitor's check is disabled when the maxMem is set to -1. However, the maxMem is also sent to the RM when the NM registers with it ( to define the max limit of allocate-able resources ).
We need an explicit flag to disable monitoring to avoid the problems caused by the overloading of the max memory value.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-449">YARN-449</a>.
Blocker bug reported by Siddharth Seth and fixed by <br>
<b>HBase test failures when running against Hadoop 2</b><br>
<blockquote>Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly.
Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-443">YARN-443</a>.
Major improvement reported by Thomas Graves and fixed by Thomas Graves (nodemanager)<br>
<b>allow OS scheduling priority of NM to be different than the containers it launches</b><br>
<blockquote>It would be nice if we could have the nodemanager run at a different OS scheduling priority than the containers so that you can still communicate with the nodemanager if the containers out of control.
On linux we could launch the nodemanager at a higher priority, but then all the containers it launches would also be at that higher priority, so we need a way for the container executor to launch them at a lower priority.
I'm not sure how this applies to windows if at all.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-429">YARN-429</a>.
Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (resourcemanager)<br>
<b>capacity-scheduler config missing from yarn-test artifact</b><br>
<blockquote>MiniYARNCluster and MiniMRCluster are unusable by downstream projects with the 2.0.3-alpha release, since the capacity-scheduler configuration is missing from the test artifact.
hadoop-yarn-server-tests-3.0.0-SNAPSHOT-tests.jar should include the default capacity-scheduler configuration. Also, this doesn't need to be part of the default classpath - and should be moved out of the top level directory in the dist package.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5117">MAPREDUCE-5117</a>.
Blocker bug reported by Roman Shaposhnik and fixed by Siddharth Seth (security)<br>
<b>With security enabled HS delegation token renewer fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5094">MAPREDUCE-5094</a>.
Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Disable mem monitoring by default in MiniMRYarnCluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5088">MAPREDUCE-5088</a>.
Blocker bug reported by Roman Shaposhnik and fixed by Daryn Sharp <br>
<b>MR Client gets an renewer token exception while Oozie is submitting a job</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5083">MAPREDUCE-5083</a>.
Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)<br>
<b>MiniMRCluster should use a random component when creating an actual cluster</b><br>
<blockquote>Committed to branch-2.0.4. Modified changes.txt in trunk, branch-2 and branch-2.0.4 accordingly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5053">MAPREDUCE-5053</a>.
Major bug reported by Robert Parker and fixed by Robert Parker <br>
<b>java.lang.InternalError from decompression codec cause reducer to fail</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5023">MAPREDUCE-5023</a>.
Critical bug reported by Kendall Thrapp and fixed by Ravi Prakash (jobhistoryserver , webapps)<br>
<b>History Server Web Services missing Job Counters</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5006">MAPREDUCE-5006</a>.
Major bug reported by Alejandro Abdelnur and fixed by Sandy Ryza (contrib/streaming)<br>
<b>streaming tests failing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4549">MAPREDUCE-4549</a>.
Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)<br>
<b>Distributed cache conflicts breaks backwards compatability</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3685">MAPREDUCE-3685</a>.
Critical bug reported by anty.rao and fixed by anty (mrv2)<br>
<b>There are some bugs in implementation of MergeManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4649">HDFS-4649</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode , security , webhdfs)<br>
<b>Webhdfs cannot list large directories</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4646">HDFS-4646</a>.
Minor bug reported by Jagane Sundar and fixed by (namenode)<br>
<b>createNNProxyWithClientProtocol ignores configured timeout value</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4581">HDFS-4581</a>.
Major bug reported by Rohit Kochar and fixed by Rohit Kochar (datanode)<br>
<b>DataNode#checkDiskError should not be called on network errors</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4577">HDFS-4577</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)<br>
<b>Webhdfs operations should declare if authentication is required</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4571">HDFS-4571</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (webhdfs)<br>
<b>WebHDFS should not set the service hostname on the server side</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4567">HDFS-4567</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)<br>
<b>Webhdfs does not need a token for token operations</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4566">HDFS-4566</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)<br>
<b>Webdhfs token cancelation should use authentication</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4560">HDFS-4560</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)<br>
<b>Webhdfs cannot use tokens obtained by another user</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4548">HDFS-4548</a>.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp <br>
<b>Webhdfs doesn't renegotiate SPNEGO token</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3344">HDFS-3344</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Kihwal Lee (namenode)<br>
<b>Unreliable corrupt blocks counting in TestProcessCorruptBlocks</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9471">HADOOP-9471</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>hadoop-client wrongfully excludes jetty-util JAR, breaking webhdfs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9467">HADOOP-9467</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (metrics)<br>
<b>Metrics2 record filtering (.record.filter.include/exclude) does not filter by name</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9444">HADOOP-9444</a>.
Blocker bug reported by Konstantin Boudnik and fixed by Roman Shaposhnik (conf)<br>
<b>$var shell substitution in properties are not expanded in hadoop-policy.xml</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9406">HADOOP-9406</a>.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>hadoop-client leaks dependency on JDK tools jar</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9405">HADOOP-9405</a>.
Minor bug reported by Andrew Wang and fixed by Andrew Wang (test , tools)<br>
<b>TestGridmixSummary#testExecutionSummarizer is broken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9399">HADOOP-9399</a>.
Minor bug reported by Todd Lipcon and fixed by Konstantin Boudnik (build)<br>
<b>protoc maven plugin doesn't work on mvn 3.0.2</b><br>
<blockquote>Committed to 2.0.4-alpha branch</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9379">HADOOP-9379</a>.
Trivial improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>capture the ulimit info after printing the log to the console</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9374">HADOOP-9374</a>.
Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)<br>
<b>Add tokens from -tokenCacheFile into UGI</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9301">HADOOP-9301</a>.
Blocker bug reported by Roman Shaposhnik and fixed by Alejandro Abdelnur (build)<br>
<b>hadoop client servlet/jsp/jetty/tomcat JARs creating conflicts in Oozie &amp; HttpFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9299">HADOOP-9299</a>.
Blocker bug reported by Roman Shaposhnik and fixed by Daryn Sharp (security)<br>
<b>kerberos name resolution is kicking in even when kerberos is not configured</b><br>
<blockquote></blockquote></li>
</ul>
</body></html>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 2.0.3-alpha Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
</STYLE>
</head>
<body>
<h1>Hadoop 2.0.3-alpha Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 2.0.2</h2>
<ul>
<li> <a href="https://issues.apache.org/jira/browse/YARN-372">YARN-372</a>.
Minor task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Move InlineDispatcher from hadoop-yarn-server-resourcemanager to hadoop-yarn-common</b><br>
<blockquote>InlineDispatcher is a utility used in unit tests. Belongs in yarn-common instead of yarn-server-resource-manager.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-364">YARN-364</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe <br>
<b>AggregatedLogDeletionService can take too long to delete logs</b><br>
<blockquote>AggregatedLogDeletionService uses the yarn.log-aggregation.retain-seconds property to determine which logs should be deleted, but it uses the same value to determine how often to check for old logs. This means logs could actually linger up to twice as long as configured.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-360">YARN-360</a>.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp <br>
<b>Allow apps to concurrently register tokens for renewal</b><br>
<blockquote>{{DelegationTokenRenewer#addApplication}} has an unnecessary {{synchronized}} keyword. This serializes job submissions and can add unnecessary latency and/or hang all submissions if there are problems renewing the token.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-357">YARN-357</a>.
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>App submission should not be synchronized</b><br>
<blockquote>MAPREDUCE-2953 fixed a race condition with querying of app status by making {{RMClientService#submitApplication}} synchronously invoke {{RMAppManager#submitApplication}}. However, the {{synchronized}} keyword was also added to {{RMAppManager#submitApplication}} with the comment:
bq. I made the submitApplication synchronized to keep it consistent with the other routines in RMAppManager although I do not believe it needs it since the rmapp datastructure is already a concurrentMap and I don't see anything else that would be an issue.
It's been observed that app submission latency is being unnecessarily impacted.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-355">YARN-355</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM app submission jams under load</b><br>
<blockquote>The RM performs a loopback connection to itself to renew its own tokens. If app submissions consume all RPC handlers for {{ClientRMProtocol}}, then app submissions block because it cannot loopback to itself to do the renewal.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-354">YARN-354</a>.
Blocker bug reported by Liang Xie and fixed by Liang Xie <br>
<b>WebAppProxyServer exits immediately after startup</b><br>
<blockquote>Please see HDFS-4426 for detail, i found the yarn WebAppProxyServer is broken by HADOOP-9181 as well, here's the hot fix, and i verified manually in our test cluster.
I'm really applogized for bring about such trouble...</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-343">YARN-343</a>.
Major bug reported by Thomas Graves and fixed by Xuan Gong (capacityscheduler)<br>
<b>Capacity Scheduler maximum-capacity value -1 is invalid</b><br>
<blockquote>I tried to start the resource manager using the capacity scheduler with a particular queues maximum-capacity set to -1 which is supposed to disable it according to the docs but I got the following exception:
java.lang.IllegalArgumentException: Illegal value of maximumCapacity -0.01 used in call to setMaxCapacity for queue foo
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.checkMaxCapacity(CSQueueUtils.java:31)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.&lt;init&gt;(LeafQueue.java:191)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:310)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:325)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:232)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:202)
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-336">YARN-336</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler FIFO scheduling within a queue only allows 1 app at a time </b><br>
<blockquote>The fair scheduler allows apps to be scheduled in FIFO fashion within a queue. Currently, when this setting is turned on, the scheduler only allows one app to run at a time. While apps submitted earlier should get first priority for allocations, when there is space remaining, other apps should have a change to get at them.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-334">YARN-334</a>.
Critical bug reported by Thomas Graves and fixed by Thomas Graves <br>
<b>Maven RAT plugin is not checking all source files</b><br>
<blockquote>yarn side of HADOOP-9097
Running 'mvn apache-rat:check' passes, but running RAT by hand (by downloading the JAR) produces some warnings for Java files, amongst others.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-331">YARN-331</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fill in missing fair scheduler documentation</b><br>
<blockquote>In the fair scheduler documentation, a few config options are missing:
locality.threshold.node
locality.threshold.rack
max.assign
aclSubmitApps
minSharePreemptionTimeout
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-330">YARN-330</a>.
Major bug reported by Hitesh Shah and fixed by Sandy Ryza (nodemanager)<br>
<b>Flakey test: TestNodeManagerShutdown#testKillContainersOnShutdown</b><br>
<blockquote>=Seems to be timing related as the container status RUNNING as returned by the ContainerManager does not really indicate that the container task has been launched. Sleep of 5 seconds is not reliable.
Running org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.353 sec &lt;&lt;&lt; FAILURE!
testKillContainersOnShutdown(org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown) Time elapsed: 9283 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: Did not find sigterm message
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.testKillContainersOnShutdown(TestNodeManagerShutdown.java:162)
Logs:
2013-01-09 14:13:08,401 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_0_0000_01_000000 transitioned from NEW to LOCALIZING
2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.LocalizedResource (LocalizedResource.java:handle(194)) - Resource file:hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/tmpDir/scriptFile.sh transitioned from INIT to DOWNLOADING
2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(521)) - Created localizer for container_0_0000_01_000000
2013-01-09 14:13:08,589 INFO [LocalizerRunner for container_0_0000_01_000000] localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(895)) - Writing credentials to the nmPrivate file hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens. Credentials list:
2013-01-09 14:13:08,628 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createUserCacheDirs(373)) - Initializing user nobody
2013-01-09 14:13:08,709 INFO [main] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:getContainerStatus(538)) - Returning container_id {, app_attempt_id {, application_id {, id: 0, cluster_timestamp: 0, }, attemptId: 1, }, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-01-09 14:13:08,781 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(99)) - Copying from hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens to hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/usercache/nobody/appcache/application_0_0000/container_0_0000_01_000000.tokens
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-328">YARN-328</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (resourcemanager)<br>
<b>Use token request messages defined in hadoop common </b><br>
<blockquote>YARN changes related to HADOOP-9192 to reuse the protobuf messages defined in common.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-325">YARN-325</a>.
Blocker bug reported by Jason Lowe and fixed by Arun C Murthy (capacityscheduler)<br>
<b>RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing</b><br>
<blockquote>If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock.
Stacktrace to follow.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-320">YARN-320</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM should always be able to renew its own tokens</b><br>
<blockquote>YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-319">YARN-319</a>.
Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)<br>
<b>Submit a job to a queue that not allowed in fairScheduler, client will hold forever.</b><br>
<blockquote>RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-315">YARN-315</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>Use security token protobuf definition from hadoop common</b><br>
<blockquote>YARN part of HADOOP-9173.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-302">YARN-302</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler assignmultiple should default to false</b><br>
<blockquote>The MR1 default was false. When true, it results in overloading some machines with many tasks and underutilizing others.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-301">YARN-301</a>.
Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)<br>
<b>Fair scheduler throws ConcurrentModificationException when iterating over app's priorities</b><br>
<blockquote>In my test cluster, fairscheduler appear to concurrentModificationException and RM crash, here is the message:
2012-12-30 17:14:17,171 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:297)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:181)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:780)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:842)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
at java.lang.Thread.run(Thread.java:662)
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-300">YARN-300</a>.
Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>After YARN-271, fair scheduler can infinite loop and not schedule any application.</b><br>
<blockquote>After yarn-271, when yarn.scheduler.fair.max.assign&lt;=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-293">YARN-293</a>.
Critical bug reported by Devaraj K and fixed by Robert Joseph Evans (nodemanager)<br>
<b>Node Manager leaks LocalizerRunner object for every Container </b><br>
<blockquote>Node Manager creates a new LocalizerRunner object for every container and puts in ResourceLocalizationService.LocalizerTracker.privLocalizers map but it never removes from the map.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-288">YARN-288</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler queue doesn't accept any jobs when ACLs are configured.</b><br>
<blockquote>If a queue is configured with an ACL for who can submit jobs, no jobs are allowed, even if a user on the list tries.
This is caused by using the scheduler thinking the user is "yarn", because it calls UserGroupInformation.getCurrentUser() instead of UserGroupInformation.createRemoteUser() with the given user name.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-286">YARN-286</a>.
Major new feature reported by Tom White and fixed by Tom White (applications)<br>
<b>Add a YARN ApplicationClassLoader</b><br>
<blockquote>Add a classloader that provides webapp-style class isolation for use by applications. This is the YARN part of MAPREDUCE-1700 (which was already developed in that JIRA).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-285">YARN-285</a>.
Major improvement reported by Derek Dagit and fixed by Derek Dagit <br>
<b>RM should be able to provide a tracking link for apps that have already been purged</b><br>
<blockquote>As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM.
When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed.
In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs.
We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-283">YARN-283</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler fails to get queue info without root prefix</b><br>
<blockquote>If queue1 exists, and a client calls "mapred queue -info queue1", an exception is thrown. If they use root.queue1, it works correctly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-282">YARN-282</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Fair scheduler web UI double counts Apps Submitted</b><br>
<blockquote>Each app submitted is reported twice under "Apps Submitted"</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-280">YARN-280</a>.
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM does not reject app submission with invalid tokens</b><br>
<blockquote>The RM will launch an app with invalid tokens. The tasks will languish with failed connection retries, followed by task reattempts, followed by app reattempts.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-278">YARN-278</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler maxRunningApps config causes no apps to make progress</b><br>
<blockquote>This occurs because the scheduler erroneously chooses apps to offer resources to that are not runnable, then later decides they are not runnable, and doesn't try to give the resources to anyone else.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-277">YARN-277</a>.
Major improvement reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Use AMRMClient in DistributedShell to exemplify the approach</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-272">YARN-272</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler log messages try to print objects without overridden toString methods</b><br>
<blockquote>A lot of junk gets printed out like this:
2012-12-11 17:31:52,998 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: Application application_1355270529654_0003 reserved container org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl@324f0f97 on node host: c1416.hal.cloudera.com:46356 #containers=7 available=0 used=8192, currently has 4 at priority org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@33; currentReservation 4096</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-271">YARN-271</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler hits IllegalStateException trying to reserve different apps on same node</b><br>
<blockquote>After the fair scheduler reserves a container on a node, it doesn't check for reservations it just made when trying to make more reservations during the same heartbeat.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-267">YARN-267</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fix fair scheduler web UI</b><br>
<blockquote>The fair scheduler web UI was broken by MAPREDUCE-4720. The queues area is not shown, and changes are required to still show the fair share inside the applications table.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-266">YARN-266</a>.
Critical bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)<br>
<b>RM and JHS Web UIs are blank because AppsBlock is not escaping string properly</b><br>
<blockquote>e.g. Job names with a line feed "\n" are causing a line feed in the JSON array being written out (since we are only using StringEscapeUtils.escapeHtml() ) and the Javascript parser complains that string quotes are unclosed. This </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-264">YARN-264</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>y.s.rm.DelegationTokenRenewer attempts to renew token even after removing an app</b><br>
<blockquote>yarn.s.rm.security.DelegationTokenRenewer uses TimerTask/Timer. When such a timer task is canceled, already scheduled tasks run to completion. The task should check for such cancellation before running. Also, delegationTokens needs to be synchronized on all accesses.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-258">YARN-258</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)<br>
<b>RM web page UI shows Invalid Date for start and finish times</b><br>
<blockquote>Whenever the number of jobs was greater than a 100, two javascript arrays were being populated. appsData and appsTableData. appsData was winning out (because it was coming out later) and so renderHadoopDate was trying to render a &lt;br title=""...&gt; string.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-254">YARN-254</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Update fair scheduler web UI for hierarchical queues</b><br>
<blockquote>The fair scheduler should have a web UI similar to the capacity scheduler that shows nested queues.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-253">YARN-253</a>.
Critical bug reported by Tom White and fixed by Tom White (nodemanager)<br>
<b>Container launch may fail if no files were localized</b><br>
<blockquote>This can be demonstrated with DistributedShell. The containers running the shell do not have any files to localize (if there is no shell script to copy) so if they run on a different NM to the AM (which does localize files), then they will fail since the appcache directory does not exist.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-251">YARN-251</a>.
Major bug reported by Tom White and fixed by Tom White (resourcemanager)<br>
<b>Proxy URI generation fails for blank tracking URIs</b><br>
<blockquote>If the URI is an empty string (the default if not set), then a warning is displayed. A null URI displays no such warning. These two cases should be handled in the same way.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-230">YARN-230</a>.
Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)<br>
<b>Make changes for RM restart phase 1</b><br>
<blockquote>As described in YARN-128, phase 1 of RM restart puts in place mechanisms to save application state and read them back after restart. Upon restart, the NM's are asked to reboot and the previously running AM's are restarted.
After this is done, RM HA and work preserving restart can continue in parallel. For more details please refer to the design document in YARN-128</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-229">YARN-229</a>.
Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)<br>
<b>Remove old code for restart</b><br>
<blockquote>Much of the code is dead/commented out and is not executed. Removing it will help with making and understanding new changes.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-225">YARN-225</a>.
Critical bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)<br>
<b>Proxy Link in RM UI thows NPE in Secure mode</b><br>
<blockquote>{code:xml}
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:241)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:975)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-224">YARN-224</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Fair scheduler logs too many nodeUpdate INFO messages</b><br>
<blockquote>The RM logs are filled with an INFO message the fair scheduler logs every time it receives a nodeUpdate. It should be taken out or demoted to debug.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-223">YARN-223</a>.
Critical bug reported by Radim Kolar and fixed by Radim Kolar <br>
<b>Change processTree interface to work better with native code</b><br>
<blockquote>Problem is that on every update of processTree new object is required. This is undesired when working with processTree implementation in native code.
replace ProcessTree.getProcessTree() with updateProcessTree(). No new object allocation is needed and it simplify application code a bit.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-222">YARN-222</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler should create queue for each user by default</b><br>
<blockquote>In MR1 the fair scheduler's default behavior was to create a pool for each user. The YARN fair scheduler has this capability, but it should be turned on by default, for consistency.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-219">YARN-219</a>.
Critical sub-task reported by Robert Joseph Evans and fixed by Robert Joseph Evans (nodemanager)<br>
<b>NM should aggregate logs when application finishes.</b><br>
<blockquote>The NM should only aggregate logs when the application finishes. This will reduce the load on the NN, especially with respect to lease renewal.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-217">YARN-217</a>.
Blocker bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)<br>
<b>yarn rmadmin commands fail in secure cluster</b><br>
<blockquote>All the rmadmin commands fail in secure mode with the "protocol org.apache.hadoop.yarn.server.nodemanager.api.RMAdminProtocolPB is unauthorized" message in RM logs.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-216">YARN-216</a>.
Major improvement reported by Todd Lipcon and fixed by Robert Joseph Evans <br>
<b>Remove jquery theming support</b><br>
<blockquote>As of today we have 9.4MB of JQuery themes in our code tree. In addition to being a waste of space, it's a highly questionable feature. I've never heard anyone complain that the Hadoop interface isn't themeable enough, and there's far more value in consistency across installations than there is in themeability. Let's rip it out.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-214">YARN-214</a>.
Major bug reported by Jason Lowe and fixed by Jonathan Eagles (resourcemanager)<br>
<b>RMContainerImpl does not handle event EXPIRE at state RUNNING</b><br>
<blockquote>RMContainerImpl has a race condition where a container can enter the RUNNING state just as the container expires. This results in an invalid event transition error:
{noformat}
2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at RUNNING
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
at java.lang.Thread.run(Thread.java:619)
{noformat}
EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for this race condition.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-212">YARN-212</a>.
Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (nodemanager)<br>
<b>NM state machine ignores an APPLICATION_CONTAINER_FINISHED event when it shouldn't</b><br>
<blockquote>The NM state machines can make the following two invalid state transitions when a speculative attempt is killed shortly after it gets started. When this happens the NM keeps the log aggregation context open for this application and therefore chews up FDs and leases on the NN, eventually running the NN out of FDs and bringing down the entire cluster.
2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at INITING
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [INIT_CONTAINER]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-206">YARN-206</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>TestApplicationCleanup.testContainerCleanup occasionally fails</b><br>
<blockquote>testContainerCleanup is occasionally failing with the error:
testContainerCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup): expected:&lt;2&gt; but was:&lt;1&gt;
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-204">YARN-204</a>.
Major bug reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (applications)<br>
<b>test coverage for org.apache.hadoop.tools</b><br>
<blockquote>Added some tests for org.apache.hadoop.tools</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-202">YARN-202</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Log Aggregation generates a storm of fsync() for namenode</b><br>
<blockquote>When the log aggregation is on, write to each aggregated container log causes hflush() to be called. For large clusters, this can creates a lot of fsync() calls for namenode.
We have seen 6-7x increase in the average number of fsync operations compared to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log aggregation writing to tmp files.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-201">YARN-201</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (capacityscheduler)<br>
<b>CapacityScheduler can take a very long time to schedule containers if requests are off cluster</b><br>
<blockquote>When a user runs a job where one of the input files is a large file on another cluster, the job can create many splits on nodes which are unreachable for computation from the current cluster. The off-switch delay logic in LeafQueue can cause the ResourceManager to allocate containers for the job very slowly. In one case the job was only getting one container every 23 seconds, and the queue had plenty of spare capacity.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-189">YARN-189</a>.
Blocker bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)<br>
<b>deadlock in RM - AMResponse object</b><br>
<blockquote>we ran into a deadlock in the RM.
=============================
"1128743461@qtp-1252749669-5201":
waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "AsyncDispatcher event handler"
"AsyncDispatcher event handler":
waiting to lock monitor 0x00002ab0bba3a370 (object 0x00002aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
which is held by "IPC Server handler 36 on 8030"
"IPC Server handler 36 on 8030":
waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "AsyncDispatcher event handler"
Java stack information for the threads listed above:
===================================================
"1128743461@qtp-1252749669-5201":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for &lt;0x00002aabbc87b960&gt; (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
95)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
...
...
..
"AsyncDispatcher event handler":
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)
- waiting to lock &lt;0x00002aab3d4cd698&gt; (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
- locked &lt;0x00002aabbb673090&gt; (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
"IPC Server handler 36 on 8030":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for &lt;0x00002aabbc87b960&gt; (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285)
- locked &lt;0x00002aab3d4cd698&gt; (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56)
at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-188">YARN-188</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (capacityscheduler)<br>
<b>Coverage fixing for CapacityScheduler</b><br>
<blockquote>some tests for CapacityScheduler
YARN-188-branch-0.23.patch patch for branch 0.23
YARN-188-branch-2.patch patch for branch 2
YARN-188-trunk.patch patch for trunk
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-187">YARN-187</a>.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Add hierarchical queues to the fair scheduler</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-186">YARN-186</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (resourcemanager , scheduler)<br>
<b>Coverage fixing LinuxContainerExecutor</b><br>
<blockquote>Added some tests for LinuxContainerExecuror
YARN-186-branch-0.23.patch patch for branch-0.23
YARN-186-branch-2.patch patch for branch-2
ARN-186-trunk.patch patch for trank
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-184">YARN-184</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Remove unnecessary locking in fair scheduler, and address findbugs excludes.</b><br>
<blockquote>In YARN-12, locks were added to all fields of QueueManager to address findbugs. In addition, findbugs exclusions were added in response to MAPREDUCE-4439, without a deep look at the code.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-183">YARN-183</a>.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Clean up fair scheduler code</b><br>
<blockquote>The fair scheduler code has a bunch of minor stylistic issues.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-181">YARN-181</a>.
Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (resourcemanager)<br>
<b>capacity-scheduler.xml move breaks Eclipse import</b><br>
<blockquote>Eclipse doesn't seem to handle "testResources" which resolve to an absolute path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the hadoop-yarn project.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-180">YARN-180</a>.
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)<br>
<b>Capacity scheduler - containers that get reserved create container token to early</b><br>
<blockquote>The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default.
This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-179">YARN-179</a>.
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (capacityscheduler)<br>
<b>Bunch of test failures on trunk</b><br>
<blockquote>{{CapacityScheduler.setConf()}} mandates a YarnConfiguration. It doesn't need to, throughout all of YARN, components only depend on Configuration and depend on the callers to provide correct configuration.
This is causing multiple tests to fail.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-178">YARN-178</a>.
Critical bug reported by Radim Kolar and fixed by Radim Kolar <br>
<b>Fix custom ProcessTree instance creation</b><br>
<blockquote>1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable.
2. pstree do not extend Configured as it should
Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-177">YARN-177</a>.
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)<br>
<b>CapacityScheduler - adding a queue while the RM is running has wacky results</b><br>
<blockquote>Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount.
Looking at the RM logs, used memory can go negative but other logs show the number positive:
2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800
2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-170">YARN-170</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>NodeManager stop() gets called twice on shutdown</b><br>
<blockquote>The stop method in the NodeManager gets called twice when the NodeManager is shut down via the shutdown hook.
The first is the stop that gets called directly by the shutdown hook. The second occurs when the NodeStatusUpdaterImpl is stopped. The NodeManager responds to the NodeStatusUpdaterImpl stop stateChanged event by stopping itself. This is so that NodeStatusUpdaterImpl can notify the NodeManager to stop, by stopping itself in response to a request from the ResourceManager
This could be avoided if the NodeStatusUpdaterImpl were to stop the NodeManager by calling its stop method directly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-169">YARN-169</a>.
Minor improvement reported by Anthony Rojas and fixed by Anthony Rojas (nodemanager)<br>
<b>Update log4j.appender.EventCounter to use org.apache.hadoop.log.metrics.EventCounter</b><br>
<blockquote>We should update the log4j.appender.EventCounter in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/resources/container-log4j.properties to use *org.apache.hadoop.log.metrics.EventCounter* rather than *org.apache.hadoop.metrics.jvm.EventCounter* to avoid triggering the following warning:
{code}WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-166">YARN-166</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)<br>
<b>capacity scheduler doesn't allow capacity &lt; 1.0</b><br>
<blockquote>1.x supports queue capacity &lt; 1, but in 0.23 the capacity scheduler doesn't. This is an issue for us since we have a large cluster running 1.x that currently has a queue with capacity 0.5%.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-165">YARN-165</a>.
Blocker improvement reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RM should point tracking URL to RM web page for app when AM fails</b><br>
<blockquote>Currently when an ApplicationMaster fails the ResourceManager is updating the tracking URL to an empty string, see RMAppAttemptImpl.ContainerFinishedTransition. Unfortunately when the client attempts to follow the proxy URL it results in a web page showing an HTTP 500 error and an ugly backtrace because "http://" isn't a very helpful tracking URL.
It would be much more helpful if the proxy URL redirected to the RM webapp page for the specific application. That page shows the various AM attempts and pointers to their logs which will be useful for debugging the problems that caused the AM attempts to fail.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-163">YARN-163</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Retrieving container log via NM webapp can hang with multibyte characters in log</b><br>
<blockquote>ContainerLogsBlock.printLogs currently assumes that skipping N bytes in the log file is the same as skipping N characters, but that is not true when the log contains multibyte characters. This can cause the loop that skips a portion of the log to try to skip past the end of the file and loop forever (or until Jetty kills the worker thread).</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-161">YARN-161</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (api)<br>
<b>Yarn Common has multiple compiler warnings for unchecked operations</b><br>
<blockquote>The warnings are in classes StateMachineFactory, RecordFactoryProvider, RpcFactoryProvider, and YarnRemoteExceptionFactoryProvider. OpenJDK 1.6.0_24 actually treats these as compilation errors, causing the build to fail.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-159">YARN-159</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)<br>
<b>RM web ui applications page should be sorted to display last app first </b><br>
<blockquote>RM web ui applications page should be sorted to display last app first.
It currently sorts with smallest application id first, which is the first apps that were submitted. After you have one page worth of apps its much more useful for it to sort such that the biggest appid (last submitted app) shows up first.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-151">YARN-151</a>.
Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash <br>
<b>Browser thinks RM main page JS is taking too long</b><br>
<blockquote>The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-150">YARN-150</a>.
Major bug reported by Bikas Saha and fixed by Bikas Saha <br>
<b>AppRejectedTransition does not unregister app from master service and scheduler</b><br>
<blockquote>AttemptStartedTransition() adds the app to the ApplicationMasterService and scheduler. when the scheduler rejects the app then AppRejectedTransition() forgets to unregister it from the ApplicationMasterService.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-146">YARN-146</a>.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)<br>
<b>Add unit tests for computing fair share in the fair scheduler</b><br>
<blockquote>MR1 had TestComputeFairShares. This should go into the YARN fair scheduler.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-145">YARN-145</a>.
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)<br>
<b>Add a Web UI to the fair share scheduler</b><br>
<blockquote>The fair scheduler had a UI in MR1. Port the capacity scheduler web UI and modify appropriately for the fair share scheduler.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-140">YARN-140</a>.
Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)<br>
<b>Add capacity-scheduler-default.xml to provide a default set of configurations for the capacity scheduler.</b><br>
<blockquote>When setting up the capacity scheduler users are faced with problems like:
{code}
FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
java.lang.IllegalArgumentException: Illegal capacity of -1 for queue root
{code}
Which basically arises from missing basic configurations, which in many cases, there is no need to explicitly provide, and a default configuration will be sufficient. For example, to address the error above, the user need to add a capacity of 100 to the root queue.
So, we need to add a capacity-scheduler-default.xml, this will be helpful to provide the basic set of default configurations required to run the capacity scheduler. The user can still override existing configurations or provide new ones in capacity-scheduler.xml. This is similar to *-default.xml vs *-site.xml for yarn, core, mapred, hdfs, etc.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-139">YARN-139</a>.
Major bug reported by Nathan Roberts and fixed by Vinod Kumar Vavilapalli (api)<br>
<b>Interrupted Exception within AsyncDispatcher leads to user confusion</b><br>
<blockquote>Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up.
2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1143)
at java.lang.Thread.join(Thread.java:1196)
at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-136">YARN-136</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (resourcemanager)<br>
<b>Make ClientTokenSecretManager part of RMContext</b><br>
<blockquote>Helps to add it to the context instead of passing it all around as an extra parameter.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-135">YARN-135</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (resourcemanager)<br>
<b>ClientTokens should be per app-attempt and be unregistered on App-finish.</b><br>
<blockquote>Two issues:
- ClientTokens are per app-attempt but are created per app.
- Apps don't get unregistered from RMClientTokenSecretManager.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-134">YARN-134</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ClientToAMSecretManager creates keys without checking for validity of the appID</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-133">YARN-133</a>.
Major bug reported by Thomas Graves and fixed by Ravi Prakash (resourcemanager)<br>
<b>update web services docs for RM clusterMetrics</b><br>
<blockquote>Looks like jira https://issues.apache.org/jira/browse/MAPREDUCE-3747 added in more RM cluster metrics but the docs didn't get updated: http://hadoop.apache.org/docs/r0.23.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-131">YARN-131</a>.
Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)<br>
<b>Incorrect ACL properties in capacity scheduler documentation</b><br>
<blockquote>The CapacityScheduler apt file incorrectly specifies the property names controlling acls for application submission and queue administration.
{{yarn.scheduler.capacity.root.&lt;queue-path&gt;.acl_submit_jobs}}
should be
{{yarn.scheduler.capacity.root.&lt;queue-path&gt;.acl_submit_applications}}
{{yarn.scheduler.capacity.root.&lt;queue-path&gt;.acl_administer_jobs}}
should be
{{yarn.scheduler.capacity.root.&lt;queue-path&gt;.acl_administer_queue}}
Uploading a patch momentarily.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-129">YARN-129</a>.
Major improvement reported by Tom White and fixed by Tom White (client)<br>
<b>Simplify classpath construction for mini YARN tests</b><br>
<blockquote>The test classpath includes a special file called 'mrapp-generated-classpath' (or similar in distributed shell) that is constructed at build time, and whose contents are a classpath with all the dependencies needed to run the tests. When the classpath for a container (e.g. the AM) is constructed the contents of mrapp-generated-classpath is read and added to the classpath, and the file itself is then added to the classpath so that later when the AM constructs a classpath for a task container it can propagate the test classpath correctly.
This mechanism can be drastically simplified by propagating the system classpath of the current JVM (read from the java.class.path property) to a launched JVM, but only if running in the context of the mini YARN cluster. Any tests that use the mini YARN cluster will automatically work with this change. Although any that explicitly deal with mrapp-generated-classpath can be simplified.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-127">YARN-127</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Move RMAdmin tool to the client package</b><br>
<blockquote>It belongs to the client package and not the RM clearly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-116">YARN-116</a>.
Major bug reported by xieguiming and fixed by xieguiming (resourcemanager)<br>
<b>RM is missing ability to add include/exclude files without a restart</b><br>
<blockquote>The "yarn.resourcemanager.nodes.include-path" default value is "", if we need to add an include file, we must currently restart the RM.
I suggest that for adding an include or exclude file, there should be no need to restart the RM. We may only execute the refresh command. The HDFS NameNode already has this ability.
Fix is to the modify HostsFileReader class instances:
From:
{code}
public HostsFileReader(String inFile,
String exFile)
{code}
To:
{code}
public HostsFileReader(Configuration conf,
String NODES_INCLUDE_FILE_PATH,String DEFAULT_NODES_INCLUDE_FILE_PATH,
String NODES_EXCLUDE_FILE_PATH,String DEFAULT_NODES_EXCLUDE_FILE_PATH)
{code}
And thus, we can read the config file dynamically when a {{refreshNodes}} is invoked and therefore have no need to restart the ResourceManager.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-103">YARN-103</a>.
Major improvement reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Add a yarn AM - RM client module</b><br>
<blockquote>Add a basic client wrapper library to the AM RM protocol in order to prevent proliferation of code being duplicated everywhere. Provide helper functions to perform reverse mapping of container requests to RM allocation resource request table format.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-102">YARN-102</a>.
Trivial bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)<br>
<b>Move the apache licence header to the top of the file in MemStore.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-94">YARN-94</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (applications/distributed-shell)<br>
<b>DistributedShell jar should point to Client as the main class by default</b><br>
<blockquote>Today, it says so..
{code}
$ $YARN_HOME/bin/yarn jar $YARN_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-$VERSION.jar
RunJar jarFile [mainClass] args...
{code}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-93">YARN-93</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Diagnostics missing from applications that have finished but failed</b><br>
<blockquote>If an application finishes in the YARN sense but fails in the app framework sense (e.g.: a failed MapReduce job) then diagnostics are missing from the RM web page for the application. The RM should be reporting diagnostic messages even for successful YARN applications.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-82">YARN-82</a>.
Minor bug reported by Andy Isaacson and fixed by Hemanth Yamijala (nodemanager)<br>
<b>YARN local-dirs defaults to /tmp/nm-local-dir</b><br>
<blockquote>{{yarn.nodemanager.local-dirs}} defaults to {{/tmp/nm-local-dir}}. It should be {hadoop.tmp.dir}/nm-local-dir or similar. Among other problems, this can prevent multiple test clusters from starting on the same machine.
Thanks to Hemanth Yamijala for reporting this issue.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-78">YARN-78</a>.
Major bug reported by Bikas Saha and fixed by Bikas Saha (applications)<br>
<b>Change UnmanagedAMLauncher to use YarnClientImpl</b><br>
<blockquote>YARN-29 added a common client impl to talk to the RM. Use that in the UnmanagedAMLauncher.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-72">YARN-72</a>.
Major bug reported by Hitesh Shah and fixed by Sandy Ryza (nodemanager)<br>
<b>NM should handle cleaning up containers when it shuts down</b><br>
<blockquote>Ideally, the NM should wait for a limited amount of time when it gets a shutdown signal for existing containers to complete and kill the containers ( if we pick an aggressive approach ) after this time interval.
For NMs which come up after an unclean shutdown, the NM should look through its directories for existing container.pids and try and kill an existing containers matching the pids found. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-57">YARN-57</a>.
Major improvement reported by Radim Kolar and fixed by Radim Kolar (nodemanager)<br>
<b>Plugable process tree</b><br>
<blockquote>Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-50">YARN-50</a>.
Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Implement renewal / cancellation of Delegation Tokens</b><br>
<blockquote>Currently, delegation tokens issues by the RM and History server cannot be renewed or cancelled. This needs to be implemented.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-43">YARN-43</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves <br>
<b>TestResourceTrackerService fail intermittently on jdk7</b><br>
<blockquote>Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.73 sec &lt;&lt;&lt; FAILURE!
testDecommissionWithIncludeHosts(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 0.086 sec &lt;&lt;&lt; FAILURE!
junit.framework.AssertionFailedError: expected:&lt;0&gt; but was:&lt;1&gt; at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testDecommissionWithIncludeHosts(TestResourceTrackerService.java:90)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-40">YARN-40</a>.
Major bug reported by Devaraj K and fixed by Devaraj K (client)<br>
<b>Provide support for missing yarn commands</b><br>
<blockquote>1. status &lt;app-id&gt;
2. kill &lt;app-id&gt; (Already issue present with Id : MAPREDUCE-3793)
3. list-apps [all]
4. nodes-report</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-33">YARN-33</a>.
Major bug reported by Mayank Bansal and fixed by Mayank Bansal (nodemanager)<br>
<b>LocalDirsHandler should validate the configured local and log dirs</b><br>
<blockquote>WHen yarn.nodemanager.log-dirs is with file:// URI then startup of node manager creates the directory like file:// under CWD.
WHich should not be there.
Thanks,
Mayank </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-32">YARN-32</a>.
Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli <br>
<b>TestApplicationTokens fails intermintently on jdk7</b><br>
<blockquote>TestApplicationsTokens fails intermintently on jdk7. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-30">YARN-30</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves <br>
<b>TestNMWebServicesApps, TestRMWebServicesApps and TestRMWebServicesNodes fail on jdk7</b><br>
<blockquote>It looks like the string changed from "const class" to "constant".
Tests run: 19, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 6.786 sec &lt;&lt;&lt; FAILURE!
testNodeAppsStateInvalid(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps) Time elapsed: 0.248 sec &lt;&lt;&lt; FAILURE!
java.lang.AssertionError: exception message doesn't match, got: No enum constant org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE expected: No enum const class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-28">YARN-28</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves <br>
<b>TestCompositeService fails on jdk7</b><br>
<blockquote>test TestCompositeService fails when run with jdk7.
It appears it expects test testCallSequence to be called first and the sequence numbers to start at 0. On jdk7 its not being called first and sequence number has already been incremented.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-23">YARN-23</a>.
Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>FairScheduler: FSQueueSchedulable#updateDemand() - potential redundant aggregation</b><br>
<blockquote>In FS, FSQueueSchedulable#updateDemand() limits the demand to maxTasks only after iterating though all the pools and computing the final demand.
By checking if the demand has reached maxTasks in every iteration, we can avoid redundant work, at the expense of one condition check every iteration.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-3">YARN-3</a>.
Major sub-task reported by Arun C Murthy and fixed by Andrew Ferguson <br>
<b>Add support for CPU isolation/monitoring of containers</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-2">YARN-2</a>.
Major new feature reported by Arun C Murthy and fixed by Arun C Murthy (capacityscheduler , scheduler)<br>
<b>Enhance CS to schedule accounting for both memory and cpu cores</b><br>
<blockquote>With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4977">MAPREDUCE-4977</a>.
Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (documentation)<br>
<b>Documentation for pluggable shuffle and pluggable sort</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4971">MAPREDUCE-4971</a>.
Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>Minor extensibility enhancements </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4969">MAPREDUCE-4969</a>.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
<b>TestKeyValueTextInputFormat test fails with Open JDK 7</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4953">MAPREDUCE-4953</a>.
Major bug reported by Andy Isaacson and fixed by Andy Isaacson (pipes)<br>
<b>HadoopPipes misuses fprintf</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4949">MAPREDUCE-4949</a>.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (examples)<br>
<b>Enable multiple pi jobs to run in parallel</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4948">MAPREDUCE-4948</a>.
Critical bug reported by Junping Du and fixed by Junping Du (client)<br>
<b>TestYARNRunner.testHistoryServerToken failed on trunk</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4946">MAPREDUCE-4946</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)<br>
<b>Type conversion of map completion events leads to performance problems with large jobs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4936">MAPREDUCE-4936</a>.
Critical bug reported by Daryn Sharp and fixed by Arun C Murthy (mrv2)<br>
<b>JobImpl uber checks for cpu are wrong</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4934">MAPREDUCE-4934</a>.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (build)<br>
<b>Maven RAT plugin is not checking all source files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4928">MAPREDUCE-4928</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (applicationmaster , security)<br>
<b>Use token request messages defined in hadoop common </b><br>
<blockquote>Protobuf message GetDelegationTokenRequestProto field renewer is made requried from optional. This change is not wire compatible with the older releases.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4925">MAPREDUCE-4925</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (examples)<br>
<b>The pentomino option parser may be buggy</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4924">MAPREDUCE-4924</a>.
Trivial bug reported by Robert Kanter and fixed by Robert Kanter (mrv1)<br>
<b>flakey test: org.apache.hadoop.mapred.TestClusterMRNotification.testMR</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4923">MAPREDUCE-4923</a>.
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (mrv1 , mrv2 , task)<br>
<b>Add toString method to TaggedInputSplit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4921">MAPREDUCE-4921</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (client)<br>
<b>JobClient should acquire HS token with RM principal</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4920">MAPREDUCE-4920</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Suresh Srinivas <br>
<b>Use security token protobuf definition from hadoop common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4913">MAPREDUCE-4913</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)<br>
<b>TestMRAppMaster#testMRAppMasterMissingStaging occasionally exits</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4907">MAPREDUCE-4907</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (mrv1 , tasktracker)<br>
<b>TrackerDistributedCacheManager issues too many getFileStatus calls</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4905">MAPREDUCE-4905</a>.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
<b>test org.apache.hadoop.mapred.pipes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4902">MAPREDUCE-4902</a>.
Trivial bug reported by Albert Chu and fixed by Albert Chu <br>
<b>Fix typo "receievd" should be "received" in log output</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4899">MAPREDUCE-4899</a>.
Major improvement reported by Derek Dagit and fixed by Derek Dagit <br>
<b>Provide a plugin to the Yarn Web App Proxy to generate tracking links for M/R appllications given the ID</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4895">MAPREDUCE-4895</a>.
Major bug reported by Dennis Y and fixed by Dennis Y <br>
<b>Fix compilation failure of org.apache.hadoop.mapred.gridmix.TestResourceUsageEmulators</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4894">MAPREDUCE-4894</a>.
Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (jobhistoryserver , mrv2)<br>
<b>Renewal / cancellation of JobHistory tokens</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4893">MAPREDUCE-4893</a>.
Major bug reported by Bikas Saha and fixed by Bikas Saha (applicationmaster)<br>
<b>MR AppMaster can do sub-optimal assignment of containers to map tasks leading to poor node locality</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4890">MAPREDUCE-4890</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)<br>
<b>Invalid TaskImpl state transitions when task fails while speculating</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4884">MAPREDUCE-4884</a>.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (contrib/streaming , test)<br>
<b>streaming tests fail to start MiniMRCluster due to "Queue configuration missing child queue names for root"</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4861">MAPREDUCE-4861</a>.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4856">MAPREDUCE-4856</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (test)<br>
<b>TestJobOutputCommitter uses same directory as TestJobCleanup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4848">MAPREDUCE-4848</a>.
Major bug reported by Jason Lowe and fixed by Jerry Chen (mr-am)<br>
<b>TaskAttemptContext cast error during AM recovery</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4845">MAPREDUCE-4845</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (client)<br>
<b>ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2 </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4842">MAPREDUCE-4842</a>.
Blocker bug reported by Jason Lowe and fixed by Mariappan Asokan (mrv2)<br>
<b>Shuffle race can hang reducer</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4838">MAPREDUCE-4838</a>.
Major improvement reported by Arun C Murthy and fixed by Zhijie Shen <br>
<b>Add extra info to JH files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4836">MAPREDUCE-4836</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash <br>
<b>Elapsed time for running tasks on AM web UI tasks page is 0</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4833">MAPREDUCE-4833</a>.
Critical bug reported by Robert Joseph Evans and fixed by Robert Parker (applicationmaster , mrv2)<br>
<b>Task can get stuck in FAIL_CONTAINER_CLEANUP</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4832">MAPREDUCE-4832</a>.
Critical bug reported by Robert Joseph Evans and fixed by Jason Lowe (applicationmaster)<br>
<b>MR AM can get in a split brain situation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4825">MAPREDUCE-4825</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)<br>
<b>JobImpl.finished doesn't expect ERROR as a final job state</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4822">MAPREDUCE-4822</a>.
Trivial improvement reported by Robert Joseph Evans and fixed by Chu Tong (jobhistoryserver)<br>
<b>Unnecessary conversions in History Events</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4819">MAPREDUCE-4819</a>.
Blocker bug reported by Jason Lowe and fixed by Bikas Saha (mr-am)<br>
<b>AM can rerun job after reporting final job status to the client</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4817">MAPREDUCE-4817</a>.
Critical bug reported by Jason Lowe and fixed by Thomas Graves (applicationmaster , mr-am)<br>
<b>Hardcoded task ping timeout kills tasks localizing large amounts of data</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4813">MAPREDUCE-4813</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)<br>
<b>AM timing out during job commit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4811">MAPREDUCE-4811</a>.
Minor improvement reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver , mrv2)<br>
<b>JobHistoryServer should show when it was started in WebUI About page</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4810">MAPREDUCE-4810</a>.
Minor improvement reported by Jason Lowe and fixed by Jerry Chen (applicationmaster)<br>
<b>Add admin command options for ApplicationMaster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4809">MAPREDUCE-4809</a>.
Major sub-task reported by Arun C Murthy and fixed by Mariappan Asokan <br>
<b>Change visibility of classes for pluggable sort changes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4808">MAPREDUCE-4808</a>.
Major new feature reported by Arun C Murthy and fixed by Mariappan Asokan <br>
<b>Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4807">MAPREDUCE-4807</a>.
Major sub-task reported by Arun C Murthy and fixed by Mariappan Asokan <br>
<b>Allow MapOutputBuffer to be pluggable</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4803">MAPREDUCE-4803</a>.
Minor test reported by Mariappan Asokan and fixed by Mariappan Asokan (test)<br>
<b>Duplicate copies of TestIndexCache.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4802">MAPREDUCE-4802</a>.
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (mr-am , mrv2 , webapps)<br>
<b>Takes a long time to load the task list on the AM for large jobs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4801">MAPREDUCE-4801</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe <br>
<b>ShuffleHandler can generate large logs due to prematurely closed channels</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4797">MAPREDUCE-4797</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)<br>
<b>LocalContainerAllocator can loop forever trying to contact the RM</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4787">MAPREDUCE-4787</a>.
Major bug reported by Ravi Prakash and fixed by Robert Parker (test)<br>
<b>TestJobMonitorAndPrint is broken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4786">MAPREDUCE-4786</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)<br>
<b>Job End Notification retry interval is 5 milliseconds by default</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4782">MAPREDUCE-4782</a>.
Blocker bug reported by Mark Fuhs and fixed by Mark Fuhs (client)<br>
<b>NLineInputFormat skips first line of last InputSplit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4778">MAPREDUCE-4778</a>.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (jobtracker , scheduler)<br>
<b>Fair scheduler event log is only written if directory exists on HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4777">MAPREDUCE-4777</a>.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4774">MAPREDUCE-4774</a>.
Major bug reported by Ivan A. Veselovsky and fixed by Jason Lowe (applicationmaster , mrv2)<br>
<b>JobImpl does not handle asynchronous task events in FAILED state</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4772">MAPREDUCE-4772</a>.
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)<br>
<b>Fetch failures can take way too long for a map to be restarted</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4771">MAPREDUCE-4771</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)<br>
<b>KeyFieldBasedPartitioner not partitioning properly when configured</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4764">MAPREDUCE-4764</a>.
Major improvement reported by Ivan A. Veselovsky and fixed by <br>
<b>repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4763">MAPREDUCE-4763</a>.
Minor improvement reported by Ivan A. Veselovsky and fixed by <br>
<b>repair test org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4752">MAPREDUCE-4752</a>.
Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)<br>
<b>Reduce MR AM memory usage through String Interning</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4751">MAPREDUCE-4751</a>.
Major bug reported by Ravi Prakash and fixed by Vinod Kumar Vavilapalli <br>
<b>AM stuck in KILL_WAIT for days</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4748">MAPREDUCE-4748</a>.
Blocker bug reported by Robert Joseph Evans and fixed by Jason Lowe (mrv2)<br>
<b>Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4746">MAPREDUCE-4746</a>.
Major bug reported by Robert Parker and fixed by Robert Parker (applicationmaster)<br>
<b>The MR Application Master does not have a config to set environment variables</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4741">MAPREDUCE-4741</a>.
Minor bug reported by Jason Lowe and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)<br>
<b>WARN and ERROR messages logged during normal AM shutdown</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4740">MAPREDUCE-4740</a>.
Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)<br>
<b>only .jars can be added to the Distributed Cache classpath</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4736">MAPREDUCE-4736</a>.
Trivial improvement reported by Brandon Li and fixed by Brandon Li (test)<br>
<b>Remove obsolete option [-rootDir] from TestDFSIO</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4733">MAPREDUCE-4733</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)<br>
<b>Reducer can fail to make progress during shuffle if too many reducers complete consecutively</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4730">MAPREDUCE-4730</a>.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)<br>
<b>AM crashes due to OOM while serving up map task completion events</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4729">MAPREDUCE-4729</a>.
Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)<br>
<b>job history UI not showing all job attempts</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4724">MAPREDUCE-4724</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)<br>
<b>job history web ui applications page should be sorted to display last app first</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4723">MAPREDUCE-4723</a>.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Fix warnings found by findbugs 2</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4721">MAPREDUCE-4721</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver)<br>
<b>Task startup time in JHS is same as job startup time.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4720">MAPREDUCE-4720</a>.
Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash <br>
<b>Browser thinks History Server main page JS is taking too long</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4712">MAPREDUCE-4712</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)<br>
<b>mr-jobhistory-daemon.sh doesn't accept --config</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4705">MAPREDUCE-4705</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver , mrv2)<br>
<b>Historyserver links expire before the history data does</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4703">MAPREDUCE-4703</a>.
Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv1 , mrv2 , test)<br>
<b>Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4681">MAPREDUCE-4681</a>.
Major bug reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>HDFS-3910 broke MR tests</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4678">MAPREDUCE-4678</a>.
Minor bug reported by Chris McConnell and fixed by Chris McConnell (examples)<br>
<b>Running the Pentomino example with defaults throws java.lang.NegativeArraySizeException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4674">MAPREDUCE-4674</a>.
Minor bug reported by Robert Justice and fixed by Robert Justice <br>
<b>Hadoop examples secondarysort has a typo "secondarysrot" in the usage</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4666">MAPREDUCE-4666</a>.
Minor improvement reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)<br>
<b>JVM metrics for history server</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4654">MAPREDUCE-4654</a>.
Critical bug reported by Colin Patrick McCabe and fixed by Sandy Ryza (test)<br>
<b>TestDistCp is @ignored</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4637">MAPREDUCE-4637</a>.
Major bug reported by Tom White and fixed by Mayank Bansal (mrv2)<br>
<b>Killing an unassigned task attempt causes the job to fail</b><br>
<blockquote>Handle TaskAttempt diagnostic updates while in the NEW and UNASSIGNED states.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4616">MAPREDUCE-4616</a>.
Minor improvement reported by Tony Burton and fixed by Tony Burton (documentation)<br>
<b>Improvement to MultipleOutputs javadocs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4607">MAPREDUCE-4607</a>.
Major bug reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Race condition in ReduceTask completion can result in Task being incorrectly failed</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4596">MAPREDUCE-4596</a>.
Major task reported by Siddharth Seth and fixed by Siddharth Seth (applicationmaster , mrv2)<br>
<b>Split StateMachine state from states seen by MRClientProtocol (for Job, Task, TaskAttempt)</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4554">MAPREDUCE-4554</a>.
Major bug reported by Benoy Antony and fixed by Benoy Antony (job submission , security)<br>
<b>Job Credentials are not transmitted if security is turned off</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4521">MAPREDUCE-4521</a>.
Major bug reported by Jason Lowe and fixed by Ravi Prakash (mrv2)<br>
<b>mapreduce.user.classpath.first incompatibility with 0.20/1.x</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4520">MAPREDUCE-4520</a>.
Major new feature reported by Arun C Murthy and fixed by Arun C Murthy <br>
<b>Add experimental support for MR AM to schedule CPUs along-with memory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4517">MAPREDUCE-4517</a>.
Minor improvement reported by James Kinley and fixed by Jason Lowe (applicationmaster)<br>
<b>Too many INFO messages written out during AM to RM heartbeat</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4479">MAPREDUCE-4479</a>.
Major bug reported by Mariappan Asokan and fixed by Mariappan Asokan (test)<br>
<b>Fix parameter order in assertEquals() in TestCombineInputFileFormat.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4458">MAPREDUCE-4458</a>.
Major improvement reported by Robert Joseph Evans and fixed by Robert Parker (mrv2)<br>
<b>Warn if java.library.path is used for AM or Task</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4425">MAPREDUCE-4425</a>.
Critical bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)<br>
<b>Speculation + Fetch failures can lead to a hung job</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4279">MAPREDUCE-4279</a>.
Major bug reported by Rahul Jain and fixed by Devaraj K (jobtracker)<br>
<b>getClusterStatus() fails with null pointer exception when running jobs in local mode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4278">MAPREDUCE-4278</a>.
Major bug reported by Araceli Henley and fixed by Sandy Ryza <br>
<b>cannot run two local jobs in parallel from the same gateway.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4272">MAPREDUCE-4272</a>.
Major bug reported by Luke Lu and fixed by Yu Gao (task)<br>
<b>SortedRanges.Range#compareTo is not spec compliant</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4266">MAPREDUCE-4266</a>.
Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>remove Ant remnants from MR</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4229">MAPREDUCE-4229</a>.
Major improvement reported by Todd Lipcon and fixed by Miomir Boljanovic (jobtracker)<br>
<b>Counter names' memory usage can be decreased by interning</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4123">MAPREDUCE-4123</a>.
Critical bug reported by Nishan Shetty and fixed by Devaraj K (mrv2)<br>
<b>./mapred groups gives NoClassDefFoundError</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4049">MAPREDUCE-4049</a>.
Major sub-task reported by Avner BenHanoch and fixed by Avner BenHanoch (performance , task , tasktracker)<br>
<b>plugin for generic shuffle service</b><br>
<blockquote>Allow ReduceTask loading a third party plugin for shuffle (and merge) instead of the default shuffle.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3678">MAPREDUCE-3678</a>.
Major new feature reported by Bejoy KS and fixed by Harsh J (mrv1 , mrv2)<br>
<b>The Map tasks logs should have the value of input split it processed</b><br>
<blockquote>A map-task's syslogs now carries basic info on the InputSplit it processed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2454">MAPREDUCE-2454</a>.
Minor new feature reported by Mariappan Asokan and fixed by Mariappan Asokan <br>
<b>Allow external sorter plugin for MR</b><br>
<blockquote>MAPREDUCE-4807 Allow external implementations of the sort phase in a Map task</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2264">MAPREDUCE-2264</a>.
Major bug reported by Adam Kramer and fixed by Devaraj K (jobtracker)<br>
<b>Job status exceeds 100% in some cases </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-1806">MAPREDUCE-1806</a>.
Major bug reported by Paul Yang and fixed by Gera Shegalov (harchive)<br>
<b>CombineFileInputFormat does not work with paths not on default FS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-1700">MAPREDUCE-1700</a>.
Major bug reported by Tom White and fixed by Tom White (task)<br>
<b>User supplied dependencies may conflict with MapReduce system JARs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4468">HDFS-4468</a>.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE <br>
<b>Fix TestHDFSCLI and TestQuota for HADOOP-9252</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4462">HDFS-4462</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)<br>
<b>2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4458">HDFS-4458</a>.
Major bug reported by wenwupeng and fixed by Binglin Chang (balancer)<br>
<b>start balancer failed with "Failed to create file [/system/balancer.id]" if configure IP on fs.defaultFS</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4456">HDFS-4456</a>.
Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Plamen Jeliazkov (webhdfs)<br>
<b>Add concat to HttpFS and WebHDFS REST API docs</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4452">HDFS-4452</a>.
Critical bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)<br>
<b>getAdditionalBlock() can create multiple blocks if the client times out and retries.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4451">HDFS-4451</a>.
Major bug reported by Joshua Blatt and fixed by (balancer)<br>
<b>hdfs balancer command returns exit code 1 on success instead of 0</b><br>
<blockquote>This is an incompatible change from release 2.0.2-alpha and prior releases. Balancer tool exited with exit code 1 on success. It is changed to exit with exit code 0 on success. Non 0 exit code indicates failure.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4445">HDFS-4445</a>.
Blocker sub-task reported by Vinay and fixed by Vinay <br>
<b>All BKJM ledgers are not checked while tailing, So failover will fail.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4444">HDFS-4444</a>.
Trivial bug reported by Stephen Chu and fixed by Stephen Chu <br>
<b>Add space between total transaction time and number of transactions in FSEditLog#printStatistics</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4443">HDFS-4443</a>.
Trivial bug reported by Christian Rohling and fixed by Christian Rohling (namenode)<br>
<b>Remove trailing '`' character from HDFS nodelist jsp</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4428">HDFS-4428</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>FsDatasetImpl should disclose what the error is when a rename fails</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4426">HDFS-4426</a>.
Blocker bug reported by Jason Lowe and fixed by Arpit Agarwal (namenode)<br>
<b>Secondary namenode shuts down immediately after startup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4415">HDFS-4415</a>.
Major bug reported by Robert Kanter and fixed by Robert Kanter <br>
<b>HostnameFilter should handle hostname resolution failures and continue processing</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4404">HDFS-4404</a>.
Critical bug reported by liaowenrui and fixed by Todd Lipcon (ha , hdfs-client)<br>
<b>Create file failure when the machine of first attempted NameNode is down</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4403">HDFS-4403</a>.
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs-client)<br>
<b>DFSClient can infer checksum type when not provided by reading first byte</b><br>
<blockquote>The HDFS implementation of getFileChecksum() can now operate correctly against earlier-version datanodes which do not include the checksum type information in their checksum response. The checksum type is automatically inferred by issuing a read of the first byte of each block.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4393">HDFS-4393</a>.
Minor improvement reported by Brandon Li and fixed by Brandon Li <br>
<b>Empty request and responses in protocol translators can be static final members</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4392">HDFS-4392</a>.
Trivial improvement reported by Andrew Purtell and fixed by Andrew Purtell (test)<br>
<b>Use NetUtils#getFreeSocketPort in MiniDFSCluster</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4385">HDFS-4385</a>.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (build)<br>
<b>Maven RAT plugin is not checking all source files</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4384">HDFS-4384</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)<br>
<b>test_libhdfs_threaded gets SEGV if JNIEnv cannot be initialized</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4381">HDFS-4381</a>.
Major improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
<b> Document fsimage format details in FSImageFormat class javadoc</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4377">HDFS-4377</a>.
Trivial bug reported by Eli Collins and fixed by Eli Collins <br>
<b>Some trivial DN comment cleanup</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4375">HDFS-4375</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode , security)<br>
<b>Use token request messages defined in hadoop common</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4369">HDFS-4369</a>.
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>GetBlockKeysResponseProto does not handle null response</b><br>
<blockquote>Protobuf message GetBlockKeysResponseProto member keys is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4367">HDFS-4367</a>.
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>GetDataEncryptionKeyResponseProto does not handle null response</b><br>
<blockquote>Member dataEncryptionKey of the protobuf message GetDataEncryptionKeyResponseProto is made optional instead of required. This is incompatible change is not likely to affect the existing users (that are using HDFS FileSystem and other public APIs). </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4364">HDFS-4364</a>.
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>GetLinkTargetResponseProto does not handle null path</b><br>
<blockquote>Protobuf message GetLinkTargetResponseProto member targetPath is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4363">HDFS-4363</a>.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>Combine PBHelper and HdfsProtoUtil and remove redundant methods</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4362">HDFS-4362</a>.
Critical bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>GetDelegationTokenResponseProto does not handle null token</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4359">HDFS-4359</a>.
Major bug reported by Liang Xie and fixed by Liang Xie (datanode)<br>
<b>remove an unnecessary synchronized keyword in BPOfferService.java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4351">HDFS-4351</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
<b>Fix BlockPlacementPolicyDefault#chooseTarget when avoiding stale nodes</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4350">HDFS-4350</a>.
Major bug reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Make enabling of stale marking on read and write paths independent</b><br>
<blockquote>This patch makes an incompatible configuration change, as described below:
In releases 1.1.0 and other point releases 1.1.x, the configuration parameter "dfs.namenode.check.stale.datanode" could be used to turn on checking for the stale nodes. This configuration is no longer supported in release 1.2.0 onwards and is renamed as "dfs.namenode.avoid.read.stale.datanode".
How feature works and configuring this feature:
As described in HDFS-3703 release notes, datanode stale period can be configured using parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode can be configured to use this staleness information for reads using configuration "dfs.namenode.avoid.read.stale.datanode". When this parameter is set to true, namenode picks a stale datanode as the last target to read from when returning block locations for reads. Using staleness information for writes is as described in the releases notes of HDFS-3912.
</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4349">HDFS-4349</a>.
Major test reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode , test)<br>
<b>Test reading files from BackupNode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4347">HDFS-4347</a>.
Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode , test)<br>
<b>TestBackupNode can go into infinite loop "Waiting checkpoint to complete."</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4344">HDFS-4344</a>.
Major bug reported by tamtam180 and fixed by Andy Isaacson (namenode)<br>
<b>dfshealth.jsp throws NumberFormatException when dfs.hosts/dfs.hosts.exclude includes port number</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4326">HDFS-4326</a>.
Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
<b>bump up Tomcat version for HttpFS to 6.0.36</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4315">HDFS-4315</a>.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)<br>
<b>DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4308">HDFS-4308</a>.
Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode)<br>
<b>addBlock() should persist file blocks once</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4307">HDFS-4307</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>SocketCache should use monotonic time</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4306">HDFS-4306</a>.
Major bug reported by Binglin Chang and fixed by Binglin Chang <br>
<b>PBHelper.convertLocatedBlock miss convert BlockToken</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4302">HDFS-4302</a>.
Major bug reported by Eugene Koontz and fixed by Eugene Koontz (ha , namenode)<br>
<b>Precondition in EditLogFileInputStream's length() method is checked too early in NameNode startup, causing fatal exception</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4295">HDFS-4295</a>.
Major bug reported by Stephen Chu and fixed by Stephen Chu (security)<br>
<b>Using port 1023 should be valid when starting Secure DataNode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4294">HDFS-4294</a>.
Major bug reported by Robert Parker and fixed by Robert Parker <br>
<b>Backwards compatibility is not maintained for TestVolumeId</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4292">HDFS-4292</a>.
Minor bug reported by Binglin Chang and fixed by Binglin Chang <br>
<b>Sanity check not correct in RemoteBlockReader2.newBlockReader</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4291">HDFS-4291</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
<b>edit log unit tests leave stray test_edit_log_file around</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4288">HDFS-4288</a>.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>NN accepts incremental BR as IBR in safemode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4282">HDFS-4282</a>.
Major bug reported by Junping Du and fixed by Todd Lipcon (namenode , test)<br>
<b>TestEditLog.testFuzzSequences FAILED in all pre-commit test</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4279">HDFS-4279</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)<br>
<b>NameNode does not initialize generic conf keys when started with -recover</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4274">HDFS-4274</a>.
Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)<br>
<b>BlockPoolSliceScanner does not close verification log during shutdown</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4270">HDFS-4270</a>.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (namenode)<br>
<b>Replications of the highest priority should be allowed to choose a source datanode that has reached its max replication limit</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4268">HDFS-4268</a>.
Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)<br>
<b>Remove redundant enum NNHAStatusHeartbeat.State</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4259">HDFS-4259</a>.
Minor improvement reported by Harsh J and fixed by Harsh J (hdfs-client)<br>
<b>Improve pipeline DN replacement failure message</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4247">HDFS-4247</a>.
Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>saveNamespace should be tolerant of dangling lease</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4242">HDFS-4242</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Map.Entry is incorrectly used in LeaseManager</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4238">HDFS-4238</a>.
Major bug reported by Vinay and fixed by Todd Lipcon (ha)<br>
<b>[HA] Standby namenode should not do purging of shared storage edits.</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4236">HDFS-4236</a>.
Blocker bug reported by Allen Wittenauer and fixed by Alejandro Abdelnur <br>
<b>Regression: HDFS-4171 puts artificial limit on username length</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4232">HDFS-4232</a>.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)<br>
<b>NN fails to write a fsimage with stale leases</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4231">HDFS-4231</a>.
Major improvement reported by Konstantin Shvachko and fixed by Konstantin Shvachko (ha , namenode)<br>
<b>Introduce HAState for BackupNode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4216">HDFS-4216</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Adding symlink should not ignore QuotaExceededException</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4214">HDFS-4214</a>.
Trivial improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (tools)<br>
<b>OfflineEditsViewer should print out the offset at which it encountered an error</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4213">HDFS-4213</a>.
Major new feature reported by Jing Zhao and fixed by Jing Zhao (hdfs-client , namenode)<br>
<b>When the client calls hsync, allows the client to update the file length in the NameNode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4199">HDFS-4199</a>.
Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky <br>
<b>Provide test for HdfsVolumeId</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4186">HDFS-4186</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
<b>logSync() is called with the write lock held while releasing lease</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4182">HDFS-4182</a>.
Critical bug reported by Todd Lipcon and fixed by Robert Joseph Evans (namenode)<br>
<b>SecondaryNameNode leaks NameCache entries</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4181">HDFS-4181</a>.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
<b>LeaseManager tries to double remove and prints extra messages</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4179">HDFS-4179</a>.
Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)<br>
<b>BackupNode: allow reads, fix checkpointing, safeMode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4178">HDFS-4178</a>.
Major bug reported by Andy Isaacson and fixed by Andy Isaacson (scripts)<br>
<b>shell scripts should not close stderr</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4172">HDFS-4172</a>.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (namenode)<br>
<b>namenode does not URI-encode parameters when building URI for datanode request</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4171">HDFS-4171</a>.
Major bug reported by Harsh J and fixed by Alejandro Abdelnur <br>
<b>WebHDFS and HttpFs should accept only valid Unix user names</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4164">HDFS-4164</a>.
Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)<br>
<b>fuse_dfs: add -lrt to the compiler command line on Linux</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4162">HDFS-4162</a>.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (datanode)<br>
<b>Some malformed and unquoted HTML strings are returned from datanode web ui</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4156">HDFS-4156</a>.
Major bug reported by Eli Collins and fixed by Eli Reisman <br>
<b>Seeking to a negative position should throw an IOE</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4155">HDFS-4155</a>.
Major improvement reported by Liang Xie and fixed by Liang Xie (libhdfs)<br>
<b>libhdfs implementation of hsync API</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4153">HDFS-4153</a>.
Major improvement reported by Liang Xie and fixed by Liang Xie (journal-node)<br>
<b>Add START_MSG/SHUTDOWN_MSG for JournalNode</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4143">HDFS-4143</a>.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Change INodeFile.blocks to private</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4140">HDFS-4140</a>.
Major bug reported by Andy Isaacson and fixed by Colin Patrick McCabe (fuse-dfs)<br>
<b>fuse-dfs handles open(O_TRUNC) poorly</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4139">HDFS-4139</a>.
Major bug reported by Andy Isaacson and fixed by Colin Patrick McCabe (fuse-dfs)<br>
<b>fuse-dfs RO mode still allows file truncation</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4132">HDFS-4132</a>.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)<br>
<b>when libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized memory </b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4130">HDFS-4130</a>.
Major sub-task reported by Han Xiao and fixed by Han Xiao (ha , performance)<br>
<b>BKJM: The reading for editlog at NN starting using bkjm is not efficient</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4127">HDFS-4127</a>.
Minor bug reported by Junping Du and fixed by Junping Du (namenode)<br>
<b>Log message is not correct in case of short of replica</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4122">HDFS-4122</a>.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (datanode , hdfs-client , namenode)<br>
<b>Cleanup HDFS logs and reduce the size of logged messages</b><br>
<blockquote>The change from this jira changes the content of some of the log messages. No log message are removed. Only the content of the log messages is changed to reduce the size. If you have a tool that depends on the exact content of the log, please look at the patch and make appropriate updates to the tool.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4121">HDFS-4121</a>.
Minor improvement reported by Binglin Chang and fixed by Binglin Chang <br>
<b>Add namespace declarations in hdfs .proto files for languages other than java</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4112">HDFS-4112</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>A few improvements on INodeDirectory</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4110">HDFS-4110</a>.
Trivial improvement reported by Liang Xie and fixed by Liang Xie (journal-node)<br>
<b>Refine JNStorage log</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4107">HDFS-4107</a>.
Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
<b>Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction</b><br>
<blockquote></blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-4106">HDFS-4106</a>.
Minor bug reported by Jing Zhao and fixed by Jing Zhao (namenode , test)<br>