Updating releasenotes for hadoop-2.1.0-beta. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.1.0-beta@1508428 13f79535-47bb-0310-9956-ffa450edef68

commit: fcf7243c4ec7350f373a1cc1f946a4008f1d4bbd [log] [tgz]
author: Arun Murthy <acmurthy@apache.org> Tue Jul 30 12:56:09 2013 +0000
committer: Arun Murthy <acmurthy@apache.org> Tue Jul 30 12:56:09 2013 +0000
tree: 1c38ca385ef6c445e11c5d1bbfabd319b5cca173
parent: 4c13ff730dc64933d70c338e781eae4488f25617 [diff]
diff --git a/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html b/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html
index f765e6d..6773103 100644
--- a/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html
+++ b/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html

@@ -12,57 +12,179 @@
 <a name="changes"/>
 <h2>Changes since Hadoop 2.0.5-alpha</h2>
 <ul>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-968">YARN-968</a>.
+     Blocker bug reported by Kihwal Lee and fixed by Vinod Kumar Vavilapalli <br>
+     <b>RM admin commands don't work</b><br>
+     <blockquote>If an RM admin command is issued using CLI, I get something like following:

+

+13/07/24 17:19:40 INFO client.RMProxy: Connecting to ResourceManager at xxxx.com/1.2.3.4:1234

+refreshQueues: Unknown protocol: org.apache.hadoop.yarn.api.ResourceManagerAdministrationProtocolPB

+

+</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-961">YARN-961</a>.
+     Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
+     <b>ContainerManagerImpl should enforce token on server. Today it is [TOKEN, SIMPLE]</b><br>
+     <blockquote>We should only accept SecurityAuthMethod.TOKEN for ContainerManagementProtocol. Today it also accepts SIMPLE for unsecured environment.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-960">YARN-960</a>.
+     Blocker bug reported by Alejandro Abdelnur and fixed by Daryn Sharp <br>
+     <b>TestMRCredentials and  TestBinaryTokenFile are failing on trunk</b><br>
+     <blockquote>Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.

+

+Making it a blocker until full impact of the issue is scoped.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-945">YARN-945</a>.
+     Blocker bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli <br>
+     <b>AM register failing after AMRMToken</b><br>
+     <blockquote>509 2013-07-19 15:53:55,569 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54313: readAndProcess from client 127.0.0.1       threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN]]

+510 org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN]

+511   at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1531)

+512   at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1482)

+513   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:788)

+514   at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:587)

+515   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:562)

+</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-937">YARN-937</a>.
+     Blocker bug reported by Arun C Murthy and fixed by Alejandro Abdelnur <br>
+     <b>Fix unmanaged AM in non-secure/secure setup post YARN-701</b><br>
+     <blockquote>Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will be used in both scenarios.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-932">YARN-932</a>.
+     Major bug reported by Sandy Ryza and fixed by Karthik Kambatla <br>
+     <b>TestResourceLocalizationService.testLocalizationInit can fail on JDK7</b><br>
+     <blockquote>It looks like this is occurring when testLocalizationInit doesn't run first.  Somehow yarn.nodemanager.log-dirs is getting set by one of the other tests (to ${yarn.log.dir}/userlogs), but yarn.log.dir isn't being set.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-927">YARN-927</a>.
+     Major task reported by Bikas Saha and fixed by Bikas Saha <br>
+     <b>Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest</b><br>
+     <blockquote>The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest().</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-926">YARN-926</a>.
+     Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
+     <b>ContainerManagerProtcol APIs should take in requests for multiple containers</b><br>
+     <blockquote>AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses.

+

+The client libraries could expose both the single and multi-container requests.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-922">YARN-922</a>.
+     Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
+     <b>Change FileSystemRMStateStore to use directories</b><br>
+     <blockquote>Store each app and its attempts in the same directory so that removing application state is only one operation</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-919">YARN-919</a>.
+     Minor bug reported by Mayank Bansal and fixed by Mayank Bansal <br>
+     <b>Document setting default heap sizes in yarn env</b><br>
+     <blockquote>Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script.

+

+There is no straight forward way to change it in script. Just updating the variables with defaults.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-918">YARN-918</a>.
+     Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
+     <b>ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701</b><br>
+     <blockquote>Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-912">YARN-912</a>.
+     Major bug reported by Bikas Saha and fixed by Mayank Bansal <br>
+     <b>Create exceptions package in common/api for yarn and move client facing exceptions to them</b><br>
+     <blockquote>Exceptions like InvalidResourceBlacklistRequestException, InvalidResourceRequestException, InvalidApplicationMasterRequestException etc are currently inside ResourceManager and not visible to clients.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-909">YARN-909</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
+     <b>Disable TestLinuxContainerExecutorWithMocks on Windows</b><br>
+     <blockquote>This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-897">YARN-897</a>.
+     Blocker bug reported by Djellel Eddine Difallah and fixed by Djellel Eddine Difallah (capacityscheduler)<br>
+     <b>CapacityScheduler wrongly sorted queues</b><br>
+     <blockquote>The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources.

+</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-894">YARN-894</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
+     <b>NodeHealthScriptRunner timeout checking is inaccurate on Windows</b><br>
+     <blockquote>In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution.

+

+Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout.

+

+We have following execution sequence in Shell:

+1) In main thread, schedule a delayed timer task that will kill the original process upon timeout.

+2) In main thread, open a buffered reader and feed in the process's standard input stream.

+3) When timeout happens, the timer task will call {{Process#destroy()}}

+ to kill the main process.

+

+On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread.

+

+On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this.

+

+

+ </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-883">YARN-883</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Expose Fair Scheduler-specific queue metrics</b><br>
+     <blockquote>When the Fair Scheduler is enabled, QueueMetrics should include fair share, minimum share, and maximum share.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-877">YARN-877</a>.
+     Major sub-task reported by Junping Du and fixed by Junping Du (scheduler)<br>
+     <b>Allow for black-listing resources in FifoScheduler</b><br>
+     <blockquote>YARN-750 already addressed black-list staff in YARN API and CS scheduler, this jira add implementation for FifoScheduler.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-875">YARN-875</a>.
+     Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>Application can hang if AMRMClientAsync callback thread has exception</b><br>
+     <blockquote>Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError().</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-874">YARN-874</a>.
      Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
      <b>Tracking YARN/MR test failures after HADOOP-9421 and YARN-827</b><br>
      <blockquote>HADOOP-9421 and YARN-827 broke some YARN/MR tests. Tracking those..</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-873">YARN-873</a>.
+     Major sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>YARNClient.getApplicationReport(unknownAppId) returns a null report</b><br>
+     <blockquote>How can the client find out that app does not exist?</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-869">YARN-869</a>.
      Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
      <b>ResourceManagerAdministrationProtocol should neither be public(yet) nor in yarn.api</b><br>
      <blockquote>This is a admin only api that we don't know yet if people can or should write new tools against. I am going to move it to yarn.server.api and make it @Private..</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-866">YARN-866</a>.
+     Major test reported by Wei Yan and fixed by Wei Yan <br>
+     <b>Add test for class ResourceWeights</b><br>
+     <blockquote>Add test case for the class ResourceWeights</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-865">YARN-865</a>.
+     Major improvement reported by Xuan Gong and fixed by Xuan Gong <br>
+     <b>RM webservices can't query based on application Types</b><br>
+     <blockquote>The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-861">YARN-861</a>.
      Critical bug reported by Devaraj K and fixed by Vinod Kumar Vavilapalli (nodemanager)<br>
      <b>TestContainerManager is failing</b><br>
-     <blockquote>https://builds.apache.org/job/Hadoop-Yarn-trunk/246/
-
-{code:xml}
-Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
-Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec &lt;&lt;&lt; FAILURE!
-testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager)  Time elapsed: 286 sec  &lt;&lt;&lt; FAILURE!
-junit.framework.ComparisonFailure: expected:&lt;[asf009.sp2.ygridcore.ne]t&gt; but was:&lt;[localhos]t&gt;
-	at junit.framework.Assert.assertEquals(Assert.java:85)
-
+     <blockquote>https://builds.apache.org/job/Hadoop-Yarn-trunk/246/

+

+{code:xml}

+Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager

+Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec &lt;&lt;&lt; FAILURE!

+testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager)  Time elapsed: 286 sec  &lt;&lt;&lt; FAILURE!

+junit.framework.ComparisonFailure: expected:&lt;[asf009.sp2.ygridcore.ne]t&gt; but was:&lt;[localhos]t&gt;

+	at junit.framework.Assert.assertEquals(Assert.java:85)

+

 {code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-854">YARN-854</a>.
      Blocker bug reported by Ramya Sunil and fixed by Omkar Vinit Joshi <br>
      <b>App submission fails on secure deploy</b><br>
-     <blockquote>App submission on secure cluster fails with the following exception:
-
-{noformat}
-INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with  exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0
-main : user is qa_user
-javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.]
-	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
-	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
-	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
-	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
-	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
-	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
-	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
-Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.
-	at org.apache.hadoop.ipc.Client.call(Client.java:1298)
-	at org.apache.hadoop.ipc.Client.call(Client.java:1250)
-	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
-	at $Proxy7.heartbeat(Unknown Source)
-	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
-	... 3 more
-
-.Failing this attempt.. Failing the application.
-
+     <blockquote>App submission on secure cluster fails with the following exception:

+

+{noformat}

+INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with  exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0

+main : user is qa_user

+javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.]

+	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

+	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

+	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

+	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

+	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)

+	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)

+	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)

+Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.

+	at org.apache.hadoop.ipc.Client.call(Client.java:1298)

+	at org.apache.hadoop.ipc.Client.call(Client.java:1250)

+	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)

+	at $Proxy7.heartbeat(Unknown Source)

+	at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)

+	... 3 more

+

+.Failing this attempt.. Failing the application.

+

 {noformat}</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-853">YARN-853</a>.
+     Major bug reported by Devaraj K and fixed by Devaraj K (capacityscheduler)<br>
+     <b>maximum-am-resource-percent doesn't work after refreshQueues command</b><br>
+     <blockquote>If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.&lt;queue-path&gt;.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing  'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-852">YARN-852</a>.
      Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
      <b>TestAggregatedLogFormat.testContainerLogsFileAccess fails on Windows</b><br>
@@ -78,13 +200,59 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-848">YARN-848</a>.
      Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
      <b>Nodemanager does not register with RM using the fully qualified hostname</b><br>
-     <blockquote>If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly. 
-
+     <blockquote>If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly. 

+

 Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-846">YARN-846</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Move pb Impl from yarn-api to yarn-common</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-845">YARN-845</a>.
+     Major sub-task reported by Arpit Gupta and fixed by Mayank Bansal (resourcemanager)<br>
+     <b>RM crash with NPE on NODE_UPDATE</b><br>
+     <blockquote>the following stack trace is generated in rm

+

+{code}

+n, service: 68.142.246.147:45454 }, ] resource=&lt;memory:1536, vCores:1&gt; queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=&lt;memory:44544, vCores:29&gt;usedCapacity=0.90625, absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=&lt;memory:44544, vCores:29&gt; cluster=&lt;memory:49152, vCores:48&gt;

+2013-06-17 12:43:53,655 INFO  capacity.ParentQueue (ParentQueue.java:completedContainer(696)) - completedContainer queue=root usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=&lt;memory:44544, vCores:29&gt; cluster=&lt;memory:49152, vCores:48&gt;

+2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(832)) - Application appattempt_1371448527090_0844_000001 released container container_1371448527090_0844_01_000005 on node: host: hostXX:45454 #containers=4 available=2048 used=6144 with event: FINISHED

+2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for application application_1371448527090_0844 on node: hostXX:45454

+2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp (FiCaSchedulerApp.java:unreserve(435)) - Application application_1371448527090_0844 unreserved  on node host: hostXX:45454 #containers=4 available=2048 used=6144, currently has 4 at priority 20; currentReservation &lt;memory:6144, vCores:4&gt;

+2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for deactivate...

+2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to the scheduler

+java.lang.NullPointerException

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)

+        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)

+        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)

+        at java.lang.Thread.run(Thread.java:662)

+2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(426)) - Exiting, bbye..

+2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@hostXX:8088

+2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted

+2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...

+2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.

+2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.

+2013-06-17 12:43:53,768 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.

+2013-06-17 12:43:53,768 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8033

+2013-06-17 12:43:53,770 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8033

+2013-06-17 12:43:53,770 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8032

+2013-06-17 12:43:53,770 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder

+2013-06-17 12:43:53,771 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8032

+2013-06-17 12:43:53,771 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder

+2013-06-17 12:43:53,771 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8030

+2013-06-17 12:43:53,773 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8030

+2013-06-17 12:43:53,773 INFO  ipc.Server (Server.java:stop(2167)) - Stopping server on 8031

+2013-06-17 12:43:53,773 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder

+2013-06-17 12:43:53,774 INFO  ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8031

+2013-06-17 12:43:53,775 INFO  ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder

+{code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-841">YARN-841</a>.
      Major sub-task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
      <b>Annotate and document AuxService APIs</b><br>
@@ -96,24 +264,24 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-839">YARN-839</a>.
      Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
      <b>TestContainerLaunch.testContainerEnvVariables fails on Windows</b><br>
-     <blockquote>The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails.
-
-Exception in trunk:
-{noformat}
-Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
-Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec &lt;&lt;&lt; FAILURE!
-testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)  Time elapsed: 1307 sec  &lt;&lt;&lt; ERROR!
-java.lang.NullPointerException
-        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278)
-        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
-        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
-        at java.lang.reflect.Method.invoke(Method.java:597)
-        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
-        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
-        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
-        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
-        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
+     <blockquote>The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails.

+

+Exception in trunk:

+{noformat}

+Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

+Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec &lt;&lt;&lt; FAILURE!

+testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)  Time elapsed: 1307 sec  &lt;&lt;&lt; ERROR!

+java.lang.NullPointerException

+        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278)

+        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

+        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

+        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

+        at java.lang.reflect.Method.invoke(Method.java:597)

+        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)

+        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)

+        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)

+        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)

+        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)

 {noformat}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-837">YARN-837</a>.
      Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
@@ -154,7 +322,7 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-824">YARN-824</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Add  static factory to yarn client lib interface and change it to abstract class</b><br>
-     <blockquote>Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private.
+     <blockquote>Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private.

 The purpose is not to expose impl</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-823">YARN-823</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
@@ -168,76 +336,88 @@
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Rename FinishApplicationMasterRequest.setFinishApplicationStatus to setFinalApplicationStatus to be consistent with getter</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-820">YARN-820</a>.
+     Major sub-task reported by Bikas Saha and fixed by Mayank Bansal <br>
+     <b>NodeManager has invalid state transition after error in resource localization</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-814">YARN-814</a>.
+     Major sub-task reported by Hitesh Shah and fixed by Jian He <br>
+     <b>Difficult to diagnose a failed container launch when error due to invalid environment variable</b><br>
+     <blockquote>The container's launch script sets up environment variables, symlinks etc. 

+

+If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. 

+

+To reproduce, set an env var where the value contains characters that throw syntax errors in bash. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-812">YARN-812</a>.
      Major bug reported by Ramya Sunil and fixed by Siddharth Seth <br>
      <b>Enabling app summary logs causes 'FileNotFound' errors</b><br>
-     <blockquote>RM app summary logs have been enabled as per the default config:
-
-{noformat}
-#
-# Yarn ResourceManager Application Summary Log 
-#
-# Set the ResourceManager summary log filename
-yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
-# Set the ResourceManager summary log level and appender
-yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
-
-# Appender for ResourceManager Application Summary Log
-# Requires the following properties to be set
-#    - hadoop.log.dir (Hadoop Log directory)
-#    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
-#    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
-
-log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
-log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
-log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
-log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
-log4j.appender.RMSUMMARY.MaxFileSize=256MB
-log4j.appender.RMSUMMARY.MaxBackupIndex=20
-log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
-log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
-{noformat}
-
-This however, throws errors while running commands as non-superuser:
-{noformat}
--bash-4.1$ hadoop dfs -ls /
-DEPRECATED: Use of this script to execute hdfs command is deprecated.
-Instead use the hdfs command for it.
-
-log4j:ERROR setFile(null,true) call failed.
-java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory)
-        at java.io.FileOutputStream.openAppend(Native Method)
-        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:192)
-        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:116)
-        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
-        at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
-        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
-        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
-        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
-        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
-        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
-        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
-        at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
-        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
-        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
-        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
-        at org.apache.log4j.LogManager.&lt;clinit&gt;(LogManager.java:127)
-        at org.apache.log4j.Logger.getLogger(Logger.java:104)
-        at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
-        at org.apache.commons.logging.impl.Log4JLogger.&lt;init&gt;(Log4JLogger.java:109)
-        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
-        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
-        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
-        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
-        at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
-        at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858)
-        at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
-        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
-        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
-        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
-        at org.apache.hadoop.fs.FsShell.&lt;clinit&gt;(FsShell.java:41)
-Found 1 items
-drwxr-xr-x   - hadoop   hadoop            0 2013-06-12 21:28 /user
+     <blockquote>RM app summary logs have been enabled as per the default config:

+

+{noformat}

+#

+# Yarn ResourceManager Application Summary Log 

+#

+# Set the ResourceManager summary log filename

+yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log

+# Set the ResourceManager summary log level and appender

+yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY

+

+# Appender for ResourceManager Application Summary Log

+# Requires the following properties to be set

+#    - hadoop.log.dir (Hadoop Log directory)

+#    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)

+#    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)

+

+log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}

+log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false

+log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender

+log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}

+log4j.appender.RMSUMMARY.MaxFileSize=256MB

+log4j.appender.RMSUMMARY.MaxBackupIndex=20

+log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout

+log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n

+{noformat}

+

+This however, throws errors while running commands as non-superuser:

+{noformat}

+-bash-4.1$ hadoop dfs -ls /

+DEPRECATED: Use of this script to execute hdfs command is deprecated.

+Instead use the hdfs command for it.

+

+log4j:ERROR setFile(null,true) call failed.

+java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory)

+        at java.io.FileOutputStream.openAppend(Native Method)

+        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:192)

+        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:116)

+        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)

+        at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)

+        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)

+        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)

+        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)

+        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)

+        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)

+        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)

+        at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)

+        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)

+        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)

+        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)

+        at org.apache.log4j.LogManager.&lt;clinit&gt;(LogManager.java:127)

+        at org.apache.log4j.Logger.getLogger(Logger.java:104)

+        at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)

+        at org.apache.commons.logging.impl.Log4JLogger.&lt;init&gt;(Log4JLogger.java:109)

+        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

+        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

+        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

+        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

+        at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)

+        at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858)

+        at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)

+        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)

+        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)

+        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)

+        at org.apache.hadoop.fs.FsShell.&lt;clinit&gt;(FsShell.java:41)

+Found 1 items

+drwxr-xr-x   - hadoop   hadoop            0 2013-06-12 21:28 /user

 {noformat}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-806">YARN-806</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
@@ -254,120 +434,124 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-799">YARN-799</a>.
      Major bug reported by Chris Riccomini and fixed by Chris Riccomini (nodemanager)<br>
      <b>CgroupsLCEResourcesHandler tries to write to cgroup.procs</b><br>
-     <blockquote>The implementation of
-
-bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
-
-Tells the container-executor to write PIDs to cgroup.procs:
-
-{code}
-  public String getResourcesOption(ContainerId containerId) {
-    String containerName = containerId.toString();
-    StringBuilder sb = new StringBuilder("cgroups=");
-
-    if (isCpuWeightEnabled()) {
-      sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");
-      sb.append(",");
-    }
-
-    if (sb.charAt(sb.length() - 1) == ',') {
-      sb.deleteCharAt(sb.length() - 1);
-    } 
-    return sb.toString();
-  }
-{code}
-
-Apparently, this file has not always been writeable:
-
-https://patchwork.kernel.org/patch/116146/
-http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
-https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
-
-The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.
-
-{quote}
-$ uname -a
-Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
-{quote}
-
-As a result, when the container-executor tries to run, it fails with this error message:
-
-bq.    fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",
-
-This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
-
-{quote}
-$ pwd 
-/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001
-$ ls -l
-total 0
--r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
--rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
-{quote}
-
-I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.
-
-I can think of several potential resolutions to this ticket:
-
-1. Ignore the problem, and make people patch YARN when they hit this issue.
-2. Write to /tasks instead of /cgroup.procs for everyone
-3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.
-4. Add a config to yarn-site that lets admins specify which file to write to.
-
+     <blockquote>The implementation of

+

+bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

+

+Tells the container-executor to write PIDs to cgroup.procs:

+

+{code}

+  public String getResourcesOption(ContainerId containerId) {

+    String containerName = containerId.toString();

+    StringBuilder sb = new StringBuilder("cgroups=");

+

+    if (isCpuWeightEnabled()) {

+      sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs");

+      sb.append(",");

+    }

+

+    if (sb.charAt(sb.length() - 1) == ',') {

+      sb.deleteCharAt(sb.length() - 1);

+    } 

+    return sb.toString();

+  }

+{code}

+

+Apparently, this file has not always been writeable:

+

+https://patchwork.kernel.org/patch/116146/

+http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html

+https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

+

+The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file.

+

+{quote}

+$ uname -a

+Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

+{quote}

+

+As a result, when the container-executor tries to run, it fails with this error message:

+

+bq.    fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n",

+

+This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

+

+{quote}

+$ pwd 

+/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001

+$ ls -l

+total 0

+-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs

+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us

+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us

+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares

+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release

+-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks

+{quote}

+

+I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem.

+

+I can think of several potential resolutions to this ticket:

+

+1. Ignore the problem, and make people patch YARN when they hit this issue.

+2. Write to /tasks instead of /cgroup.procs for everyone

+3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks.

+4. Add a config to yarn-site that lets admins specify which file to write to.

+

 Thoughts?</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-795">YARN-795</a>.
      Major bug reported by Wei Yan and fixed by Wei Yan (scheduler)<br>
      <b>Fair scheduler queue metrics should subtract allocated vCores from available vCores</b><br>
-     <blockquote>The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect.
+     <blockquote>The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect.

 This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-792">YARN-792</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Move NodeHealthStatus from yarn.api.record to yarn.server.api.record</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-791">YARN-791</a>.
+     Blocker sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , resourcemanager)<br>
+     <b>Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-789">YARN-789</a>.
      Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (scheduler)<br>
      <b>Enable zero capabilities resource requests in fair scheduler</b><br>
-     <blockquote>Per discussion in YARN-689, reposting updated use case:
-
-1. I have a set of services co-existing with a Yarn cluster.
-
-2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing.
-
-3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa.
-By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources.
-
-These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping.
-
-The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory).
-
-The current limitation is that the increment is also the minimum. 
-
-If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc).
-
-If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster.
-
-Finally, on hard enforcement. 
-
-* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024.
-
+     <blockquote>Per discussion in YARN-689, reposting updated use case:

+

+1. I have a set of services co-existing with a Yarn cluster.

+

+2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing.

+

+3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa.

+By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources.

+

+These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping.

+

+The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory).

+

+The current limitation is that the increment is also the minimum. 

+

+If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc).

+

+If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster.

+

+Finally, on hard enforcement. 

+

+* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024.

+

 * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again,  this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-787">YARN-787</a>.
      Blocker sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)<br>
      <b>Remove resource min from Yarn client API</b><br>
-     <blockquote>Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing.
+     <blockquote>Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-782">YARN-782</a>.
      Critical improvement reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
      <b>vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way </b><br>
-     <blockquote>The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not.
-
-If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory.  But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions.
-
-The lack of consistency will exacerbate the already difficult problem of resource configuration.
+     <blockquote>The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not.

+

+If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory.  But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions.

+

+The lack of consistency will exacerbate the already difficult problem of resource configuration.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-781">YARN-781</a>.
      Major sub-task reported by Devaraj Das and fixed by Jian He <br>
@@ -384,18 +568,22 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-767">YARN-767</a>.
      Major bug reported by Jian He and fixed by Jian He <br>
      <b>Initialize Application status metrics  when QueueMetrics is initialized</b><br>
-     <blockquote>Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed
+     <blockquote>Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed

 For now these metrics are created only when they are needed, we want to make them be seen when QueueMetrics is initialized</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-764">YARN-764</a>.
      Major bug reported by nemon lou and fixed by nemon lou (resourcemanager)<br>
      <b>blank Used Resources on Capacity Scheduler page </b><br>
-     <blockquote>Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.)
+     <blockquote>Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.)

 After changing resource.java's toString method by replacing "&lt;&gt;" with "{}",this bug gets fixed.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-763">YARN-763</a>.
+     Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>AMRMClientAsync should stop heartbeating after receiving shutdown from RM</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-761">YARN-761</a>.
      Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
      <b>TestNMClientAsync fails sometimes</b><br>
-     <blockquote>See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/.
-
+     <blockquote>See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/.

+

 It passed on my machine though.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-760">YARN-760</a>.
      Major bug reported by Sandy Ryza and fixed by Niranjan Singh (nodemanager)<br>
@@ -428,8 +616,8 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-750">YARN-750</a>.
      Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy <br>
      <b>Allow for black-listing resources in YARN API and Impl in CS</b><br>
-     <blockquote>YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources.
-
+     <blockquote>YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources.

+

 This jira is a companion to allow for black-listing (in CS).</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-749">YARN-749</a>.
      Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy <br>
@@ -442,13 +630,13 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-746">YARN-746</a>.
      Major sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
      <b>rename Service.register() and Service.unregister() to registerServiceListener() &amp; unregisterServiceListener() respectively</b><br>
-     <blockquote>make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} &amp; {{unregisterServiceListener()}} respectively.
-
+     <blockquote>make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} &amp; {{unregisterServiceListener()}} respectively.

+

 This only affects a couple of production classes; {{Service.register()}} and is used in some of the lifecycle tests of the YARN-530. There are no tests of {{Service.unregister()}}, which is something that could be corrected.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-742">YARN-742</a>.
      Major bug reported by Kihwal Lee and fixed by Jason Lowe (nodemanager)<br>
      <b>Log aggregation causes a lot of redundant setPermission calls</b><br>
-     <blockquote>In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/&lt;user&gt;/logs. Also mkdirs calls are made before this.
+     <blockquote>In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/&lt;user&gt;/logs. Also mkdirs calls are made before this.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-739">YARN-739</a>.
      Major sub-task reported by Siddharth Seth and fixed by Omkar Vinit Joshi <br>
@@ -458,6 +646,12 @@
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142 </b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-736">YARN-736</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Add a multi-resource fair sharing metric</b><br>
+     <blockquote>Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it.  This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions.

+

+With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense.  It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-735">YARN-735</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Make ApplicationAttemptID, ContainerID, NodeID immutable</b><br>
@@ -465,35 +659,39 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-733">YARN-733</a>.
      Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
      <b>TestNMClient fails occasionally</b><br>
-     <blockquote>The problem happens at:
-{code}
-        // getContainerStatus can be called after stopContainer
-        try {
-          ContainerStatus status = nmClient.getContainerStatus(
-              container.getId(), container.getNodeId(),
-              container.getContainerToken());
-          assertEquals(container.getId(), status.getContainerId());
-          assertEquals(ContainerState.RUNNING, status.getState());
-          assertTrue("" + i, status.getDiagnostics().contains(
-              "Container killed by the ApplicationMaster."));
-          assertEquals(-1000, status.getExitStatus());
-        } catch (YarnRemoteException e) {
-          fail("Exception is not expected");
-        }
-{code}
-
-NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one.
-
-There will be the similar problem wrt NMClientImpl#startContainer.
+     <blockquote>The problem happens at:

+{code}

+        // getContainerStatus can be called after stopContainer

+        try {

+          ContainerStatus status = nmClient.getContainerStatus(

+              container.getId(), container.getNodeId(),

+              container.getContainerToken());

+          assertEquals(container.getId(), status.getContainerId());

+          assertEquals(ContainerState.RUNNING, status.getState());

+          assertTrue("" + i, status.getDiagnostics().contains(

+              "Container killed by the ApplicationMaster."));

+          assertEquals(-1000, status.getExitStatus());

+        } catch (YarnRemoteException e) {

+          fail("Exception is not expected");

+        }

+{code}

+

+NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one.

+

+There will be the similar problem wrt NMClientImpl#startContainer.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-731">YARN-731</a>.
      Major sub-task reported by Siddharth Seth and fixed by Zhijie Shen <br>
      <b>RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions</b><br>
      <blockquote>Will be required for YARN-662. Also, remote NPEs show up incorrectly for some unit tests.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-727">YARN-727</a>.
+     Blocker sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
+     <b>ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter</b><br>
+     <blockquote>Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-726">YARN-726</a>.
      Critical bug reported by Siddharth Seth and fixed by Mayank Bansal <br>
      <b>Queue, FinishTime fields broken on RM UI</b><br>
-     <blockquote>The queue shows up as "Invalid Date"
+     <blockquote>The queue shows up as "Invalid Date"

 Finish Time shows up as a Long value.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-724">YARN-724</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
@@ -510,8 +708,8 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-717">YARN-717</a>.
      Major sub-task reported by Jian He and fixed by Jian He <br>
      <b>Copy BuilderUtil methods into token-related records</b><br>
-     <blockquote>This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA.
-
+     <blockquote>This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA.

+

 We may remove the ClientToken/ContainerToken/DelegationToken interface and just use the common Token interface </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-716">YARN-716</a>.
      Major task reported by Siddharth Seth and fixed by Siddharth Seth <br>
@@ -520,69 +718,74 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-715">YARN-715</a>.
      Major bug reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
      <b>TestDistributedShell and TestUnmanagedAMLauncher are failing</b><br>
-     <blockquote>Tests are timing out. Looks like this is related to YARN-617.
-{code}
-2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container.
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container.
-Expected containerId: user Found: container_1369183214008_0001_01_000001
-  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
-  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
-  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
-  at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
-  at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
-  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
+     <blockquote>Tests are timing out. Looks like this is related to YARN-617.

+{code}

+2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container.

+Expected containerId: user Found: container_1369183214008_0001_01_000001

+2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado

+Expected containerId: user Found: container_1369183214008_0001_01_000001

+2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.

+Expected containerId: user Found: container_1369183214008_0001_01_000001

+org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container.

+Expected containerId: user Found: container_1369183214008_0001_01_000001

+  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)

+  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)

+  at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)

+  at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)

+  at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)

+  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)

 {code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-714">YARN-714</a>.
      Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>AMRM protocol changes for sending NMToken list</b><br>
-     <blockquote>NMToken will be sent to AM on allocate call if
-1) AM doesn't already have NMToken for the underlying NM
-2) Key rolled over on RM and AM gets new container on the same NM.
+     <blockquote>NMToken will be sent to AM on allocate call if

+1) AM doesn't already have NMToken for the underlying NM

+2) Key rolled over on RM and AM gets new container on the same NM.

 On allocate call RM will send a consolidated list of all required NMTokens.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-711">YARN-711</a>.
      Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
      <b>Copy BuilderUtil methods into individual records</b><br>
-     <blockquote>BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records.
-
+     <blockquote>BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records.

+

 As a first step, we should just copy all the factory methods into individual classes, deprecate BuilderUtils and then slowly move all code off BuilderUtils.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-708">YARN-708</a>.
      Major task reported by Siddharth Seth and fixed by Siddharth Seth <br>
      <b>Move RecordFactory classes to hadoop-yarn-api, miscellaneous fixes to the interfaces</b><br>
-     <blockquote>This is required for additional changes in YARN-528. 
+     <blockquote>This is required for additional changes in YARN-528. 

 Some of the interfaces could use some cleanup as well - they shouldn't be declaring YarnException (Runtime) in their signature.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-706">YARN-706</a>.
      Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
      <b>Race Condition in TestFSDownload</b><br>
-     <blockquote>See the test failure in YARN-695
-
+     <blockquote>See the test failure in YARN-695

+

 https://builds.apache.org/job/PreCommit-YARN-Build/957//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPatternJar/</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-701">YARN-701</a>.
+     Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
+     <b>ApplicationTokens should be used irrespective of kerberos</b><br>
+     <blockquote> - Single code path for secure and non-secure cases is useful for testing, coverage.

+ - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-700">YARN-700</a>.
      Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
      <b>TestInfoBlock fails on Windows because of line ending missmatch</b><br>
-     <blockquote>Exception:
-{noformat}
-Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
-Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec &lt;&lt;&lt; FAILURE!
-testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  Time elapsed: 873 sec  &lt;&lt;&lt; FAILURE!
-java.lang.AssertionError: 
-	at org.junit.Assert.fail(Assert.java:91)
-	at org.junit.Assert.assertTrue(Assert.java:43)
-	at org.junit.Assert.assertTrue(Assert.java:54)
-	at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
-	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
-	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
-	at java.lang.reflect.Method.invoke(Method.java:597)
-	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
-	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
-	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
-	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
-	at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
+     <blockquote>Exception:

+{noformat}

+Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock

+Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec &lt;&lt;&lt; FAILURE!

+testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  Time elapsed: 873 sec  &lt;&lt;&lt; FAILURE!

+java.lang.AssertionError: 

+	at org.junit.Assert.fail(Assert.java:91)

+	at org.junit.Assert.assertTrue(Assert.java:43)

+	at org.junit.Assert.assertTrue(Assert.java:54)

+	at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)

+	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

+	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

+	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

+	at java.lang.reflect.Method.invoke(Method.java:597)

+	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)

+	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)

+	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)

+	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)

+	at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)

 {noformat}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-695">YARN-695</a>.
      Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
@@ -591,28 +794,28 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-694">YARN-694</a>.
      Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>Start using NMTokens to authenticate all communication with NM</b><br>
-     <blockquote>AM uses the NMToken to authenticate all the AM-NM communication.
-NM will validate NMToken in below manner
-* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId.
-* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this.
-* If NMToken is invalid then NM will reject AM calls.
-
-Modification for ContainerToken
-* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM).
-* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container).
+     <blockquote>AM uses the NMToken to authenticate all the AM-NM communication.

+NM will validate NMToken in below manner

+* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId.

+* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this.

+* If NMToken is invalid then NM will reject AM calls.

+

+Modification for ContainerToken

+* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM).

+* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container).

 * ContainerToken will exist and it will only be used to validate the AM's container start request.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-693">YARN-693</a>.
      Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>Sending NMToken to AM on allocate call</b><br>
-     <blockquote>This is part of YARN-613.
-As per the updated design, AM will receive per NM, NMToken in following scenarios
-* AM is receiving first container on underlying NM.
-* AM is receiving container on underlying NM after either NM or RM rebooted.
-** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment).
-** After NM reboot, RM will delete the token information corresponding to that AM for all AMs.
-* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side.
-In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one.
-
+     <blockquote>This is part of YARN-613.

+As per the updated design, AM will receive per NM, NMToken in following scenarios

+* AM is receiving first container on underlying NM.

+* AM is receiving container on underlying NM after either NM or RM rebooted.

+** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment).

+** After NM reboot, RM will delete the token information corresponding to that AM for all AMs.

+* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side.

+In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one.

+

 AMRMClient should expose these NMToken to client. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-692">YARN-692</a>.
      Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
@@ -621,9 +824,14 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-690">YARN-690</a>.
      Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
      <b>RM exits on token cancel/renew problems</b><br>
-     <blockquote>The DelegationTokenRenewer thread is critical to the RM.  When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread.  It should be exiting only on non-RuntimeExceptions.
-
+     <blockquote>The DelegationTokenRenewer thread is critical to the RM.  When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread.  It should be exiting only on non-RuntimeExceptions.

+

 The problem is especially bad in 23 because the yarn protobuf layer converts IOExceptions into UndeclaredThrowableExceptions (RuntimeException) which causes the renewer to abort the process.  An UnknownHostException takes down the RM...</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-688">YARN-688</a>.
+     Major bug reported by Jian He and fixed by Jian He <br>
+     <b>Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater</b><br>
+     <blockquote>Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers event happens to be on the same dispatcher thread, CleanupContainers Event will not be processed until SHUTDOWN event is processed. see similar problem on YARN-495.

+On normal NM shutdown, this is not a problem since normal stop happens on shutdownHook thread.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-686">YARN-686</a>.
      Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
      <b>Flatten NodeReport</b><br>
@@ -636,6 +844,11 @@
      Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
      <b>Change ResourceTracker API and LocalizationProtocol API to throw YarnRemoteException and IOException</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-661">YARN-661</a>.
+     Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)<br>
+     <b>NM fails to cleanup local directories for users</b><br>
+     <blockquote>YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems.  The top-level usercache directory is owned by the user but is in a directory that is not writable by the user.  Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions.

+</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-660">YARN-660</a>.
      Major sub-task reported by Bikas Saha and fixed by Bikas Saha <br>
      <b>Improve AMRMClient with matching requests</b><br>
@@ -644,6 +857,10 @@
      Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
      <b>Fair scheduler metrics should subtract allocated memory from available memory</b><br>
      <blockquote>In the scheduler web UI, cluster metrics reports that the "Memory Total" goes up when an application is allocated resources.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-654">YARN-654</a>.
+     Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
+     <b>AMRMClient: Perform sanity checks for parameters of public methods</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-651">YARN-651</a>.
      Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
      <b>Change ContainerManagerPBClientImpl and RMAdminProtocolPBClientImpl to throw IOException and YarnRemoteException</b><br>
@@ -655,9 +872,9 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-646">YARN-646</a>.
      Major bug reported by Dapeng Sun and fixed by Dapeng Sun (documentation)<br>
      <b>Some issues in Fair Scheduler's document</b><br>
-     <blockquote>Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
-1.In the section &#8220;Configuration&#8221;, It contains two properties named &#8220;yarn.scheduler.fair.minimum-allocation-mb&#8221;, the second one should be &#8220;yarn.scheduler.fair.maximum-allocation-mb&#8221;
-2.In the section &#8220;Allocation file format&#8221;, the document tells &#8220; The format contains three types of elements&#8221;, but it lists four types of elements following that.
+     <blockquote>Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:

+1.In the section &#8220;Configuration&#8221;, It contains two properties named &#8220;yarn.scheduler.fair.minimum-allocation-mb&#8221;, the second one should be &#8220;yarn.scheduler.fair.maximum-allocation-mb&#8221;

+2.In the section &#8220;Allocation file format&#8221;, the document tells &#8220; The format contains three types of elements&#8221;, but it lists four types of elements following that.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-645">YARN-645</a>.
      Major bug reported by Jian He and fixed by Jian He <br>
@@ -722,8 +939,8 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-617">YARN-617</a>.
      Minor sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
      <b>In unsercure mode, AM can fake resource requirements </b><br>
-     <blockquote>Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel.
-
+     <blockquote>Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel.

+

 In the minimum, this will avoid accidental bugs in AMs in unsecure mode.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-615">YARN-615</a>.
      Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
@@ -740,12 +957,12 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-605">YARN-605</a>.
      Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
      <b>Failing unit test in TestNMWebServices when using git for source control </b><br>
-     <blockquote>Failed tests:   testNode(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
-  testNodeSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
-  testNodeDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
-  testNodeInfo(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
-  testNodeInfoSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
-  testNodeInfoDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
+     <blockquote>Failed tests:   testNode(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789

+  testNodeSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789

+  testNodeDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789

+  testNodeInfo(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789

+  testNodeInfoSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789

+  testNodeInfoDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789

   testSingleNodesXML(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-600">YARN-600</a>.
      Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
@@ -754,10 +971,10 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-599">YARN-599</a>.
      Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
      <b>Refactoring submitApplication in ClientRMService and RMAppManager</b><br>
-     <blockquote>Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event.
-
-In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission.
-
+     <blockquote>Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event.

+

+In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission.

+

 Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-598">YARN-598</a>.
      Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
@@ -766,18 +983,18 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-597">YARN-597</a>.
      Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
      <b>TestFSDownload fails on Windows because of dependencies on tar/gzip/jar tools</b><br>
-     <blockquote>{{testDownloadArchive}}, {{testDownloadPatternJar}} and {{testDownloadArchiveZip}} fail with the similar Shell ExitCodeException:
-
-{code}
-testDownloadArchiveZip(org.apache.hadoop.yarn.util.TestFSDownload)  Time elapsed: 480 sec  &lt;&lt;&lt; ERROR!
-org.apache.hadoop.util.Shell$ExitCodeException: bash: line 0: cd: /D:/svn/t/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/TestFSDownload: No such file or directory
-gzip: 1: No such file or directory
-
-	at org.apache.hadoop.util.Shell.runCommand(Shell.java:377)
-	at org.apache.hadoop.util.Shell.run(Shell.java:292)
-	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:497)
-	at org.apache.hadoop.yarn.util.TestFSDownload.createZipFile(TestFSDownload.java:225)
-	at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadArchiveZip(TestFSDownload.java:503)
+     <blockquote>{{testDownloadArchive}}, {{testDownloadPatternJar}} and {{testDownloadArchiveZip}} fail with the similar Shell ExitCodeException:

+

+{code}

+testDownloadArchiveZip(org.apache.hadoop.yarn.util.TestFSDownload)  Time elapsed: 480 sec  &lt;&lt;&lt; ERROR!

+org.apache.hadoop.util.Shell$ExitCodeException: bash: line 0: cd: /D:/svn/t/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/TestFSDownload: No such file or directory

+gzip: 1: No such file or directory

+

+	at org.apache.hadoop.util.Shell.runCommand(Shell.java:377)

+	at org.apache.hadoop.util.Shell.run(Shell.java:292)

+	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:497)

+	at org.apache.hadoop.yarn.util.TestFSDownload.createZipFile(TestFSDownload.java:225)

+	at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadArchiveZip(TestFSDownload.java:503)

 {code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-595">YARN-595</a>.
      Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
@@ -838,32 +1055,89 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-571">YARN-571</a>.
      Major sub-task reported by Hitesh Shah and fixed by Omkar Vinit Joshi <br>
      <b>User should not be part of ContainerLaunchContext</b><br>
-     <blockquote>Today, a user is expected to set the user name in the CLC when either submitting an application or launching a container from the AM. This does not make sense as the user can/has been identified by the RM as part of the RPC layer.
-
-Solution would be to move the user information into either the Container object or directly into the ContainerToken which can then be used by the NM to launch the container. This user information would set into the container by the RM.
-
+     <blockquote>Today, a user is expected to set the user name in the CLC when either submitting an application or launching a container from the AM. This does not make sense as the user can/has been identified by the RM as part of the RPC layer.

+

+Solution would be to move the user information into either the Container object or directly into the ContainerToken which can then be used by the NM to launch the container. This user information would set into the container by the RM.

+

+</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-569">YARN-569</a>.
+     Major sub-task reported by Carlo Curino and fixed by Carlo Curino (capacityscheduler)<br>
+     <b>CapacityScheduler: support for preemption (using a capacity monitor)</b><br>
+     <blockquote>There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to 

+applications resource requests, and node updates, and the more introspective, time-based considerations 

+needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate

+mechanisms of the CapacityScheduler directly to add support for preemption by means of a "Capacity Monitor",

+which can be run optionally as a separate service (much like the NMLivelinessMonitor).

+

+The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals 

+(e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, 

+performs off-line computation to determine if preemption is needed, and how best to "edit" the current schedule to 

+improve capacity, and generates events that produce four possible actions:

+# Container de-reservations

+# Resource-based preemptions

+# Container-based preemptions

+# Container killing

+

+The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. 

+Note that due to the "lag" in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers

+from a queue) and not trying to tightly and consistently micromanage container allocations. 

+

+

+------------- Preemption policy  (ProportionalCapacityPreemptionPolicy): ------------- 

+

+Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as follows:

+# it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*)

+# if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**)

+# it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and 

+respecting bounds on the amount of preemption we allow for each round)

+# it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order)

+# it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits

+# (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left,

+# (if not enough) it moves onto unreserve and preempt from the next application. 

+# containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. 

+

+Notes:

+(*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. 

+(**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point.  

+

+Tunables of the ProportionalCapacityPreemptionPolicy:

+# 	observe-only mode (i.e., log the actions it would take, but behave as read-only)

+# how frequently to run the policy

+# how long to wait between preemption and kill of a container

+# which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned)

+# deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it)

+# overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity)

+

+In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning controllers are likely to apply.

+

+Generality:

+The monitor-based scheduler edit, and the preemption mechanisms we introduced here are designed to be more general than enforcing capacity/fairness, in fact, we are considering other monitors that leverage the same idea of "schedule edits" to target different global properties (e.g., allocate enough resources to guarantee deadlines for important jobs, or data-locality optimizations, IO-balancing among nodes, etc...).

+

+Note that by default the preemption policy we describe is disabled in the patch.

+

+Depends on YARN-45 and YARN-567, is related to YARN-568

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-568">YARN-568</a>.
      Major improvement reported by Carlo Curino and fixed by Carlo Curino (scheduler)<br>
      <b>FairScheduler: support for work-preserving preemption </b><br>
-     <blockquote>In the attached patch, we modified  the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.
+     <blockquote>In the attached patch, we modified  the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-567">YARN-567</a>.
      Major sub-task reported by Carlo Curino and fixed by Carlo Curino (resourcemanager)<br>
      <b>RM changes to support preemption for FairScheduler and CapacityScheduler</b><br>
-     <blockquote>A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. 
-
-The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned
-to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. 
-
-By introducing the notion of a work-preserving preemption we can remove this tradeoff.  This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption  efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569).
-
-The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService.
+     <blockquote>A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. 

+

+The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned

+to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. 

+

+By introducing the notion of a work-preserving preemption we can remove this tradeoff.  This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption  efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569).

+

+The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-563">YARN-563</a>.
      Major sub-task reported by Thomas Weise and fixed by Mayank Bansal <br>
      <b>Add application type to ApplicationReport </b><br>
-     <blockquote>This field is needed to distinguish different types of applications (app master implementations). For example, we may run applications of type XYZ in a cluster alongside MR and would like to filter applications by type.
+     <blockquote>This field is needed to distinguish different types of applications (app master implementations). For example, we may run applications of type XYZ in a cluster alongside MR and would like to filter applications by type.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-562">YARN-562</a>.
      Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
@@ -872,10 +1146,10 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-561">YARN-561</a>.
      Major sub-task reported by Hitesh Shah and fixed by Xuan Gong <br>
      <b>Nodemanager should set some key information into the environment of every container that it launches.</b><br>
-     <blockquote>Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched. 
-
-For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster. 
-
+     <blockquote>Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched. 

+

+For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster. 

+

 At the minimum, container id would be a useful piece of information. If the container wishes to talk to its local NM, the nodemanager related information would also come in handy. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-557">YARN-557</a>.
      Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications)<br>
@@ -884,24 +1158,24 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-553">YARN-553</a>.
      Minor sub-task reported by Harsh J and fixed by Karthik Kambatla (client)<br>
      <b>Have YarnClient generate a directly usable ApplicationSubmissionContext</b><br>
-     <blockquote>Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse.
-
-{code}
-    GetNewApplicationResponse newApp = yarnClient.getNewApplication();
-    ApplicationId appId = newApp.getApplicationId();
-
-    ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class);
-
-    appContext.setApplicationId(appId);
-{code}
-
-A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like:
-
-{code}
-GetNewApplicationResponse newApp = yarnClient.getNewApplication();
-ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext();
-{code}
-
+     <blockquote>Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse.

+

+{code}

+    GetNewApplicationResponse newApp = yarnClient.getNewApplication();

+    ApplicationId appId = newApp.getApplicationId();

+

+    ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class);

+

+    appContext.setApplicationId(appId);

+{code}

+

+A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like:

+

+{code}

+GetNewApplicationResponse newApp = yarnClient.getNewApplication();

+ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext();

+{code}

+

 [The above method can also take an arg for the container launch spec, or perhaps pre-load defaults like min-resource, etc. in the returned object, aside of just associating the application ID automatically.]</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-549">YARN-549</a>.
      Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
@@ -914,39 +1188,50 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-547">YARN-547</a>.
      Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>Race condition in Public / Private Localizer may result into resource getting downloaded again</b><br>
-     <blockquote>Public Localizer :
-At present when multiple containers try to request a localized resource 
-* If the resource is not present then first it is created and Resource Localization starts ( LocalizedResource is in DOWNLOADING state)
-* Now if in this state multiple ResourceRequestEvents arrive then ResourceLocalizationEvents are sent for all of them.
-
-Most of the times it is not resulting into a duplicate resource download but there is a race condition present there. Inside ResourceLocalization (for public download) all the requests are added to local attempts map. If a new request comes in then first it is checked in this map before a new download starts for the same. For the current download the request will be there in the map. Now if a same resource request comes in then it will rejected (i.e. resource is getting downloaded already). However if the current download completes then the request will be removed from this local map. Now after this removal if the LocalizerRequestEvent comes in then as it is not present in local map the resource will be downloaded again.
-
-PrivateLocalizer :
-Here a different but similar race condition is present.
-* Here inside findNextResource method call; each LocalizerRunner tries to grab a lock on LocalizerResource. If the lock is not acquired then it will keep trying until the resource state changes to LOCALIZED. This lock will be released by the LocalizerRunner when download completes.
-* Now if another ContainerLocalizer tries to grab the lock on a resource before LocalizedResource state changes to LOCALIZED then resource will be downloaded again.
-
+     <blockquote>Public Localizer :

+At present when multiple containers try to request a localized resource 

+* If the resource is not present then first it is created and Resource Localization starts ( LocalizedResource is in DOWNLOADING state)

+* Now if in this state multiple ResourceRequestEvents arrive then ResourceLocalizationEvents are sent for all of them.

+

+Most of the times it is not resulting into a duplicate resource download but there is a race condition present there. Inside ResourceLocalization (for public download) all the requests are added to local attempts map. If a new request comes in then first it is checked in this map before a new download starts for the same. For the current download the request will be there in the map. Now if a same resource request comes in then it will rejected (i.e. resource is getting downloaded already). However if the current download completes then the request will be removed from this local map. Now after this removal if the LocalizerRequestEvent comes in then as it is not present in local map the resource will be downloaded again.

+

+PrivateLocalizer :

+Here a different but similar race condition is present.

+* Here inside findNextResource method call; each LocalizerRunner tries to grab a lock on LocalizerResource. If the lock is not acquired then it will keep trying until the resource state changes to LOCALIZED. This lock will be released by the LocalizerRunner when download completes.

+* Now if another ContainerLocalizer tries to grab the lock on a resource before LocalizedResource state changes to LOCALIZED then resource will be downloaded again.

+

 At both the places the root cause of this is that all the threads try to acquire the lock on resource however current state of the LocalizedResource is not taken into consideration.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-542">YARN-542</a>.
      Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
      <b>Change the default global AM max-attempts value to be not one</b><br>
-     <blockquote>Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.
-
+     <blockquote>Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.

+

 I propose we change it to atleast two. Can change it to 4 to match other retry-configs.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-541">YARN-541</a>.
+     Blocker bug reported by Krishna Kishore Bonagiri and fixed by Bikas Saha (resourcemanager)<br>
+     <b>getAllocatedContainers() is not returning all the allocated containers</b><br>
+     <blockquote>I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another.

+

+My main worry is, even though the RM's log is saying that all 10 requested containers are allocated,  the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.

+

+Thanks,

+Kishore

+

+ </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-539">YARN-539</a>.
      Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>LocalizedResources are leaked in memory in case resource localization fails</b><br>
-     <blockquote>If resource localization fails then resource remains in memory and is
-1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory).
-2) reused if LocalizationRequest comes again for the same resource.
-
+     <blockquote>If resource localization fails then resource remains in memory and is

+1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory).

+2) reused if LocalizationRequest comes again for the same resource.

+

 I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-538">YARN-538</a>.
      Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
      <b>RM address DNS lookup can cause unnecessary slowness on every JHS page load </b><br>
-     <blockquote>When I run the job history server locally, every page load takes in the 10s of seconds.  I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname.  When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically.
-
-There's no that we need to perform this resolution on every page load.
+     <blockquote>When I run the job history server locally, every page load takes in the 10s of seconds.  I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname.  When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically.

+

+There's no that we need to perform this resolution on every page load.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-536">YARN-536</a>.
      Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
@@ -963,48 +1248,60 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-530">YARN-530</a>.
      Major sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
      <b>Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services</b><br>
-     <blockquote># Extend the YARN {{Service}} interface as discussed in YARN-117
-# Implement the changes in {{AbstractService}} and {{FilterService}}.
-# Migrate all services in yarn-common to the more robust service model, test.
-
+     <blockquote># Extend the YARN {{Service}} interface as discussed in YARN-117

+# Implement the changes in {{AbstractService}} and {{FilterService}}.

+# Migrate all services in yarn-common to the more robust service model, test.

+

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-525">YARN-525</a>.
      Major improvement reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)<br>
      <b>make CS node-locality-delay refreshable</b><br>
      <blockquote>the config yarn.scheduler.capacity.node-locality-delay doesn't change when you change the value in capacity_scheduler.xml and then run yarn rmadmin -refreshQueues.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-523">YARN-523</a>.
+     Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
+     <b>Container localization failures aren't reported from NM to RM</b><br>
+     <blockquote>This is mainly a pain on crashing AMs, but once we fix this, containers also can benefit - same fix for both.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-521">YARN-521</a>.
+     Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
+     <b>Augment AM - RM client module to be able to request containers only at specific locations</b><br>
+     <blockquote>When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-518">YARN-518</a>.
      Major improvement reported by Dapeng Sun and fixed by Sandy Ryza (documentation)<br>
      <b>Fair Scheduler's document link could be added to the hadoop 2.x main doc page</b><br>
-     <blockquote>Currently the doc page for Fair Scheduler looks good and it&#8217;s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.
-It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler. 
+     <blockquote>Currently the doc page for Fair Scheduler looks good and it&#8217;s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.

+It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler. 

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-515">YARN-515</a>.
      Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans <br>
      <b>Node Manager not getting the master key</b><br>
-     <blockquote>On branch-2 the latest version I see the following on a secure cluster.
-
-{noformat}
-2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now
-2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of &lt;me
-mory:12288, vCores:16&gt;
-2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started.
-2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
-2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater
-java.lang.NullPointerException
-        at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
-        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
-{noformat}
-
+     <blockquote>On branch-2 the latest version I see the following on a secure cluster.

+

+{noformat}

+2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now

+2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of &lt;me

+mory:12288, vCores:16&gt;

+2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started.

+2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.

+2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater

+java.lang.NullPointerException

+        at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)

+        at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)

+{noformat}

+

 The Null pointer exception just keeps repeating and all of the nodes end up being lost.  It looks like it never gets the secret key when it registers.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-514">YARN-514</a>.
      Major sub-task reported by Bikas Saha and fixed by Zhijie Shen (resourcemanager)<br>
      <b>Delayed store operations should not result in RM unavailability for app submission</b><br>
      <blockquote>Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-513">YARN-513</a>.
+     Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
+     <b>Create common proxy client for communicating with RM</b><br>
+     <blockquote>When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-512">YARN-512</a>.
      Minor bug reported by Jason Lowe and fixed by Maysam Yabandeh (nodemanager)<br>
      <b>Log aggregation root directory check is more expensive than it needs to be</b><br>
-     <blockquote>The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call.  That effectively stats the file twice.  It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case.
-
+     <blockquote>The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call.  That effectively stats the file twice.  It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case.

+

 In addition we may consider caching the presence of the directory rather than checking it each time a node aggregates logs for an application.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-507">YARN-507</a>.
      Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
@@ -1021,10 +1318,10 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-496">YARN-496</a>.
      Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
      <b>Fair scheduler configs are refreshed inconsistently in reinitialize</b><br>
-     <blockquote>When FairScheduler#reinitialize is called, some of the scheduler-wide configs are refreshed and others aren't.  They should all be refreshed.
-
-Ones that are refreshed: userAsDefaultQueue, nodeLocalityThreshold, rackLocalityThreshold, preemptionEnabled
-
+     <blockquote>When FairScheduler#reinitialize is called, some of the scheduler-wide configs are refreshed and others aren't.  They should all be refreshed.

+

+Ones that are refreshed: userAsDefaultQueue, nodeLocalityThreshold, rackLocalityThreshold, preemptionEnabled

+

 Ones that aren't: minimumAllocation, maximumAllocation, assignMultiple, maxAssign</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-495">YARN-495</a>.
      Major bug reported by Jian He and fixed by Jian He <br>
@@ -1057,14 +1354,14 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-485">YARN-485</a>.
      Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
      <b>TestProcfsProcessTree#testProcessTree() doesn't wait long enough for the process to die</b><br>
-     <blockquote>TestProcfsProcessTree#testProcessTree fails occasionally with the following stack trace
-
-{noformat}
-Stack Trace:
-junit.framework.AssertionFailedError: expected:&lt;false&gt; but was:&lt;true&gt;
-&#160; &#160; &#160; &#160; at org.apache.hadoop.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java)
-{noformat}
-
+     <blockquote>TestProcfsProcessTree#testProcessTree fails occasionally with the following stack trace

+

+{noformat}

+Stack Trace:

+junit.framework.AssertionFailedError: expected:&lt;false&gt; but was:&lt;true&gt;

+&#160; &#160; &#160; &#160; at org.apache.hadoop.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java)

+{noformat}

+

 kill -9 is executed asynchronously, the signal is delivered when the process comes out of the kernel (sys call). Checking if the process died immediately after can fail at times.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-482">YARN-482</a>.
      Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
@@ -1073,11 +1370,11 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-481">YARN-481</a>.
      Major bug reported by Chris Riccomini and fixed by Chris Riccomini (client)<br>
      <b>Add AM Host and RPC Port to ApplicationCLI Status Output</b><br>
-     <blockquote>Hey Guys,
-
-I noticed that the ApplicationCLI is just randomly not printing some of the values in the ApplicationReport. I've added the getHost and getRpcPort. These are useful for me, since I want to make an RPC call to the AM (not the tracker call).
-
-Thanks!
+     <blockquote>Hey Guys,

+

+I noticed that the ApplicationCLI is just randomly not printing some of the values in the ApplicationReport. I've added the getHost and getRpcPort. These are useful for me, since I want to make an RPC call to the AM (not the tracker call).

+

+Thanks!

 Chris</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-479">YARN-479</a>.
      Major bug reported by Hitesh Shah and fixed by Jian He <br>
@@ -1086,16 +1383,16 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-476">YARN-476</a>.
      Minor bug reported by Jason Lowe and fixed by Sandy Ryza <br>
      <b>ProcfsBasedProcessTree info message confuses users</b><br>
-     <blockquote>ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following:
-
-{noformat}
-2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim.
-2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim.
-2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim.
-{noformat}
-
-As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs.  It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs.
-
+     <blockquote>ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following:

+

+{noformat}

+2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim.

+2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim.

+2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim.

+{noformat}

+

+As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs.  It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs.

+

 We should either make this DEBUG or remove it entirely.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-475">YARN-475</a>.
      Major sub-task reported by Hitesh Shah and fixed by Hitesh Shah <br>
@@ -1104,61 +1401,61 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-474">YARN-474</a>.
      Major bug reported by Hitesh Shah and fixed by Zhijie Shen (capacityscheduler)<br>
      <b>CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed</b><br>
-     <blockquote>Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. 
-
+     <blockquote>Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. 

+

 The 2 applications not yet in running state do not get launched even though limits are increased.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-469">YARN-469</a>.
      Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
      <b>Make scheduling mode in FS pluggable</b><br>
-     <blockquote>Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action.
-
+     <blockquote>Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action.

+

 Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case).</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-468">YARN-468</a>.
      Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov <br>
      <b>coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter </b><br>
-     <blockquote>coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter
-
+     <blockquote>coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter

+

 patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-467">YARN-467</a>.
      Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi (nodemanager)<br>
      <b>Jobs fail during resource localization when public distributed-cache hits unix directory limits</b><br>
-     <blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception.
-
-java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed
-	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
-	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
-	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
-	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
-	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
-	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
-	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
-	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
-	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
-	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
-	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
-	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
-	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
-	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
-	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
-	at java.lang.Thread.run(Thread.java:662)
-
+     <blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception.

+

+java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed

+	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)

+	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)

+	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)

+	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)

+	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)

+	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)

+	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)

+	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)

+	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)

+	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

+	at java.util.concurrent.FutureTask.run(FutureTask.java:138)

+	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

+	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

+	at java.util.concurrent.FutureTask.run(FutureTask.java:138)

+	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

+	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

+	at java.lang.Thread.run(Thread.java:662)

+

 we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-460">YARN-460</a>.
      Blocker bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)<br>
      <b>CS user left in list of active users for the queue even when application finished</b><br>
-     <blockquote>We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config.
-
+     <blockquote>We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config.

+

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-458">YARN-458</a>.
      Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager , resourcemanager)<br>
      <b>YARN daemon addresses must be placed in many different configs</b><br>
-     <blockquote>The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address
-
-A new user trying to configure a cluster needs to know the names of all these four configs.
-
-The same issue exists for nodemanagers.
-
+     <blockquote>The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address

+

+A new user trying to configure a cluster needs to know the names of all these four configs.

+

+The same issue exists for nodemanagers.

+

 It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-450">YARN-450</a>.
      Major sub-task reported by Bikas Saha and fixed by Zhijie Shen <br>
@@ -1171,30 +1468,30 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-447">YARN-447</a>.
      Minor improvement reported by nemon lou and fixed by nemon lou (scheduler)<br>
      <b>applicationComparator improvement for CS</b><br>
-     <blockquote>Now the compare code is :
-return a1.getApplicationId().getId() - a2.getApplicationId().getId();
-
-Will be replaced with :
-return a1.getApplicationId().compareTo(a2.getApplicationId());
-
-This will bring some benefits:
-1,leave applicationId compare logic to ApplicationId class;
+     <blockquote>Now the compare code is :

+return a1.getApplicationId().getId() - a2.getApplicationId().getId();

+

+Will be replaced with :

+return a1.getApplicationId().compareTo(a2.getApplicationId());

+

+This will bring some benefits:

+1,leave applicationId compare logic to ApplicationId class;

 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-444">YARN-444</a>.
      Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , applications/distributed-shell)<br>
      <b>Move special container exit codes from YarnConfiguration to API</b><br>
-     <blockquote>YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101.
-
-These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants.
-
+     <blockquote>YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101.

+

+These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants.

+

 Per discussion on YARN-417, appmaster writers need to be able to provide special handling for them, so it might make sense to move these to their own user-facing class.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-441">YARN-441</a>.
      Major sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
      <b>Clean up unused collection methods in various APIs</b><br>
-     <blockquote>There's a bunch of unused methods like getAskCount() and getAsk(index) in AllocateRequest, and other interfaces. These should be removed.
-
-In YARN, found them in. MR will have it's own set.
-AllocateRequest
+     <blockquote>There's a bunch of unused methods like getAskCount() and getAsk(index) in AllocateRequest, and other interfaces. These should be removed.

+

+In YARN, found them in. MR will have it's own set.

+AllocateRequest

 StartContaienrResponse</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-440">YARN-440</a>.
      Major sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
@@ -1219,24 +1516,24 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-412">YARN-412</a>.
      Minor bug reported by Roger Hoover and fixed by Roger Hoover (scheduler)<br>
      <b>FifoScheduler incorrectly checking for node locality</b><br>
-     <blockquote>In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app.  This seems to be incorrect as it should be checking hostname instead.  The offending line of code is 455:
-
-application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
-
-Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
-
-In the CapacityScheduler, it's done using hostname.  See LeafQueue.assignNodeLocalContainers, line 1129
-
-application.getResourceRequest(priority, node.getHostName());
-
+     <blockquote>In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app.  This seems to be incorrect as it should be checking hostname instead.  The offending line of code is 455:

+

+application.getResourceRequest(priority, node.getRMNode().getNodeAddress());

+

+Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234)

+

+In the CapacityScheduler, it's done using hostname.  See LeafQueue.assignNodeLocalContainers, line 1129

+

+application.getResourceRequest(priority, node.getHostName());

+

 Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local.  However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-410">YARN-410</a>.
      Major bug reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
      <b>New lines in diagnostics for a failed app on the per-application page make it hard to read</b><br>
-     <blockquote>We need to fix the following issues on YARN web-UI:
- - Remove the "Note" column from the application list. When a failure happens, this "Note" spoils the table layout.
- - When the Application is still not running, the Tracking UI should be title "UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) links to "#".
- - The per-application page has all the RM related information like version, start-time etc. Must be some accidental change by one of the patches.
+     <blockquote>We need to fix the following issues on YARN web-UI:

+ - Remove the "Note" column from the application list. When a failure happens, this "Note" spoils the table layout.

+ - When the Application is still not running, the Tracking UI should be title "UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) links to "#".

+ - The per-application page has all the RM related information like version, start-time etc. Must be some accidental change by one of the patches.

  - The diagnostics for a failed app on the per-application page don't retain new lines and wrap'em around - looks hard to read.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-406">YARN-406</a>.
      Minor improvement reported by Hitesh Shah and fixed by Hitesh Shah <br>
@@ -1265,13 +1562,13 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-390">YARN-390</a>.
      Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
      <b>ApplicationCLI and NodeCLI use hard-coded platform-specific line separator, which causes test failures on Windows</b><br>
-     <blockquote>{{ApplicationCLI}}, {{NodeCLI}}, and the corresponding test {{TestYarnCLI}} all use a hard-coded '\n' as the line separator.  This causes test failures on Windows.
+     <blockquote>{{ApplicationCLI}}, {{NodeCLI}}, and the corresponding test {{TestYarnCLI}} all use a hard-coded '\n' as the line separator.  This causes test failures on Windows.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-387">YARN-387</a>.
      Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
      <b>Fix inconsistent protocol naming</b><br>
-     <blockquote>We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming.
-
+     <blockquote>We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming.

+

 We should fix these before we go beta. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-385">YARN-385</a>.
      Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
@@ -1280,61 +1577,61 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-383">YARN-383</a>.
      Minor bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
      <b>AMRMClientImpl should handle null rmClient in stop()</b><br>
-     <blockquote>2013-02-06 09:31:33,813 INFO  [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl
-org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null
-        at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605)
-        at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150)
-        at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
-        at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
+     <blockquote>2013-02-06 09:31:33,813 INFO  [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl

+org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null

+        at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605)

+        at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150)

+        at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)

+        at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-382">YARN-382</a>.
      Major improvement reported by Thomas Graves and fixed by Zhijie Shen (scheduler)<br>
      <b>SchedulerUtils improve way normalizeRequest sets the resource capabilities</b><br>
-     <blockquote>In YARN-370, we changed it from setting the capability to directly setting memory and cores:
-
--    ask.setCapability(normalized);
-+    ask.getCapability().setMemory(normalized.getMemory());
-+    ask.getCapability().setVirtualCores(normalized.getVirtualCores());
-
-We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details.
-
+     <blockquote>In YARN-370, we changed it from setting the capability to directly setting memory and cores:

+

+-    ask.setCapability(normalized);

++    ask.getCapability().setMemory(normalized.getMemory());

++    ask.getCapability().setVirtualCores(normalized.getVirtualCores());

+

+We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details.

+

 I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again.  Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-381">YARN-381</a>.
      Minor improvement reported by Eli Collins and fixed by Sandy Ryza (documentation)<br>
      <b>Improve FS docs</b><br>
-     <blockquote>The MR2 FS docs could use some improvements.
-
-Configuration:
-- sizebasedweight - what is the "size" here? Total memory usage?
-
-Pool properties:
-- minResources - what does min amount of aggregate memory mean given that this is not a reservation?
-- maxResources - is this a hard limit?
-- weight: How is this  ratio configured?  Eg base is 1 and all weights are relative to that?
-- schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job?
-
-There's no mention of ACLs, even though they're supported. See the CS docs for comparison.
-
-Also there are a couple typos worth fixing while we're at it, eg "finish. apps to run"
-
+     <blockquote>The MR2 FS docs could use some improvements.

+

+Configuration:

+- sizebasedweight - what is the "size" here? Total memory usage?

+

+Pool properties:

+- minResources - what does min amount of aggregate memory mean given that this is not a reservation?

+- maxResources - is this a hard limit?

+- weight: How is this  ratio configured?  Eg base is 1 and all weights are relative to that?

+- schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job?

+

+There's no mention of ACLs, even though they're supported. See the CS docs for comparison.

+

+Also there are a couple typos worth fixing while we're at it, eg "finish. apps to run"

+

 Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-380">YARN-380</a>.
      Major bug reported by Thomas Graves and fixed by Omkar Vinit Joshi (client)<br>
      <b>yarn node -status prints Last-Last-Health-Update</b><br>
-     <blockquote>I assume the Last-Last-Health-Update is a typo and it should just be Last-Health-Update.
-
-
-$ yarn node -status foo.com:8041
-Node Report : 
-        Node-Id : foo.com:8041
-        Rack : /10.10.10.0
-        Node-State : RUNNING
-        Node-Http-Address : foo.com:8042
-        Health-Status(isNodeHealthy) : true
-        Last-Last-Health-Update : 1360118400219
-        Health-Report : 
-        Containers : 0
-        Memory-Used : 0M
+     <blockquote>I assume the Last-Last-Health-Update is a typo and it should just be Last-Health-Update.

+

+

+$ yarn node -status foo.com:8041

+Node Report : 

+        Node-Id : foo.com:8041

+        Rack : /10.10.10.0

+        Node-State : RUNNING

+        Node-Http-Address : foo.com:8042

+        Health-Status(isNodeHealthy) : true

+        Last-Last-Health-Update : 1360118400219

+        Health-Report : 

+        Containers : 0

+        Memory-Used : 0M

         Memory-Capacity : 24576</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-378">YARN-378</a>.
      Major sub-task reported by xieguiming and fixed by Zhijie Shen (client , resourcemanager)<br>
@@ -1343,128 +1640,166 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-377">YARN-377</a>.
      Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Chris Nauroth <br>
      <b>Fix TestContainersMonitor for HADOOP-9252</b><br>
-     <blockquote>HADOOP-9252 slightly changed the format of some StringUtils outputs.  It caused TestContainersMonitor to fail.
-
+     <blockquote>HADOOP-9252 slightly changed the format of some StringUtils outputs.  It caused TestContainersMonitor to fail.

+

 Also, some methods were deprecated by HADOOP-9252.  The use of them should be replaced with the new methods.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-376">YARN-376</a>.
      Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
      <b>Apps that have completed can appear as RUNNING on the NM UI</b><br>
      <blockquote>On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished.  Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished.  This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-369">YARN-369</a>.
+     Major sub-task reported by Hitesh Shah and fixed by Mayank Bansal (resourcemanager)<br>
+     <b>Handle ( or throw a proper error when receiving) status updates from application masters that have not registered</b><br>
+     <blockquote>Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped.

+

+org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED

+       at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)

+       at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)

+       at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)

+       at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)

+       at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)

+       at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)

+       at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)

+       at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)

+       at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)

+       at java.lang.Thread.run(Thread.java:680)

+

+

+ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-368">YARN-368</a>.
+     Trivial bug reported by Albert Chu and fixed by Albert Chu <br>
+     <b>Fix typo "defiend" should be "defined" in error output</b><br>
+     <blockquote>Noticed the following in an error log output while doing some experiements

+

+./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle

+

+"defiend" should be "defined"

+</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-365">YARN-365</a>.
      Major sub-task reported by Siddharth Seth and fixed by Xuan Gong (resourcemanager , scheduler)<br>
      <b>Each NM heartbeat should not generate an event for the Scheduler</b><br>
-     <blockquote>Follow up from YARN-275
+     <blockquote>Follow up from YARN-275

 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-363">YARN-363</a>.
      Major bug reported by Jason Lowe and fixed by Kenji Kikushima <br>
      <b>yarn proxyserver fails to find webapps/proxy directory on startup</b><br>
-     <blockquote>Starting up the proxy server fails with this error:
-
-{noformat}
-2013-01-29 17:37:41,357 FATAL webproxy.WebAppProxy (WebAppProxy.java:start(99)) - Could not start proxy web server
-java.io.FileNotFoundException: webapps/proxy not found in CLASSPATH
-	at org.apache.hadoop.http.HttpServer.getWebAppsPath(HttpServer.java:533)
-	at org.apache.hadoop.http.HttpServer.&lt;init&gt;(HttpServer.java:225)
-	at org.apache.hadoop.http.HttpServer.&lt;init&gt;(HttpServer.java:164)
-	at org.apache.hadoop.yarn.server.webproxy.WebAppProxy.start(WebAppProxy.java:90)
-	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
-	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.main(WebAppProxyServer.java:94)
-{noformat}
+     <blockquote>Starting up the proxy server fails with this error:

+

+{noformat}

+2013-01-29 17:37:41,357 FATAL webproxy.WebAppProxy (WebAppProxy.java:start(99)) - Could not start proxy web server

+java.io.FileNotFoundException: webapps/proxy not found in CLASSPATH

+	at org.apache.hadoop.http.HttpServer.getWebAppsPath(HttpServer.java:533)

+	at org.apache.hadoop.http.HttpServer.&lt;init&gt;(HttpServer.java:225)

+	at org.apache.hadoop.http.HttpServer.&lt;init&gt;(HttpServer.java:164)

+	at org.apache.hadoop.yarn.server.webproxy.WebAppProxy.start(WebAppProxy.java:90)

+	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)

+	at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.main(WebAppProxyServer.java:94)

+{noformat}

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-362">YARN-362</a>.
      Minor bug reported by Jason Lowe and fixed by Ravi Prakash <br>
      <b>Unexpected extra results when using webUI table search</b><br>
-     <blockquote>When using the search box on the web UI to search for a specific task number (e.g.: "0831"), sometimes unexpected extra results are shown.  Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results.
-
+     <blockquote>When using the search box on the web UI to search for a specific task number (e.g.: "0831"), sometimes unexpected extra results are shown.  Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results.

+

 It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-347">YARN-347</a>.
+     Major improvement reported by Junping Du and fixed by Junping Du (client)<br>
+     <b>YARN CLI should show CPU info besides memory info in node status</b><br>
+     <blockquote>With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status &lt;NodeID&gt; should show CPU used and capacity info as memory info.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-345">YARN-345</a>.
      Critical bug reported by Devaraj K and fixed by Robert Parker (nodemanager)<br>
      <b>Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager</b><br>
-     <blockquote>{code:xml}
-org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED
-	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
-	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
-	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
-	at java.lang.Thread.run(Thread.java:662)
-{code}
-{code:xml}
-2013-01-17 04:03:46,726 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
-org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP
-	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
-	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
-	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
-	at java.lang.Thread.run(Thread.java:662)
-{code}
-{code:xml}
-2013-01-17 00:01:11,006 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
-org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT
-	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
-	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
-	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
-	at java.lang.Thread.run(Thread.java:662)
-{code}
-{code:xml}
-
-2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358385982671_1304_01_000001 transitioned from NEW to DONE
-2013-01-17 10:56:36,975 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
-org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at FINISHED
-	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
-	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
-	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
-	at java.lang.Thread.run(Thread.java:662)
-2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null
-{code}
-{code:xml}
-
-2013-01-17 10:56:36,026 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
-org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at FINISHED
-	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
-	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
-	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
-	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
-	at java.lang.Thread.run(Thread.java:662)
-2013-01-17 10:56:36,026 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null
-{code}
+     <blockquote>{code:xml}

+org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED

+	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)

+	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)

+	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)

+	at java.lang.Thread.run(Thread.java:662)

+{code}

+{code:xml}

+2013-01-17 04:03:46,726 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state

+org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP

+	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)

+	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)

+	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)

+	at java.lang.Thread.run(Thread.java:662)

+{code}

+{code:xml}

+2013-01-17 00:01:11,006 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state

+org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT

+	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)

+	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)

+	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)

+	at java.lang.Thread.run(Thread.java:662)

+{code}

+{code:xml}

+

+2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358385982671_1304_01_000001 transitioned from NEW to DONE

+2013-01-17 10:56:36,975 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state

+org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at FINISHED

+	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)

+	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)

+	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)

+	at java.lang.Thread.run(Thread.java:662)

+2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null

+{code}

+{code:xml}

+

+2013-01-17 10:56:36,026 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state

+org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at FINISHED

+	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)

+	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)

+	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)

+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)

+	at java.lang.Thread.run(Thread.java:662)

+2013-01-17 10:56:36,026 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null

+{code}

 </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-333">YARN-333</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
+     <b>Schedulers cannot control the queue-name of an application</b><br>
+     <blockquote>Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to "default".

+

+A scheduler may wish to make its own decision on which queue to place an app in if none is specified. For example, when the fair scheduler user-as-default-queue config option is set to true, and an app is submitted with no queue specified, the fair scheduler should assign the app to a queue with the user's name.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-326">YARN-326</a>.
      Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
      <b>Add multi-resource scheduling to the fair scheduler</b><br>
-     <blockquote>With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness.  The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness.
-
+     <blockquote>With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness.  The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness.

+

 More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-319">YARN-319</a>.
      Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)<br>
      <b>Submit a job to a queue that not allowed in fairScheduler, client will hold forever.</b><br>
-     <blockquote>RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client  will hold forever.
+     <blockquote>RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client  will hold forever.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-309">YARN-309</a>.
      Major sub-task reported by Xuan Gong and fixed by Xuan Gong (resourcemanager)<br>
@@ -1474,6 +1809,23 @@
      Major improvement reported by Arun C Murthy and fixed by Xuan Gong <br>
      <b>Improve hashCode implementations for PB records</b><br>
      <blockquote>As [~hsn] pointed out in YARN-2, we use very small primes in all our hashCode implementations.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-295">YARN-295</a>.
+     Major sub-task reported by Devaraj K and fixed by Mayank Bansal (resourcemanager)<br>
+     <b>Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl</b><br>
+     <blockquote>{code:xml}

+2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state

+org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED

+	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)

+	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)

+	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)

+	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)

+	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)

+	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)

+	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)

+	at java.lang.Thread.run(Thread.java:662)

+{code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-289">YARN-289</a>.
      Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
      <b>Fair scheduler allows reservations that won't fit on node</b><br>
@@ -1489,13 +1841,13 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-237">YARN-237</a>.
      Major improvement reported by Ravi Prakash and fixed by Jian He (resourcemanager)<br>
      <b>Refreshing the RM page forgets how many rows I had in my Datatables</b><br>
-     <blockquote>If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows.
+     <blockquote>If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows.

 This user preference should be stored in a cookie.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-236">YARN-236</a>.
      Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
      <b>RM should point tracking URL to RM web page when app fails to start</b><br>
-     <blockquote>Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start.  For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful "The requested application exited before setting a tracking URL".
-
+     <blockquote>Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start.  For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful "The requested application exited before setting a tracking URL".

+

 Usually the diagnostic string on the RM app page has something useful, so we might as well point there.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-227">YARN-227</a>.
      Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
@@ -1504,80 +1856,80 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-209">YARN-209</a>.
      Major bug reported by Bikas Saha and fixed by Zhijie Shen (capacityscheduler)<br>
      <b>Capacity scheduler doesn't trigger app-activation after adding nodes</b><br>
-     <blockquote>Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever.
+     <blockquote>Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever.

 This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-200">YARN-200</a>.
      Major sub-task reported by Robert Joseph Evans and fixed by Ravi Prakash <br>
      <b>yarn log does not output all needed information, and is in a binary format</b><br>
-     <blockquote>yarn logs does not output attemptid, nodename, or container-id.  Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts.
-
-Also the output currently includes several binary characters.  This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep.
-
+     <blockquote>yarn logs does not output attemptid, nodename, or container-id.  Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts.

+

+Also the output currently includes several binary characters.  This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep.

+

 The help message can also be more useful to users</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-198">YARN-198</a>.
      Minor improvement reported by Ramgopal N and fixed by Jian He (nodemanager)<br>
      <b>If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager</b><br>
-     <blockquote>If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM.
+     <blockquote>If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM.

  If there is a link to navigate back to RM it would be good</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-196">YARN-196</a>.
      Major bug reported by Ramgopal N and fixed by Xuan Gong (nodemanager)<br>
      <b>Nodemanager should be more robust in handling connection failure  to ResourceManager when a cluster is started</b><br>
-     <blockquote>If NM is started before starting the RM ,NM is shutting down with the following error
-{code}
-ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
-org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException
-	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
-	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
-	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
-	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
-Caused by: java.lang.reflect.UndeclaredThrowableException
-	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
-	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
-	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
-	... 3 more
-Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
-	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
-	at $Proxy23.registerNodeManager(Unknown Source)
-	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
-	... 5 more
-Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
-	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
-	at org.apache.hadoop.ipc.Client.call(Client.java:1141)
-	at org.apache.hadoop.ipc.Client.call(Client.java:1100)
-	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
-	... 7 more
-Caused by: java.net.ConnectException: Connection refused
-	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
-	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
-	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
-	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
-	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
-	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
-	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
-	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
-	at org.apache.hadoop.ipc.Client.call(Client.java:1117)
-	... 9 more
-2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
-java.lang.InterruptedException
-	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
-	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
-	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
-	at java.lang.Thread.run(Thread.java:619)
-2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
-2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999
-2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
-2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290
-2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290
-2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
-2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped.
-2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
-java.lang.InterruptedException
-	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
-	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
-	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
-	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
-	at java.lang.Thread.run(Thread.java:619)
+     <blockquote>If NM is started before starting the RM ,NM is shutting down with the following error

+{code}

+ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager

+org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException

+	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)

+	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)

+	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)

+	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)

+Caused by: java.lang.reflect.UndeclaredThrowableException

+	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)

+	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)

+	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)

+	... 3 more

+Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

+	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)

+	at $Proxy23.registerNodeManager(Unknown Source)

+	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)

+	... 5 more

+Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

+	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)

+	at org.apache.hadoop.ipc.Client.call(Client.java:1141)

+	at org.apache.hadoop.ipc.Client.call(Client.java:1100)

+	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)

+	... 7 more

+Caused by: java.net.ConnectException: Connection refused

+	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

+	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)

+	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

+	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)

+	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)

+	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)

+	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)

+	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)

+	at org.apache.hadoop.ipc.Client.call(Client.java:1117)

+	... 9 more

+2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted

+java.lang.InterruptedException

+	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)

+	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)

+	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)

+	at java.lang.Thread.run(Thread.java:619)

+2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.

+2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999

+2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.

+2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290

+2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290

+2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder

+2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped.

+2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted

+java.lang.InterruptedException

+	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)

+	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)

+	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)

+	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)

+	at java.lang.Thread.run(Thread.java:619)

 {code}</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-193">YARN-193</a>.
      Major bug reported by Hitesh Shah and fixed by Zhijie Shen (resourcemanager)<br>
@@ -1586,10 +1938,10 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-142">YARN-142</a>.
      Blocker task reported by Siddharth Seth and fixed by  <br>
      <b>[Umbrella] Cleanup YARN APIs w.r.t exceptions</b><br>
-     <blockquote>Ref: MAPREDUCE-4067
-
-All YARN APIs currently throw YarnRemoteException.
-1) This cannot be extended in it's current form.
+     <blockquote>Ref: MAPREDUCE-4067

+

+All YARN APIs currently throw YarnRemoteException.

+1) This cannot be extended in it's current form.

 2) The RPC layer can throw IOExceptions. These end up showing up as UndeclaredThrowableExceptions.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-125">YARN-125</a>.
      Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
@@ -1598,75 +1950,75 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-124">YARN-124</a>.
      Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
      <b>Make Yarn Node Manager services robust against shutdown</b><br>
-     <blockquote>Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager services. This is done by checking for fields being non-null before shutting down/closing etc, and setting the fields to null afterwards -to be resilient against re-entrancy.
-
+     <blockquote>Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager services. This is done by checking for fields being non-null before shutting down/closing etc, and setting the fields to null afterwards -to be resilient against re-entrancy.

+

 No tests other than manual review.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-123">YARN-123</a>.
      Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
      <b>Make yarn Resource Manager services robust against shutdown</b><br>
-     <blockquote>Split MAPREDUCE-3502 patches to make the RM code more resilient to being stopped more than once, or before started.
-
+     <blockquote>Split MAPREDUCE-3502 patches to make the RM code more resilient to being stopped more than once, or before started.

+

 This depends on MAPREDUCE-4014.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-117">YARN-117</a>.
      Major improvement reported by Steve Loughran and fixed by Steve Loughran <br>
      <b>Enhance YARN service model</b><br>
-     <blockquote>Having played the YARN service model, there are some issues
-that I've identified based on past work and initial use.
-
-This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs.
-
-h2. state model prevents stopped state being entered if you could not successfully start the service.
-
-In the current lifecycle you cannot stop a service unless it was successfully started, but
-* {{init()}} may acquire resources that need to be explicitly released
-* if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources.
-
-*Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null.
-
-Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than "stopped".
-
-MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one.
-h2. AbstractService doesn't prevent duplicate state change requests.
-
-The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} &amp; {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this.
-
-This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} &amp; {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers.
-
-h2. AbstractService state change doesn't defend against race conditions.
-
-There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads.
-
-h2.  Static methods to choreograph of lifecycle operations
-
-Helper methods to move things through lifecycles. init-&gt;start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns.
-
-h2. state transition failures are something that registered service listeners may wish to be informed of.
-
-When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics.
-
-*Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface.
-
-h2. Service listener failures not handled
-
-Is this an error an error or not? Log and ignore may not be what is desired.
-
-*Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes.
-
-h2. Support static listeners for all AbstractServices
-
-Add support to {{AbstractService}} that allow callers to register listeners for all instances. The existing listener interface could be used. This allows management tools to hook into the events.
-
-The static listeners would be invoked for all state changes except creation (base class shouldn't be handing out references to itself at this point).
-
-These static events could all be async, pushed through a shared {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the listeners invoked.
-
-h2. Add some example listeners for management/diagnostics
-* event to commons log for humans.
-* events for machines hooked up to the JSON logger.
-* for testing: something that be told to fail.
-
-h2.  Services should support signal interruptibility
-
+     <blockquote>Having played the YARN service model, there are some issues

+that I've identified based on past work and initial use.

+

+This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs.

+

+h2. state model prevents stopped state being entered if you could not successfully start the service.

+

+In the current lifecycle you cannot stop a service unless it was successfully started, but

+* {{init()}} may acquire resources that need to be explicitly released

+* if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources.

+

+*Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null.

+

+Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than "stopped".

+

+MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one.

+h2. AbstractService doesn't prevent duplicate state change requests.

+

+The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} &amp; {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this.

+

+This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} &amp; {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers.

+

+h2. AbstractService state change doesn't defend against race conditions.

+

+There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads.

+

+h2.  Static methods to choreograph of lifecycle operations

+

+Helper methods to move things through lifecycles. init-&gt;start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns.

+

+h2. state transition failures are something that registered service listeners may wish to be informed of.

+

+When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics.

+

+*Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface.

+

+h2. Service listener failures not handled

+

+Is this an error an error or not? Log and ignore may not be what is desired.

+

+*Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes.

+

+h2. Support static listeners for all AbstractServices

+

+Add support to {{AbstractService}} that allow callers to register listeners for all instances. The existing listener interface could be used. This allows management tools to hook into the events.

+

+The static listeners would be invoked for all state changes except creation (base class shouldn't be handing out references to itself at this point).

+

+These static events could all be async, pushed through a shared {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the listeners invoked.

+

+h2. Add some example listeners for management/diagnostics

+* event to commons log for humans.

+* events for machines hooked up to the JSON logger.

+* for testing: something that be told to fail.

+

+h2.  Services should support signal interruptibility

+

 The services would benefit from a way of shutting them down on a kill signal; this can be done via a runtime hook. It should not be automatic though, as composite services will get into a very complex state during shutdown. Better to provide a hook that lets you register/unregister services to terminate, and have the relevant {{main()}} entry points tell their root services to register themselves.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-112">YARN-112</a>.
      Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)<br>
@@ -1679,185 +2031,247 @@
 <li> <a href="https://issues.apache.org/jira/browse/YARN-101">YARN-101</a>.
      Minor bug reported by xieguiming and fixed by Xuan Gong (nodemanager)<br>
      <b>If  the heartbeat message loss, the nodestatus info of complete container will loss too.</b><br>
-     <blockquote>see the red color:
-
-org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
-
- protected void startStatusUpdater() {
-
-    new Thread("Node Status Updater") {
-      @Override
-      @SuppressWarnings("unchecked")
-      public void run() {
-        int lastHeartBeatID = 0;
-        while (!isStopped) {
-          // Send heartbeat
-          try {
-            synchronized (heartbeatMonitor) {
-              heartbeatMonitor.wait(heartBeatInterval);
-            }
-        {color:red} 
-            // Before we send the heartbeat, we get the NodeStatus,
-            // whose method removes completed containers.
-            NodeStatus nodeStatus = getNodeStatus();
-         {color}
-            nodeStatus.setResponseId(lastHeartBeatID);
-            
-            NodeHeartbeatRequest request = recordFactory
-                .newRecordInstance(NodeHeartbeatRequest.class);
-            request.setNodeStatus(nodeStatus);   
-            {color:red} 
-
-           // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here.
-            HeartbeatResponse response =
-              resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
-           {color} 
-
-            if (response.getNodeAction() == NodeAction.SHUTDOWN) {
-              LOG
-                  .info("Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat," +
-                  		" hence shutting down.");
-              NodeStatusUpdaterImpl.this.stop();
-              break;
-            }
-            if (response.getNodeAction() == NodeAction.REBOOT) {
-              LOG.info("Node is out of sync with ResourceManager,"
-                  + " hence rebooting.");
-              NodeStatusUpdaterImpl.this.reboot();
-              break;
-            }
-
-            lastHeartBeatID = response.getResponseId();
-            List&lt;ContainerId&gt; containersToCleanup = response
-                .getContainersToCleanupList();
-            if (containersToCleanup.size() != 0) {
-              dispatcher.getEventHandler().handle(
-                  new CMgrCompletedContainersEvent(containersToCleanup));
-            }
-            List&lt;ApplicationId&gt; appsToCleanup =
-                response.getApplicationsToCleanupList();
-            //Only start tracking for keepAlive on FINISH_APP
-            trackAppsForKeepAlive(appsToCleanup);
-            if (appsToCleanup.size() != 0) {
-              dispatcher.getEventHandler().handle(
-                  new CMgrCompletedAppsEvent(appsToCleanup));
-            }
-          } catch (Throwable e) {
-            // TODO Better error handling. Thread can die with the rest of the
-            // NM still running.
-            LOG.error("Caught exception in status-updater", e);
-          }
-        }
-      }
-    }.start();
-  }
-
-
-
-  private NodeStatus getNodeStatus() {
-
-    NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
-    nodeStatus.setNodeId(this.nodeId);
-
-    int numActiveContainers = 0;
-    List&lt;ContainerStatus&gt; containersStatuses = new ArrayList&lt;ContainerStatus&gt;();
-    for (Iterator&lt;Entry&lt;ContainerId, Container&gt;&gt; i =
-        this.context.getContainers().entrySet().iterator(); i.hasNext();) {
-      Entry&lt;ContainerId, Container&gt; e = i.next();
-      ContainerId containerId = e.getKey();
-      Container container = e.getValue();
-
-      // Clone the container to send it to the RM
-      org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
-          container.cloneAndGetContainerStatus();
-      containersStatuses.add(containerStatus);
-      ++numActiveContainers;
-      LOG.info("Sending out status for container: " + containerStatus);
-      {color:red} 
-
-      // Here is the part that removes the completed containers.
-      if (containerStatus.getState() == ContainerState.COMPLETE) {
-        // Remove
-        i.remove();
-      {color} 
-
-        LOG.info("Removed completed container " + containerId);
-      }
-    }
-    nodeStatus.setContainersStatuses(containersStatuses);
-
-    LOG.debug(this.nodeId + " sending out status for "
-        + numActiveContainers + " containers");
-
-    NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus();
-    nodeHealthStatus.setHealthReport(healthChecker.getHealthReport());
-    nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy());
-    nodeHealthStatus.setLastHealthReportTime(
-        healthChecker.getLastHealthReportTime());
-    if (LOG.isDebugEnabled()) {
-      LOG.debug("Node's health-status : " + nodeHealthStatus.getIsNodeHealthy()
-                + ", " + nodeHealthStatus.getHealthReport());
-    }
-    nodeStatus.setNodeHealthStatus(nodeHealthStatus);
-
-    List&lt;ApplicationId&gt; keepAliveAppIds = createKeepAliveApplicationList();
-    nodeStatus.setKeepAliveApplications(keepAliveAppIds);
-    
-    return nodeStatus;
-  }
+     <blockquote>see the red color:

+

+org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java

+

+ protected void startStatusUpdater() {

+

+    new Thread("Node Status Updater") {

+      @Override

+      @SuppressWarnings("unchecked")

+      public void run() {

+        int lastHeartBeatID = 0;

+        while (!isStopped) {

+          // Send heartbeat

+          try {

+            synchronized (heartbeatMonitor) {

+              heartbeatMonitor.wait(heartBeatInterval);

+            }

+        {color:red} 

+            // Before we send the heartbeat, we get the NodeStatus,

+            // whose method removes completed containers.

+            NodeStatus nodeStatus = getNodeStatus();

+         {color}

+            nodeStatus.setResponseId(lastHeartBeatID);

+            

+            NodeHeartbeatRequest request = recordFactory

+                .newRecordInstance(NodeHeartbeatRequest.class);

+            request.setNodeStatus(nodeStatus);   

+            {color:red} 

+

+           // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here.

+            HeartbeatResponse response =

+              resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();

+           {color} 

+

+            if (response.getNodeAction() == NodeAction.SHUTDOWN) {

+              LOG

+                  .info("Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat," +

+                  		" hence shutting down.");

+              NodeStatusUpdaterImpl.this.stop();

+              break;

+            }

+            if (response.getNodeAction() == NodeAction.REBOOT) {

+              LOG.info("Node is out of sync with ResourceManager,"

+                  + " hence rebooting.");

+              NodeStatusUpdaterImpl.this.reboot();

+              break;

+            }

+

+            lastHeartBeatID = response.getResponseId();

+            List&lt;ContainerId&gt; containersToCleanup = response

+                .getContainersToCleanupList();

+            if (containersToCleanup.size() != 0) {

+              dispatcher.getEventHandler().handle(

+                  new CMgrCompletedContainersEvent(containersToCleanup));

+            }

+            List&lt;ApplicationId&gt; appsToCleanup =

+                response.getApplicationsToCleanupList();

+            //Only start tracking for keepAlive on FINISH_APP

+            trackAppsForKeepAlive(appsToCleanup);

+            if (appsToCleanup.size() != 0) {

+              dispatcher.getEventHandler().handle(

+                  new CMgrCompletedAppsEvent(appsToCleanup));

+            }

+          } catch (Throwable e) {

+            // TODO Better error handling. Thread can die with the rest of the

+            // NM still running.

+            LOG.error("Caught exception in status-updater", e);

+          }

+        }

+      }

+    }.start();

+  }

+

+

+

+  private NodeStatus getNodeStatus() {

+

+    NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);

+    nodeStatus.setNodeId(this.nodeId);

+

+    int numActiveContainers = 0;

+    List&lt;ContainerStatus&gt; containersStatuses = new ArrayList&lt;ContainerStatus&gt;();

+    for (Iterator&lt;Entry&lt;ContainerId, Container&gt;&gt; i =

+        this.context.getContainers().entrySet().iterator(); i.hasNext();) {

+      Entry&lt;ContainerId, Container&gt; e = i.next();

+      ContainerId containerId = e.getKey();

+      Container container = e.getValue();

+

+      // Clone the container to send it to the RM

+      org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 

+          container.cloneAndGetContainerStatus();

+      containersStatuses.add(containerStatus);

+      ++numActiveContainers;

+      LOG.info("Sending out status for container: " + containerStatus);

+      {color:red} 

+

+      // Here is the part that removes the completed containers.

+      if (containerStatus.getState() == ContainerState.COMPLETE) {

+        // Remove

+        i.remove();

+      {color} 

+

+        LOG.info("Removed completed container " + containerId);

+      }

+    }

+    nodeStatus.setContainersStatuses(containersStatuses);

+

+    LOG.debug(this.nodeId + " sending out status for "

+        + numActiveContainers + " containers");

+

+    NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus();

+    nodeHealthStatus.setHealthReport(healthChecker.getHealthReport());

+    nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy());

+    nodeHealthStatus.setLastHealthReportTime(

+        healthChecker.getLastHealthReportTime());

+    if (LOG.isDebugEnabled()) {

+      LOG.debug("Node's health-status : " + nodeHealthStatus.getIsNodeHealthy()

+                + ", " + nodeHealthStatus.getHealthReport());

+    }

+    nodeStatus.setNodeHealthStatus(nodeHealthStatus);

+

+    List&lt;ApplicationId&gt; keepAliveAppIds = createKeepAliveApplicationList();

+    nodeStatus.setKeepAliveApplications(keepAliveAppIds);

+    

+    return nodeStatus;

+  }

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-99">YARN-99</a>.
      Major sub-task reported by Devaraj K and fixed by Omkar Vinit Joshi (nodemanager)<br>
      <b>Jobs fail during resource localization when private distributed-cache hits unix directory limits</b><br>
-     <blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.
-
-
-{code:xml}
-java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
-	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
-	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
-	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
-	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
-	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
-	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
-	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
-	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
-	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
-	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
-	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
-	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
-	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
-	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
-	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
-	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
-	at java.lang.Thread.run(Thread.java:662)
-{code}
-
+     <blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.

+

+

+{code:xml}

+java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed

+	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)

+	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)

+	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)

+	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)

+	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)

+	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)

+	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)

+	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)

+	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)

+	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

+	at java.util.concurrent.FutureTask.run(FutureTask.java:138)

+	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

+	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

+	at java.util.concurrent.FutureTask.run(FutureTask.java:138)

+	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

+	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

+	at java.lang.Thread.run(Thread.java:662)

+{code}

+

 We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-84">YARN-84</a>.
+     Minor improvement reported by Brandon Li and fixed by Brandon Li <br>
+     <b>Use Builder to get RPC server in YARN</b><br>
+     <blockquote>In HADOOP-8736, a Builder is introduced to replace all the getServer() variants. This JIRA is the change in YARN.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-71">YARN-71</a>.
      Critical bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong (nodemanager)<br>
      <b>Ensure/confirm that the NodeManager cleans up local-dirs on restart</b><br>
-     <blockquote>We have to make sure that NodeManagers cleanup their local files on restart.
-
+     <blockquote>We have to make sure that NodeManagers cleanup their local files on restart.

+

 It may already be working like that in which case we should have tests validating this.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-62">YARN-62</a>.
+     Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
+     <b>AM should not be able to abuse container tokens for repetitive container launches</b><br>
+     <blockquote>Clone of YARN-51.

+

+ApplicationMaster should not be able to store container tokens and use the same set of tokens for repetitive container launches. The possibility of such abuse is there in the current code, for a duration of 1d+10mins, we need to fix this.</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-45">YARN-45</a>.
      Major sub-task reported by Chris Douglas and fixed by Carlo Curino (resourcemanager)<br>
      <b>Scheduler feedback to AM to release containers</b><br>
-     <blockquote>The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers.
-
+     <blockquote>The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers.

+

 [1] http://research.yahoo.com/files/yl-2012-003.pdf</blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/YARN-24">YARN-24</a>.
      Major bug reported by Jason Lowe and fixed by Sandy Ryza (nodemanager)<br>
      <b>Nodemanager fails to start if log aggregation enabled and namenode unavailable</b><br>
      <blockquote>If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5421">MAPREDUCE-5421</a>.
+     Blocker bug reported by Junping Du and fixed by Junping Du (test)<br>
+     <b>TestNonExistentJob is failed due to recent changes in YARN</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5419">MAPREDUCE-5419</a>.
+     Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)<br>
+     <b>TestSlive is getting FileNotFound Exception</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5412">MAPREDUCE-5412</a>.
+     Major bug reported by Jian He and fixed by Jian He <br>
+     <b>Change MR to use multiple containers API of ContainerManager after YARN-926</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5398">MAPREDUCE-5398</a>.
+     Major improvement reported by Bikas Saha and fixed by Jian He <br>
+     <b>MR changes for YARN-513</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5366">MAPREDUCE-5366</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
+     <b>TestMRAsyncDiskService fails on Windows</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5360">MAPREDUCE-5360</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)<br>
+     <b>TestMRJobClient fails on Windows due to path format</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5359">MAPREDUCE-5359</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
+     <b>JobHistory should not use File.separator to match timestamp in path</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5357">MAPREDUCE-5357</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
+     <b>Job staging directory owner checking could fail on Windows</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5355">MAPREDUCE-5355</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
+     <b>MiniMRYarnCluster with localFs does not work on Windows</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5349">MAPREDUCE-5349</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
+     <b>TestClusterMapReduceTestCase and TestJobName fail on Windows in branch-2</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5334">MAPREDUCE-5334</a>.
      Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
      <b>TestContainerLauncherImpl is failing</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5333">MAPREDUCE-5333</a>.
+     Major test reported by Alejandro Abdelnur and fixed by Wei Yan (mr-am)<br>
+     <b>Add test that verifies MRAM works correctly when sending requests with non-normalized capabilities</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5328">MAPREDUCE-5328</a>.
+     Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
+     <b>ClientToken should not be set in the environment</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5326">MAPREDUCE-5326</a>.
      Blocker bug reported by Arun C Murthy and fixed by Zhijie Shen <br>
      <b>Add version to shuffle header</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5325">MAPREDUCE-5325</a>.
+     Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
+     <b>ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter---MR changes</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5319">MAPREDUCE-5319</a>.
      Major bug reported by yeshavora and fixed by Xuan Gong <br>
      <b>Job.xml file does not has 'user.name' property for Hadoop2</b><br>
@@ -1887,7 +2301,7 @@
      <b>Changes on MR after moving ProtoBase to package impl.pb on YARN-724</b><br>
      <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5301">MAPREDUCE-5301</a>.
-     Major bug reported by Siddharth Seth and fixed by  <br>
+     Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br>
      <b>Update MR code to work with YARN-635 changes</b><br>
      <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5300">MAPREDUCE-5300</a>.
@@ -2034,6 +2448,10 @@
      Major sub-task reported by Sandy Ryza and fixed by Zhijie Shen (client)<br>
      <b>Mapred API: TaskCompletionEvent incompatibility issues with MR1</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5213">MAPREDUCE-5213</a>.
+     Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
+     <b>Re-assess TokenCache methods marked @Private</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5212">MAPREDUCE-5212</a>.
      Major bug reported by Xuan Gong and fixed by Xuan Gong <br>
      <b>Handle exception related changes in YARN's ClientRMProtocol api after YARN-631</b><br>
@@ -2074,6 +2492,10 @@
      Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
      <b>TestQueue#testQueue fails with timeout on Windows</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5187">MAPREDUCE-5187</a>.
+     Major bug reported by Chuan Liu and fixed by Chuan Liu (mrv2)<br>
+     <b>Create mapreduce command scripts on Windows</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-5184">MAPREDUCE-5184</a>.
      Major sub-task reported by Arun C Murthy and fixed by Zhijie Shen (documentation)<br>
      <b>Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2</b><br>
@@ -2378,6 +2800,10 @@
      Major bug reported by Thomas Graves and fixed by Thomas Graves (webapps)<br>
      <b>TestHsWebServicesJobs fails on jdk7</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4374">MAPREDUCE-4374</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu (mrv2)<br>
+     <b>Fix child task environment variable config and add support for Windows</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4356">MAPREDUCE-4356</a>.
      Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)<br>
      <b>Provide access to ParsedTask.obtainTaskAttempts()</b><br>
@@ -2446,6 +2872,118 @@
      Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)<br>
      <b>Gridmix simulated job's map's hdfsBytesRead counter is wrong when compressed input is used</b><br>
      <blockquote>Makes Gridmix use the uncompressed input data size while simulating map tasks in the case where compressed input data was used in original job.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5027">HDFS-5027</a>.
+     Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)<br>
+     <b>On startup, DN should scan volumes in parallel</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5025">HDFS-5025</a>.
+     Major sub-task reported by Jing Zhao and fixed by Jing Zhao (ha , namenode)<br>
+     <b>Record ClientId and CallId in EditLog to enable rebuilding retry cache in case of HA failover</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5024">HDFS-5024</a>.
+     Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)<br>
+     <b>Make DatanodeProtocol#commitBlockSynchronization idempotent</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5020">HDFS-5020</a>.
+     Major improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)<br>
+     <b>Make DatanodeProtocol#blockReceivedAndDeleted idempotent</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5018">HDFS-5018</a>.
+     Minor bug reported by Ted Yu and fixed by Ted Yu <br>
+     <b>Misspelled DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT in javadoc of DatanodeInfo#isStale()</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5016">HDFS-5016</a>.
+     Blocker bug reported by Devaraj Das and fixed by Suresh Srinivas <br>
+     <b>Deadlock in pipeline recovery causes Datanode to be marked dead</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5010">HDFS-5010</a>.
+     Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (namenode , performance)<br>
+     <b>Reduce the frequency of getCurrentUser() calls from namenode</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5008">HDFS-5008</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Jing Zhao (namenode)<br>
+     <b>Make ClientProtocol#abandonBlock() idempotent</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5007">HDFS-5007</a>.
+     Minor improvement reported by Kousuke Saruta and fixed by Kousuke Saruta <br>
+     <b>Replace hard-coded property keys with DFSConfigKeys fields</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5005">HDFS-5005</a>.
+     Major bug reported by Jing Zhao and fixed by Jing Zhao <br>
+     <b>Move SnapshotException and SnapshotAccessControlException to o.a.h.hdfs.protocol</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-5003">HDFS-5003</a>.
+     Minor bug reported by Xi Fang and fixed by Xi Fang (test)<br>
+     <b>TestNNThroughputBenchmark failed caused by existing directories</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4999">HDFS-4999</a>.
+     Major bug reported by Kihwal Lee and fixed by Colin Patrick McCabe <br>
+     <b>fix TestShortCircuitLocalRead on branch-2</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4998">HDFS-4998</a>.
+     Major bug reported by Kihwal Lee and fixed by Kihwal Lee (test)<br>
+     <b>TestUnderReplicatedBlocks fails intermittently</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4996">HDFS-4996</a>.
+     Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
+     <b>ClientProtocol#metaSave can be made idempotent by overwriting the output file instead of appending to it</b><br>
+     <blockquote>The dfsadmin -metasave command has been changed to overwrite the output file.  Previously, this command would append to the output file if it already existed.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4992">HDFS-4992</a>.
+     Major improvement reported by Max Lapan and fixed by Max Lapan (balancer)<br>
+     <b>Make balancer's thread count configurable</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4982">HDFS-4982</a>.
+     Major bug reported by Todd Lipcon and fixed by Todd Lipcon (journal-node , security)<br>
+     <b>JournalNode should relogin from keytab before fetching logs from other JNs</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4980">HDFS-4980</a>.
+     Major bug reported by Mark Grover and fixed by Mark Grover (build)<br>
+     <b>Incorrect logging.properties file for hadoop-httpfs</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4979">HDFS-4979</a>.
+     Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
+     <b>Implement retry cache on the namenode</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4978">HDFS-4978</a>.
+     Major improvement reported by Jing Zhao and fixed by Jing Zhao <br>
+     <b>Make disallowSnapshot idempotent</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4974">HDFS-4974</a>.
+     Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (ha , namenode)<br>
+     <b>Analyze and add annotations to Namenode protocol methods and enable retry</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4969">HDFS-4969</a>.
+     Blocker bug reported by Robert Kanter and fixed by Robert Kanter (test , webhdfs)<br>
+     <b>WebhdfsFileSystem expects non-standard WEBHDFS Json element</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4954">HDFS-4954</a>.
+     Major bug reported by Brandon Li and fixed by Brandon Li (nfs)<br>
+     <b>compile failure in branch-2: getFlushedOffset should catch or rethrow IOException</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4951">HDFS-4951</a>.
+     Major bug reported by Robert Kanter and fixed by Robert Kanter (security)<br>
+     <b>FsShell commands using secure httpfs throw exceptions due to missing TokenRenewer</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4948">HDFS-4948</a>.
+     Major bug reported by Robert Joseph Evans and fixed by Brandon Li <br>
+     <b>mvn site for hadoop-hdfs-nfs fails</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4944">HDFS-4944</a>.
+     Major bug reported by Chris Nauroth and fixed by Chris Nauroth (webhdfs)<br>
+     <b>WebHDFS cannot create a file path containing characters that must be URI-encoded, such as space.</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4943">HDFS-4943</a>.
+     Minor bug reported by Jerry He and fixed by Jerry He (webhdfs)<br>
+     <b>WebHdfsFileSystem does not work when original file path has encoded chars </b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4932">HDFS-4932</a>.
+     Minor improvement reported by Fengdong Yu and fixed by Fengdong Yu (ha , namenode)<br>
+     <b>Avoid a wide line on the name node webUI if we have more Journal nodes</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4927">HDFS-4927</a>.
+     Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
+     <b>CreateEditsLog creates inodes with an invalid inode ID, which then cannot be loaded by a namenode.</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4917">HDFS-4917</a>.
      Major bug reported by Fengdong Yu and fixed by Fengdong Yu (datanode , namenode)<br>
      <b>Start-dfs.sh cannot pass the parameters correctly</b><br>
@@ -2454,18 +2992,38 @@
      Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs-client)<br>
      <b>When possible, Use DFSClient.Conf instead of Configuration </b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4912">HDFS-4912</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
+     <b>Cleanup FSNamesystem#startFileInternal</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4910">HDFS-4910</a>.
      Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
      <b>TestPermission failed in branch-2</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4908">HDFS-4908</a>.
+     Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode , snapshots)<br>
+     <b>Reduce snapshot inode memory usage</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4906">HDFS-4906</a>.
      Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs-client)<br>
      <b>HDFS Output streams should not accept writes after being closed</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4903">HDFS-4903</a>.
+     Minor improvement reported by Suresh Srinivas and fixed by Arpit Agarwal (namenode)<br>
+     <b>Print trash configuration and trash emptier state in namenode log</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4902">HDFS-4902</a>.
      Major bug reported by Binglin Chang and fixed by Binglin Chang (snapshots)<br>
      <b>DFSClient.getSnapshotDiffReport should use string path rather than o.a.h.fs.Path</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4888">HDFS-4888</a>.
+     Major bug reported by Ravi Prakash and fixed by Ravi Prakash <br>
+     <b>Refactor and fix FSNamesystem.getTurnOffTip to sanity</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4887">HDFS-4887</a>.
+     Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee (benchmarks , test)<br>
+     <b>TestNNThroughputBenchmark exits abruptly</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4883">HDFS-4883</a>.
      Major bug reported by Konstantin Shvachko and fixed by Tao Luo (namenode)<br>
      <b>complete() should verify fileId</b><br>
@@ -2538,6 +3096,10 @@
      Major sub-task reported by Jing Zhao and fixed by Jing Zhao (snapshots)<br>
      <b>Snapshot: identify the correct prior snapshot when deleting a snapshot under a renamed subtree</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4841">HDFS-4841</a>.
+     Major bug reported by Stephen Chu and fixed by Robert Kanter (security , webhdfs)<br>
+     <b>FsShell commands using secure webhfds fail ClientFinalizer shutdown hook</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4840">HDFS-4840</a>.
      Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
      <b>ReplicationMonitor gets NPE during shutdown</b><br>
@@ -2602,6 +3164,10 @@
      Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode)<br>
      <b>Corrupt replica can be prematurely removed from corruptReplicas map</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4797">HDFS-4797</a>.
+     Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (datanode)<br>
+     <b>BlockScanInfo does not override equals(..) and hashCode() consistently</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4787">HDFS-4787</a>.
      Major improvement reported by Tian Hong Wang and fixed by Tian Hong Wang <br>
      <b>Create a new HdfsConfiguration before each TestDFSClientRetries testcases</b><br>
@@ -2638,6 +3204,10 @@
      Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)<br>
      <b>Permission check of symlink deletion incorrectly throws UnresolvedLinkException</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4762">HDFS-4762</a>.
+     Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)<br>
+     <b>Provide HDFS based NFSv3 and Mountd implementation</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4751">HDFS-4751</a>.
      Minor bug reported by Andrew Wang and fixed by Andrew Wang (test)<br>
      <b>TestLeaseRenewer#testThreadName flakes</b><br>
@@ -2695,7 +3265,7 @@
      <b>Speed up lease/block recovery when DN fails and a block goes into recovery</b><br>
      <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4714">HDFS-4714</a>.
-     Major bug reported by Kihwal Lee and fixed by  (namenode)<br>
+     Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)<br>
      <b>Log short messages in Namenode RPC server for exceptions meant for clients</b><br>
      <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4705">HDFS-4705</a>.
@@ -2718,6 +3288,10 @@
      Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)<br>
      <b>Some test cases in TestCheckpoint do not clean up after themselves</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4687">HDFS-4687</a>.
+     Minor bug reported by Andrew Wang and fixed by Andrew Wang (test)<br>
+     <b>TestDelegationTokenForProxyUser#testWebHdfsDoAs is flaky with JDK7</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4679">HDFS-4679</a>.
      Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
      <b>Namenode operation checks should be done in a consistent manner</b><br>
@@ -2754,6 +3328,10 @@
      Minor bug reported by Jagane Sundar and fixed by  (namenode)<br>
      <b>createNNProxyWithClientProtocol ignores configured timeout value</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4645">HDFS-4645</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Arpit Agarwal (namenode)<br>
+     <b>Move from randomly generated block ID to sequentially generated block ID</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4643">HDFS-4643</a>.
      Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (qjm , test)<br>
      <b>Fix flakiness in TestQuorumJournalManager</b><br>
@@ -2810,6 +3388,10 @@
      Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
      <b>TestMiniDFSCluster fails on Windows</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4602">HDFS-4602</a>.
+     Major sub-task reported by Suresh Srinivas and fixed by Uma Maheswara Rao G <br>
+     <b>TestBookKeeperHACheckpoints fails</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4598">HDFS-4598</a>.
      Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)<br>
      <b>WebHDFS concat: the default value of sources in the code does not match the doc</b><br>
@@ -2894,20 +3476,24 @@
      Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
      <b>LightWeightGSet expects incrementing a volatile to be atomic</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4521">HDFS-4521</a>.
+     Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
+     <b>invalid network topologies should not be cached</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4519">HDFS-4519</a>.
      Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode , scripts)<br>
      <b>Support override of jsvc binary and log file locations when launching secure datanode.</b><br>
-     <blockquote>With this improvement the following options are available in release 1.2.0 and later on 1.x release stream:
-1. jsvc location can be overridden by setting environment variable JSVC_HOME. Defaults to jsvc binary packaged within the Hadoop distro.
-2. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.
-3. jsvc error output is directed to the file defined by JSVC_ERRFILE file.  Defaults to $HADOOP_LOG_DIR/jsvc.err.
-
-With this improvement the following options are available in release 2.0.4 and later on 2.x release stream:
-1. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.
-2. jsvc error output is directed to the file defined by JSVC_ERRFILE file.  Defaults to $HADOOP_LOG_DIR/jsvc.err.
-
-For overriding jsvc location on 2.x releases, here is the release notes from HDFS-2303:
-To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.
+     <blockquote>With this improvement the following options are available in release 1.2.0 and later on 1.x release stream:

+1. jsvc location can be overridden by setting environment variable JSVC_HOME. Defaults to jsvc binary packaged within the Hadoop distro.

+2. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.

+3. jsvc error output is directed to the file defined by JSVC_ERRFILE file.  Defaults to $HADOOP_LOG_DIR/jsvc.err.

+

+With this improvement the following options are available in release 2.0.4 and later on 2.x release stream:

+1. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.

+2. jsvc error output is directed to the file defined by JSVC_ERRFILE file.  Defaults to $HADOOP_LOG_DIR/jsvc.err.

+

+For overriding jsvc location on 2.x releases, here is the release notes from HDFS-2303:

+To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.

 </blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4518">HDFS-4518</a>.
      Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal <br>
@@ -2937,6 +3523,10 @@
      Major bug reported by Chris Nauroth and fixed by Chris Nauroth <br>
      <b>several HDFS tests attempt file operations on invalid HDFS paths when running on Windows</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4465">HDFS-4465</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Aaron T. Myers (datanode)<br>
+     <b>Optimize datanode ReplicasMap and ReplicaInfo</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4461">HDFS-4461</a>.
      Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
      <b>DirectoryScanner: volume path prefix takes up memory for every block that is scanned </b><br>
@@ -2949,6 +3539,18 @@
      Major bug reported by Ted Yu and fixed by Ted Yu <br>
      <b>Fix typo MAX_NOT_CHANGED_INTERATIONS</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4374">HDFS-4374</a>.
+     Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
+     <b>Display NameNode startup progress in UI</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4373">HDFS-4373</a>.
+     Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
+     <b>Add HTTP API for querying NameNode startup progress</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4372">HDFS-4372</a>.
+     Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
+     <b>Track NameNode startup progress</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4346">HDFS-4346</a>.
      Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)<br>
      <b>Refactor INodeId and GenerationStamp</b><br>
@@ -2961,6 +3563,10 @@
      Major sub-task reported by Brandon Li and fixed by Brandon Li (hdfs-client , namenode)<br>
      <b>Update addBlock() to inculde inode id as additional argument</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4339">HDFS-4339</a>.
+     Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)<br>
+     <b>Persist inode id in fsimage and editlog</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4334">HDFS-4334</a>.
      Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)<br>
      <b>Add a unique id to each INode</b><br>
@@ -2993,6 +3599,10 @@
      Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Junping Du (balancer)<br>
      <b>TestBalancerWithNodeGroup times out</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-4249">HDFS-4249</a>.
+     Major new feature reported by Suresh Srinivas and fixed by Chris Nauroth (namenode)<br>
+     <b>Add status NameNode startup to webUI </b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-4246">HDFS-4246</a>.
      Minor improvement reported by Harsh J and fixed by Harsh J (hdfs-client)<br>
      <b>The exclude node list should be more forgiving, for each output stream</b><br>
@@ -3065,6 +3675,10 @@
      Minor bug reported by Andy Isaacson and fixed by Colin Patrick McCabe <br>
      <b>duplicative dfs_hosts entries handled wrong</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-3880">HDFS-3880</a>.
+     Minor improvement reported by Brandon Li and fixed by Brandon Li (datanode , ha , namenode , security)<br>
+     <b>Use Builder to get RPC server in HDFS</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-3875">HDFS-3875</a>.
      Critical bug reported by Todd Lipcon and fixed by Kihwal Lee (datanode , hdfs-client)<br>
      <b>Issue handling checksum errors in write pipeline</b><br>
@@ -3085,6 +3699,10 @@
      Major new feature reported by Junping Du and fixed by Junping Du (namenode)<br>
      <b>Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network topology</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-3499">HDFS-3499</a>.
+     Major bug reported by Junping Du and fixed by Junping Du (datanode)<br>
+     <b>Make NetworkTopology support user specified topology class</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-3498">HDFS-3498</a>.
      Major improvement reported by Junping Du and fixed by Junping Du (namenode)<br>
      <b>Make Replica Removal Policy pluggable and ReplicaPlacementPolicyDefault extensible for reusing code in subclass</b><br>
@@ -3121,6 +3739,10 @@
      Trivial improvement reported by Harsh J and fixed by Harsh J (datanode)<br>
      <b>Unnecessary double-check in DN#getHostName</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HDFS-2042">HDFS-2042</a>.
+     Minor improvement reported by Eli Collins and fixed by  (libhdfs)<br>
+     <b>Require c99 when building libhdfs</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HDFS-1804">HDFS-1804</a>.
      Minor new feature reported by Harsh J and fixed by Aaron T. Myers (datanode)<br>
      <b>Add a new block-volume device choosing policy that looks at free space</b><br>
@@ -3129,6 +3751,118 @@
      Major improvement reported by George Porter and fixed by Colin Patrick McCabe (datanode , hdfs-client , performance)<br>
      <b>DFS read performance suboptimal when client co-located on nodes with data</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9792">HADOOP-9792</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)<br>
+     <b>Retry the methods that are tagged @AtMostOnce along with @Idempotent</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9786">HADOOP-9786</a>.
+     Major bug reported by Jing Zhao and fixed by Jing Zhao <br>
+     <b>RetryInvocationHandler#isRpcInvocation should support ProtocolTranslator </b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9773">HADOOP-9773</a>.
+     Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)<br>
+     <b>TestLightWeightCache fails</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9770">HADOOP-9770</a>.
+     Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)<br>
+     <b>Make RetryCache#state non volatile</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9763">HADOOP-9763</a>.
+     Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (util)<br>
+     <b>Extends LightWeightGSet to support eviction of expired elements</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9762">HADOOP-9762</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)<br>
+     <b>RetryCache utility for implementing RPC retries</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9760">HADOOP-9760</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)<br>
+     <b>Move GSet and LightWeightGSet to hadoop-common</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9759">HADOOP-9759</a>.
+     Critical bug reported by Chuan Liu and fixed by Chuan Liu <br>
+     <b>Add support for NativeCodeLoader#getLibraryName on Windows</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9756">HADOOP-9756</a>.
+     Minor improvement reported by Junping Du and fixed by Junping Du (ipc)<br>
+     <b>Additional cleanup RPC code</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9754">HADOOP-9754</a>.
+     Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)<br>
+     <b>Clean up RPC code</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9751">HADOOP-9751</a>.
+     Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)<br>
+     <b>Add clientId and retryCount to RpcResponseHeaderProto</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9738">HADOOP-9738</a>.
+     Major bug reported by Kihwal Lee and fixed by Jing Zhao (tools)<br>
+     <b>TestDistCh fails</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9734">HADOOP-9734</a>.
+     Minor improvement reported by Jason Lowe and fixed by Jason Lowe (ipc)<br>
+     <b>Common protobuf definitions for GetUserMappingsProtocol, RefreshAuthorizationPolicyProtocol and RefreshUserMappingsProtocol</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9720">HADOOP-9720</a>.
+     Major sub-task reported by Suresh Srinivas and fixed by Arpit Agarwal <br>
+     <b>Rename Client#uuid to Client#clientId</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9717">HADOOP-9717</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Jing Zhao (ipc)<br>
+     <b>Add retry attempt count to the RPC requests</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9716">HADOOP-9716</a>.
+     Major improvement reported by Suresh Srinivas and fixed by Tsz Wo (Nicholas), SZE (ipc)<br>
+     <b>Move the Rpc request call ID generation to client side InvocationHandler</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9707">HADOOP-9707</a>.
+     Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (util)<br>
+     <b>Fix register lists for crc32c inline assembly</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9701">HADOOP-9701</a>.
+     Minor bug reported by Steve Loughran and fixed by Karthik Kambatla (documentation)<br>
+     <b>mvn site ambiguous links in hadoop-common</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9698">HADOOP-9698</a>.
+     Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
+     <b>RPCv9 client must honor server's SASL negotiate response</b><br>
+     <blockquote>The RPC client now waits for the Server's SASL negotiate response before instantiating its SASL client.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9691">HADOOP-9691</a>.
+     Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (ipc)<br>
+     <b>RPC clients can generate call ID using AtomicInteger instead of synchronizing on the Client instance.</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9688">HADOOP-9688</a>.
+     Blocker improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)<br>
+     <b>Add globally unique Client ID to RPC requests</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9683">HADOOP-9683</a>.
+     Blocker sub-task reported by Luke Lu and fixed by Daryn Sharp (ipc)<br>
+     <b>Wrap IpcConnectionContext in RPC headers</b><br>
+     <blockquote>Connection context is now sent as a rpc header wrapped protobuf.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9681">HADOOP-9681</a>.
+     Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
+     <b>FileUtil.unTarUsingJava() should close the InputStream upon finishing</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9678">HADOOP-9678</a>.
+     Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
+     <b>TestRPC#testStopsAllThreads intermittently fails on Windows</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9676">HADOOP-9676</a>.
+     Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
+     <b>make maximum RPC buffer size configurable</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9673">HADOOP-9673</a>.
+     Trivial improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (net)<br>
+     <b>NetworkTopology: when a node can't be added, print out its location for diagnostic purposes</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9665">HADOOP-9665</a>.
+     Critical bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
+     <b>BlockDecompressorStream#decompress will throw EOFException instead of return -1 when EOF</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9661">HADOOP-9661</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (metrics)<br>
+     <b>Allow metrics sources to be extended</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9656">HADOOP-9656</a>.
      Minor bug reported by Chuan Liu and fixed by Chuan Liu (test , tools)<br>
      <b>Gridmix unit tests fail on Windows and Linux</b><br>
@@ -3137,6 +3871,10 @@
      Blocker improvement reported by Zhijie Shen and fixed by Zhijie Shen <br>
      <b>Promote YARN service life-cycle libraries into Hadoop Common</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9643">HADOOP-9643</a>.
+     Minor bug reported by Mark Miller and fixed by Mark Miller (security)<br>
+     <b>org.apache.hadoop.security.SecurityUtil calls toUpperCase(Locale.getDefault()) as well as toLowerCase(Locale.getDefault()) on hadoop.security.authentication value.</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9638">HADOOP-9638</a>.
      Major bug reported by Chris Nauroth and fixed by Andrey Klochkov (test)<br>
      <b>parallel test changes caused invalid test path for several HDFS tests on Windows</b><br>
@@ -3241,10 +3979,26 @@
      Blocker bug reported by Arun C Murthy and fixed by Karthik Kambatla (documentation)<br>
      <b>Document Hadoop Compatibility</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9515">HADOOP-9515</a>.
+     Major new feature reported by Brandon Li and fixed by Brandon Li <br>
+     <b>Add general interface for NFS and Mount</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9511">HADOOP-9511</a>.
      Major improvement reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
      <b>Adding support for additional input streams (FSDataInputStream and RandomAccessFile) in SecureIOUtils.</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9509">HADOOP-9509</a>.
+     Major new feature reported by Brandon Li and fixed by Brandon Li <br>
+     <b>Implement ONCRPC and XDR</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9507">HADOOP-9507</a>.
+     Minor bug reported by Mostafa Elhemali and fixed by Chris Nauroth (fs)<br>
+     <b>LocalFileSystem rename() is broken in some cases when destination exists</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9504">HADOOP-9504</a>.
+     Critical bug reported by Liang Xie and fixed by Liang Xie (metrics)<br>
+     <b>MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9503">HADOOP-9503</a>.
      Minor improvement reported by Varun Sharma and fixed by Varun Sharma (ipc)<br>
      <b>Remove sleep between IPC client connect timeouts</b><br>
@@ -3309,6 +4063,10 @@
      Major bug reported by Chuan Liu and fixed by Chuan Liu <br>
      <b>Port winutils static code analysis change to trunk</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9439">HADOOP-9439</a>.
+     Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)<br>
+     <b>JniBasedUnixGroupsMapping: fix some crash bugs</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9437">HADOOP-9437</a>.
      Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)<br>
      <b>TestNativeIO#testRenameTo fails on Windows due to assumption that POSIX errno is embedded in NativeIOException</b><br>
@@ -3328,9 +4086,21 @@
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9421">HADOOP-9421</a>.
      Blocker sub-task reported by Sanjay Radia and fixed by Daryn Sharp <br>
      <b>Convert SASL to use ProtoBuf and provide negotiation capabilities</b><br>
-     <blockquote>Raw SASL protocol now uses protobufs wrapped with RPC headers.
-The negotiation sequence incorporates the state of the exchange.
+     <blockquote>Raw SASL protocol now uses protobufs wrapped with RPC headers.

+The negotiation sequence incorporates the state of the exchange.

 The server now has the ability to advertise its supported auth types.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9418">HADOOP-9418</a>.
+     Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
+     <b>Add symlink resolution support to DistributedFileSystem</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9416">HADOOP-9416</a>.
+     Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
+     <b>Add new symlink resolution methods in FileSystem and FileSystemLinkResolver</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9414">HADOOP-9414</a>.
+     Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
+     <b>Refactor out FSLinkResolver and relevant helper methods</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9413">HADOOP-9413</a>.
      Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
      <b>Introduce common utils for File#setReadable/Writable/Executable and File#canRead/Write/Execute that work cross-platform</b><br>
@@ -3395,6 +4165,10 @@
      Major bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc , security)<br>
      <b>"Auth failed" log should include exception string</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9355">HADOOP-9355</a>.
+     Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)<br>
+     <b>Abstract symlink tests to use either FileContext or FileSystem</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9353">HADOOP-9353</a>.
      Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (build)<br>
      <b>Activate native-win profile by default on Windows</b><br>
@@ -3447,6 +4221,10 @@
      Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe <br>
      <b>when exiting on a signal, print the signal name first</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9307">HADOOP-9307</a>.
+     Major bug reported by Todd Lipcon and fixed by Todd Lipcon (fs)<br>
+     <b>BufferedFSInputStream.read returns wrong results after certain seeks</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9305">HADOOP-9305</a>.
      Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (security)<br>
      <b>Add support for running the Hadoop client on 64-bit AIX</b><br>
@@ -3539,6 +4317,10 @@
      Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (fs , tools)<br>
      <b>Add shell command to dump file checksums</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-9164">HADOOP-9164</a>.
+     Minor improvement reported by Binglin Chang and fixed by Binglin Chang (native)<br>
+     <b>Print paths of loaded native libraries in NativeLibraryChecker</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-9163">HADOOP-9163</a>.
      Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
      <b>The rpc msg in  ProtobufRpcEngine.proto should be moved out to avoid an extra copy</b><br>
@@ -3627,6 +4409,10 @@
      Major improvement reported by Govind Kamat and fixed by Govind Kamat (io)<br>
      <b>Native-code implementation of bzip2 codec</b><br>
      <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/HADOOP-8440">HADOOP-8440</a>.
+     Minor bug reported by Ivan Mitic and fixed by Ivan Mitic (fs)<br>
+     <b>HarFileSystem.decodeHarURI fails for URIs whose host contains numbers</b><br>
+     <blockquote></blockquote></li>
 <li> <a href="https://issues.apache.org/jira/browse/HADOOP-8415">HADOOP-8415</a>.
      Minor improvement reported by Jan van der Lugt and fixed by Jan van der Lugt (conf)<br>
      <b>getDouble() and setDouble() in org.apache.hadoop.conf.Configuration</b><br>
commit	fcf7243c4ec7350f373a1cc1f946a4008f1d4bbd	[log] [tgz]
author	Arun Murthy <acmurthy@apache.org>	Tue Jul 30 12:56:09 2013 +0000
committer	Arun Murthy <acmurthy@apache.org>	Tue Jul 30 12:56:09 2013 +0000
tree	1c38ca385ef6c445e11c5d1bbfabd319b5cca173
parent	4c13ff730dc64933d70c338e781eae4488f25617 [diff]