| ~~ Licensed under the Apache License, Version 2.0 (the "License"); |
| ~~ you may not use this file except in compliance with the License. |
| ~~ You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. See accompanying LICENSE file. |
| |
| --- |
| YARN |
| --- |
| --- |
| ${maven.build.timestamp} |
| |
| Apache Hadoop NextGen MapReduce (YARN) |
| |
| MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, |
| what we call, MapReduce 2.0 (MRv2) or YARN. |
| |
| The fundamental idea of MRv2 is to split up the two major functionalities of |
| the JobTracker, resource management and job scheduling/monitoring, into |
| separate daemons. The idea is to have a global ResourceManager (<RM>) and |
| per-application ApplicationMaster (<AM>). An application is either a single |
| job in the classical sense of Map-Reduce jobs or a DAG of jobs. |
| |
| The ResourceManager and per-node slave, the NodeManager (<NM>), form the |
| data-computation framework. The ResourceManager is the ultimate authority that |
| arbitrates resources among all the applications in the system. |
| |
| The per-application ApplicationMaster is, in effect, a framework specific |
| library and is tasked with negotiating resources from the ResourceManager and |
| working with the NodeManager(s) to execute and monitor the tasks. |
| |
| [./yarn_architecture.gif] MapReduce NextGen Architecture |
| |
| The ResourceManager has two main components: Scheduler and |
| ApplicationsManager. |
| |
| The Scheduler is responsible for allocating resources to the various running |
| applications subject to familiar constraints of capacities, queues etc. The |
| Scheduler is pure scheduler in the sense that it performs no monitoring or |
| tracking of status for the application. Also, it offers no guarantees about |
| restarting failed tasks either due to application failure or hardware |
| failures. The Scheduler performs its scheduling function based the resource |
| requirements of the applications; it does so based on the abstract notion of |
| a resource <Container> which incorporates elements such as memory, cpu, disk, |
| network etc. In the first version, only <<<memory>>> is supported. |
| |
| The Scheduler has a pluggable policy plug-in, which is responsible for |
| partitioning the cluster resources among the various queues, applications etc. |
| The current Map-Reduce schedulers such as the CapacityScheduler and the |
| FairScheduler would be some examples of the plug-in. |
| |
| The CapacityScheduler supports <<<hierarchical queues>>> to allow for more |
| predictable sharing of cluster resources |
| |
| The ApplicationsManager is responsible for accepting job-submissions, |
| negotiating the first container for executing the application specific |
| ApplicationMaster and provides the service for restarting the |
| ApplicationMaster container on failure. |
| |
| The NodeManager is the per-machine framework agent who is responsible for |
| containers, monitoring their resource usage (cpu, memory, disk, network) and |
| reporting the same to the ResourceManager/Scheduler. |
| |
| The per-application ApplicationMaster has the responsibility of negotiating |
| appropriate resource containers from the Scheduler, tracking their status and |
| monitoring for progress. |
| |
| MRV2 maintains <<API compatibility>> with previous stable release |
| (hadoop-0.20.205). This means that all Map-Reduce jobs should still run |
| unchanged on top of MRv2 with just a recompile. |
| |