| <!--- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. See accompanying LICENSE file. |
| --> |
| |
| Apache Hadoop ${project.version} |
| ================================ |
| |
| Apache Hadoop ${project.version} consists of significant |
| improvements over the previous stable release (hadoop-1.x). |
| |
| Here is a short overview of the improvments to both HDFS and MapReduce. |
| |
| * HDFS Federation |
| |
| In order to scale the name service horizontally, federation uses |
| multiple independent Namenodes/Namespaces. The Namenodes are |
| federated, that is, the Namenodes are independent and don't require |
| coordination with each other. The datanodes are used as common storage |
| for blocks by all the Namenodes. Each datanode registers with all the |
| Namenodes in the cluster. Datanodes send periodic heartbeats and block |
| reports and handles commands from the Namenodes. |
| |
| More details are available in the |
| [HDFS Federation](./hadoop-project-dist/hadoop-hdfs/Federation.html) |
| document. |
| |
| * MapReduce NextGen aka YARN aka MRv2 |
| |
| The new architecture introduced in hadoop-0.23, divides the two major |
| functions of the JobTracker: resource management and job life-cycle |
| management into separate components. |
| |
| The new ResourceManager manages the global assignment of compute |
| resources to applications and the per-application |
| ApplicationMaster manages the application‚ scheduling and |
| coordination. |
| |
| An application is either a single job in the sense of classic |
| MapReduce jobs or a DAG of such jobs. |
| |
| The ResourceManager and per-machine NodeManager daemon, which |
| manages the user processes on that machine, form the computation |
| fabric. |
| |
| The per-application ApplicationMaster is, in effect, a framework |
| specific library and is tasked with negotiating resources from the |
| ResourceManager and working with the NodeManager(s) to execute and |
| monitor the tasks. |
| |
| More details are available in the |
| [YARN](./hadoop-yarn/hadoop-yarn-site/YARN.html) document. |
| |
| Getting Started |
| =============== |
| |
| The Hadoop documentation includes the information you need to get started using |
| Hadoop. Begin with the |
| [Single Node Setup](./hadoop-project-dist/hadoop-common/SingleCluster.html) |
| which shows you how to set up a single-node Hadoop installation. |
| Then move on to the |
| [Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html) |
| to learn how to set up a multi-node Hadoop installation. |