| ~~ Licensed under the Apache License, Version 2.0 (the "License"); |
| ~~ you may not use this file except in compliance with the License. |
| ~~ You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. See accompanying LICENSE file. |
| |
| --- |
| Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster. |
| --- |
| --- |
| ${maven.build.timestamp} |
| |
| Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. |
| |
| \[ {{{./index.html}Go Back}} \] |
| |
| %{toc|section=1|fromDepth=0} |
| |
| * Mapreduce Tarball |
| |
| You should be able to obtain the MapReduce tarball from the release. |
| If not, you should be able to create a tarball from the source. |
| |
| +---+ |
| $ mvn clean install -DskipTests |
| $ cd hadoop-mapreduce-project |
| $ mvn clean install assembly:assembly -Pnative |
| +---+ |
| <<NOTE:>> You will need protoc installed of version 2.4.1 or greater. |
| |
| To ignore the native builds in mapreduce you can omit the <<<-Pnative>>> argument |
| for maven. The tarball should be available in <<<target/>>> directory. |
| |
| |
| * Setting up the environment. |
| |
| Assuming you have installed hadoop-common/hadoop-hdfs and exported |
| <<$HADOOP_COMMON_HOME>>/<<$HADOOP_HDFS_HOME>>, untar hadoop mapreduce |
| tarball and set environment variable <<$HADOOP_MAPRED_HOME>> to the |
| untarred directory. Set <<$YARN_HOME>> the same as <<$HADOOP_MAPRED_HOME>>. |
| |
| <<NOTE:>> The following instructions assume you have hdfs running. |
| |
| * Setting up Configuration. |
| |
| To start the ResourceManager and NodeManager, you will have to update the configs. |
| Assuming your $HADOOP_CONF_DIR is the configuration directory and has the installed |
| configs for HDFS and <<<core-site.xml>>>. There are 2 config files you will have to setup |
| <<<mapred-site.xml>>> and <<<yarn-site.xml>>>. |
| |
| ** Setting up <<<mapred-site.xml>>> |
| |
| Add the following configs to your <<<mapred-site.xml>>>. |
| |
| +---+ |
| <property> |
| <name>mapreduce.cluster.temp.dir</name> |
| <value></value> |
| <description>No description</description> |
| <final>true</final> |
| </property> |
| |
| <property> |
| <name>mapreduce.cluster.local.dir</name> |
| <value></value> |
| <description>No description</description> |
| <final>true</final> |
| </property> |
| +---+ |
| |
| ** Setting up <<<yarn-site.xml>>> |
| |
| Add the following configs to your <<<yarn-site.xml>>> |
| |
| +---+ |
| <property> |
| <name>yarn.resourcemanager.resource-tracker.address</name> |
| <value>host:port</value> |
| <description>host is the hostname of the resource manager and |
| port is the port on which the NodeManagers contact the Resource Manager. |
| </description> |
| </property> |
| |
| <property> |
| <name>yarn.resourcemanager.scheduler.address</name> |
| <value>host:port</value> |
| <description>host is the hostname of the resourcemanager and port is the port |
| on which the Applications in the cluster talk to the Resource Manager. |
| </description> |
| </property> |
| |
| <property> |
| <name>yarn.resourcemanager.scheduler.class</name> |
| <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> |
| <description>In case you do not want to use the default scheduler</description> |
| </property> |
| |
| <property> |
| <name>yarn.resourcemanager.address</name> |
| <value>host:port</value> |
| <description>the host is the hostname of the ResourceManager and the port is the port on |
| which the clients can talk to the Resource Manager. </description> |
| </property> |
| |
| <property> |
| <name>yarn.nodemanager.local-dirs</name> |
| <value></value> |
| <description>the local directories used by the nodemanager</description> |
| </property> |
| |
| <property> |
| <name>yarn.nodemanager.address</name> |
| <value>0.0.0.0:port</value> |
| <description>the nodemanagers bind to this port</description> |
| </property> |
| |
| <property> |
| <name>yarn.nodemanager.resource.memory-mb</name> |
| <value>10240</value> |
| <description>the amount of memory on the NodeManager in GB</description> |
| </property> |
| |
| <property> |
| <name>yarn.nodemanager.remote-app-log-dir</name> |
| <value>/app-logs</value> |
| <description>directory on hdfs where the application logs are moved to </description> |
| </property> |
| |
| <property> |
| <name>yarn.nodemanager.log-dirs</name> |
| <value></value> |
| <description>the directories used by Nodemanagers as log directories</description> |
| </property> |
| |
| <property> |
| <name>yarn.nodemanager.aux-services</name> |
| <value>mapreduce.shuffle</value> |
| <description>shuffle service that needs to be set for Map Reduce to run </description> |
| </property> |
| +---+ |
| |
| * Setting up <<<capacity-scheduler.xml>>> |
| |
| Make sure you populate the root queues in <<<capacity-scheduler.xml>>>. |
| |
| +---+ |
| <property> |
| <name>yarn.scheduler.capacity.root.queues</name> |
| <value>unfunded,default</value> |
| </property> |
| |
| <property> |
| <name>yarn.scheduler.capacity.root.capacity</name> |
| <value>100</value> |
| </property> |
| |
| <property> |
| <name>yarn.scheduler.capacity.root.unfunded.capacity</name> |
| <value>50</value> |
| </property> |
| |
| <property> |
| <name>yarn.scheduler.capacity.root.default.capacity</name> |
| <value>50</value> |
| </property> |
| +---+ |
| |
| * Running daemons. |
| |
| Assuming that the environment variables <<$HADOOP_COMMON_HOME>>, <<$HADOOP_HDFS_HOME>>, <<$HADOO_MAPRED_HOME>>, |
| <<$YARN_HOME>>, <<$JAVA_HOME>> and <<$HADOOP_CONF_DIR>> have been set appropriately. |
| Set $<<$YARN_CONF_DIR>> the same as $<<HADOOP_CONF_DIR>> |
| |
| Run ResourceManager and NodeManager as: |
| |
| +---+ |
| $ cd $HADOOP_MAPRED_HOME |
| $ sbin/yarn-daemon.sh start resourcemanager |
| $ sbin/yarn-daemon.sh start nodemanager |
| +---+ |
| |
| You should be up and running. You can run randomwriter as: |
| |
| +---+ |
| $ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out |
| +---+ |
| |
| Good luck. |