hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm - hadoop - Git at Google

 ~~ Licensed under the Apache License, Version 2.0 (the "License");
 ~~ you may not use this file except in compliance with the License.
 ~~ You may obtain a copy of the License at
 ~~
 ~~   http://www.apache.org/licenses/LICENSE-2.0
 ~~
 ~~ Unless required by applicable law or agreed to in writing, software
 ~~ distributed under the License is distributed on an "AS IS" BASIS,
 ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ~~ See the License for the specific language governing permissions and
 ~~ limitations under the License. See accompanying LICENSE file.

   ---
   Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster.
   ---
   ---
   ${maven.build.timestamp}

 Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

   \[ {{{./index.html}Go Back}} \]

 %{toc|section=1|fromDepth=0}

 * Mapreduce Tarball

   You should be able to obtain the MapReduce tarball from the release.
   If not, you should be able to create a tarball from the source.

 +---+
 $ mvn clean install -DskipTests
 $ cd hadoop-mapreduce-project
 $ mvn clean install assembly:assembly -Pnative
 +---+
   <<NOTE:>> You will need protoc installed of version 2.4.1 or greater.

   To ignore the native builds in mapreduce you can omit the <<<-Pnative>>> argument
   for maven. The tarball should be available in <<<target/>>> directory.


 * Setting up the environment.

   Assuming you have installed hadoop-common/hadoop-hdfs and exported
   <<$HADOOP_COMMON_HOME>>/<<$HADOOP_HDFS_HOME>>, untar hadoop mapreduce
   tarball and set environment variable <<$HADOOP_MAPRED_HOME>> to the
   untarred directory. Set <<$YARN_HOME>> the same as <<$HADOOP_MAPRED_HOME>>.

   <<NOTE:>> The following instructions assume you have hdfs running.

 * Setting up Configuration.

   To start the ResourceManager and NodeManager, you will have to update the configs.
   Assuming your $HADOOP_CONF_DIR is the configuration directory and has the installed
   configs for HDFS and <<<core-site.xml>>>. There are 2 config files you will have to setup
   <<<mapred-site.xml>>> and <<<yarn-site.xml>>>.

 ** Setting up <<<mapred-site.xml>>>

   Add the following configs to your <<<mapred-site.xml>>>.

 +---+
   <property>
     <name>mapreduce.cluster.temp.dir</name>
     <value></value>
     <description>No description</description>
     <final>true</final>
   </property>

   <property>
     <name>mapreduce.cluster.local.dir</name>
     <value></value>
     <description>No description</description>
     <final>true</final>
   </property>
 +---+

 ** Setting up <<<yarn-site.xml>>>

 Add the following configs to your <<<yarn-site.xml>>>

 +---+
   <property>
     <name>yarn.resourcemanager.resource-tracker.address</name>
     <value>host:port</value>
     <description>host is the hostname of the resource manager and
     port is the port on which the NodeManagers contact the Resource Manager.
     </description>
   </property>

   <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>host:port</value>
     <description>host is the hostname of the resourcemanager and port is the port
     on which the Applications in the cluster talk to the Resource Manager.
     </description>
   </property>

   <property>
     <name>yarn.resourcemanager.scheduler.class</name>
     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
     <description>In case you do not want to use the default scheduler</description>
   </property>

   <property>
     <name>yarn.resourcemanager.address</name>
     <value>host:port</value>
     <description>the host is the hostname of the ResourceManager and the port is the port on
     which the clients can talk to the Resource Manager. </description>
   </property>

   <property>
     <name>yarn.nodemanager.local-dirs</name>
     <value></value>
     <description>the local directories used by the nodemanager</description>
   </property>

   <property>
     <name>yarn.nodemanager.address</name>
     <value>0.0.0.0:port</value>
     <description>the nodemanagers bind to this port</description>
   </property>

   <property>
     <name>yarn.nodemanager.resource.memory-mb</name>
     <value>10240</value>
     <description>the amount of memory on the NodeManager in GB</description>
   </property>

   <property>
     <name>yarn.nodemanager.remote-app-log-dir</name>
     <value>/app-logs</value>
     <description>directory on hdfs where the application logs are moved to </description>
   </property>

    <property>
     <name>yarn.nodemanager.log-dirs</name>
     <value></value>
     <description>the directories used by Nodemanagers as log directories</description>
   </property>

   <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce.shuffle</value>
     <description>shuffle service that needs to be set for Map Reduce to run </description>
   </property>
 +---+

 * Setting up <<<capacity-scheduler.xml>>>

    Make sure you populate the root queues in <<<capacity-scheduler.xml>>>.

 +---+
   <property>
     <name>yarn.scheduler.capacity.root.queues</name>
     <value>unfunded,default</value>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.capacity</name>
     <value>100</value>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.unfunded.capacity</name>
     <value>50</value>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.capacity</name>
     <value>50</value>
   </property>
 +---+

 * Running daemons.

   Assuming that the environment variables <<$HADOOP_COMMON_HOME>>, <<$HADOOP_HDFS_HOME>>, <<$HADOO_MAPRED_HOME>>,
   <<$YARN_HOME>>, <<$JAVA_HOME>> and <<$HADOOP_CONF_DIR>> have been set appropriately.
   Set $<<$YARN_CONF_DIR>> the same as $<<HADOOP_CONF_DIR>>

   Run ResourceManager and NodeManager as:

 +---+
 $ cd $HADOOP_MAPRED_HOME
 $ sbin/yarn-daemon.sh start resourcemanager
 $ sbin/yarn-daemon.sh start nodemanager
 +---+

   You should be up and running. You can run randomwriter as:

 +---+
 $ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out
 +---+

 Good luck.
	~~ Licensed under the Apache License, Version 2.0 (the "License");
	~~ you may not use this file except in compliance with the License.
	~~ You may obtain a copy of the License at
	~~
	~~ http://www.apache.org/licenses/LICENSE-2.0
	~~
	~~ Unless required by applicable law or agreed to in writing, software
	~~ distributed under the License is distributed on an "AS IS" BASIS,
	~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	~~ See the License for the specific language governing permissions and
	~~ limitations under the License. See accompanying LICENSE file.

	---
	Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster.
	---
	---
	${maven.build.timestamp}

	Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

	\[ {{{./index.html}Go Back}} \]

	%{toc\|section=1\|fromDepth=0}

	* Mapreduce Tarball

	You should be able to obtain the MapReduce tarball from the release.
	If not, you should be able to create a tarball from the source.

	+---+
	$ mvn clean install -DskipTests
	$ cd hadoop-mapreduce-project
	$ mvn clean install assembly:assembly -Pnative
	+---+
	<<NOTE:>> You will need protoc installed of version 2.4.1 or greater.

	To ignore the native builds in mapreduce you can omit the <<<-Pnative>>> argument
	for maven. The tarball should be available in <<<target/>>> directory.


	* Setting up the environment.

	Assuming you have installed hadoop-common/hadoop-hdfs and exported
	<<$HADOOP_COMMON_HOME>>/<<$HADOOP_HDFS_HOME>>, untar hadoop mapreduce
	tarball and set environment variable <<$HADOOP_MAPRED_HOME>> to the
	untarred directory. Set <<$YARN_HOME>> the same as <<$HADOOP_MAPRED_HOME>>.

	<<NOTE:>> The following instructions assume you have hdfs running.

	* Setting up Configuration.

	To start the ResourceManager and NodeManager, you will have to update the configs.
	Assuming your $HADOOP_CONF_DIR is the configuration directory and has the installed
	configs for HDFS and <<<core-site.xml>>>. There are 2 config files you will have to setup
	<<<mapred-site.xml>>> and <<<yarn-site.xml>>>.

	** Setting up <<<mapred-site.xml>>>

	Add the following configs to your <<<mapred-site.xml>>>.

	+---+
	<property>
	<name>mapreduce.cluster.temp.dir</name>
	<value></value>
	<description>No description</description>
	<final>true</final>
	</property>

	<property>
	<name>mapreduce.cluster.local.dir</name>
	<value></value>
	<description>No description</description>
	<final>true</final>
	</property>
	+---+

	** Setting up <<<yarn-site.xml>>>

	Add the following configs to your <<<yarn-site.xml>>>

	+---+
	<property>
	<name>yarn.resourcemanager.resource-tracker.address</name>
	<value>host:port</value>
	<description>host is the hostname of the resource manager and
	port is the port on which the NodeManagers contact the Resource Manager.
	</description>
	</property>

	<property>
	<name>yarn.resourcemanager.scheduler.address</name>
	<value>host:port</value>
	<description>host is the hostname of the resourcemanager and port is the port
	on which the Applications in the cluster talk to the Resource Manager.
	</description>
	</property>

	<property>
	<name>yarn.resourcemanager.scheduler.class</name>
	<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
	<description>In case you do not want to use the default scheduler</description>
	</property>

	<property>
	<name>yarn.resourcemanager.address</name>
	<value>host:port</value>
	<description>the host is the hostname of the ResourceManager and the port is the port on
	which the clients can talk to the Resource Manager. </description>
	</property>

	<property>
	<name>yarn.nodemanager.local-dirs</name>
	<value></value>
	<description>the local directories used by the nodemanager</description>
	</property>

	<property>
	<name>yarn.nodemanager.address</name>
	<value>0.0.0.0:port</value>
	<description>the nodemanagers bind to this port</description>
	</property>

	<property>
	<name>yarn.nodemanager.resource.memory-mb</name>
	<value>10240</value>
	<description>the amount of memory on the NodeManager in GB</description>
	</property>

	<property>
	<name>yarn.nodemanager.remote-app-log-dir</name>
	<value>/app-logs</value>
	<description>directory on hdfs where the application logs are moved to </description>
	</property>

	<property>
	<name>yarn.nodemanager.log-dirs</name>
	<value></value>
	<description>the directories used by Nodemanagers as log directories</description>
	</property>

	<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce.shuffle</value>
	<description>shuffle service that needs to be set for Map Reduce to run </description>
	</property>
	+---+

	* Setting up <<<capacity-scheduler.xml>>>

	Make sure you populate the root queues in <<<capacity-scheduler.xml>>>.

	+---+
	<property>
	<name>yarn.scheduler.capacity.root.queues</name>
	<value>unfunded,default</value>
	</property>

	<property>
	<name>yarn.scheduler.capacity.root.capacity</name>
	<value>100</value>
	</property>

	<property>
	<name>yarn.scheduler.capacity.root.unfunded.capacity</name>
	<value>50</value>
	</property>

	<property>
	<name>yarn.scheduler.capacity.root.default.capacity</name>
	<value>50</value>
	</property>
	+---+

	* Running daemons.

	Assuming that the environment variables <<$HADOOP_COMMON_HOME>>, <<$HADOOP_HDFS_HOME>>, <<$HADOO_MAPRED_HOME>>,
	<<$YARN_HOME>>, <<$JAVA_HOME>> and <<$HADOOP_CONF_DIR>> have been set appropriately.
	Set $<<$YARN_CONF_DIR>> the same as $<<HADOOP_CONF_DIR>>

	Run ResourceManager and NodeManager as:

	+---+
	$ cd $HADOOP_MAPRED_HOME
	$ sbin/yarn-daemon.sh start resourcemanager
	$ sbin/yarn-daemon.sh start nodemanager
	+---+

	You should be up and running. You can run randomwriter as:

	+---+
	$ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out
	+---+

	Good luck.