hadoop-mapreduce-project/INSTALL - hadoop - Git at Google

 To compile  Hadoop Mapreduce next following, do the following:

 Step 1) Install dependencies for yarn

 See http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-porject/hadoop-yarn/README
 Make sure protbuf library is in your library path or set: export LD_LIBRARY_PATH=/usr/local/lib

 Step 2) Checkout

 svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk

 Step 3) Build

 Go to common directory - choose your regular common build command. For example:

 export MAVEN_OPTS=-Xmx512m
 mvn clean package -Pdist -Dtar -DskipTests -Pnative

 You can omit -Pnative it you don't want to build native packages.

 Step 4) Untar the tarball from hadoop-dist/target/ into a clean and different
 directory, say YARN_HOME.

 Step 5)
 Start hdfs

 To run Hadoop Mapreduce next applications:

 Step 6) export the following variables to where you have things installed:
 You probably want to export these in hadoop-env.sh and yarn-env.sh also.

 export HADOOP_MAPRED_HOME=<mapred loc>
 export HADOOP_COMMON_HOME=<common loc>
 export HADOOP_HDFS_HOME=<hdfs loc>
 export YARN_HOME=directory where you untarred yarn
 export HADOOP_CONF_DIR=<conf loc>
 export YARN_CONF_DIR=$HADOOP_CONF_DIR

 Step 7) Setup config: for running mapreduce applications, which now are in user land, you need to setup nodemanager with the following configuration in your yarn-site.xml before you start the nodemanager.
     <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce.shuffle</value>
     </property>

     <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
     </property>

 Step 8) Modify mapred-site.xml to use yarn framework
     <property>
       <name> mapreduce.framework.name</name>
       <value>yarn</value>
     </property>

 Step 9) cd $YARN_HOME

 Step 10) sbin/yarn-daemon.sh start resourcemanager

 Step 11) sbin/yarn-daemon.sh start nodemanager

 Step 12) sbin/mr-jobhistory-daemon.sh start historyserver

 Step 13) You are all set, an example on how to run a mapreduce job is:
 cd $HADOOP_MAPRED_HOME
 ant examples -Dresolvers=internal
 $HADOOP_COMMON_HOME/bin/hadoop jar $HADOOP_MAPRED_HOME/build/hadoop-mapreduce-examples-*.jar randomwriter -Dmapreduce.job.user.name=$USER -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars $YARN_HOME/modules/hadoop-mapreduce-client-jobclient-*.jar output

 The output on the command line should be almost similar to what you see in the JT/TT setup (Hadoop 0.20/0.21)
	To compile Hadoop Mapreduce next following, do the following:

	Step 1) Install dependencies for yarn

	See http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-porject/hadoop-yarn/README
	Make sure protbuf library is in your library path or set: export LD_LIBRARY_PATH=/usr/local/lib

	Step 2) Checkout

	svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk

	Step 3) Build

	Go to common directory - choose your regular common build command. For example:

	export MAVEN_OPTS=-Xmx512m
	mvn clean package -Pdist -Dtar -DskipTests -Pnative

	You can omit -Pnative it you don't want to build native packages.

	Step 4) Untar the tarball from hadoop-dist/target/ into a clean and different
	directory, say YARN_HOME.

	Step 5)
	Start hdfs

	To run Hadoop Mapreduce next applications:

	Step 6) export the following variables to where you have things installed:
	You probably want to export these in hadoop-env.sh and yarn-env.sh also.

	export HADOOP_MAPRED_HOME=<mapred loc>
	export HADOOP_COMMON_HOME=<common loc>
	export HADOOP_HDFS_HOME=<hdfs loc>
	export YARN_HOME=directory where you untarred yarn
	export HADOOP_CONF_DIR=<conf loc>
	export YARN_CONF_DIR=$HADOOP_CONF_DIR

	Step 7) Setup config: for running mapreduce applications, which now are in user land, you need to setup nodemanager with the following configuration in your yarn-site.xml before you start the nodemanager.
	<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce.shuffle</value>
	</property>

	<property>
	<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>

	Step 8) Modify mapred-site.xml to use yarn framework
	<property>
	<name> mapreduce.framework.name</name>
	<value>yarn</value>
	</property>

	Step 9) cd $YARN_HOME

	Step 10) sbin/yarn-daemon.sh start resourcemanager

	Step 11) sbin/yarn-daemon.sh start nodemanager

	Step 12) sbin/mr-jobhistory-daemon.sh start historyserver

	Step 13) You are all set, an example on how to run a mapreduce job is:
	cd $HADOOP_MAPRED_HOME
	ant examples -Dresolvers=internal
	$HADOOP_COMMON_HOME/bin/hadoop jar $HADOOP_MAPRED_HOME/build/hadoop-mapreduce-examples-.jar randomwriter -Dmapreduce.job.user.name=$USER -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars $YARN_HOME/modules/hadoop-mapreduce-client-jobclient-.jar output

	The output on the command line should be almost similar to what you see in the JT/TT setup (Hadoop 0.20/0.21)