src/site/confluence/contrib/python/running-mapreduce-jobs.confluence - whirr - Git at Google

 h1. Running MapReduce Jobs

 After you launch a cluster, a {{hadoop-site.xml}} file is created in the
 directory {{~/.hadoop-cloud/<cluster-name>}}. You can use this to connect to
 the cluster by setting the {{HADOOP\_CONF\_DIR}} environment variable. (It is
 also possible to set the configuration file to use by passing it as a {{-conf}}
 option to Hadoop Tools):
 {code}
 % export HADOOP_CONF_DIR=~/.hadoop-cloud/my-hadoop-cluster
 {code}
 *To browse HDFS:*
 {code}
 % hadoop fs -ls /
 {code}
 Note that the version of Hadoop installed locally should match the version
 installed on the cluster.
 \\
 \\
 *To run a job locally:*
 {code}
 % hadoop fs -mkdir input # create an input directory
 % hadoop fs -put $HADOOP_HOME/LICENSE.txt input # copy a file there
 % hadoop jar $HADOOP_HOME/hadoop-*examples*.jar wordcount input output
 % hadoop fs -cat output/part-* | head
 {code}
 The preceding examples assume that you installed Hadoop on your local machine.
 But you can also run jobs within the cluster.
 \\
 \\
 *To run jobs within the cluster:*

 1. Log into the Namenode:
 {code}
 % hadoop-ec2 login my-hadoop-cluster
 {code}

 2. Run the job:
 {code}
 # hadoop fs -mkdir input
 # hadoop fs -put /etc/hadoop/conf/*.xml input
 # hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output 'dfs\[a-z.]+'
 # hadoop fs -cat output/part-* | head
 {code}
	h1. Running MapReduce Jobs

	After you launch a cluster, a {{hadoop-site.xml}} file is created in the
	directory {{~/.hadoop-cloud/<cluster-name>}}. You can use this to connect to
	the cluster by setting the {{HADOOP\_CONF\_DIR}} environment variable. (It is
	also possible to set the configuration file to use by passing it as a {{-conf}}
	option to Hadoop Tools):
	{code}
	% export HADOOP_CONF_DIR=~/.hadoop-cloud/my-hadoop-cluster
	{code}
	To browse HDFS:
	{code}
	% hadoop fs -ls /
	{code}
	Note that the version of Hadoop installed locally should match the version
	installed on the cluster.
	\\
	\\
	To run a job locally:
	{code}
	% hadoop fs -mkdir input # create an input directory
	% hadoop fs -put $HADOOP_HOME/LICENSE.txt input # copy a file there
	% hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount input output
	% hadoop fs -cat output/part-* \| head
	{code}
	The preceding examples assume that you installed Hadoop on your local machine.
	But you can also run jobs within the cluster.
	\\
	\\
	To run jobs within the cluster:

	1. Log into the Namenode:
	{code}
	% hadoop-ec2 login my-hadoop-cluster
	{code}

	2. Run the job:
	{code}
	# hadoop fs -mkdir input
	# hadoop fs -put /etc/hadoop/conf/*.xml input
	# hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output 'dfs\[a-z.]+'
	# hadoop fs -cat output/part-* \| head
	{code}