blob: 5c217bc94960a1982213a23e6c0be4d47f7e6a4e [file] [log] [blame]
h1. Running MapReduce Jobs
After you launch a cluster, a {{hadoop-site.xml}} file is created in the
directory {{~/.hadoop-cloud/<cluster-name>}}. You can use this to connect to
the cluster by setting the {{HADOOP\_CONF\_DIR}} environment variable. (It is
also possible to set the configuration file to use by passing it as a {{-conf}}
option to Hadoop Tools):
{code}
% export HADOOP_CONF_DIR=~/.hadoop-cloud/my-hadoop-cluster
{code}
*To browse HDFS:*
{code}
% hadoop fs -ls /
{code}
Note that the version of Hadoop installed locally should match the version
installed on the cluster.
\\
\\
*To run a job locally:*
{code}
% hadoop fs -mkdir input # create an input directory
% hadoop fs -put $HADOOP_HOME/LICENSE.txt input # copy a file there
% hadoop jar $HADOOP_HOME/hadoop-*examples*.jar wordcount input output
% hadoop fs -cat output/part-* | head
{code}
The preceding examples assume that you installed Hadoop on your local machine.
But you can also run jobs within the cluster.
\\
\\
*To run jobs within the cluster:*
1. Log into the Namenode:
{code}
% hadoop-ec2 login my-hadoop-cluster
{code}
2. Run the job:
{code}
# hadoop fs -mkdir input
# hadoop fs -put /etc/hadoop/conf/*.xml input
# hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output 'dfs\[a-z.]+'
# hadoop fs -cat output/part-* | head
{code}