| --- |
| title: MapReduce Example |
| --- |
| |
| This example uses mapreduce and accumulo to compute word counts for a set of |
| documents. This is accomplished using a map-only mapreduce job and a |
| accumulo table with aggregators. |
| |
| To run this example you will need a directory in HDFS containing text files. |
| The accumulo readme will be used to show how to run this example. |
| |
| $ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README |
| $ hadoop fs -ls /user/username/wc |
| Found 1 items |
| -rw-r--r-- 2 username supergroup 9359 2009-07-15 17:54 /user/username/wc/Accumulo.README |
| |
| The first part of running this example is to create a table with aggregation |
| for the column family count. |
| |
| $ ./bin/accumulo shell -u username -p password |
| Shell - Apache Accumulo Interactive Shell |
| - version: 1.3.x-incubating |
| - instance name: instance |
| - instance id: 00000000-0000-0000-0000-000000000000 |
| - |
| - type 'help' for a list of available commands |
| - |
| username@instance> createtable wordCount -a count=org.apache.accumulo.core.iterators.aggregation.StringSummation |
| username@instance wordCount> quit |
| |
| After creating the table, run the word count map reduce job. |
| |
| [user1@instance accumulo]$ bin/tool.sh lib/accumulo-examples-*[^c].jar org.apache.accumulo.examples.mapreduce.WordCount instance zookeepers /user/user1/wc wordCount -u username -p password |
| |
| 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1 |
| 11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003 |
| 11/02/07 18:20:13 INFO mapred.JobClient: map 0% reduce 0% |
| 11/02/07 18:20:20 INFO mapred.JobClient: map 100% reduce 0% |
| 11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003 |
| 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6 |
| 11/02/07 18:20:22 INFO mapred.JobClient: Job Counters |
| 11/02/07 18:20:22 INFO mapred.JobClient: Launched map tasks=1 |
| 11/02/07 18:20:22 INFO mapred.JobClient: Data-local map tasks=1 |
| 11/02/07 18:20:22 INFO mapred.JobClient: FileSystemCounters |
| 11/02/07 18:20:22 INFO mapred.JobClient: HDFS_BYTES_READ=10487 |
| 11/02/07 18:20:22 INFO mapred.JobClient: Map-Reduce Framework |
| 11/02/07 18:20:22 INFO mapred.JobClient: Map input records=255 |
| 11/02/07 18:20:22 INFO mapred.JobClient: Spilled Records=0 |
| 11/02/07 18:20:22 INFO mapred.JobClient: Map output records=1452 |
| |
| After the map reduce job completes, query the accumulo table to see word |
| counts. |
| |
| $ ./bin/accumulo shell -u username -p password |
| username@instance> table wordCount |
| username@instance wordCount> scan -b the |
| the count:20080906 [] 75 |
| their count:20080906 [] 2 |
| them count:20080906 [] 1 |
| then count:20080906 [] 1 |
| there count:20080906 [] 1 |
| these count:20080906 [] 3 |
| this count:20080906 [] 6 |
| through count:20080906 [] 1 |
| time count:20080906 [] 3 |
| time. count:20080906 [] 1 |
| to count:20080906 [] 27 |
| total count:20080906 [] 1 |
| tserver, count:20080906 [] 1 |
| tserver.compaction.major.concurrent.max count:20080906 [] 1 |
| ... |