This example uses mapreduce to extract specified columns from an existing table.
To run this example you will need some data in a table. The following will put a trivial amount of data into accumulo using the accumulo shell:
$ accumulo shell root@instance> createnamespace examples root@instance> createtable examples.input root@instance examples.input> insert dog cf cq dogvalue root@instance examples.input> insert cat cf cq catvalue root@instance examples.input> insert junk family qualifier junkvalue root@instance examples.input> quit
The TableToFile class configures a map-only job to read the specified columns and write the key/value pairs to a file in HDFS.
The following will extract the rows containing the column “cf:cq”:
$ ./bin/runmr mapreduce.TableToFile -t exampmles.input --columns cf:cq --output /tmp/output $ hadoop fs -ls /tmp/output Found 2 items -rw-r--r-- 3 root supergroup 0 2021-05-04 10:32 /tmp/output/_SUCCESS -rw-r--r-- 3 root supergroup 44 2021-05-04 10:32 /tmp/output/part-m-00000
We can see the output of our little map-reduce job:
$ hadoop fs -text /tmp/output/part-m-00000 catrow cf:cq [] catvalue dogrow cf:cq [] dogvalue $