This example archives file data into an Accumulo table. Files with duplicate data are only stored once. The example has the following classes:
This example is coupled with the dirlist example. See README.dirlist for instructions.
If you haven't already run the README.dirlist example, ingest a file with FileDataIngest.
$ ./bin/accumulo org.apache.accumulo.examples.simple.filedata.FileDataIngest instance zookeepers username password dataTable exampleVis 1000 $ACCUMULO_HOME/README
Open the accumulo shell and look at the data. The row is the MD5 hash of the file, which you can verify by running a command such as ‘md5sum’ on the file.
> scan -t dataTable
Run the CharacterHistogram MapReduce to add some information about the file.
$ bin/tool.sh lib/examples-simple*[^cs].jar org.apache.accumulo.examples.simple.filedata.CharacterHistogram instance zookeepers username password dataTable exampleVis exampleVis
Scan again to see the histogram stored in the ‘info’ column family.
> scan -t dataTable