Apache Accumulo Spark Example

Requirements

Accumulo 2.0+
Hadoop YARN installed & HADOOP_CONF_DIR set in environment
Spark installed & SPARK_HOME set in environment

Spark example

The CopyPlus5K example will create an Accumulo table called spark_example_input and write 100 key/value entries into Accumulo with the values 0..99. It then launches a Spark application that does following:

Read data from spark_example_input table using AccumuloInputFormat
Add 5000 to each value
Write the data to a new Accumulo table (called spark_example_output) using one of two methods.
1. Bulk import - Write data to an RFile in HDFS using AccumuloFileOutputFormat and bulk import to Accumulo table
2. Batchwriter - Creates a BatchWriter in Spark code to write to the table.

This application can be run using the command:

./run.sh batch /path/to/accumulo-client.properties

Change batch to bulk to use Bulk import method.