tree: 334ad8e2ce7779f45f15c757cb9691cee6b7d40b [path history] [tgz]
  1. src/
  2. .gitignore
  3. pom.xml
  4. README.md
  5. run.sh
spark/README.md

Apache Accumulo Spark Example

Requirements

  • Accumulo 2.0+
  • Hadoop YARN installed & HADOOP_CONF_DIR set in environment
  • Spark installed & SPARK_HOME set in environment

Spark example

The CopyPlus5K example will create an Accumulo table called spark_example_input and write 100 key/value entries into Accumulo with the values 0..99. It then launches a Spark application that does following:

  • Read data from spark_example_input table using AccumuloInputFormat
  • Add 5000 to each value
  • Write the data to a new Accumulo table (called spark_example_output) using one of two methods.
    1. Bulk import - Write data to an RFile in HDFS using AccumuloFileOutputFormat and bulk import to Accumulo table
    2. Batchwriter - Creates a BatchWriter in Spark code to write to the table.

This application can be run using the command:

./run.sh batch /path/to/accumulo-client.properties

Change batch to bulk to use Bulk import method.