tree: 1702bc8ed012fd4e14eec3cb73c6007d3d90e9cb [path history] [tgz]
  1. .gitignore
  2. README.md
  3. contrib/
  4. pom.xml
  5. run.sh
  6. src/
spark/README.md

Apache Accumulo Spark Example

Requirements

  • Accumulo 2.0+
  • Hadoop YARN installed & HADOOP_CONF_DIR set in environment
  • Spark installed & SPARK_HOME set in environment

Spark example

The CopyPlus5K example will create an Accumulo table called spark_example_input and write 100 key/value entries into Accumulo with the values 0..99. It then launches a Spark application that does following:

  • Read data from spark_example_input table using AccumuloInputFormat
  • Add 5000 to each value
  • Write the data to a new Accumulo table (called spark_example_output) using one of two methods.
    1. Bulk import - Write data to an RFile in HDFS using AccumuloFileOutputFormat and bulk import to Accumulo table
    2. Batchwriter - Creates a BatchWriter in Spark code to write to the table.

This application can be run using the command:

./run.sh batch /path/to/accumulo-client.properties

Change batch to bulk to use Bulk import method.