Follow the steps below to run the Accumulo examples:
Clone this repository
git clone https://github.com/apache/accumulo-examples.git
Follow Accumulo's quickstart to install and run an Accumulo instance. Accumulo has an accumulo-client.properties in conf/
that must be configured as the examples will use this file to connect to your instance.
Review env.sh.example in to see if you need to customize it. If ACCUMULO_HOME
& HADOOP_HOME
are set in your shell, you may be able skip this step. Make sure ACCUMULO_CLIENT_PROPS
is set to the location of your accumulo-client.properties.
cp conf/env.sh.example conf/env.sh vim conf/env.sh
Build the examples repo and copy the examples jar to Accumulo's lib/ext
directory:
./bin/build cp target/accumulo-examples.jar /path/to/accumulo/lib/ext/
Each Accumulo example has its own documentation and instructions for running the example which are linked to below.
When running the examples, remember the tips below:
runex
or runmr
commands which are located in the bin/
directory of this repo. The runex
command is a simple script that use the examples shaded jar to run a a class. The runmr
starts a MapReduce job in YARN.accumulo
and accumulo-util
commands which are expected to be on your PATH
. These commands are found in the bin/
directory of your Accumulo installation.Each example below highlights a feature of Apache Accumulo.
Example | Description |
---|---|
batch | Using the batch writer and batch scanner |
bloom | Creating a bloom filter enabled table to increase query performance |
bulkIngest | Ingesting bulk data using map/reduce jobs on Hadoop |
classpath | Using per-table classpaths |
client | Using table operations, reading and writing data in Java. |
combiner | Using example StatsCombiner to find min, max, sum, and count. |
compactionStrategy | Configuring a compaction strategy |
constraints | Using constraints with tables. Limit the mutation size to avoid running out of memory |
deleteKeyValuePair | Deleting a key/value pair and verifying the deletion in RFile. |
dirlist | Storing filesystem information. |
export | Exporting and importing tables. |
filedata | Storing file data. |
filter | Using the AgeOffFilter to remove records more than 30 seconds old. |
helloworld | Inserting records both inside map/reduce jobs and outside. And reading records between two rows. |
isolation | Using the isolated scanner to ensure partial changes are not seen. |
regex | Using MapReduce and Accumulo to find data using regular expressions. |
reservations | Using conditional mutations to implement simple reservation system. |
rgbalancer | Using a balancer to spread groups of tablets within a table evenly |
rowhash | Using MapReduce to read a table and write to a new column in the same table. |
sample | Building and using sample data in Accumulo. |
shard | Using the intersecting iterator with a term index partitioned by document. |
spark | Using Accumulo as input and output for Apache Spark jobs |
tabletofile | Using MapReduce to read a table and write one of its columns to a file in HDFS. |
terasort | Generating random data and sorting it using Accumulo. |
uniquecols | Use MapReduce to count unique columns in Accumulo |
visibility | Using visibilities (or combinations of authorizations). Also shows user permissions. |
wordcount | Use MapReduce and Accumulo to do a word count on text files |
This repository can be used to test Accumulo release candidates. See docs/release-testing.md.