|author||Mark Owens <email@example.com>||Fri Feb 12 14:59:26 2021 -0500|
|committer||GitHub <firstname.lastname@example.org>||Fri Feb 12 14:59:26 2021 -0500|
Merge pull request #63 from jmark99/bingest Update the bulkIngest example to work correctly with Accumulo 2.0.x. Minor updates made to the documentation. The classes were refactored so the example will work correctly with 2.0 changes. Generation of test data was moved from SetupTable to BulkIngestExample (as in 1.10.x version). Prior to change, necessary data was not correctly being written to HDFS thereby preventing the BulkIngestExample class from finding information needed to generate the data. Closes #63.
Follow the steps below to run the Accumulo examples:
Clone this repository
git clone https://github.com/apache/accumulo-examples.git
Follow Accumulo's quickstart to install and run an Accumulo instance. Accumulo has an accumulo-client.properties in
conf/ that must be configured as the examples will use this file to connect to your instance.
Review env.sh.example in to see if you need to customize it. If
HADOOP_HOME are set in your shell, you may be able skip this step. Make sure
ACCUMULO_CLIENT_PROPS is set to the location of your accumulo-client.properties.
cp conf/env.sh.example conf/env.sh vim conf/env.sh
Build the examples repo and copy the examples jar to Accumulo's
./bin/build cp target/accumulo-examples.jar /path/to/accumulo/lib/ext/
Each Accumulo example has its own documentation and instructions for running the example which are linked to below.
When running the examples, remember the tips below:
runmrcommands which are located in the
bin/directory of this repo. The
runexcommand is a simple script that use the examples shaded jar to run a a class. The
runmrstarts a MapReduce job in YARN.
accumulo-utilcommands which are expected to be on your
PATH. These commands are found in the
bin/directory of your Accumulo installation.
Each example below highlights a feature of Apache Accumulo.
|batch||Using the batch writer and batch scanner|
|bloom||Creating a bloom filter enabled table to increase query performance|
|bulkIngest||Ingesting bulk data using map/reduce jobs on Hadoop|
|classpath||Using per-table classpaths|
|client||Using table operations, reading and writing data in Java.|
|combiner||Using example StatsCombiner to find min, max, sum, and count.|
|compactionStrategy||Configuring a compaction strategy|
|constraints||Using constraints with tables. Limit the mutation size to avoid running out of memory|
|deleteKeyValuePair||Deleting a key/value pair and verifying the deletion in RFile.|
|dirlist||Storing filesystem information.|
|export||Exporting and importing tables.|
|filedata||Storing file data.|
|filter||Using the AgeOffFilter to remove records more than 30 seconds old.|
|helloworld||Inserting records both inside map/reduce jobs and outside. And reading records between two rows.|
|isolation||Using the isolated scanner to ensure partial changes are not seen.|
|regex||Using MapReduce and Accumulo to find data using regular expressions.|
|reservations||Using conditional mutations to implement simple reservation system.|
|rgbalancer||Using a balancer to spread groups of tablets within a table evenly|
|rowhash||Using MapReduce to read a table and write to a new column in the same table.|
|sample||Building and using sample data in Accumulo.|
|shard||Using the intersecting iterator with a term index partitioned by document.|
|spark||Using Accumulo as input and output for Apache Spark jobs|
|tabletofile||Using MapReduce to read a table and write one of its columns to a file in HDFS.|
|terasort||Generating random data and sorting it using Accumulo.|
|uniquecols||Use MapReduce to count unique columns in Accumulo|
|visibility||Using visibilities (or combinations of authorizations). Also shows user permissions.|
|wordcount||Use MapReduce and Accumulo to do a word count on text files|
This repository can be used to test Accumulo release candidates. See docs/release-testing.md.