commit	ef2e04e78cc08ce1d1a4b3f67947e5fe1f874c89	[log] [tgz]
author	Mark Owens <jmark99@apache.org>	Fri May 07 08:16:44 2021 -0400
committer	GitHub <noreply@github.com>	Fri May 07 08:16:44 2021 -0400
tree	91bf6fb821f788a680852c2501991bc444afae11
parent	aed79ea9edb739c90805202314fadb3455a644e2 [diff]

Avoid checking Accumulo table exists before creation (#74)

The initial focus of this ticket was to remove the check for table existence and just create and catch and then ignore TableExistsException instead.

After initial work (see previous comments), it was decided to update the code to work more realistically. Primarily, do not ignore the exceptions and at least alert the user that the table existed prior and let the user decide whether to remove the table and re-run the example. 

Another concern was the fear of interfering with an existing table on a users system. 

The primary changes then became to update table creation and provide feedback via logs to situations where prior tables already existed. In order to prevent table name collision, the examples were modified to make use of Accumulo namespaces. The classes and documentation were update to create an 'examples' namespace wherein all the table are created.

Along the way several other smaller tweaks were made also. Some of these are listed below.

* The process of table creation was re-factored into a <code>Commons</code> class. All examples now use a method in that class to create both the namespace and tablename. A couple of constants used throughout the example classes are defined there as well.
* The bloom classes now have a couple of methods that helped to remove some redundant code.
* An unneeded import was removed from the CharacterHistogram.java class.
* Most of the use of System.out.println was replace with logging instead.
* Update SequentialBatchWriter to exit if required table for scanning does not exist.
* A majority of the documentation was updated to included the creation of the necessary 'examples' namespace.
* The config command was updated to use the table.class.loader.context rather than the deprecated table.classpath.context.
* Update the constraints example to work with the new location of the contraints classes in Accumulo.
* Update filedata documentation to note that the ChunkCombiner class must be avaiable in the accumulo lib directory or on the classpath somewhere in order to scan the create examples.dataTable.

Closes #13

54 files changed

tree: 91bf6fb821f788a680852c2501991bc444afae11

README.md

Apache Accumulo Examples

Setup instructions

Follow the steps below to run the Accumulo examples:

Clone this repository

 git clone https://github.com/apache/accumulo-examples.git

Follow Accumulo's quickstart to install and run an Accumulo instance. Accumulo has an accumulo-client.properties in conf/ that must be configured as the examples will use this file to connect to your instance.
Review env.sh.example in to see if you need to customize it. If ACCUMULO_HOME & HADOOP_HOME are set in your shell, you may be able skip this step. Make sure ACCUMULO_CLIENT_PROPS is set to the location of your accumulo-client.properties.
```
 cp conf/env.sh.example conf/env.sh
 vim conf/env.sh
```
Build the examples repo and copy the examples jar to Accumulo's lib/ext directory:
```
 ./bin/build
 cp target/accumulo-examples.jar /path/to/accumulo/lib/ext/
```
Each Accumulo example has its own documentation and instructions for running the example which are linked to below.

When running the examples, remember the tips below:

Examples are run using the runex or runmr commands which are located in the bin/ directory of this repo. The runex command is a simple script that use the examples shaded jar to run a a class. The runmr starts a MapReduce job in YARN.
Commands intended to be run in bash are prefixed by ‘$’ and should be run from the root of this repository.
Several examples use the accumulo and accumulo-util commands which are expected to be on your PATH. These commands are found in the bin/ directory of your Accumulo installation.
Commands intended to be run in the Accumulo shell are prefixed by ‘>’.

Available Examples

Each example below highlights a feature of Apache Accumulo.

Example	Description
batch	Using the batch writer and batch scanner
bloom	Creating a bloom filter enabled table to increase query performance
bulkIngest	Ingesting bulk data using map/reduce jobs on Hadoop
classpath	Using per-table classpaths
client	Using table operations, reading and writing data in Java.
combiner	Using example StatsCombiner to find min, max, sum, and count.
compactionStrategy	Configuring a compaction strategy
constraints	Using constraints with tables. Limit the mutation size to avoid running out of memory
deleteKeyValuePair	Deleting a key/value pair and verifying the deletion in RFile.
dirlist	Storing filesystem information.
export	Exporting and importing tables.
filedata	Storing file data.
filter	Using the AgeOffFilter to remove records more than 30 seconds old.
helloworld	Inserting records both inside map/reduce jobs and outside. And reading records between two rows.
isolation	Using the isolated scanner to ensure partial changes are not seen.
regex	Using MapReduce and Accumulo to find data using regular expressions.
reservations	Using conditional mutations to implement simple reservation system.
rgbalancer	Using a balancer to spread groups of tablets within a table evenly
rowhash	Using MapReduce to read a table and write to a new column in the same table.
sample	Building and using sample data in Accumulo.
shard	Using the intersecting iterator with a term index partitioned by document.
spark	Using Accumulo as input and output for Apache Spark jobs
tabletofile	Using MapReduce to read a table and write one of its columns to a file in HDFS.
terasort	Generating random data and sorting it using Accumulo.
uniquecols	Use MapReduce to count unique columns in Accumulo
visibility	Using visibilities (or combinations of authorizations). Also shows user permissions.
wordcount	Use MapReduce and Accumulo to do a word count on text files

Release Testing

This repository can be used to test Accumulo release candidates. See docs/release-testing.md.