blob: 7637ce772cafee8f5d853a060e719c5da264a2b6 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Apache Accumulo Examples
[![Build Status][ti]][tl]
## Introduction
The Accumulo-Examples repository contains a collection of examples for Accumulo versions 2.0 and
greater. Examples within the `main` branch are designed to work with the version currently
under development. Additional branches exist for previous releases of the Accumulo 2.x line. For
example, the `2.0` branch contains examples specifically intended to work with that release version.
A collection of examples for Accumulo 1.10 can be found [here].
## Setup instructions
Follow the steps below to run the Accumulo examples:
1. Clone this repository
git clone https://github.com/apache/accumulo-examples.git
2. Follow [Accumulo's quickstart][quickstart] to install and run an Accumulo instance.
Accumulo has an [accumulo-client.properties] in `conf/` that must be configured as
the examples will use this file to connect to your instance.
3. Review [env.sh.example] in to see if you need to customize it. If `ACCUMULO_HOME` & `HADOOP_HOME`
are set in your shell, you may be able skip this step. Make sure `ACCUMULO_CLIENT_PROPS` is
set to the location of your [accumulo-client.properties].
cp conf/env.sh.example conf/env.sh
vim conf/env.sh
3. Build the examples repo and copy the examples jar to Accumulo's `lib/` directory to get on its
class path:
./bin/build
cp target/accumulo-examples.jar /path/to/accumulo/lib/
4. Each Accumulo example has its own documentation and instructions for running the example which
are linked to below.
When running the examples, remember the tips below:
* Examples are run using the `runex` or `runmr` commands which are located in the `bin/` directory
of this repo. The `runex` command is a simple script that use the examples shaded jar to run a
a class. The `runmr` starts a MapReduce job in YARN.
* Commands intended to be run in bash are prefixed by '$' and should be run from the root of this
repository.
* Several examples use the `accumulo` and `accumulo-util` commands which are expected to be on your
`PATH`. These commands are found in the `bin/` directory of your Accumulo installation.
* Commands intended to be run in the Accumulo shell are prefixed by '>'.
## Available Examples
Each example below highlights a feature of Apache Accumulo.
| Example | Description |
|---------|-------------|
| [batch] | Using the batch writer and batch scanner |
| [bloom] | Creating a bloom filter enabled table to increase query performance |
| [bulkIngest] | Ingesting bulk data using map/reduce jobs on Hadoop |
| [classpath] | Using per-table classpaths |
| [client] | Using table operations, reading and writing data in Java. |
| [combiner] | Using example StatsCombiner to find min, max, sum, and count. |
| [compactionStrategy] | Configuring a compaction strategy |
| [constraints] | Using constraints with tables. Limit the mutation size to avoid running out of memory |
| [deleteKeyValuePair] | Deleting a key/value pair and verifying the deletion in RFile. |
| [dirlist] | Storing filesystem information. |
| [export] | Exporting and importing tables. |
| [filedata] | Storing file data. |
| [filter] | Using the AgeOffFilter to remove records more than 30 seconds old. |
| [helloworld] | Inserting records both inside map/reduce jobs and outside. And reading records between two rows. |
| [isolation] | Using the isolated scanner to ensure partial changes are not seen. |
| [regex] | Using MapReduce and Accumulo to find data using regular expressions. |
| [reservations] | Using conditional mutations to implement simple reservation system. |
| [rgbalancer] | Using a balancer to spread groups of tablets within a table evenly |
| [rowhash] | Using MapReduce to read a table and write to a new column in the same table. |
| [sample] | Building and using sample data in Accumulo. |
| [shard] | Using the intersecting iterator with a term index partitioned by document. |
| [spark] | Using Accumulo as input and output for Apache Spark jobs |
| [tabletofile] | Using MapReduce to read a table and write one of its columns to a file in HDFS. |
| [terasort] | Generating random data and sorting it using Accumulo. |
| [tracing] | Generating trace data in a client application and Accumulo. |
| [uniquecols] | Use MapReduce to count unique columns in Accumulo |
| [visibility] | Using visibilities (or combinations of authorizations). Also shows user permissions. |
| [wordcount] | Use MapReduce and Accumulo to do a word count on text files |
## Release Testing
This repository can be used to test Accumulo release candidates. See
[docs/release-testing.md](docs/release-testing.md).
[quickstart]: https://accumulo.apache.org/docs/2.x/getting-started/quickstart
[accumulo-client.properties]: https://accumulo.apache.org/docs/2.x/configuration/files#accumulo-clientproperties
[env.sh.example]: conf/env.sh.example
[manual]: https://accumulo.apache.org/latest/accumulo_user_manual/
[batch]: docs/batch.md
[bloom]: docs/bloom.md
[bulkIngest]: docs/bulkIngest.md
[classpath]: docs/classpath.md
[client]: docs/client.md
[combiner]: docs/combiner.md
[compactionStrategy]: docs/compactionStrategy.md
[constraints]: docs/constraints.md
[deleteKeyValuePair]: docs/deleteKeyValuePair.md
[dirlist]: docs/dirlist.md
[export]: docs/export.md
[filedata]: docs/filedata.md
[filter]: docs/filter.md
[helloworld]: docs/helloworld.md
[isolation]: docs/isolation.md
[maxmutation]: docs/maxmutation.md
[regex]: docs/regex.md
[reservations]: docs/reservations.md
[rgbalancer]: docs/rgbalancer.md
[rowhash]: docs/rowhash.md
[sample]: docs/sample.md
[shard]: docs/shard.md
[spark]: spark/README.md
[tabletofile]: docs/tabletofile.md
[terasort]: docs/terasort.md
[tracing]: docs/tracing.md
[uniquecols]: docs/uniquecols.md
[visibility]: docs/visibility.md
[wordcount]: docs/wordcount.md
[ti]: https://github.com/apache/accumulo-examples/workflows/QA/badge.svg
[tl]: https://github.com/apache/accumulo-examples/actions
[here]: https://accumulo.apache.org/1.10/examples