blob: 2da5dc14796299e8ebfbd9134900757991887466 [file] [log] [blame]
Title: Apache Accumulo Examples
Notice: Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.
http://www.apache.org/licenses/LICENSE-2.0
.
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Before running any of the examples, the following steps must be performed.
1. Install and run Accumulo via the instructions found in $ACCUMULO_HOME/README.
Remember the instance name. It will be referred to as "instance" throughout
the examples. A comma-separated list of zookeeper servers will be referred
to as "zookeepers".
2. Create an Accumulo user (see the [user manual][1]), or use the root user.
The "username" Accumulo user name with password "password" is used
throughout the examples. This user needs the ability to create tables.
In all commands, you will need to replace "instance", "zookeepers",
"username", and "password" with the values you set for your Accumulo instance.
Commands intended to be run in bash are prefixed by '$'. These are always
assumed to be run from the $ACCUMULO_HOME directory.
Commands intended to be run in the Accumulo shell are prefixed by '>'.
Each README in the examples directory highlights the use of particular
features of Apache Accumulo.
README.batch: Using the batch writer and batch scanner.
README.bloom: Creating a bloom filter enabled table to increase query
performance.
README.bulkIngest: Ingesting bulk data using map/reduce jobs on Hadoop.
README.classpath: Using per-table classpaths.
README.client: Using table operations, reading and writing data in Java.
README.combiner: Using example StatsCombiner to find min, max, sum, and
count.
README.constraints: Using constraints with tables.
README.dirlist: Storing filesystem information.
README.export: Exporting and importing tables.
README.filedata: Storing file data.
README.filter: Using the AgeOffFilter to remove records more than 30
seconds old.
README.helloworld: Inserting records both inside map/reduce jobs and
outside. And reading records between two rows.
README.isolation: Using the isolated scanner to ensure partial changes
are not seen.
README.mapred: Using MapReduce to read from and write to Accumulo
tables.
README.maxmutation: Limiting mutation size to avoid running out of memory.
README.regex: Using MapReduce and Accumulo to find data using regular
expressions.
README.rowhash: Using MapReduce to read a table and write to a new
column in the same table.
README.sample: Building and using sample data in Accumulo.
README.shard: Using the intersecting iterator with a term index
partitioned by document.
README.tabletofile: Using MapReduce to read a table and write one of its
columns to a file in HDFS.
README.terasort: Generating random data and sorting it using Accumulo.
README.visibility: Using visibilities (or combinations of authorizations).
Also shows user permissions.
[1]: https://accumulo.apache.org/1.5/accumulo_user_manual#_user_administration