blob: b9def1038f35ee2c5f51a08196269f6c25520603 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
== Iterator Design
Accumulo SortedKeyValueIterators, commonly referred to as Iterators for short, are server-side programming constructs
that allow users to implement custom retrieval or computational purpose within Accumulo TabletServers. The name rightly
brings forward similarities to the Java Iterator interface; however, Accumulo Iterators are more complex than Java
Iterators. Notably, in addition to the expected methods to retrieve the current element and advance to the next element
in the iteration, Accumulo Iterators must also support the ability to "move" (`seek`) to an specified point in the
iteration (the Accumulo table). Accumulo Iterators are designed to be concatenated together, similar to applying a
series of transformations to a list of elements. Accumulo Iterators can duplicate their underlying source to create
multiple "pointers" over the same underlying data (which is extremely powerful since each stream is sorted) or they can
merge multiple Iterators into a single view. In this sense, a collection of Iterators operating in tandem is close to
a tree-structure than a list, but there is always a sense of a flow of Key-Value pairs through some Iterators. Iterators
are not designed to act as triggers nor are they designed to operate outside of the purview of a single table.
Understanding how TabletServers invoke the methods on a SortedKeyValueIterator can be obtuse as the actual code is
buried within the implementation of the TabletServer; however, it is generally unnecessary to have a strong
understanding of this as the interface provides clear definitions about what each action each method should take. This
chapter aims to provide a more detailed description of how Iterators are invoked, some best practices and some common
pitfalls.
=== Instantiation
To invoke an Accumulo Iterator inside of the TabletServer, the Iterator class must be on the classpath of every
TabletServer. For production environments, it is common to place a JAR file which contains the Iterator in
`$ACCUMULO_HOME/lib`. In development environments, it is convenient to instead place the JAR file in
`$ACCUMULO_HOME/lib/ext` as JAR files in this directory are dynamically reloaded by the TabletServers alleviating the
need to restart Accumulo while testing an Iterator. Advanced classloader features which enable other types of
filesystems and per-table classpath configurations (as opposed to process-wide classpaths). These features
are not covered here, but elsewhere in the user manual.
Accumulo references the Iterator class by name and uses Java reflection to instantiate the Iterator. This means that
Iterators must have a public no-args constructor.
=== Interface
A normal implementation of the SortedKeyValueIterator defines functionality for the following methods:
[source,java]
----
void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException;
boolean hasTop();
void next() throws IOException;
void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive) throws IOException;
Key getTopKey();
Value getTopValue();
SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env);
----
==== `init`
The `init` method is called by the TabletServer after it constructs an instance of the Iterator. This method should
clear/reset any internal state in the Iterator and prepare it to process data. The first argument, the `source`, is the
Iterator "below" this Iterator (where the client is at "top" and the Iterator for files in HDFS are at the "bottom").
The "source" Iterator provides the Key-Value pairs which this Iterator will operate upon.
The second argument, a Map of options, is made up of options provided by the user, options set in the table's
configuration, and/or options set in the containing namespace's configuration.
These options allow for Iterators to dynamically configure themselves on the fly. If no options are used in the current context
(a Scan or Compaction), the Map will be empty. An example of a configuration item for an Iterator could be a pattern used to filter
Key-Value pairs in a regular expression Iterator.
The third argument, the `IteratorEnvironment`, is a special object which provides information to this Iterator about the
context in which it was invoked. Commonly, this information is not necessary to inspect. For example, if an Iterator
knows that it is running in the context of a full-major compaction (reading all of the data) as opposed to a user scan
(which may strongly limit the number of columns), the Iterator might make different algorithmic decisions in an attempt to
optimize itself.
==== `seek`
The `seek` method is likely the most confusing method on the Iterator interface. The purpose of this method is to
advance the stream of Key-Value pairs to a certain point in the iteration (the Accumulo table). It is common that before
the implementation of this method returns some additional processing is performed which may further advance the current
position past the `startKey` of the `Range`. This, however, is dependent on the functionality the iterator provides. For
example, a filtering iterator would consume a number Key-Value pairs which do not meets its criteria before `seek`
returns. The important condition for `seek` to meet is that this Iterator should be ready to return the first Key-Value
pair, or none if no such pair is available, when the method returns. The Key-Value pair would be returned by `getTopKey`
and `getTopValue`, respectively, and `hasTop` should return a boolean denoting whether or not there is
a Key-Value pair to return.
The arguments passed to seek are as follows:
The TabletServer first provides a `Range`, an object which defines some collection of Accumulo `Key`s, which defines the
Key-Value pairs that this Iterator should return. Each `Range` has a `startKey` and `endKey` with an inclusive flag for
both. While this Range is often similar to the Range(s) set by the client on a Scanner or BatchScanner, it is not
guaranteed to be a Range that the client set. Accumulo will split up larger ranges and group them together based on
Tablet boundaries per TabletServer. Iterators should not attempt to implement any custom logic based on the Range(s)
provided to `seek` and Iterators should not return any Keys that fall outside of the provided Range.
The second argument, a `Collection<ByteSequence>`, is the set of column families which should be retained or
excluded by this Iterator. The third argument, a boolean, defines whether the collection of column families
should be treated as an inclusion collection (true) or an exclusion collection (false).
It is likely that all implementations of `seek` will first make a call to the `seek` method on the
"source" Iterator that was provided in the `init` method. The collection of column families and
the boolean `include` argument should be passed down as well as the `Range`. Somewhat commonly, the Iterator will
also implement some sort of additional logic to find or compute the first Key-Value pair in the provided
Range. For example, a regular expression Iterator would consume all records which do not match the given
pattern before returning from `seek`.
It is important to retain the original Range passed to this method to know when this Iterator should stop
reading more Key-Value pairs. Ignoring this typically does not affect scans from a Scanner, but it
will result in duplicate keys emitting from a BatchScan if the scanned table has more than one tablet.
Best practice is to never emit entries outside the seek range.
==== `next`
The `next` method is analogous to the `next` method on a Java Iterator: this method should advance
the Iterator to the next Key-Value pair. For implementations that perform some filtering or complex
logic, this may result in more than one Key-Value pair being inspected. This method alters
some internal state that is exposed via the `hasTop`, `getTopKey`, and `getTopValue` methods.
The result of this method is commonly caching a Key-Value pair which `getTopKey` and `getTopValue`
can later return. While there is another Key-Value pair to return, `hasTop` should return true.
If there are no more Key-Value pairs to return from this Iterator since the last call to
`seek`, `hasTop` should return false.
==== `hasTop`
The `hasTop` method is similar to the `hasNext` method on a Java Iterator in that it informs
the caller if there is a Key-Value pair to be returned. If there is no pair to return, this method
should return false. Like a Java Iterator, multiple calls to `hasTop` (without calling `next`) should not
alter the internal state of the Iterator.
==== `getTopKey` and `getTopValue`
These methods simply return the current Key-Value pair for this iterator. If `hasTop` returns true,
both of these methods should return non-null objects. If `hasTop` returns false, it is undefined
what these methods should return. Like `hasTop`, multiple calls to these methods should not alter
the state of the Iterator.
Users should take caution when either
1. caching the Key/Value from `getTopKey`/`getTopValue`, for use after calling `next` on the source iterator.
In this case, the cached Key/Value object is aliased to the reference returned by the source iterator.
Iterators may reuse the same Key/Value object in a `next` call for performance reasons, changing the data
that the cached Key/Value object references and resulting in a logic bug.
2. modifying the Key/Value from `getTopKey`/`getTopValue`. If the source iterator reuses data stored in the Key/Value,
then the source iterator may use the modified data that the Key/Value references. This may/may not result in a logic bug.
In both cases, copying the Key/Value's data into a new object ensures iterator correctness. If neither case applies,
it is safe to not copy the Key/Value. The general guideline is to be aware of who else may use Key/Value objects
returned from `getTopKey`/`getTopValue`.
==== `deepCopy`
The `deepCopy` method is similar to the `clone` method from the Java `Cloneable` interface.
Implementations of this method should return a new object of the same type as the Accumulo Iterator
instance it was called on. Any internal state from the instance `deepCopy` was called
on should be carried over to the returned copy. The returned copy should be ready to have
`seek` called on it. The SortedKeyValueIterator interface guarantees that `init` will be called on
an iterator before `deepCopy` and that `init` will not be called on the iterator returned by
`deepCopy`.
Typically, implementations of `deepCopy` call a copy-constructor which will initialize
internal data structures. As with `seek`, it is common for the `IteratorEnvironment`
argument to be ignored as most Iterator implementations can be written without the explicit
information the environment provides.
In the analogy of a series of Iterators representing a tree, `deepCopy` can be thought of as
early programming assignments which implement their own tree data structures. `deepCopy` calls
copy on its sources (the children), copies itself, attaches the copies of the children, and
then returns itself.
=== TabletServer invocation of Iterators
The following code is a general outline for how TabletServers invoke Iterators.
[source,java]
----
List<KeyValue> batch;
Range range = getRangeFromClient();
while(!overSizeLimit(batch)){
SortedKeyValueIterator source = getSystemIterator();
for(String clzName : getUserIterators()){
Class<?> clz = Class.forName(clzName);
SortedKeyValueIterator iter = (SortedKeyValueIterator) clz.newInstance();
iter.init(source, opts, env);
source = iter;
}
// read a batch of data to return to client
// the last iterator, the "top"
SortedKeyValueIterator topIter = source;
topIter.seek(getRangeFromUser(), ...)
while(topIter.hasTop() && !overSizeLimit(batch)){
key = topIter.getTopKey()
val = topIter.getTopValue()
batch.add(new KeyValue(key, val)
if(systemDataSourcesChanged()){
// code does not show isolation case, which will
// keep using same data sources until a row boundry is hit
range = new Range(key, false, range.endKey(), range.endKeyInclusive());
break;
}
}
}
//return batch of key values to client
----
Additionally, the obtuse "re-seek" case can be outlined as the following:
[source,java]
----
// Given the above
List<KeyValue> batch = getNextBatch();
// Store off lastKeyReturned for this client
lastKeyReturned = batch.get(batch.size() - 1).getKey();
// thread goes away (client stops asking for the next batch).
// Eventually client comes back
// Setup as before...
Range userRange = getRangeFromUser();
Range actualRange = new Range(lastKeyReturned, false
userRange.getEndKey(), userRange.isEndKeyInclusive());
// Use the actualRange, not the user provided one
topIter.seek(actualRange);
----
=== Isolation
Accumulo provides a feature which clients can enable to prevent the viewing of partially
applied mutations within the context of rows. If a client is submitting multiple column
updates to rows at a time, isolation would ensure that a client would either see all of
updates made to that row or none of the updates (until they are all applied).
When using Isolation, there are additional concerns in iterator design. A scan time iterator in accumulo
reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a
key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources
may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user enables
Isolation, this will only occur after a new row is returned, in which case it will re-seek to the very beginning of the
next possible row.
=== Abstract Iterators
A number of Abstract implementations of Iterators are provided to allow for faster creation
of common patterns. The most commonly used abstract implementations are the `Filter` and
`Combiner` classes. When possible these classes should be used instead as they have been
thoroughly tested inside Accumulo itself.
==== Filter
The `Filter` abstract Iterator provides a very simple implementation which allows implementations
to define whether or not a Key-Value pair should be returned via an `accept(Key, Value)` method.
Filters are extremely simple to implement; however, when the implementation is filtering a
large percentage of Key-Value pairs with respect to the total number of pairs examined,
it can be very inefficient. For example, if a Filter implementation can determine after examining
part of the row that no other pairs in this row will be accepted, there is no mechanism to
efficiently skip the remaining Key-Value pairs. Concretely, take a row which is comprised of
1000 Key-Value pairs. After examining the first 10 Key-Value pairs, it is determined
that no other Key-Value pairs in this row will be accepted. The Filter must still examine each
remaining 990 Key-Value pairs in this row. Another way to express this deficiency is that
Filters have no means to leverage the `seek` method to efficiently skip large portions
of Key-Value pairs.
As such, the `Filter` class functions well for filtering small amounts of data, but is
inefficient for filtering large amounts of data. The decision to use a `Filter` strongly
depends on the use case and distribution of data being filtered.
==== Combiner
The `Combiner` class is another common abstract Iterator. Similar to the `Combiner` interface
define in Hadoop's MapReduce framework, implementations of this abstract class reduce
multiple Values for different versions of a Key (Keys which only differ by timestamps) into one Key-Value pair.
Combiners provide a simple way to implement common operations like summation and
aggregation without the need to implement the entire Accumulo Iterator interface.
One important consideration when choosing to design a Combiner is that the "reduction" operation
is often best represented when it is associative and commutative. Operations which do not meet
these criteria can be implemented; however, the implementation can be difficult.
A second consideration is that a Combiner is not guaranteed to see every Key-Value pair
which differ only by timestamp every time it is invoked. For example, if there are 5 Key-Value
pairs in a table which only differ by the timestamps 1, 2, 3, 4, and 5, it is not guaranteed that
every invocation of the Combiner will see 5 timestamps. One invocation might see the Values for
Keys with timestamp 1 and 4, while another invocation might see the Values for Keys with the
timestamps 1, 2, 4 and 5.
Finally, when configuring an Accumulo table to use a Combiner, be sure to disable the Versioning Iterator or set the
Combiner at a priority less than the Combiner (the Versioning Iterator is added at a priority of 20 by default). The
Versioning Iterator will filter out multiple Key-Value pairs that differ only by timestamp and return only the Key-Value
pair that has the largest timestamp.
=== Best practices
Because of the flexibility that the `SortedKeyValueInterface` provides, it doesn't directly disallow
many implementations which are poor design decisions. The following are some common recommendations to
follow and pitfalls to avoid in Iterator implementations.
==== Avoid special logic encoded in Ranges
Commonly, granular Ranges that a client passes to an Iterator from a `Scanner` or `BatchScanner` are unmodified.
If a `Range` falls within the boundaries of a Tablet, an Iterator will often see that same Range in the
`seek` method. However, there is no guarantee that the `Range` will remain unaltered from client to server. As such, Iterators
should *never* make assumptions about the current state/context based on the `Range`.
The common failure condition is referred to as a "re-seek". In the context of a Scan, TabletServers construct the
"stack" of Iterators and batch up Key-Value pairs to send back to the client. When a sufficient number of Key-Value
pairs are collected, it is common for the Iterators to be "torn down" until the client asks for the next batch of
Key-Value pairs. This is done by the TabletServer to add fairness in ensuring one Scan does not monopolize the available
resources. When the client asks for the next batch, the implementation modifies the original Range so that servers know
the point to resume the iteration (to avoid returning duplicate Key-Value pairs). Specifically, the new Range is created
from the original but is shortened by setting the startKey of the original Range to the Key last returned by the Scan,
non-inclusive.
==== `seek`'ing backwards
The ability for an Iterator to "skip over" large blocks of Key-Value pairs is a major tenet behind Iterators.
By `seek`'ing when it is known that there is a collection of Key-Value pairs which can be ignored can
greatly increase the speed of a scan as many Key-Value pairs do not have to be deserialized and processed.
While the `seek` method provides the `Range` that should be used to `seek` the underlying source Iterator,
there is no guarantee that the implementing Iterator uses that `Range` to perform the `seek` on its
"source" Iterator. As such, it is possible to seek to any `Range` and the interface has no assertions
to prevent this from happening.
Since Iterators are allowed to `seek` to arbitrary Keys, it also allows Iterators to create infinite loops
inside Scans that will repeatedly read the same data without end. If an arbitrary Range is constructed, it should
construct a completely new Range as it allows for bugs to be introduced which will break Accumulo.
Thus, `seek`'s should always be thought of as making "forward progress" in the view of the total iteration. The
`startKey` of a `Range` should always be greater than the current Key seen by the Iterator while the `endKey` of the
`Range` should always retain the original `endKey` (and `endKey` inclusivity) of the last `Range` seen by your
Iterator's implementation of seek.
==== Take caution in constructing new data in an Iterator
Implementations of Iterator might be tempted to open BatchWriters inside of an Iterator as a means
to implement triggers for writing additional data outside of their client application. The lifecycle of an Iterator
is *not* managed in such a way that guarantees that this is safe nor efficient. Specifically, there
is no way to guarantee that the internal ThreadPool inside of the BatchWriter is closed (and the thread(s)
are reaped) without calling the close() method. `close`'ing and recreating a `BatchWriter` after every
Key-Value pair is also prohibitively performance limiting to be considered an option.
The only safe way to generate additional data in an Iterator is to alter the current Key-Value pair.
For example, the `WholeRowIterator` serializes the all of the Key-Values pairs that fall within each
row. A safe way to generate more data in an Iterator would be to construct an Iterator that is
"higher" (at a larger priority) than the `WholeRowIterator`, that is, the Iterator receives the Key-Value pairs which are
a serialization of many Key-Value pairs. The custom Iterator could deserialize the pairs, compute
some function, and add a new Key-Value pair to the original collection, re-serializing the collection
of Key-Value pairs back into a single Key-Value pair.
Any other situation is likely not guaranteed to ensure that the caller (a Scan or a Compaction) will
always see all intended data that is generated.
=== Final things to remember
Some simple recommendations/points to keep in mind:
==== Method call order
On an instance of an Iterator: `init` is always called before `seek`, `seek` is always called before `hasTop`,
`getTopKey` and `getTopValue` will not be called if `hasTop` returns false.
==== Teardown
As mentioned, instance of Iterators may be torn down inside of the server transparently. When a complex
collection of iterators is performing some advanced functionality, they will not be torn down until a Key-Value
pair is returned out of the "stack" of Iterators (and added into the batch of Key-Values to be returned
to the caller). Being torn-down is equivalent to a new instance of the Iterator being creating and `deepCopy`
being called on the new instance with the old instance provided as the argument to `deepCopy`. References
to the old instance are removed and the object is lazily garbage collected by the JVM.
=== Compaction-time Iterators
When Iterators are configured to run during compactions, at the `minc` or `majc` scope, these Iterators sometimes need
to make different assertions than those who only operate at scan time. Iterators won't see the delete entries; however,
Iterators will not necessarily see all of the Key-Value pairs in ever invocation. Because compactions often do not rewrite
all files (only a subset of them), it is possible that the logic take this into consideration.
For example, a Combiner that runs over data at during compactions, might not see all of the values for a given Key. The
Combiner must recognize this and not perform any function that would be incorrect due
to the missing values.