blob: e00ba4a26fb45e4db0caed69c140c364ce8f8e54 [file] [log] [blame]
Title: Apache Accumulo Filter Example
Notice: Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.
http://www.apache.org/licenses/LICENSE-2.0
.
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
This is a simple filter example. It uses the AgeOffFilter that is provided as
part of the core package org.apache.accumulo.core.iterators.user. Filters are
iterators that select desired key/value pairs (or weed out undesired ones).
Filters extend the org.apache.accumulo.core.iterators.Filter class
and must implement a method accept(Key k, Value v). This method returns true
if the key/value pair are to be delivered and false if they are to be ignored.
Filter takes a "negate" parameter which defaults to false. If set to true, the
return value of the accept method is negated, so that key/value pairs accepted
by the method are omitted by the Filter.
username@instance> createtable filtertest
username@instance filtertest> setiter -t filtertest -scan -p 10 -n myfilter -ageoff
AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
----------> set AgeOffFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method:
----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 30000
----------> set AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day:
username@instance filtertest> scan
username@instance filtertest> insert foo a b c
username@instance filtertest> scan
foo a:b [] c
username@instance filtertest>
... wait 30 seconds ...
username@instance filtertest> scan
username@instance filtertest>
Note the absence of the entry inserted more than 30 seconds ago. Since the
scope was set to "scan", this means the entry is still in Accumulo, but is
being filtered out at query time. To delete entries from Accumulo based on
the ages of their timestamps, AgeOffFilters should be set up for the "minc"
and "majc" scopes, as well.
To force an ageoff of the persisted data, after setting up the ageoff iterator
on the "minc" and "majc" scopes you can flush and compact your table. This will
happen automatically as a background operation on any table that is being
actively written to, but can also be requested in the shell.
The first setiter command used the special -ageoff flag to specify the
AgeOffFilter, but any Filter can be configured by using the -class flag. The
following commands show how to enable the AgeOffFilter for the minc and majc
scopes using the -class flag, then flush and compact the table.
username@instance filtertest> setiter -t filtertest -minc -majc -p 10 -n myfilter -class org.apache.accumulo.core.iterators.user.AgeOffFilter
AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
----------> set AgeOffFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method:
----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 30000
----------> set AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day:
username@instance filtertest> flush
06 10:42:24,806 [shell.Shell] INFO : Flush of table filtertest initiated...
username@instance filtertest> compact
06 10:42:36,781 [shell.Shell] INFO : Compaction of table filtertest started for given range
username@instance filtertest> flush -t filtertest -w
06 10:42:52,881 [shell.Shell] INFO : Flush of table filtertest completed.
username@instance filtertest> compact -t filtertest -w
06 10:43:00,632 [shell.Shell] INFO : Compacting table ...
06 10:43:01,307 [shell.Shell] INFO : Compaction of table filtertest completed for given range
username@instance filtertest>
By default, flush and compact execute in the background, but with the -w flag
they will wait to return until the operation has completed. Both are
demonstrated above, though only one call to each would be necessary. A
specific table can be specified with -t.
After the compaction runs, the newly created files will not contain any data
that should have been aged off, and the Accumulo garbage collector will remove
the old files.
To see the iterator settings for a table, use config.
username@instance filtertest> config -t filtertest -f iterator
---------+---------------------------------------------+---------------------------------------------------------------------------
SCOPE | NAME | VALUE
---------+---------------------------------------------+---------------------------------------------------------------------------
table | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.majc.myfilter.opt.ttl ...... | 30000
table | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.majc.vers.opt.maxVersions .. | 1
table | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.minc.myfilter.opt.ttl ...... | 30000
table | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.minc.vers.opt.maxVersions .. | 1
table | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.scan.myfilter.opt.ttl ...... | 30000
table | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.scan.vers.opt.maxVersions .. | 1
---------+---------------------------------------------+---------------------------------------------------------------------------
username@instance filtertest>
When setting new iterators, make sure to order their priority numbers
(specified with -p) in the order you would like the iterators to be applied.
Also, each iterator must have a unique name and priority within each scope.