Accumulo has an iterator called the intersecting iterator which supports querying a term index that is partitioned by document, or “sharded”. This example shows how to use the intersecting iterator through these four programs:
To run these example programs, create two tables like below.
username@instance> createtable shard username@instance shard> createtable doc2term
After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code.
$ find /path/to/accumulo/core -name "*.java" | xargs ./bin/runex shard.Index -t shard --partitions 30
The following command queries the index to find all files containing ‘foo’ and ‘bar’.
$ ./bin/runex shard.Query -t shard foo bar /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java
In order to run ContinuousQuery, we need to run Reverse.java to populate doc2term.
$ ./bin/runex shard.Reverse --shardTable shard --doc2Term doc2term
Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds.
$ ./bin/runex shard.ContinuousQuery --shardTable shard --doc2Term doc2term --terms 5 [public, core, class, binarycomparable, b] 2 0.081 [wordtodelete, unindexdocument, doctablename, putdelete, insert] 1 0.041 [import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1 0.049 [getpackage, testversion, util, version, 55] 1 0.048 [for, static, println, public, the] 55 0.211 [sleeptime, wrappingiterator, options, long, utilwaitthread] 1 0.057 [string, public, long, 0, wait] 12 0.132