doc/SASI.md - cassandra - Git at Google

 <!--
 #
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
 # regarding copyright ownership.  The ASF licenses this file
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 -->

 # SASIIndex

 [`SASIIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/SASIIndex.java),
 or "SASI" for short, is an implementation of Cassandra's
 `Index` interface that can be used as an alternative to the
 existing implementations. SASI's indexing and querying improves on
 existing implementations by tailoring it specifically to Cassandra's
 needs. SASI has superior performance in cases where queries would
 previously require filtering. In achieving this performance, SASI aims
 to be significantly less resource intensive than existing
 implementations, in memory, disk, and CPU usage. In addition, SASI
 supports prefix and contains queries on strings (similar to SQL's
 `LIKE = "foo*"` or `LIKE = "*foo*"'`).

 The following goes on describe how to get up and running with SASI,
 demonstrates usage with examples, and provides some details on its
 implementation.

 ## Using SASI

 The examples below walk through creating a table and indexes on its
 columns, and performing queries on some inserted data.

 The examples below assume the `demo` keyspace has been created and is
 in use.

 ```
 cqlsh> CREATE KEYSPACE demo WITH replication = {
    ... 'class': 'SimpleStrategy',
    ... 'replication_factor': '1'
    ... };
 cqlsh> USE demo;
 ```

 All examples are performed on the `sasi` table:

 ```
 cqlsh:demo> CREATE TABLE sasi (id uuid, first_name text, last_name text,
         ... age int, height int, created_at bigint, primary key (id));
 ```

 #### Creating Indexes

 To create SASI indexes use CQLs `CREATE CUSTOM INDEX` statement:

 ```
 cqlsh:demo> CREATE CUSTOM INDEX ON sasi (first_name) USING 'org.apache.cassandra.index.sasi.SASIIndex'
         ... WITH OPTIONS = {
         ... 'analyzer_class':
         ...   'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
         ... 'case_sensitive': 'false'
         ... };

 cqlsh:demo> CREATE CUSTOM INDEX ON sasi (last_name) USING 'org.apache.cassandra.index.sasi.SASIIndex'
         ... WITH OPTIONS = {'mode': 'CONTAINS'};

 cqlsh:demo> CREATE CUSTOM INDEX ON sasi (age) USING 'org.apache.cassandra.index.sasi.SASIIndex';

 cqlsh:demo> CREATE CUSTOM INDEX ON sasi (created_at) USING 'org.apache.cassandra.index.sasi.SASIIndex'
         ...  WITH OPTIONS = {'mode': 'SPARSE'};
 ```

 The indexes created have some options specified that customize their
 behaviour and potentially performance. The index on `first_name` is
 case-insensitive. The analyzers are discussed more in a subsequent
 example. The `NonTokenizingAnalyzer` performs no analysis on the
 text. Each index has a mode: `PREFIX`, `CONTAINS`, or `SPARSE`, the
 first being the default. The `last_name` index is created with the
 mode `CONTAINS` which matches terms on suffixes instead of prefix
 only. Examples of this are available below and more detail can be
 found in the section on
 [OnDiskIndex](#ondiskindexbuilder).The
 `created_at` column is created with its mode set to `SPARSE`, which is
 meant to improve performance of querying large, dense number ranges
 like timestamps for data inserted every millisecond. Details of the
 `SPARSE` implementation can also be found in the section on the
 [OnDiskIndex](#ondiskindexbuilder). The `age`
 index is created with the default `PREFIX` mode and no
 case-sensitivity or text analysis options are specified since the
 field is numeric.

 After inserting the following data and performing a `nodetool flush`,
 SASI performing index flushes to disk can be seen in Cassandra's logs
 -- although the direct call to flush is not required (see
 [IndexMemtable](#indexmemtable) for more details).

 ```
 cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
         ... VALUES (556ebd54-cbe5-4b75-9aae-bf2a31a24500, 'Pavel', 'Yaskevich', 27, 181, 1442959315018);

 cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
         ... VALUES (5770382a-c56f-4f3f-b755-450e24d55217, 'Jordan', 'West', 26, 173, 1442959315019);

 cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
         ... VALUES (96053844-45c3-4f15-b1b7-b02c441d3ee1, 'Mikhail', 'Stepura', 36, 173, 1442959315020);

 cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
         ... VALUES (f5dfcabe-de96-4148-9b80-a1c41ed276b4, 'Michael', 'Kjellman', 26, 180, 1442959315021);

 cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
         ... VALUES (2970da43-e070-41a8-8bcb-35df7a0e608a, 'Johnny', 'Zhang', 32, 175, 1442959315022);

 cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
         ... VALUES (6b757016-631d-4fdb-ac62-40b127ccfbc7, 'Jason', 'Brown', 40, 182, 1442959315023);

 cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
         ... VALUES (8f909e8a-008e-49dd-8d43-1b0df348ed44, 'Vijay', 'Parthasarathy', 34, 183, 1442959315024);

 cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi;

  first_name | last_name     | age | height | created_at
 ------------+---------------+-----+--------+---------------
     Michael |      Kjellman |  26 |    180 | 1442959315021
     Mikhail |       Stepura |  36 |    173 | 1442959315020
       Jason |         Brown |  40 |    182 | 1442959315023
       Pavel |     Yaskevich |  27 |    181 | 1442959315018
       Vijay | Parthasarathy |  34 |    183 | 1442959315024
      Jordan |          West |  26 |    173 | 1442959315019
      Johnny |         Zhang |  32 |    175 | 1442959315022

 (7 rows)
 ```

 #### Equality & Prefix Queries

 SASI supports all queries already supported by CQL, including LIKE statement
 for PREFIX, CONTAINS and SUFFIX searches.

 ```
 cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
         ... WHERE first_name = 'Pavel';

   first_name | last_name | age | height | created_at
 -------------+-----------+-----+--------+---------------
        Pavel | Yaskevich |  27 |    181 | 1442959315018

 (1 rows)
 ```

 ```
 cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
        ... WHERE first_name = 'pavel';

   first_name | last_name | age | height | created_at
 -------------+-----------+-----+--------+---------------
        Pavel | Yaskevich |  27 |    181 | 1442959315018

 (1 rows)
 ```

 ```
 cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
         ... WHERE first_name LIKE 'M%';

  first_name | last_name | age | height | created_at
 ------------+-----------+-----+--------+---------------
     Michael |  Kjellman |  26 |    180 | 1442959315021
     Mikhail |   Stepura |  36 |    173 | 1442959315020

 (2 rows)
 ```

 Of course, the case of the query does not matter for the `first_name`
 column because of the options provided at index creation time.

 ```
 cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
         ... WHERE first_name LIKE 'm%';

  first_name | last_name | age | height | created_at
 ------------+-----------+-----+--------+---------------
     Michael |  Kjellman |  26 |    180 | 1442959315021
     Mikhail |   Stepura |  36 |    173 | 1442959315020

 (2 rows)
 ```

 #### Compound Queries

 SASI supports queries with multiple predicates, however, due to the
 nature of the default indexing implementation, CQL requires the user
 to specify `ALLOW FILTERING` to opt in to the potential performance
 pitfalls of such a query. With SASI, while the requirement to include
 `ALLOW FILTERING` remains, to reduce modifications to the grammar, the
 performance pitfalls do not exist because filtering is not
 performed. Details on how SASI joins data from multiple predicates is
 available below in the
 [Implementation Details](#implementation-details)
 section.

 ```
 cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
         ... WHERE first_name LIKE 'M%' and age < 30 ALLOW FILTERING;

  first_name | last_name | age | height | created_at
 ------------+-----------+-----+--------+---------------
     Michael |  Kjellman |  26 |    180 | 1442959315021

 (1 rows)
 ```

 #### Suffix Queries

 The next example demonstrates `CONTAINS` mode on the `last_name`
 column. By using this mode, predicates can search for any strings
 containing the search string as a sub-string. In this case the strings
 containing "a" or "an".

 ```
 cqlsh:demo> SELECT * FROM sasi WHERE last_name LIKE '%a%';

  id                                   | age | created_at    | first_name | height | last_name
 --------------------------------------+-----+---------------+------------+--------+---------------
  f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 | 1442959315021 |    Michael |    180 |      Kjellman
  96053844-45c3-4f15-b1b7-b02c441d3ee1 |  36 | 1442959315020 |    Mikhail |    173 |       Stepura
  556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | 1442959315018 |      Pavel |    181 |     Yaskevich
  8f909e8a-008e-49dd-8d43-1b0df348ed44 |  34 | 1442959315024 |      Vijay |    183 | Parthasarathy
  2970da43-e070-41a8-8bcb-35df7a0e608a |  32 | 1442959315022 |     Johnny |    175 |         Zhang

 (5 rows)

 cqlsh:demo> SELECT * FROM sasi WHERE last_name LIKE '%an%';

  id                                   | age | created_at    | first_name | height | last_name
 --------------------------------------+-----+---------------+------------+--------+-----------
  f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 | 1442959315021 |    Michael |    180 |  Kjellman
  2970da43-e070-41a8-8bcb-35df7a0e608a |  32 | 1442959315022 |     Johnny |    175 |     Zhang

 (2 rows)
 ```

 #### Expressions on Non-Indexed Columns

 SASI also supports filtering on non-indexed columns like `height`. The
 expression can only narrow down an existing query using `AND`.

 ```
 cqlsh:demo> SELECT * FROM sasi WHERE last_name LIKE '%a%' AND height >= 175 ALLOW FILTERING;

  id                                   | age | created_at    | first_name | height | last_name
 --------------------------------------+-----+---------------+------------+--------+---------------
  f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 | 1442959315021 |    Michael |    180 |      Kjellman
  556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | 1442959315018 |      Pavel |    181 |     Yaskevich
  8f909e8a-008e-49dd-8d43-1b0df348ed44 |  34 | 1442959315024 |      Vijay |    183 | Parthasarathy
  2970da43-e070-41a8-8bcb-35df7a0e608a |  32 | 1442959315022 |     Johnny |    175 |         Zhang

 (4 rows)
 ```

 #### Delimiter based Tokenization Analysis

 A simple text analysis provided is delimiter based tokenization. This provides an alternative to indexing collections,
 as delimiter separated text can be indexed without the overhead of `CONTAINS` mode nor using `PREFIX` or `SUFFIX` queries.

 ```
 cqlsh:demo> ALTER TABLE sasi ADD aliases text;
 cqlsh:demo> CREATE CUSTOM INDEX on sasi (aliases) USING 'org.apache.cassandra.index.sasi.SASIIndex'
         ... WITH OPTIONS = {
         ... 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer',
         ... 'delimiter': ',',
         ... 'mode': 'prefix',
         ... 'analyzed': 'true'};
 cqlsh:demo> UPDATE sasi SET aliases = 'Mike,Mick,Mikey,Mickey' WHERE id = f5dfcabe-de96-4148-9b80-a1c41ed276b4;
 cqlsh:demo> SELECT * FROM sasi WHERE aliases LIKE 'Mikey' ALLOW FILTERING;

  id                                   | age | aliases                | created_at    | first_name | height | last_name
 --------------------------------------+-----+------------------------+---------------+------------+--------+-----------
  f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 | Mike,Mick,Mikey,Mickey | 1442959315021 |    Michael |    180 |  Kjellman
 ```

 #### Text Analysis (Tokenization and Stemming)

 Lastly, to demonstrate text analysis an additional column is needed on
 the table. Its definition, index, and statements to update rows are shown below.

 ```
 cqlsh:demo> ALTER TABLE sasi ADD bio text;
 cqlsh:demo> CREATE CUSTOM INDEX ON sasi (bio) USING 'org.apache.cassandra.index.sasi.SASIIndex'
         ... WITH OPTIONS = {
         ... 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
         ... 'tokenization_enable_stemming': 'true',
         ... 'analyzed': 'true',
         ... 'tokenization_normalize_lowercase': 'true',
         ... 'tokenization_locale': 'en'
         ... };
 cqlsh:demo> UPDATE sasi SET bio = 'Software Engineer, who likes distributed systems, doesnt like to argue.' WHERE id = 5770382a-c56f-4f3f-b755-450e24d55217;
 cqlsh:demo> UPDATE sasi SET bio = 'Software Engineer, works on the freight distribution at nights and likes arguing' WHERE id = 556ebd54-cbe5-4b75-9aae-bf2a31a24500;
 cqlsh:demo> SELECT * FROM sasi;

  id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
 --------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+---------------
  f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 |                                                                             null | 1442959315021 |    Michael |    180 |      Kjellman
  96053844-45c3-4f15-b1b7-b02c441d3ee1 |  36 |                                                                             null | 1442959315020 |    Mikhail |    173 |       Stepura
  6b757016-631d-4fdb-ac62-40b127ccfbc7 |  40 |                                                                             null | 1442959315023 |      Jason |    182 |         Brown
  556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 |     Yaskevich
  8f909e8a-008e-49dd-8d43-1b0df348ed44 |  34 |                                                                             null | 1442959315024 |      Vijay |    183 | Parthasarathy
  5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |          West
  2970da43-e070-41a8-8bcb-35df7a0e608a |  32 |                                                                             null | 1442959315022 |     Johnny |    175 |         Zhang

 (7 rows)
 ```

 Index terms and query search strings are stemmed for the `bio` column
 because it was configured to use the
 [`StandardAnalyzer`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java)
 and `analyzed` is set to `true`. The
 `tokenization_normalize_lowercase` is similar to the `case_sensitive`
 property but for the
 [`StandardAnalyzer`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java). These
 query demonstrates the stemming applied by [`StandardAnalyzer`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java).

 ```
 cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'distributing';

  id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
 --------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
  556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich
  5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |      West

 (2 rows)

 cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'they argued';

  id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
 --------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
  556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich
  5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |      West

 (2 rows)

 cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'working at the company';

  id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
 --------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
  556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich

 (1 rows)

 cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'soft eng';

  id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
 --------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
  556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich
  5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |      West

 (2 rows)
 ```

 ## Implementation Details

 While SASI, at the surface, is simply an implementation of the
 `Index` interface, at its core there are several data
 structures and algorithms used to satisfy it. These are described
 here. Additionally, the changes internal to Cassandra to support SASI's
 integration are described.

 The `Index` interface divides responsibility of the
 implementer into two parts: Indexing and Querying. Further, Cassandra
 makes it possible to divide those responsibilities into the memory and
 disk components. SASI takes advantage of Cassandra's write-once,
 immutable, ordered data model to build indexes along with the flushing
 of the memtable to disk -- this is the origin of the name "SSTable
 Attached Secondary Index".

 The SASI index data structures are built in memory as the SSTable is
 being written, and they are flushed to disk before the writing of the
 SSTable completes. The writing of each index file only requires
 sequential writes to disk. In some cases, partial flushes are
 performed, and later stitched back together, to reduce memory
 usage. These data structures are optimized for this use case.

 Taking advantage of Cassandra's ordered data model, at query time,
 candidate indexes are narrowed down for searching, minimizing the amount
 of work done. Searching is then performed using an efficient method
 that streams data off disk as needed.

 ### Indexing

 Per SSTable, SASI writes an index file for each indexed column. The
 data for these files is built in memory using the
 [`OnDiskIndexBuilder`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndexBuilder.java). Once
 flushed to disk, the data is read using the
 [`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
 class. These are composed of bytes representing indexed terms,
 organized for efficient writing or searching respectively. The keys
 and values they hold represent tokens and positions in an SSTable and
 these are stored per-indexed term in
 [`TokenTreeBuilder`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTreeBuilder.java)s
 for writing, and
 [`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s
 for querying. These index files are memory mapped after being written
 to disk, for quicker access. For indexing data in the memtable, SASI
 uses its
 [`IndexMemtable`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java)
 class.

 #### OnDiskIndex(Builder)

 Each
 [`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
 is an instance of a modified
 [Suffix Array](https://en.wikipedia.org/wiki/Suffix_array) data
 structure. The
 [`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
 is comprised of page-size blocks of sorted terms and pointers to the
 terms' associated data, as well as the data itself, stored also in one
 or more page-sized blocks. The
 [`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
 is structured as a tree of arrays, where each level describes the
 terms in the level below, the final level being the terms
 themselves. The `PointerLevel`s and their `PointerBlock`s contain
 terms and pointers to other blocks that *end* with those terms. The
 `DataLevel`, the final level, and its `DataBlock`s contain terms and
 point to the data itself, contained in [`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s.

 The terms written to the
 [`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
 vary depending on its "mode": either `PREFIX`, `CONTAINS`, or
 `SPARSE`. In the `PREFIX` and `SPARSE` cases, terms' exact values are
 written exactly once per `OnDiskIndex`. For example, when using a `PREFIX` index
 with terms `Jason`, `Jordan`, `Pavel`, all three will be included in
 the index. A `CONTAINS` index writes additional terms for each suffix of
 each term recursively. Continuing with the example, a `CONTAINS` index
 storing the previous terms would also store `ason`, `ordan`, `avel`,
 `son`, `rdan`, `vel`, etc. This allows for queries on the suffix of
 strings. The `SPARSE` mode differs from `PREFIX` in that for every 64
 blocks of terms a
 [`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)
 is built merging all the
 [`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s
 for each term into a single one. This copy of the data is used for
 efficient iteration of large ranges of e.g. timestamps. The index
 "mode" is configurable per column at index creation time.

 #### TokenTree(Builder)

 The
 [`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)
 is an implementation of the well-known
 [B+-tree](https://en.wikipedia.org/wiki/B%2B_tree) that has been
 modified to optimize for its use-case. In particular, it has been
 optimized to associate tokens, longs, with a set of positions in an
 SSTable, also longs. Allowing the set of long values accommodates
 the possibility of a hash collision in the token, but the data
 structure is optimized for the unlikely possibility of such a
 collision.

 To optimize for its write-once environment the
 [`TokenTreeBuilder`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTreeBuilder.java)
 completely loads its interior nodes as the tree is built, and it uses
 the well-known algorithm optimized for bulk-loading the data
 structure.

 [`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s provide the means to iterate over tokens, and file
 positions, that match a given term, and to skip forward in that
 iteration, an operation used heavily at query time.

 #### IndexMemtable

 The
 [`IndexMemtable`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java)
 handles indexing the in-memory data held in the memtable. The
 [`IndexMemtable`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java)
 in turn manages either a
 [`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
 or a
 [`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java)
 per-column. The choice of which index type is used is data
 dependent. The
 [`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
 is used for literal types. `AsciiType` and `UTF8Type` are literal
 types by default but any column can be configured as a literal type
 using the `is_literal` option at index creation time. For non-literal
 types the
 [`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java)
 is used. The
 [`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
 is an implementation that can efficiently support prefix queries on
 character-like data. The
 [`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java),
 conversely, is better suited for other Cassandra data types like
 numbers.

 The
 [`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
 is built using either the `ConcurrentRadixTree` or
 `ConcurrentSuffixTree` from the `com.goooglecode.concurrenttrees`
 package. The choice between the two is made based on the indexing
 mode, `PREFIX` or other modes, and `CONTAINS` mode, respectively.

 The
 [`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java)
 is built on top of `java.util.concurrent.ConcurrentSkipListSet`.

 ### Querying

 Responsible for converting the internal `IndexExpression`
 representation into SASI's
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
 and
 [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)
 trees, optimizing the trees to reduce the amount of work done, and
 driving the query itself, the
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 is the work horse of SASI's querying implementation. To efficiently
 perform union and intersection operations, SASI provides several
 iterators similar to Cassandra's `MergeIterator`, but tailored
 specifically for SASI's use while including more features. The
 [`RangeUnionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java),
 like its name suggests, performs set unions over sets of tokens/keys
 matching the query, only reading as much data as it needs from each
 set to satisfy the query. The
 [`RangeIntersectionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java),
 similar to its counterpart, performs set intersections over its data.

 #### QueryPlan

 The
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 instantiated per search query is at the core of SASI's querying
 implementation. Its work can be divided in two stages: analysis and
 execution.

 During the analysis phase,
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 converts from Cassandra's internal representation of
 `IndexExpression`s, which has also been modified to support encoding
 queries that contain ORs and groupings of expressions using
 parentheses (see the
 [Cassandra Internal Changes](#cassandra-internal-changes)
 section below for more details). This process produces a tree of
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s, which in turn may contain [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s, all of which
 provide an alternative, more efficient, representation of the query.

 During execution, the
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 uses the `DecoratedKey`-generating iterator created from the
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java) tree. These keys are read from disk and a final check to
 ensure they satisfy the query is made, once again using the
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java) tree. At the point the desired amount of matching data has
 been found, or there is no more matching data, the result set is
 returned to the coordinator through the existing internal components.

 The number of queries (total/failed/timed-out), and their latencies,
 are maintained per-table/column family.

 SASI also supports concurrently iterating terms for the same index
 across SSTables. The concurrency factor is controlled by the
 `cassandra.search_concurrency_factor` system property. The default is
 `1`.

 ##### QueryController

 Each
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 references a
 [`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)
 used throughout the execution phase. The
 [`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)
 has two responsibilities: to manage and ensure the proper cleanup of
 resources (indexes), and to strictly enforce the time bound per query,
 specified by the user via the range slice timeout. All indexes are
 accessed via the
 [`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)
 so that they can be safely released by it later. The
 [`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)'s
 `checkpoint` function is called in specific places in the execution
 path to ensure the time-bound is enforced.

 ##### QueryPlan Optimizations

 While in the analysis phase, the
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 performs several potential optimizations to the query. The goal of
 these optimizations is to reduce the amount of work performed during
 the execution phase.

 The simplest optimization performed is compacting multiple expressions
 joined by logical intersections (`AND`) into a single [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java) with
 three or more [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s. For example, the query `WHERE age < 100 AND
 fname = 'p*' AND first_name != 'pa*' AND age > 21` would,
 without modification, have the following tree:

                           ┌───────┐
                  ┌────────│  AND  │──────┐
                  │        └───────┘      │
                  ▼                       ▼
               ┌───────┐             ┌──────────┐
         ┌─────│  AND  │─────┐       │age < 100 │
         │     └───────┘     │       └──────────┘
         ▼                   ▼
     ┌──────────┐          ┌───────┐
     │ fname=p* │        ┌─│  AND  │───┐
     └──────────┘        │ └───────┘   │
                         ▼             ▼
                     ┌──────────┐  ┌──────────┐
                     │fname!=pa*│  │ age > 21 │
                     └──────────┘  └──────────┘

 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 will remove the redundant right branch whose root is the final `AND`
 and has leaves `fname != pa*` and `age > 21`. These [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s will
 be compacted into the parent `AND`, a safe operation due to `AND`
 being associative and commutative. The resulting tree looks like the
 following:

                                   ┌───────┐
                          ┌────────│  AND  │──────┐
                          │        └───────┘      │
                          ▼                       ▼
                       ┌───────┐             ┌──────────┐
           ┌───────────│  AND  │────────┐    │age < 100 │
           │           └───────┘        │    └──────────┘
           ▼               │            ▼
     ┌──────────┐          │      ┌──────────┐
     │ fname=p* │          ▼      │ age > 21 │
     └──────────┘    ┌──────────┐ └──────────┘
                     │fname!=pa*│
                     └──────────┘

 When excluding results from the result set, using `!=`, the
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 determines the best method for handling it. For range queries, for
 example, it may be optimal to divide the range into multiple parts
 with a hole for the exclusion. For string queries, such as this one,
 it is more optimal, however, to simply note which data to skip, or
 exclude, while scanning the index. Following this optimization the
 tree looks like this:

                                    ┌───────┐
                           ┌────────│  AND  │──────┐
                           │        └───────┘      │
                           ▼                       ▼
                        ┌───────┐             ┌──────────┐
                ┌───────│  AND  │────────┐    │age < 100 │
                │       └───────┘        │    └──────────┘
                ▼                        ▼
         ┌──────────────────┐         ┌──────────┐
         │     fname=p*     │         │ age > 21 │
         │ exclusions=[pa*] │         └──────────┘
         └──────────────────┘

 The last type of optimization applied, for this query, is to merge
 range expressions across branches of the tree -- without modifying the
 meaning of the query, of course. In this case, because the query
 contains all `AND`s the `age` expressions can be collapsed. Along with
 this optimization, the initial collapsing of unneeded `AND`s can also
 be applied once more to result in this final tree using to execute the
 query:

                             ┌───────┐
                      ┌──────│  AND  │───────┐
                      │      └───────┘       │
                      ▼                      ▼
            ┌──────────────────┐    ┌────────────────┐
            │     fname=p*     │    │ 21 < age < 100 │
            │ exclusions=[pa*] │    └────────────────┘
            └──────────────────┘

 #### Operations and Expressions

 As discussed, the
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 optimizes a tree represented by
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s
 as interior nodes, and
 [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s
 as leaves. The
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
 class, more specifically, can have zero, one, or two
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s
 as children and an unlimited number of expressions. The iterators used
 to perform the queries, discussed below in the
 "Range(Union|Intersection)Iterator" section, implement the necessary
 logic to merge results transparently regardless of the
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s
 children.

 Besides participating in the optimizations performed by the
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java),
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
 is also responsible for taking a row that has been returned by the
 query and performing a final validation that it in fact does match. This
 `satisfiesBy` operation is performed recursively from the root of the
 [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
 tree for a given query. These checks are performed directly on the
 data in a given row. For more details on how `satisfiesBy` works, see
 the documentation
 [in the code](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L87-L123).

 #### Range(Union|Intersection)Iterator

 The abstract `RangeIterator` class provides a unified interface over
 the two main operations performed by SASI at various layers in the
 execution path: set intersection and union. These operations are
 performed in an iterated, or "streaming", fashion to prevent unneeded
 reads of elements from either set. In both the intersection and union
 cases the algorithms take advantage of the data being pre-sorted using
 the same sort order, e.g. term or token order.

 The
 [`RangeUnionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java)
 performs the "Merge-Join" portion of the
 [Sort-Merge-Join](https://en.wikipedia.org/wiki/Sort-merge_join)
 algorithm, with the properties of an outer-join, or union. It is
 implemented with several optimizations to improve its performance over
 a large number of iterators -- sets to union. Specifically, the
 iterator exploits the likely case of the data having many subgroups
 of overlapping ranges and the unlikely case that all ranges will
 overlap each other. For more details see the
 [javadoc](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java#L9-L21).

 The
 [`RangeIntersectionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java)
 itself is not a subclass of `RangeIterator`. It is a container for
 several classes, one of which, `AbstractIntersectionIterator`,
 sub-classes `RangeIterator`. SASI supports two methods of performing
 the intersection operation, and the ability to be adaptive in choosing
 between them based on some properties of the data.

 `BounceIntersectionIterator`, and the `BOUNCE` strategy, works like
 the
 [`RangeUnionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java)
 in that it performs a "Merge-Join", however, its nature is similar to
 an inner-join, where like values are merged by a data-specific merge
 function (e.g. merging two tokens in a list to lookup in a SSTable
 later). See the
 [javadoc](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L88-L101)
 for more details on its implementation.

 `LookupIntersectionIterator`, and the `LOOKUP` strategy, performs a
 different operation, more similar to a lookup in an associative data
 structure, or "hash lookup" in database terminology. Once again,
 details on the implementation can be found in the
 [javadoc](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L199-L208).

 The choice between the two iterators, or the `ADAPTIVE` strategy, is
 based upon the ratio of data set sizes of the minimum and maximum
 range of the sets being intersected. If the number of the elements in
 minimum range divided by the number of elements is the maximum range
 is less than or equal to `0.01`, then the `ADAPTIVE` strategy chooses
 the `LookupIntersectionIterator`, otherwise the
 `BounceIntersectionIterator` is chosen.

 ### The SASIIndex Class

 The above components are glued together by the
 [`SASIIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/SASIIndex.java)
 class which implements `Index`, and is instantiated
 per-table containing SASI indexes. It manages all indexes for a table
 via the
 [`sasi.conf.DataTracker`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java)
 and
 [`sasi.conf.view.View`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/conf/view/View.java)
 components, controls writing of all indexes for an SSTable via its
 [`PerSSTableIndexWriter`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java), and initiates searches with
 `Searcher`. These classes glue the previously
 mentioned indexing components together with Cassandra's SSTable
 life-cycle ensuring indexes are not only written when Memtable's flush,
 but also as SSTable's are compacted. For querying, the
 `Searcher` does little but defer to
 [`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
 and update e.g. latency metrics exposed by SASI.

 ### Cassandra Internal Changes

 To support the above changes and integrate them into Cassandra a few
 minor internal changes were made to Cassandra itself. These are
 described here.

 #### SSTable Write Life-cycle Notifications

 The `SSTableFlushObserver` is an observer pattern-like interface,
 whose sub-classes can register to be notified about events in the
 life-cycle of writing out a SSTable. Sub-classes can be notified when a
 flush begins and ends, as well as when each next row is about to be
 written, and each next column. SASI's `PerSSTableIndexWriter`,
 discussed above, is the only current subclass.

 ### Limitations and Caveats

 The following are items that can be addressed in future updates but are not
 available in this repository or are not currently implemented.

 * The cluster must be configured to use a partitioner that produces
   `LongToken`s, e.g. `Murmur3Partitioner`. Other existing partitioners which
   don't produce LongToken e.g. `ByteOrderedPartitioner` and `RandomPartitioner`
   will not work with SASI.
 * Not Equals and OR support have been removed in this release while
   changes are made to Cassandra itself to support them.

 ### Contributors

 * [Pavel Yaskevich](https://github.com/xedin)
 * [Jordan West](https://github.com/jrwest)
 * [Michael Kjellman](https://github.com/mkjellman)
 * [Jason Brown](https://github.com/jasobrown)
 * [Mikhail Stepura](https://github.com/mishail)