tree: 174c49f5af67c157e129dc5e3fb285514df71bd8 [path history] [tgz]
  1. analyzer/
  2. disk/
  3. iterators/
  4. memory/
  5. metrics/
  6. plan/
  7. postings/
  8. utils/
  9. view/
  10. virtual/
  11. IndexValidation.java
  12. QueryContext.java
  13. README.md
  14. SSTableContext.java
  15. SSTableContextManager.java
  16. StorageAttachedIndex.java
  17. StorageAttachedIndexBuilder.java
  18. StorageAttachedIndexBuildingSupport.java
  19. StorageAttachedIndexGroup.java
  20. VectorQueryContext.java
src/java/org/apache/cassandra/index/sai/README.md

Storage-Attached Indexing

Overview

Storage-attached indexing is a column based local secondary index implementation for Cassandra.

The project was inspired by SASI (SSTable-Attached Secondary Indexes) and retains some of its high-level architectural character (and even some actual code), but makes significant improvements in a number of areas:

  • The on-disk/SSTable index formats for both string and numeric data have been completely replaced. Strings are indexed on disk using a byte-ordered trie data structure, while numeric types are indexed using a block-oriented balanced tree.
  • While indexes continue to be managed at the column level from the user's perspective, the storage design at the column index level is row-based, with related offset and token information stored only once at the SSTable level. This drastically reduces our on-disk footprint when several columns are indexed on the same table.
  • Tracing, metrics, virtual table-based metadata and snapshot-based backup/restore are supported out of the box.
  • On-disk index components can be streamed completely when entire SSTable streaming is enabled.
  • Incremental index building is supported, and on-disk index components are included in snapshots.

Many similarities with standard secondary indexes remain:

  • The full set of C* consistency levels is supported for both reads and writes.
  • Index updates are synchronous with mutations and do not require any kind of read-before-write.
  • Global queries are implemented on the back of C* range reads.
  • Paging is supported.
  • Only token ordering of results is supported.
  • Index builds are visible to operators as compactions and are executed on compaction threads.
  • All DML and DDL statements are CQL-based.
  • Single-node management operations are available via nodetool. (ex. stop & rebuild_index)

Quick Start

The following short tutorial will get you up-and-running with storage-attached indexing.

Build and Start Cassandra

Follow the instructions to build and start Cassandra in README.asc in root folder of the Cassandra repository

Create a Simple Data Model

1.) Run the following DDL statements to create a table and two indexes:

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

USE test;

CREATE TABLE person (id int, name text, age int, PRIMARY KEY (id));

CREATE INDEX ON person (name) USING 'sai' WITH OPTIONS = {'case_sensitive': false};

CREATE INDEX ON person (age) USING 'sai';

2.) Add some data.

INSERT INTO person (id, name, age) VALUES (1, 'John', 21);

INSERT INTO person (id, name, age) VALUES (2, 'john', 50);

INSERT INTO person (id, name, age) VALUES (3, 'Boris', 43);

INSERT INTO person (id, name, age) VALUES (4, 'Caleb', 34);

Make Some Queries

1.) Query for everyone named “John”, ignoring case.

SELECT * FROM person WHERE name = 'John';

 id | age | name
----+-----+------
  1 |  21 | John
  2 |  50 | john

2.) Query for everyone between the ages of 18 and 25.

SELECT * FROM person WHERE age >= 18 AND age <= 35;

 id | age | name
----+-----+-------
  1 |  21 |  John
  4 |  34 | Caleb

Contributors

  • Marc Selwan
  • Caleb Rackliffe
  • Zhao Yang
  • Jason Rutherglen
  • Maciej Zasada
  • Andres de la Peña
  • Mike Adamson
  • Zahir Patni
  • Tomek Lasica
  • Berenguer Blasi
  • Rocco Varela
  • Piotr Kołaczkowski