Core Persistence Query Index Module

This module defines an EntityCollectionIndex interface for indexing, de-indexing and querying Entities within a Collection. Queries are expressed in Usergrid's SQL-like query syntax.

Implementation

This module also provides an implementation of the EntityCollectionIndex using the open source ElasticSearch as index and query engine.

Here are the important parts of the QueryIndex module:

  • EntityCollectionIndex: the interface that defines methods for indexing, deindexing and querying an index.
  • EntityCollectionIndexFactory: factory for obtaining an index for an Entity Collection.
  • IndexFig: defines configuration needed for this module to operate.
  • org.apache.usergrid.persistence.index.impl: provides an implementation using ElasticSearch via its Java API.
  • Query, Results and EntityRefs: these classes were “ported” from Usergrid 1.0 to support Usergrid query syntax. We define a grammar and use ANTLR to generate a parser and a lexer.

100 Legacy Tests

These 100 tests help us ensure that Usergrid 1.0 query syntax is fully supported by this module. To enable re-use of tests from Usergrid 1.0 this module's tests include some “legacy” test infrastructure classes, e.g. Application, Core Application. It also includes a partial implementation of the old Entity Manager interface.

In package org.apache.usergrid.persistence.index.impl:

  • GeoIT
  • IndexIT
  • CollectionIT

In package: org.apache.usergrid.persistence.query

In package: org.apache.usergrid.persistence.query.tree

Stress Tests

Coming soon...

Issues to consider

  • We have to set a Query Cursor Timeout, is that a problem?
    • No, but how does it work. Does timeout reset on each query?
  • We need to set a Refresh Frequency, how do we design around that?
    • To be determined...
  • Better to have index for all, or one per organization?
    • More indexes, more complexity, number of shards, etc.?
    • Smaller indexes means quicker queries?
  • For each index, how many shards? The default five is good enough?
    • The number of shards = the maximum number of nodes possible

Naming Configuration

clusterName = config{usergrid.cluster_name} keyspaceName = config{cassandra.keyspace.application} managementName = config{elasticsearch.managment_index} indexRoot = {clusterName}{keyspaceName} managementIndexName = {indexRoot}{managementName} managementAliasName = {indexRoot}_{managementName}read_alias || {indexRoot}{managementName}read_alias applicationIndexName = {indexRoot}applications{bucketId} applicationAliasName = {indexRoot}{appId}read_alias || {indexRoot}{appId}_write_alias