stack/corepersistence/queryindex/README.md - usergrid - Git at Google

 Core Persistence Query Index Module
 ===
 This module defines an __EntityCollectionIndex__ interface for indexing, de-indexing and querying Entities within a Collection. Queries are expressed in Usergrid's SQL-like query syntax.

 Implementation
 ---
 This module also provides an implementation of the EntityCollectionIndex using the open source ElasticSearch as index and query engine.

 Here are the important parts of the QueryIndex module:

 * __EntityCollectionIndex__: the interface that defines methods for indexing, deindexing and querying an index.
 * __EntityCollectionIndexFactory__: factory for obtaining an index for an Entity Collection.
 * __IndexFig__: defines configuration needed for this module to operate.
 * __org.apache.usergrid.persistence.index.impl__: provides an implementation using ElasticSearch via its Java API.
 * __Query, Results and EntityRefs__: these classes were "ported" from Usergrid 1.0 to support Usergrid query syntax. We define a grammar and use ANTLR to generate a parser and a lexer.

 100 Legacy Tests
 ---
 These 100 tests help us ensure that Usergrid 1.0 query syntax is fully supported by this module. To enable re-use of tests from Usergrid 1.0 this module's tests include some "legacy" test infrastructure classes, e.g. Application, Core Application. It also includes a partial implementation of the old Entity Manager interface.

 In package org.apache.usergrid.persistence.index.impl:

 * GeoIT
 * IndexIT
 * CollectionIT

 In package: org.apache.usergrid.persistence.query

 In package: org.apache.usergrid.persistence.query.tree

 Stress Tests
 ---
 Coming soon...


 Issues to consider
 ---

 * We have to set a Query Cursor Timeout, is that a problem?
     * No, but how does it work. Does timeout reset on each query?
 * We need to set a Refresh Frequency, how do we design around that?
     * To be determined...
 * Better to have index for all, or one per organization?
     * More indexes, more complexity, number of shards, etc.?
     * Smaller indexes means quicker queries?
 * For each index, how many shards? The default five is good enough?
     * The number of shards = the maximum number of nodes possible

 Naming Configuration
 ---
 clusterName = config{usergrid.cluster_name}
 keyspaceName = config{cassandra.keyspace.application}
 managementName = config{elasticsearch.managment_index}
 indexRoot = {clusterName}_{keyspaceName}
 managementIndexName = {indexRoot}_{managementName}
 managementAliasName = {indexRoot}_{managementName}_read_alias || {indexRoot}_{managementName}_read_alias
 applicationIndexName = {indexRoot}_applications_{bucketId}
 applicationAliasName = {indexRoot}_{appId}_read_alias || {indexRoot}_{appId}_write_alias
	Core Persistence Query Index Module
	===
	This module defines an __EntityCollectionIndex__ interface for indexing, de-indexing and querying Entities within a Collection. Queries are expressed in Usergrid's SQL-like query syntax.

	Implementation
	---
	This module also provides an implementation of the EntityCollectionIndex using the open source ElasticSearch as index and query engine.

	Here are the important parts of the QueryIndex module:

	* __EntityCollectionIndex__: the interface that defines methods for indexing, deindexing and querying an index.
	* __EntityCollectionIndexFactory__: factory for obtaining an index for an Entity Collection.
	* __IndexFig__: defines configuration needed for this module to operate.
	* __org.apache.usergrid.persistence.index.impl__: provides an implementation using ElasticSearch via its Java API.
	* __Query, Results and EntityRefs__: these classes were "ported" from Usergrid 1.0 to support Usergrid query syntax. We define a grammar and use ANTLR to generate a parser and a lexer.

	100 Legacy Tests
	---
	These 100 tests help us ensure that Usergrid 1.0 query syntax is fully supported by this module. To enable re-use of tests from Usergrid 1.0 this module's tests include some "legacy" test infrastructure classes, e.g. Application, Core Application. It also includes a partial implementation of the old Entity Manager interface.

	In package org.apache.usergrid.persistence.index.impl:

	* GeoIT
	* IndexIT
	* CollectionIT

	In package: org.apache.usergrid.persistence.query

	In package: org.apache.usergrid.persistence.query.tree

	Stress Tests
	---
	Coming soon...


	Issues to consider
	---

	* We have to set a Query Cursor Timeout, is that a problem?
	* No, but how does it work. Does timeout reset on each query?
	* We need to set a Refresh Frequency, how do we design around that?
	* To be determined...
	* Better to have index for all, or one per organization?
	* More indexes, more complexity, number of shards, etc.?
	* Smaller indexes means quicker queries?
	* For each index, how many shards? The default five is good enough?
	* The number of shards = the maximum number of nodes possible

	Naming Configuration
	---
	clusterName = config{usergrid.cluster_name}
	keyspaceName = config{cassandra.keyspace.application}
	managementName = config{elasticsearch.managment_index}
	indexRoot = {clusterName}_{keyspaceName}
	managementIndexName = {indexRoot}_{managementName}
	managementAliasName = {indexRoot}_{managementName}_read_alias \|\| {indexRoot}_{managementName}_read_alias
	applicationIndexName = {indexRoot}_applications_{bucketId}
	applicationAliasName = {indexRoot}_{appId}_read_alias \|\| {indexRoot}_{appId}_write_alias