SOLR-13152 documentation

commit: 5bda03c2b108f73fe196f68470cc52130ef8d48f [log] [tgz]
author: Gus Heck <gus@apache.org> Mon Mar 04 01:58:51 2019 -0500
committer: Gus Heck <gus@apache.org> Mon Mar 04 01:58:51 2019 -0500
tree: c7b58c875a8c49065b9e9a7058904eab9d690fd3
parent: da21eae589e118c10ab8a33a75f8587911c52e9d [diff]
diff --git a/solr/solr-ref-guide/src/aliases.adoc b/solr/solr-ref-guide/src/aliases.adoc
new file mode 100644
index 0000000..52ce75d
--- /dev/null
+++ b/solr/solr-ref-guide/src/aliases.adoc

@@ -0,0 +1,267 @@
+= Aliases
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+== Standard Aliases
+
+Since version 6, SolrCloud has had the ability to query one or more collections via an alternative name. These
+alternative names for collections are known as aliases, and are useful when you want to:
+
+1. Atomically switch to using a newly (re)indexed collection with zero down time (by re-defining the alias)
+1. Insulate the client programming versus changes in collection names
+1. Issue a single query against several collections with identical schemas
+
+It's also possible to send update commands to aliases, but this is rarely useful if the
+  alias refers to more than one collection (as in case 3 above).
+Since there is no logic by which to distribute documents among the collections, all updates will simply be
+  directed to the first collection in the list.
+
+Standard aliases are created and updated using the <<collections-api.adoc#createalias,CREATEALIAS>> command.
+The current list of collections that are members of an alias can be verified via the
+  <<collections-api.adoc#clusterstatus,CLUSTERSTATUS>> command.
+The full definition of all aliases including metadata about that alias (in the case of routed aliases, see below)
+  can be verified via the <<collections-api.adoc#listaliases,LISTALIASES>> command.
+Alternatively this information is available by checking `/aliases.json` in zookeeper via a zookeeper
+  client or in the <<cloud-screens.adoc#tree-view,tree page>> of the cloud menu in the admin UI.
+Aliases may be deleted via the <<collections-api.adoc#deletealias,DELETEALIAS>> command.
+The underlying collections are *unaffected* by this command.
+
+TIP: Any alias (standard or routed) that references multiple collections may complicate relevancy.
+By default, SolrCloud scores documents on a per shard basis.
+With multiple collections in an alias this is always a problem, so if you have a use case for which BM25 or
+  TF/IDF relevancy is important you will want to turn on one of the
+  <<distributed-requests.adoc#distributedidf,ExactStatsCache>> implementations.
+However, for analytical use cases where results are sorted on numeric, date or alphanumeric field values rather
+  than relevancy calculations this is not a problem.
+
+== Routed Aliases
+
+To address the update limitations associated with standard aliases and provide additional useful features, the concept of
+  RoutedAliases has been developed.
+There are presently two types of Routed Alias time routed and category routed. These are described in detail below,
+  but share some common behavior.
+
+When processing an update for a routed alias, Solr initializes its
+  <<update-request-processors.adoc#update-request-processors,UpdateRequestProcessor>> chain as usual, but
+  when `DistributedUpdateProcessor` (DUP) initializes, it detects that the update targets a routed alias and injects
+  `RoutedAliasUpdateProcessor` (RAUP) in front of itself.
+RAUP, in coordination with the Overseer, is the main part of a routed alias, and must immediately precede DUP. It is not
+  possible to configure custom chains with other types of UpdateRequestProcessors between RAUP and DUP.
+
+Ideally, as a user of a routed alias, you needn't concern yourself with the particulars of the collection naming pattern
+  since both queries and updates may be done via the alias.
+When adding data, you should usually direct documents to the alias (e.g., reference the alias name instead of any collection).
+The Solr server and CloudSolrClient will direct an update request to the first collection that an alias points to.
+Once the server receives the data it will perform the necessary routing.
+
+WARNING: It is possible to update the collections
+  directly, but there is no safeguard against putting data in the incorrect collection if the alias is circumvented
+  in this manner.
+
+CAUTION: It's probably a bad idea to use "data driven" mode with routed aliases, as duplicate schema mutations might happen
+concurrently leading to errors.
+
+
+== Time Routed Aliases
+
+Starting in Solr 7.4, Time Routed Aliases (TRAs) are a SolrCloud feature that manages an alias and a time sequential
+ series of collections.
+
+It automatically creates new collections and (optionally) deletes old ones as it routes documents to the correct
+  collection based on its timestamp.
+This approach allows for indefinite indexing of data without degradation of performance otherwise experienced due to the
+  continuous growth of a single index.
+
+If you need to store a lot of timestamped data in Solr, such as logs or IoT sensor data, then this feature probably
+  makes more sense than creating one sharded hash-routed collection.
+
+=== How It Works
+
+First you create a time routed aliases using the <<collections-api.adoc#createalias,CREATEALIAS>> command with some
+  router settings.
+Most of the settings are editable at a later time using the <<collections-api.adoc#aliasprop,ALIASPROP>> command.
+
+The first collection will be created automatically, along with an alias pointing to it.
+Each underlying Solr "core" in a collection that is a member of a TRA has a special core property referencing the alias.
+The name of each collection is comprised of the TRA name and the start timestamp (UTC), with trailing zeros and symbols
+  truncated.
+
+The collections list for a TRA is always reverse sorted, and thus the connection path of the request will route to the
+  lead collection. Using CloudSolrClient is preferable as it can reduce the number of underlying physical HTTP requests by one.
+If you know that a particular set of documents to be delivered is going to a particular older collection then you could
+  direct it there from the client side as an optimization but it's not necessary. CloudSolrClient does not (yet) do this.
+
+
+TRUP first reads TRA configuration from the alias properties when it is initialized.  As it sees each document, it checks for
+  changes to TRA properties, updates its cached configuration if needed and then determines which collection the
+  document belongs to:
+
+* If TRUP needs to send it to a time segment represented by a collection other than the one that
+  the client chose to communicate with, then it will do so using mechanisms shared with DUP.
+  Once the document is forwarded to the correct collection (i.e., the correct TRA time segment), it skips directly to
+  DUP on the target collection and continues normally, potentially being routed again to the correct shard & replica
+  within the target collection.
+
+* If it belongs in the current collection (which is usually the case if processing events as they occur), the document
+  passes through to DUP. DUP does it's normal collection-level processing that may involve routing the document
+  to another shard & replica.
+
+* If the time stamp on the document is more recent than the most recent TRA segment, then a new collection needs to be
+  added at the front of the TRA.
+  TRUP will create this collection, add it to the alias and then forward the document to the collection it just created.
+  This can happen recursively if more than one collection needs to be created.
++
+Each time a new collection is added, the oldest collections in the TRA are examined for possible deletion, if that has
+    been configured.
+All this happens synchronously, potentially adding seconds to the update request and indexing latency.
+If `router.preemptiveCreateMath` is configured and if the document arrives within this window then it will occur
+asynchronously.
+
+Any other type of update like a commit or delete is routed by TRUP to all collections.
+Generally speaking, this is not a performance concern. When Solr receives a delete or commit wherein nothing is deleted
+or nothing needs to be committed, then it's pretty cheap.
+
+
+=== Limitations & Assumptions
+
+* Only *time* routed aliases are supported.  If you instead have some other sequential number, you could fake it
+  as a time (e.g., convert to a timestamp assuming some epoch and increment).
++
+The smallest possible interval is one second.
+No other routing scheme is supported, although this feature was developed with considerations that it could be
+  extended/improved to other schemes.
+
+* The underlying collections form a contiguous sequence without gaps.  This will not be suitable when there are
+  large gaps in the underlying data, as Solr will insist that there be a collection for each increment.  This
+  is due in part on Solr calculating the end time of each interval collection based on the timestamp of
+  the next collection, since it is otherwise not stored in any way.
+
+* Avoid sending updates to the oldest collection if you have also configured that old collections should be
+  automatically deleted.  It could lead to exceptions bubbling back to the indexing client.
+
+== Category Routed Aliases
+
+Starting in Solr 8.1, Category Routed Aliases (CRAs) are a feature to manage aliases and a set of dependent collections
+based on the value of a single field.
+
+CRAs automatically create new collections but because the partitioning is on categorical information rather than continuous
+numerically based values there's no logic for automatic deletion. This approach allows for simplified indexing of data
+that must be segregated into collections for cluster management or security reasons.
+
+=== How It Works
+
+First you create a time routed aliases using the <<collections-api.adoc#createalias,CREATEALIAS>> command with some
+  router settings.
+ Most of the settings are editable at a later time using the <<collections-api.adoc#aliasprop,ALIASPROP>> command.
+
+The alias will be created with a special place-holder collection which will always be named
+ `myAlias__CRA__NEW_CATEGORY_ROUTED_ALIAS_WAITING_FOR_DATA__TEMP`. The first document indexed into the CRA
+ will create a second collection named `myAlias__CRA__foo` (for a routed field value of `foo`). The second document
+ indexed will cause the temporary place holder collection to be deleted. Thereafter collections will be created whenever
+ a new value for the field is encountered.
+
+CAUTION: To guard against runaway collection creation options for limiting the total number of categories, and for
+rejecting values that don't match a regular expression are provided (see <<collections-api.adoc#createalias,CREATEALIAS>> for
+details). Note that by providing very large or very permissive values for these options you are accepting the risk that
+garbled data could potentially create thousands of collections and bring your cluster to a grinding halt.
+
+Please note that the values (and thus the collection names) are case sensitive. As elsewhere in Solr manipulation and
+cleaning of the data is expected to be done by external processes before data is sent to Solr with one exception.
+Throughout Solr there are limitations on the allowable characters in collection names. Any characters other than ASCII
+alphanumeric characters (`A-Za-z0-9`), hyphen (`-`) or underscore (`_`) are replaced with an underscore when calculating
+the collection name for a category. For a CRA named `myAlias` the following table shows how collection names would be
+calculated:
+
+|===
+|Value |CRA Collection Name
+
+|foo
+|+myAlias__CRA__foo+
+
+|Foo
+|+myAlias__CRA__Foo+
+
+|foo bar
+|+myAlias__CRA__foo_bar+
+
+|+FOÓB&R+
+|+myAlias__CRA__FO_B_R+
+
+|+中文的东西+
+|+myAlias__CRA_______+
+
+|+foo__CRA__bar+
+|*Causes 400 Bad Request*
+
+|+<null>+
+|*Causes 400 Bad Request*
+
+|===
+
+Since collection creation can take upwards of 1-3 seconds, systems inserting data in a CRA should be
+ constructed to handle such pauses whenever a new collection is created.
+Unlike time routed aliases, there is no way to predict the next value so such pauses are unavoidable.
+
+There is no automated means of removing a category. If a category needs to be removed from a CRA
+the following procedure is recommended:
+
+1. Ensure that no documents with the value corresponding to the category to be removed will be sent
+   either by stopping indexing or by fixing the incoming data stream
+1. Modify the alias definition in zookeeper, removing the collection corresponding to the category.
+1. Delete the collection corresponding to the category. Note that if the collection is not removed
+   from the alias first, this step will fail.
+
+=== Limitations & Assumptions
+
+* CRAs are presently unsuitable for non-english data values due to the limits on collection names.
+  This can be worked around by duplicating the route value to a *_url safe_* base 64 encoded field
+  and routing on that value instead.
+
+* The check for the __CRA__ infix is independent of the regular expression validation and occurs after
+  the name of the collection to be created has been calculated. It may not be avoided and is necessary
+  to support future features.
+
+== Improvement Possibilities
+
+Routed aliases are a relatively new feature of SolrCloud that can be expected to be improved.
+Some _potential_ areas for improvement that _are not implemented yet_ are:
+
+* *TRAs*: Searches with time filters should only go to applicable collections.
+
+* *TRAs*: Ways to automatically optimize (or reduce the resources of) older collections that aren't expected to receive more
+  updates, and might have less search demand.
+
+* *CRAs*: Intrinsic support for non-english text via base64 encoding
+
+* *CRAs*: Supply an initial list of values for cases where these are known before hand to reduce pauses during indexing
+
+* CloudSolrClient could route documents to the correct collection based on the route value instead always picking the
+  latest/first.
+
+* Presently only updates are routed and queries are distributed to all collections in the alias, but future
+  features might enable routing of the query to the single appropriate collection based on a special parameter or perhaps
+  a filter on the routed field.
+
+* Collections might be constrained by their size instead of or in addition to time or category value.
+  This might be implemented as another type of routed alias, or possibly as an option on the existing routed aliases
+
+* Compatibility with CDCR.
+
+* Option for deletion of aliases that also deletes the underlying collections in one step. Routed Aliases may quickly
+  create more collections than expected during initial testing. Removing them after such events is overly tedious.
+
+As always, patches and pull requests are welcome!
\ No newline at end of file

diff --git a/solr/solr-ref-guide/src/collections-api.adoc b/solr/solr-ref-guide/src/collections-api.adoc
index c78db6d..192adff 100644
--- a/solr/solr-ref-guide/src/collections-api.adoc
+++ b/solr/solr-ref-guide/src/collections-api.adoc

@@ -538,7 +538,7 @@
 *Standard aliases* are simple:  CREATEALIAS registers the alias name with the names of one or more collections provided
   by the command.
 If an existing alias exists, it is replaced/updated.
-A standard alias can serve to have the appearance of renaming a collection, and can be used to atomically swap
+A standard alias can serve as a means to rename a collection, and can be used to atomically swap
 which backing/underlying collection is "live" for various purposes.
 When Solr searches an alias pointing to multiple collections, Solr will search all shards of all the collections as an
   aggregated whole.
@@ -547,23 +547,18 @@
 
 `/admin/collections?action=CREATEALIAS&name=_name_&collections=_collectionlist_`
 
-*Routed aliases* are aliases with additional capabilities to act as a kind of super-collection -- routing
-  updates to the correct collection.
-Since the only routing strategy at present is time oriented, these are also called *Time Routed Aliases* (TRAs).
-A TRA manages an alias and a time sequential series of collections that it will both create and optionally delete on-demand.
-See <<time-routed-aliases.adoc#time-routed-aliases,Time Routed Aliases>> for some important high-level information
+*Routed aliases* are aliases with additional capabilities to act as a kind of super-collection that route
+  updates to the correct collection. Routing is data driven and may be based on a temporal field or on categories
+  specified in a field (normally string based).
+See <<aliases.adoc#routed-aliases,Routed Aliases>> for some important high-level information
   before getting started.
 
-NOTE: Presently this is only supported for temporal fields stored as a
-<<field-types-included-with-solr.adoc#field-types-included-with-solr,DatePointField or TrieDateField>> type. Other
-well ordered field types may be added in future versions.
-
 [source,text]
 ----
 localhost:8983/solr/admin/collections?action=CREATEALIAS&name=timedata&router.start=NOW/DAY&router.field=evt_dt&router.name=time&router.interval=%2B1DAY&router.maxFutureMs=3600000&create-collection.collection.configName=myConfig&create-collection.numShards=2
 ----
 
-If run on Jan 15, 2018, the above will create an alias named timedata, that contains collections with names prefixed
+If run on Jan 15, 2018, the above will create an time routed alias named timedata, that contains collections with names prefixed
 with `timedata` and an initial collection named `timedata_2018_01_15` will be created immediately. Updates sent to this
 alias with a (required) value in `evt_dt` that is before or after 2018-01-15 will be rejected, until the last 60
 minutes of 2018-01-15. After 2018-01-15T23:00:00 documents for either 2018-01-15 or 2018-01-16 will be accepted.
@@ -605,6 +600,21 @@
 
 Most routed alias parameters become _alias properties_ that can subsequently be inspected and <<aliasprop,modified>>.
 
+`router.name`::
+The type of routing to use. Presently only `time` and `category` are valid.  This parameter is required.
+
+`router.field`::
+The field to inspect to determine which underlying collection an incoming document should be routed to.
+This field is required on all incoming documents.
+
+`create-collection.*`::
+The * wildcard can be replaced with any parameter from the <<create,CREATE>> command except `name`. All other fields
+are identical in requirements and naming except that we insist that the configset be explicitly specified.
+The configset must be created beforehand, either uploaded or copied and modified.
+It's probably a bad idea to use "data driven" mode as schema mutations might happen concurrently leading to errors.
+
+==== Time Routed Alias Parameters
+
 `router.start`::
 The start date/time of data for this time routed alias in Solr's standard date/time format (i.e., ISO-8601 or "NOW"
 optionally with <<working-with-dates.adoc#date-math,date math>>).
@@ -625,13 +635,6 @@
 +
 The default timezone is UTC.
 
-`router.field`::
-The date field to inspect to determine which underlying collection an incoming document should be routed to.
-This field is required on all incoming documents.
-
-`router.name`::
-The type of routing to use. Presently only `time` is valid.  This parameter is required.
-
 `router.interval`::
 A date math expression that will be appended to a timestamp to determine the next collection in the series.
 Any date math expression that can be evaluated if appended to a timestamp of the form 2018-01-15T16:17:18 will
@@ -677,11 +680,18 @@
 +
 The default is not to delete.
 
-`create-collection.*`::
-The * wildcard can be replaced with any parameter from the <<create,CREATE>> command except `name`. All other fields
-are identical in requirements and naming except that we insist that the configset be explicitly specified.
-The configset must be created beforehand, either uploaded or copied and modified.
-It's probably a bad idea to use "data driven" mode as schema mutations might happen concurrently leading to errors.
+==== Category Routed Alias Parameters
+
+`router.maxCardinality`::
+The maximum number of categories allowed for this alias.
+This setting safeguards against the inadvertent creation of an infinite number of collections in the event of bad data.
+
+`router.mustMatch`::
+A regular expression that the value of the field specified by `router.field` must match before a corresponding
+collection will be created. Note that changing this setting after data has been added will not alter the data already
+indexed. Any valid Java regular expression pattern may be specified. This expression is pre-compiled at the start of
+each request so batching of updates is strongly recommended. Overly complex patterns will produce cpu
+or garbage collecting overhead during indexing as determined by the JVM's implementation of regular expressions.
 
 === CREATEALIAS Response
 
@@ -836,6 +846,10 @@
 
 `/admin/collections?action=ALIASPROP&name=_name_&property.someKey=somevalue`
 
+WARNING: This command allows you to revise any property. No alias specific validation is performed.
+         Routed aliases may cease to function, function incorrectly or cause errors if property values
+         are set carelessly.
+
 === ALIASPROP Parameters
 
 `name`::

diff --git a/solr/solr-ref-guide/src/distributed-requests.adoc b/solr/solr-ref-guide/src/distributed-requests.adoc
index be129ed..3422540 100644
--- a/solr/solr-ref-guide/src/distributed-requests.adoc
+++ b/solr/solr-ref-guide/src/distributed-requests.adoc

@@ -127,6 +127,7 @@
 +
 NOTE: In SolrCloud mode, if at least one node is included in the whitelist, then the `live_nodes` will no longer be used as source for the list. This means that if you need to do a cross-cluster request using the `shards` parameter in SolrCloud mode (in addition to regular within-cluster requests), you'll need to add all nodes (local cluster + remote nodes) to the whitelist.
 
+[[distributedidf]]
 == Configuring statsCache (Distributed IDF)
 
 Document and term statistics are needed in order to calculate relevancy. Solr provides four implementations out of the box when it comes to document stats calculation:

diff --git a/solr/solr-ref-guide/src/how-solrcloud-works.adoc b/solr/solr-ref-guide/src/how-solrcloud-works.adoc
index b0e42e4..7f4db3b 100644
--- a/solr/solr-ref-guide/src/how-solrcloud-works.adoc
+++ b/solr/solr-ref-guide/src/how-solrcloud-works.adoc

@@ -1,5 +1,5 @@
 = How SolrCloud Works
-:page-children: shards-and-indexing-data-in-solrcloud, distributed-requests, time-routed-aliases
+:page-children: shards-and-indexing-data-in-solrcloud, distributed-requests, aliases
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
@@ -21,7 +21,7 @@
 
 * <<shards-and-indexing-data-in-solrcloud.adoc#shards-and-indexing-data-in-solrcloud,Shards and Indexing Data in SolrCloud>>
 * <<distributed-requests.adoc#distributed-requests,Distributed Requests>>
-* <<time-routed-aliases.adoc#time-routed-aliases,Time Routed Aliases>>
+* <<aliases.adoc#aliases,Standard and Routed Aliases>>
 
 If you are already familiar with SolrCloud concepts and basic functionality, you can skip to the section covering <<solrcloud-configuration-and-parameters.adoc#solrcloud-configuration-and-parameters,SolrCloud Configuration and Parameters>>.
 

diff --git a/solr/solr-ref-guide/src/time-routed-aliases.adoc b/solr/solr-ref-guide/src/time-routed-aliases.adoc
deleted file mode 100644
index 3ef0e19..0000000
--- a/solr/solr-ref-guide/src/time-routed-aliases.adoc
+++ /dev/null

@@ -1,119 +0,0 @@
-= Time Routed Aliases
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-Time Routed Aliases (TRAs) is a SolrCloud feature that manages an alias and a time sequential series of collections.
-
-It automatically creates new collections and (optionally) deletes old ones as it routes documents to the correct
-  collection based on its timestamp.
-This approach allows for indefinite indexing of data without degradation of performance otherwise experienced due to the
-  continuous growth of a single index.
-
-If you need to store a lot of timestamped data in Solr, such as logs or IoT sensor data, then this feature probably
-  makes more sense than creating one sharded hash-routed collection.
-
-== How It Works
-
-First you create a time routed aliases using the <<collections-api.adoc#createalias,CREATEALIAS>> command with some
-  router settings.
-Most of the settings are editable at a later time using the <<collections-api.adoc#aliasprop,ALIASPROP>> command.
-
-The first collection will be created automatically, along with an alias pointing to it.
-Each underlying Solr "core" in a collection that is a member of a TRA has a special core property referencing the alias.
-The name of each collection is comprised of the TRA name and the start timestamp (UTC), with trailing zeros and symbols
-  truncated.
-Ideally, as a user of this feature, you needn't concern yourself with the particulars of the collection naming pattern
-  since both queries and updates may be done via the alias.
-
-When adding data, you should usually direct documents to the alias (e.g., reference the alias name instead of any collection).
-The Solr server and CloudSolrClient will direct an update request to the first collection that an alias points to.
-
-The collections list for a TRA is always reverse sorted, and thus the connection path of the request will route to the
-  lead collection. Using CloudSolrClient is preferable as it can reduce the number of underlying physical HTTP requests by one.
-If you know that a particular set of documents to be delivered is going to a particular older collection then you could
-  direct it there from the client side as an optimization but it's not necessary. CloudSolrClient does not (yet) do this.
-
-When processing an update for a TRA, Solr initializes its
-  <<update-request-processors.adoc#update-request-processors,UpdateRequestProcessor>> chain as usual, but
-  when `DistributedUpdateProcessor` (DUP) initializes, it detects that the update targets a TRA and injects
-  `TimeRoutedUpdateProcessor` (TRUP) in front of itself.
-TRUP, in coordination with the Overseer, is the main part of a TRA, and must immediately precede DUP. It is not
-  possible to configure custom chains with other types of UpdateRequestProcessors between TRUP and DUP.
-
-TRUP first reads TRA configuration from the alias properties when it is initialized.  As it sees each document, it checks for
-  changes to TRA properties, updates its cached configuration if needed and then determines which collection the
-  document belongs to:
-
-* If TRUP needs to send it to a time segment represented by a collection other than the one that
-  the client chose to communicate with, then it will do so using mechanisms shared with DUP.
-  Once the document is forwarded to the correct collection (i.e., the correct TRA time segment), it skips directly to
-  DUP on the target collection and continues normally, potentially being routed again to the correct shard & replica
-  within the target collection.
-
-* If it belongs in the current collection (which is usually the case if processing events as they occur), the document
-  passes through to DUP. DUP does it's normal collection-level processing that may involve routing the document
-  to another shard & replica.
-
-* If the time stamp on the document is more recent than the most recent TRA segment, then a new collection needs to be
-  added at the front of the TRA.
-  TRUP will create this collection, add it to the alias and then forward the document to the collection it just created.
-  This can happen recursively if more than one collection needs to be created.
-+
-Each time a new collection is added, the oldest collections in the TRA are examined for possible deletion, if that has
-    been configured.
-All this happens synchronously, potentially adding seconds to the update request and indexing latency.
-If `router.preemptiveCreateMath` is configured and if the document arrives within this window then it will occur
-asynchronously.
-
-Any other type of update like a commit or delete is routed by TRUP to all collections.
-Generally speaking, this is not a performance concern. When Solr receives a delete or commit wherein nothing is deleted
-or nothing needs to be committed, then it's pretty cheap.
-
-== Improvement Possibilities
-
-This is a new feature of SolrCloud that can be expected to be improved.
-Some _potential_ areas for improvement that _are not implemented yet_ are:
-
-* Searches with time filters should only go to applicable collections.
-
-* Collections ought to be constrained by their size instead of or in addition to time.
-  Based on the underlying design, this would only apply to the lead collection.
-
-* Ways to automatically optimize (or reduce the resources of) older collections that aren't expected to receive more
-  updates, and might have less search demand.
-
-* CloudSolrClient could route documents to the correct collection based on a timestamp instead always picking the
-  latest.
-
-* Compatibility with CDCR.
-
-== Limitations & Assumptions
-
-* Only *time* routed aliases are supported.  If you instead have some other sequential number, you could fake it
-  as a time (e.g., convert to a timestamp assuming some epoch and increment).
-+
-The smallest possible interval is one second.
-No other routing scheme is supported, although this feature was developed with considerations that it could be
-  extended/improved to other schemes.
-
-* The underlying collections form a contiguous sequence without gaps.  This will not be suitable when there are
-  large gaps in the underlying data, as Solr will insist that there be a collection for each increment.  This
-  is due in part on Solr calculating the end time of each interval collection based on the timestamp of
-  the next collection, since it is otherwise not stored in any way.
-
-* Avoid sending updates to the oldest collection if you have also configured that old collections should be
-  automatically deleted.  It could lead to exceptions bubbling back to the indexing client.
commit	5bda03c2b108f73fe196f68470cc52130ef8d48f	[log] [tgz]
author	Gus Heck <gus@apache.org>	Mon Mar 04 01:58:51 2019 -0500
committer	Gus Heck <gus@apache.org>	Mon Mar 04 01:58:51 2019 -0500
tree	c7b58c875a8c49065b9e9a7058904eab9d690fd3
parent	da21eae589e118c10ab8a33a75f8587911c52e9d [diff]