blob: ffca04201092512ae60ac0151f920a09f4e7527f [file] [log] [blame]
= SolrCloud Autoscaling Triggers
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[WARNING]
.Autoscaling is deprecated
====
The autoscaling framework in its current form is deprecated and will be removed in Solr 9.0.
Some features relating to replica placement will be replaced in 9.0, but other features are not likely to be replaced.
There is no planned replacement for triggers.
====
Triggers are used in autoscaling to watch for cluster events such as nodes joining or leaving, search rate, index rate, on a schedule, or any other metric breaching a threshold.
Trigger implementations verify the state of resources that they monitor. When they detect a
change that merits attention they generate _events_, which are then queued and processed by configured
`TriggerAction` implementations. This usually involves computing and executing a plan to do something (e.g., move replicas). Solr provides predefined implementations of triggers for <<Event Types,specific event types>>.
Triggers execute on the node that runs `Overseer`. They are scheduled to run periodically, at a default interval of 1 second between each execution (although it's important to note that not every execution of a trigger produces events).
== Event Types
Currently the following event types (and corresponding trigger implementations) are defined:
* `nodeAdded`: generated when a node joins the cluster. See <<Node Added Trigger>>.
* `nodeLost`: generated when a node leaves the cluster. See <<Node Lost Trigger>> and <<Auto Add Replicas Trigger>>.
* `metric`: generated when the configured metric crosses a configured lower or upper threshold value. See <<Metric Trigger>>.
* `indexSize`: generated when a shard size (defined as index size in bytes or number of documents)
exceeds upper or lower threshold values. See <<Index Size Trigger>>.
* `searchRate`: generated when the search rate exceeds configured upper or lower thresholds. See <<Search Rate Trigger>>.
* `scheduled`: generated according to a scheduled time period such as every 24 hours, etc. See <<Scheduled Trigger>>.
Events are not necessarily generated immediately after the corresponding state change occurred; the
maximum rate of events is controlled by the `waitFor` configuration parameter (see <<Trigger Configuration>> below for more explanation).
The following properties are common to all event types:
`id`:: (string) A unique time-based event id.
`eventType`:: (string) The type of event.
`source`:: (string) The name of the trigger that produced this event.
`eventTime`:: (long) Unix time when the condition that caused this event occurred. For example, for a
`nodeAdded` event this will be the time when the node was added and not when the event was actually
generated, which may significantly differ due to the rate limits set by `waitFor`.
`properties`:: (map, optional) Any additional properties. Currently includes e.g., `nodeNames` property that
indicates the nodes that were lost or added.
== Trigger Configuration
Trigger configurations are managed using the <<solrcloud-autoscaling-api.adoc#write-api,Autoscaling Write API>> with the commands `<<solrcloud-autoscaling-api.adoc#create-or-update-a-trigger,set-trigger>>`, `<<solrcloud-autoscaling-api.adoc#remove-trigger,remove-trigger>>`,
`suspend-trigger`, and `resume-trigger`.
=== Trigger Properties
Trigger configuration consists of the following properties:
`name`:: (string, required) A unique trigger configuration name.
`event`:: (string, required) One of the predefined event types (`nodeAdded` or `nodeLost`).
`actions`:: (list of action configs, optional) An ordered list of actions to execute when event is fired.
`waitFor`:: (string, optional) The time to wait between generating new events, as an integer number immediately
followed by unit symbol, one of `s` (seconds), `m` (minutes), or `h` (hours). Default is `0s`. A condition must
persist at least for the `waitFor` period to generate an event.
`enabled`:: (boolean, optional) When `true` the trigger is enabled. Default is `true`.
Additional implementation-specific properties may be provided, as described in the sections for individual triggers below.
=== Action Properties
Action configuration consists of the following properties:
`name`:: (string, required) A unique name of the action configuration.
`class`:: (string, required) The action implementation class.
Additional implementation-specific properties may be provided, as described in the sections for individual triggers below.
If the `actions` configuration is omitted, then by default, the `ComputePlanAction` and the `ExecutePlanAction` are automatically added to the trigger configuration.
=== Example Trigger Configuration
This simple example shows the configuration for adding (or updating) a trigger for `nodeAdded` events.
[source,json]
----
{
"set-trigger": {
"name" : "node_added_trigger",
"event" : "nodeAdded",
"waitFor" : "1s",
"enabled" : true,
"actions" : [
{
"name" : "compute_plan",
"class": "solr.ComputePlanAction"
},
{
"name" : "custom_action",
"class": "com.example.CustomAction"
},
{
"name" : "execute_plan",
"class": "solr.ExecutePlanAction"
}
]
}
}
----
This trigger configuration will compute and execute a plan to allocate the resources available on the new node. A custom action could also be used to possibly modify the plan.
== Available Triggers
As described earlier, there are several triggers available to watch for events.
=== Node Added Trigger
The `NodeAddedTrigger` generates `nodeAdded` events when a node joins the cluster. It can be used to either move replicas
from other nodes to the new node or to add new replicas.
In addition to the parameters described at <<Trigger Configuration>>, this trigger supports one more parameter:
`preferredOperation`:: (string, optional, defaults to `movereplica`) The operation to be performed in response to an event generated by this trigger. By default, replicas will be moved from other nodes to the added node. The only other supported value is `addreplica` which adds more replicas of the existing collections on the new node.
`replicaType`:: (string, optional, defaults to `NRT`) The replica type that will be used to add replicas, in response to an event generated by this trigger, when "preferredOperation" is "ADDREPLICA". By default, the replica(s) will be of type `NRT`. The only other supported values are `PULL` and `TLOG`, which will add more replicas of the specified type to the existing collections on the new node.
.Example: Node Added Trigger to move replicas to new node
[source,json]
----
{
"set-trigger": {
"name": "node_added_trigger",
"event": "nodeAdded",
"waitFor": "5s"
}
}
----
.Example: Node Added Trigger to add replicas on new node with replica type PULL
[source,json]
----
{
"set-trigger": {
"name": "node_added_trigger",
"event": "nodeAdded",
"waitFor": "5s",
"preferredOperation": "ADDREPLICA",
"replicaType": "PULL"
}
}
----
=== Node Lost Trigger
The `NodeLostTrigger` generates `nodeLost` events when a node leaves the cluster. It can be used to either move replicas
that were hosted by the lost node to other nodes or to delete them from the cluster.
In addition to the parameters described at <<Trigger Configuration>>, this trigger supports the one more parameter:
`preferredOperation`:: (string, optional, defaults to `MOVEREPLICA`) The operation to be performed in response to an event generated by this trigger. By default, replicas will be moved from the lost nodes to the other nodes in the cluster. The only other supported value is `DELETENODE` which deletes all information about replicas that were hosted by the lost node.
.Example: Node Lost Trigger to move replicas to new node
[source,json]
----
{
"set-trigger": {
"name": "node_lost_trigger",
"event": "nodeLost",
"waitFor": "120s"
}
}
----
.Example: Node Lost Trigger to delete replicas
[source,json]
----
{
"set-trigger": {
"name": "node_lost_trigger",
"event": "nodeLost",
"waitFor": "120s",
"preferredOperation": "DELETENODE"
}
}
----
TIP: It is recommended that the value of `waitFor` configuration for the node lost trigger be larger than 1 minute so that large full garbage collection pauses do not cause this trigger to generate events and needlessly move or delete replicas in the cluster.
=== Auto Add Replicas Trigger
When a collection has the parameter `autoAddReplicas` set to true then a trigger configuration named `.auto_add_replicas` is automatically created to watch for nodes going away. This trigger produces `nodeLost` events,
which are then processed by configured actions (usually resulting in computing and executing a plan
to add replicas on the live nodes to maintain the expected replication factor).
Refer to the section <<solrcloud-autoscaling-auto-add-replicas.adoc#, Autoscaling Automatically Adding Replicas>> to learn more about how the `.autoAddReplicas` trigger works.
In addition to the parameters described at <<Trigger Configuration>>, this trigger supports one parameter, which is defined in the `<solrcloud>` section of `solr.xml`:
`autoReplicaFailoverWaitAfterExpiration`::
The minimum time in milliseconds to wait for initiating replacement of a replica after first noticing it not being live. This is important to prevent false positives while stopping or starting the cluster. The default is `120000` (2 minutes). The value provided for this parameter is used as the value for the `waitFor` parameter in the `.auto_add_replicas` trigger.
TIP: See <<format-of-solr-xml.adoc#the-solrcloud-element,The <solrcloud> Element>> for more details about how to work with `solr.xml`.
=== Metric Trigger
The metric trigger can be used to monitor any metric exposed by the <<metrics-reporting.adoc#,Metrics API>>. It supports lower and upper threshold configurations as well as optional filters to limit operation to specific collection, shards, and nodes.
In addition to the parameters described at <<Trigger Configuration>>, this trigger supports the following parameters:
`metrics`::
(string, required) The metric property name to be watched in the format metrics:__group__:__prefix__, e.g., `metrics:solr.node:CONTAINER.fs.coreRoot.usableSpace`.
`below`::
(double, optional) The lower threshold for the metric value. The trigger produces a metric breached event if the metric's value falls below this value.
`above`::
(double, optional) The upper threshold for the metric value. The trigger produces a metric breached event if the metric's value crosses above this value.
`collection`::
(string, optional) The collection used to limit the nodes on which the given metric is watched. When the metric is breached, trigger actions will limit operations to this collection only.
`shard`::
(string, optional) The shard used to limit the nodes on which the given metric is watched. When the metric is breached, trigger actions will limit operations to this shard only.
`node`::
(string, optional) The node on which the given metric is watched. Trigger actions will operate on this node only.
`preferredOperation`::
(string, optional, defaults to `MOVEREPLICA`) The operation to be performed in response to an event generated by this trigger. By default, replicas will be moved from the hot node to others. The only other supported value is `ADDREPLICA` which adds more replicas if the metric is breached.
.Example: a metric trigger that fires when total usable space on a node having replicas of "mycollection" falls below 100GB
[source,json]
----
{
"set-trigger": {
"name": "metric_trigger",
"event": "metric",
"waitFor": "5s",
"metrics": "metric:solr.node:CONTAINER.fs.coreRoot.usableSpace",
"below": 107374182400,
"collection": "mycollection"
}
}
----
=== Index Size Trigger
This trigger can be used for monitoring the size of collection shards, measured either by the
number of documents in a shard or the physical size of the shard's index in bytes.
When either of the upper thresholds is exceeded for a particular shard the trigger will generate
an event with a (configurable) requested operation to perform on the offending shards - by default
this is a SPLITSHARD operation.
Similarly, when either of the lower thresholds is exceeded the trigger will generate an
event with a (configurable) requested operation to perform on two of the smallest
shards. By default this is a MERGESHARDS operation, and is currently ignored because
that operation is not yet implemented (see https://issues.apache.org/jira/browse/SOLR-9407[SOLR-9407]).
When `splitMethod=link` is used the resulting sub-shards will initially have nearly the same size
as the parent shard due to the hard-linking of parent index files, and will differ just in the lists of
deleted documents. In order to correctly recognize the effectively reduced index size an estimate
is calculated using a simple formula: `indexCommitSize * numDocs / maxDoc`. This value is then
compared with `aboveBytes` and `belowBytes` limits.
Additionally, monitoring can be restricted to a list of collections; by default
all collections are monitored.
In addition to the parameters described at <<Trigger Configuration>>, this trigger supports the following configuration parameters (all thresholds are exclusive):
`aboveBytes`::
An upper threshold in bytes. This value is compared to the `SEARCHER.searcher.indexCommitSize` metric, which
reports the size of the latest commit point (ignoring any data related to earlier commit points, which may be
still present for replication or snapshot purposes). See also the note above how this value is used with
`splitMethod=link`.
`belowBytes`::
A lower threshold in bytes. Note that this value should be at least 2x smaller than
`aboveBytes`
`aboveDocs`::
An upper threshold expressed as the number of documents. This value is compared with `SEARCHER.searcher.numDocs` metric.
+
NOTE: Due to the way Lucene indexes work, a shard may exceed the `aboveBytes` threshold
on disk even if the number of documents is relatively small, because replaced and deleted documents keep
occupying disk space until they are actually removed during Lucene index merging.
`belowDocs`::
A lower threshold expressed as the number of documents.
`aboveOp`::
The operation to request when an upper threshold is exceeded. If not specified the
default value is `SPLITSHARD`.
`belowOp`::
The operation to request when a lower threshold is exceeded. If not specified
the default value is `MERGESHARDS` (but see the note above).
`collections`::
A comma-separated list of collection names that this trigger should monitor. If not
specified or empty all collections are monitored.
`maxOps`::
Maximum number of operations requested in a single event. This property limits the speed of
changes in a highly dynamic situation, which may lead to more serious threshold violations,
but it also limits the maximum load on the cluster that the large number of requested
operations may cause. The default value is 10.
`splitMethod`::
One of the supported methods for index splitting to use. Default value is `rewrite`, which is
slow and puts a high CPU load on the shard leader but results in optimized sub-shard indexes.
The `link` method is much faster and puts very little load on the shard leader but results in
indexes that are initially as large as the parent shard's index, which slows down replication and
may lead to excessive initial disk space consumption on replicas.
`splitFuzz`::
A float value (default is 0.0f, must be smaller than 0.5f) that allows to vary the sub-shard ranges
by this percentage of total shard range, odd shards being larger and even shards being smaller.
Non-zero values are useful for large indexes with aggressively growing size, as they help to prevent
avalanches of split shard requests when the total size of the index
reaches even multiples of the maximum shard size thresholds.
`splitByPrefix`::
A boolean value (default is false) that specifies whether the aboveOp shard split should try to
calculate sub-shard hash ranges according to document prefixes, or do a traditional shard split (i.e.
split the hash range into n sub-ranges).
Events generated by this trigger contain additional details about the shards
that exceeded thresholds and the types of violations (upper / lower bounds, bytes / docs metrics).
.Example: Index Size Trigger
This configuration specifies an index size trigger that monitors collections "test1" and "test2",
with both bytes (1GB) and number of docs (1 million) upper limits, and a custom `belowOp`
operation `NONE` (which still can be monitored and acted upon by an appropriate trigger listener):
[source,json]
----
{
"set-trigger": {
"name" : "index_size_trigger",
"event" : "indexSize",
"collections" : "test1,test2",
"aboveBytes" : 1000000000,
"aboveDocs" : 1000000000,
"belowBytes" : 200000,
"belowDocs" : 200000,
"belowOp" : "NONE",
"waitFor" : "1m",
"enabled" : true,
"actions" : [
{
"name" : "compute_plan",
"class": "solr.ComputePlanAction"
},
{
"name" : "execute_plan",
"class": "solr.ExecutePlanAction"
}
]
}
}
----
=== Search Rate Trigger
The search rate trigger can be used for monitoring search rates in a selected
collection (1-min average rate by default), and request that either replicas be moved from
"hot nodes" to different nodes, or new replicas be added to "hot shards" to reduce the
per-replica search rate for a collection or shard with hot spots.
Similarly, if the search rate falls below a threshold then the trigger may request that some
replicas are deleted from "cold" shards. It can also optionally issue node-level action requests
when a cumulative node-level rate falls below a threshold.
Per-shard rates are calculated as arithmetic average of rates of all searchable replicas in a given shard.
This method was chosen to avoid generating false events when a simple client keeps sending requests
to a single specific replica (because adding or removing other replicas can't solve this situation,
only proper load balancing can - either by using `CloudSolrClient` or another load-balancing client).
This trigger calculates node-level cumulative rates using per-replica rates reported by
replicas that are part of monitored collections / shards on each node. This means that it may report
some nodes as "cold" (underutilized) because it ignores other, perhaps more active, replicas
belonging to other collections. Also, nodes that don't host any of the monitored replicas or
those that are explicitly excluded by `node` configuration property won't be reported at all.
.Calculating `waitFor`
[CAUTION]
====
Special care should be taken when configuring the `waitFor` property. By default the trigger
monitors a 1-minute average search rate of a replica. Changes to the number of replicas that should in turn
change per-replica search rates may be requested and executed relatively quickly if the
`waitFor` is set to comparable values of 1 min or shorter.
However, the metric value, being a moving average, will always lag behind the new "momentary" rate after the changes. This in turn means that the monitored metric may not change sufficiently enough to prevent the
trigger from firing again, because it will continue to measure the average rate as still violating
the threshold for some time after the change was executed. As a result the trigger may keep
requesting that even more replicas be added (or removed) and thus it may "overshoot" the optimal number of replicas.
For this reason it's recommended to always set `waitFor` to values several
times longer than the time constant of the used metric. For example, with the default 1-minute average the
`waitFor` should be set to at least `2m` (2 minutes) or more.
====
In addition to the parameters described at <<Trigger Configuration>>, this trigger supports the following configuration properties:
`collections`::
(string, optional) A comma-separated list of collection names to monitor, or any collection if empty or not set.
`shard`::
(string, optional) A shard name within the collection (requires `collections` to be set to exactly one name), or any shard if empty.
`node`::
(string, optional) A node name to monitor, or any if empty.
`metric`::
(string, optional) A metric name that represents the search rate. The default is `QUERY./select.requestTimes:1minRate`. This name has to identify a single numeric metric value, and it may use the colon syntax for selecting one property of
a complex metric. This value is collected from all replicas for a shard, and then an arithmetic average is calculated
per shard to determine shard-level violations.
`maxOps`::
(integer, optional) The maximum number of `ADDREPLICA` or `DELETEREPLICA` operations
requested in a single autoscaling event. The default value is `3` and it helps to smooth out
the changes to the number of replicas during periods of large search rate fluctuations.
`minReplicas`::
(integer, optional) The minimum acceptable number of searchable replicas (i.e., replicas other
than `PULL` type). The trigger will not generate any `DELETEREPLICA` requests when the number of
searchable replicas in a shard reaches this threshold.
+
When this value is not set (the default)
the `replicationFactor` property of the collection is used, and if that property is not set then
the value is set to `1`. Note also that shard leaders are never deleted.
`aboveRate`::
(float) The upper bound for the request rate metric value. At least one of
`aboveRate` or `belowRate` must be set.
`belowRate`::
(float) The lower bound for the request rate metric value. At least one of
`aboveRate` or `belowRate` must be set.
`aboveNodeRate`::
(float) The upper bound for the total request rate metric value per node. If not
set then cumulative per-node rates will be ignored.
`belowNodeRate`::
(float) The lower bound for the total request rate metric value per node. If not
set then cumulative per-node rates will be ignored.
`aboveOp`::
(string, optional) A collection action to request when the upper threshold for a shard is
exceeded. Default action is `ADDREPLICA` and the trigger will request from 1 up to `maxOps` operations
per shard per event, proportionally to how much the rate is exceeded. This property can be set to 'NONE'
to effectively disable the action but still report it to the listeners.
`aboveNodeOp`::
(string, optional) The collection action to request when the upper threshold for a node (`aboveNodeRate`) is exceeded.
Default action is `MOVEREPLICA`, and the trigger will request 1 replica operation per hot node per event.
If both `aboveOp` and `aboveNodeOp` operations are to be requested then `aboveNodeOp` operations are
always requested first, and only if no `aboveOp` (shard level) operations are to be requested (because `aboveOp`
operations will change node-level rates anyway). This property can be set to 'NONE' to effectively disable
the action but still report it to the listeners.
`belowOp`::
(string, optional) The collection action to request when the lower threshold for a shard is
exceeded. Default action is `DELETEREPLICA`, and the trigger will request at most `maxOps` replicas
to be deleted from eligible cold shards. This property can be set to 'NONE'
to effectively disable the action but still report it to the listeners.
`belowNodeOp`::
(string, optional) The action to request when the lower threshold for a node (`belowNodeRate`) is exceeded.
Default action is null (not set) and the condition is ignored, because in many cases the
trigger will monitor only some selected resources (replicas from selected
collections or shards) so setting this by default to e.g., `DELETENODE` could interfere with
these non-monitored resources. The trigger will request 1 operation per cold node per event.
If both `belowOp` and `belowNodeOp` operations are requested then `belowOp` operations are
always requested first.
.Example:
A search rate trigger that monitors collection "test" and adds new replicas if 5-minute
average request rate of "/select" handler exceeds 100 requests/sec, and the condition persists
for over 20 minutes. If the rate falls below 0.01 and persists for 20 min the trigger will
request not only replica deletions (leaving at most 1 replica per shard) but also it may
request node deletion.
[source,json]
----
{
"set-trigger": {
"name" : "search_rate_trigger",
"event" : "searchRate",
"collections" : "test",
"metric" : "QUERY./select.requestTimes:5minRate",
"aboveRate" : 100.0,
"belowRate" : 0.01,
"belowNodeRate" : 0.01,
"belowNodeOp" : "DELETENODE",
"minReplicas" : 1,
"waitFor" : "20m",
"enabled" : true,
"actions" : [
{
"name" : "compute_plan",
"class": "solr.ComputePlanAction"
},
{
"name" : "execute_plan",
"class": "solr.ExecutePlanAction"
}
]
}
}
----
[[scheduledtrigger]]
=== Scheduled Trigger
The Scheduled trigger generates events according to a fixed rate schedule.
In addition to the parameters described at <<Trigger Configuration>>, this trigger supports the following configuration:
`startTime`::
(string, required) The start date/time of the schedule. This should either be a DateMath string e.g., 'NOW', or be an ISO-8601 date time string (the same standard used during search and indexing in Solr, which defaults to UTC), or be specified without the trailing 'Z' accompanied with the `timeZone` parameter. For example, each of the following values are acceptable:
* `2018-01-31T15:30:00Z`: ISO-8601 date time string. The trailing `Z` signals that the time is in UTC
* `NOW+5MINUTES`: Solr's date math string
* `2018-01-31T15:30:00`: No trailing 'Z' signals that the `timeZone` parameter must be specified to avoid ambiguity
`every`::
(string, required) A positive Solr date math string which is added to the `startTime` or the last run time to arrive at the next scheduled time.
`graceTime`::
(string, optional) A positive Solr date math string. This is the additional grace time over the scheduled time within which the trigger is allowed to generate an event.
`timeZone`::
(string, optional) A time zone string which is used for calculating the scheduled times.
`preferredOperation`::
(string, optional, defaults to `MOVEREPLICA`) The preferred operation to perform in response to an event generated by this trigger. The only supported values are `MOVEREPLICA` or `ADDREPLICA`.
This trigger applies the `every` date math expression on the `startTime` or the last event time to derive the next scheduled time and if current time is greater than next scheduled time but within `graceTime` then an event is generated.
Apart from the common event properties described in the <<Event Types>> section, the trigger adds an additional `actualEventTime` event property which has the actual event time as opposed to the scheduled time.
For example, if the scheduled time was `2018-01-31T15:30:00Z` and grace time was `+15MINUTES` then an event may be fired at `2018-01-31T15:45:00Z`. Such an event will have `eventTime` as `2018-01-31T15:30:00Z`, the scheduled time, but the `actualEventTime` property will have a value of `2018-01-31T15:45:00Z`, the actual time.
.Frequently scheduled events and trigger starvation
[CAUTION]
====
Be cautious with scheduled triggers that are set to run as or more frequently than the trigger cooldown period (defaults to 5 seconds).
Solr pauses all triggers for a cooldown period after a trigger fires so that the system has some time to stabilize. An aggressive scheduled trigger can starve all other triggers from
ever executing if a new scheduled event is ready as soon as the cooldown period is over. The same starvation scenario can happen to the scheduled trigger as well.
Solr randomizes the order in which the triggers are resumed after the cooldown period to mitigate this problem. However, it is recommended that scheduled triggers
are not used with low `every` values and an external scheduling process such as cron be used for such cases instead.
====
== Default Triggers
A fresh installation of SolrCloud always creates some default triggers. If these triggers are missing (e.g., they were
deleted) they are re-created on any autoscaling configuration change or Overseer restart. These triggers can be
suspended if their functionality somehow interferes with other configuration but they can't be permanently deleted.
=== Auto-add Replicas Trigger
The default configuration and functionality of this trigger is described in detail in the
section titled <<solrcloud-autoscaling-auto-add-replicas.adoc#,Automatically Adding Replicas>>.
=== Scheduled Maintenance Trigger
This is a <<scheduledtrigger>> named `.scheduled_maintenance` and it's configured to run once per day.
It executes the following actions:
==== `solr.InactiveShardPlanAction`
This action checks existing collections for any shards in `INACTIVE` state, which indicates that they
are the original parent shards remaining after a successful `SPLITSHARD` operation.
These shards are not immediately deleted because shard splitting is a complex operation that may fail in
non-obvious ways, so keeping the original parent shard gives users a chance to recover from potential failures.
However, keeping these shards indefinitely doesn't make sense either because they still use system
resources (their Solr cores are still being loaded, and their indexes still occupy disk space).
This scheduled action is responsible for removing such inactive parent shards after their
time-to-live expires. By default the TTL is set to 48 hours after the shard state was set to
`INACTIVE`. When this TTL elapses this scheduled action requests that the shard be deleted, which is then
executed by `solr.ExecutePlanAction` that is configured for this trigger.
==== `solr.InactiveMarkersPlanAction`
When a node is lost or added an event is generated - but if the lost node was the one running
Overseer leader such event may not be properly processed by the triggers (which run in the Overseer leader context).
For this reason a special marker is created in ZooKeeper so that when the next Overseer leader is elected the
triggers will be able to learn about and process these past events.
Triggers don't delete these markers once they are done processing (because several triggers may need them and e.g.,
scheduled triggers may run at arbitrary times with arbitrary delays) so Solr needs a mechanism to clean up
old markers for such events so that they don't accumulate over time. This trigger action performs the clean-up
- it deletes markers older than the configured time-to-live (by default it's 48 hours).
=== `solr.ExecutePlanAction`
This action simply executes any collection admin requests generated by other
actions - in particular, in the default configuration it executes `DELETESHARD` requests produced by
`solr.InactiveShardPlanAction`, as described above.