blob: 4d4ff58d972e3d49b261fc90ee4d439e8b7f6b83 [file] [log] [blame]
= Autoscaling API
:toclevels: 2
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
The Autoscaling API is used to manage autoscaling policies, preferences, triggers, listeners and to get diagnostics on the state of the cluster.
== Read API
The autoscaling Read API is available at `/solr/admin/autoscaling` or `/api/cluster/autoscaling` (v2 API style). It returns information about the configured cluster preferences, cluster policy, collection-specific policies triggers and listeners.
This API does not take any parameters.
=== Read API Response
The output will contain cluster preferences, cluster policy and collection specific policies.
=== Examples using Read API
*Output*
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 2
},
"cluster-policy": [
{
"replica": "<2",
"shard": "#EACH",
"node": "#ANY"
}
],
"WARNING": "This response format is experimental. It is likely to change in the future."
}
----
== Diagnostics API
The diagnostics API shows the violations, if any, of all conditions in the cluster and, if applicable, the collection-specific policy. It is available at the `/admin/autoscaling/diagnostics` path.
This API does not take any parameters.
=== Diagnostics API Response
The output will contain `sortedNodes` which is a list of nodes in the cluster sorted according to overall load in descending order (as determined by the preferences) and `violations` which is a list of nodes along with the conditions that they violate.
=== Examples Using Diagnostics API
Here is an example with no violations but in the `sortedNodes` section, we can see that the first node is most loaded (according to number of cores):
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 65
},
"diagnostics": {
"sortedNodes": [
{
"node": "127.0.0.1:8983_solr",
"cores": 3
},
{
"node": "127.0.0.1:7574_solr",
"cores": 2
}
],
"violations": []
},
"WARNING": "This response format is experimental. It is likely to change in the future."
}
----
Suppose we added a condition to the cluster policy as follows:
[source,json]
----
{"replica": "<2", "shard": "#EACH", "node": "#ANY"}
----
However, since the first node in the first example had more than 1 replica for a shard already, then the diagnostics API will return:
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 45
},
"diagnostics": {
"sortedNodes": [
{
"node": "127.0.0.1:8983_solr",
"cores": 3
},
{
"node": "127.0.0.1:7574_solr",
"cores": 2
}
],
"violations": [
{
"collection": "gettingstarted",
"shard": "shard1",
"node": "127.0.0.1:8983_solr",
"tagKey": "127.0.0.1:8983_solr",
"violation": {
"replica": "2",
"delta": 0
},
"clause": {
"replica": "<2",
"shard": "#EACH",
"node": "#ANY",
"collection": "gettingstarted"
}
}
]
},
"WARNING": "This response format is experimental. It is likely to change in the future."
}
----
In the above example the node with port 8983 has two replicas for `shard1` in violation of our policy.
=== Inline Policy Configuration
If there is no autoscaling policy configured or if you wish to use a configuration other than the default, it is possible to send the autoscaling policy JSON as an inline payload as follows:
[source,bash]
----
curl -X POST -H 'Content-type:application/json' -d '{
"cluster-policy": [
{"replica": 0, "put" : "on-each", "nodeset": {"port" : "7574"}}]
}' http://localhost:8983/api/cluster/autoscaling/diagnostics?omitHeader=true
----
*Output*
[source,json]
----
{
"diagnostics":{
"sortedNodes":[{
"node":"10.0.0.80:7574_solr",
"isLive":true,
"cores":2.0,
"freedisk":567.4989128112793,
"port":7574,
"totaldisk":1044.122688293457,
"replicas":{"mycoll":{
"shard2":[{
"core_node7":{
"core":"mycoll_shard2_replica_n4",
"shard":"shard2",
"collection":"mycoll",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"base_url":"http://10.0.0.80:7574/solr",
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":6.426125764846802E-8}}],
"shard1":[{
"core_node3":{
"core":"mycoll_shard1_replica_n1",
"shard":"shard1",
"collection":"mycoll",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"base_url":"http://10.0.0.80:7574/solr",
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":6.426125764846802E-8}}]}}}
,{
"node":"10.0.0.80:8983_solr",
"isLive":true,
"cores":2.0,
"freedisk":567.498908996582,
"port":8983,
"totaldisk":1044.122688293457,
"replicas":{"mycoll":{
"shard2":[{
"core_node8":{
"core":"mycoll_shard2_replica_n6",
"shard":"shard2",
"collection":"mycoll",
"node_name":"10.0.0.80:8983_solr",
"type":"NRT",
"leader":"true",
"base_url":"http://10.0.0.80:8983/solr",
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":6.426125764846802E-8}}],
"shard1":[{
"core_node5":{
"core":"mycoll_shard1_replica_n2",
"shard":"shard1",
"collection":"mycoll",
"node_name":"10.0.0.80:8983_solr",
"type":"NRT",
"leader":"true",
"base_url":"http://10.0.0.80:8983/solr",
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":6.426125764846802E-8}}]}}}],
"liveNodes":["10.0.0.80:7574_solr",
"10.0.0.80:8983_solr"],
"violations":[{
"collection":"mycoll",
"tagKey":7574,
"violation":{
"replica":{
"NRT":2,
"count":2},
"delta":2.0},
"clause":{
"replica":0,
"port":"7574",
"collection":"mycoll"},
"violatingReplicas":[{
"core_node7":{
"core":"mycoll_shard2_replica_n4",
"shard":"shard2",
"collection":"mycoll",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"base_url":"http://10.0.0.80:7574/solr",
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":6.426125764846802E-8}}
,{
"core_node3":{
"core":"mycoll_shard1_replica_n1",
"shard":"shard1",
"collection":"mycoll",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"base_url":"http://10.0.0.80:7574/solr",
"state":"active",
"force_set_state":"false",
"INDEX.sizeInGB":6.426125764846802E-8}}]}],
"config":{
"cluster-policy":[{
"replica":0,
"port":"7574"}]}},
"WARNING":"This response format is experimental. It is likely to change in the future."}
----
== Suggestions API
Suggestions are operations recommended by the system according to the policies and preferences the user has set.
Suggestions are made only if there are `violations` to active policies. The `operation` section of the response uses the defined preferences to identify the target node.
The API is available at `/admin/autoscaling/suggestions`. Here is an example output from a suggestion request:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":101},
"suggestions":[{
"type":"violation",
"violation":{
"collection":"mycoll",
"shard":"shard2",
"tagKey":"7574",
"violation":{ "delta":-1},
"clause":{
"replica":"0",
"shard":"#EACH",
"port":7574,
"collection":"mycoll"}},
"operation":{
"method":"POST",
"path":"/c/mycoll",
"command":{"move-replica":{
"targetNode":"192.168.43.37:8983_solr",
"replica":"core_node7"}}}},
{
"type":"violation",
"violation":{
"collection":"mycoll",
"shard":"shard2",
"tagKey":"7574",
"violation":{ "delta":-1},
"clause":{
"replica":"0",
"shard":"#EACH",
"port":7574,
"collection":"mycoll"}},
"operation":{
"method":"POST",
"path":"/c/mycoll",
"command":{"move-replica":{
"targetNode":"192.168.43.37:7575_solr",
"replica":"core_node15"}}}}],
"WARNING":"This response format is experimental. It is likely to change in the future."}
----
The suggested `operation` is an API call that can be invoked to remedy the current violation.
The types of suggestions available are
* `violation`: Fixes a violation to one or more policy rules
* `repair`: Add missing replicas
* `improvement`: move replicas around so that the load is more evenly balanced according to the autoscaling preferences
By default, the suggestions API returns all of the above, in that order. However it is possible to fetch only certain types by adding a request parameter `type`. e.g: `type=violation&type=repair`
=== Inline Policy Configuration
If there is no autoscaling policy configured or if you wish to use a configuration other than the default, it is possible to send the autoscaling policy JSON as an inline payload as follows:
[source,bash]
----
curl -X POST -H 'Content-type:application/json' -d '{
"cluster-policy": [
{"replica": 0, "put" : "on-each-node", "nodeset": {"port" : "7574"}}
]
}' http://localhost:8983/solr/admin/autoscaling/suggestions?omitHeader=true
----
*Output*
[source,json]
----
{
"suggestions":[{
"type":"violation",
"violation":{
"collection":"mycoll",
"tagKey":7574,
"violation":{
"replica":{
"NRT":2,
"count":2},
"delta":2.0},
"clause":{
"replica":0,
"port":"7574",
"collection":"mycoll"}},
"operation":{
"method":"POST",
"path":"/c/mycoll",
"command":{"move-replica":{
"targetNode":"10.0.0.80:8983_solr",
"inPlaceMove":"true",
"replica":"core_node8"}}}},
{
"type":"violation",
"violation":{
"collection":"mycoll",
"tagKey":7574,
"violation":{
"replica":{
"NRT":2,
"count":2},
"delta":2.0},
"clause":{
"replica":0,
"port":"7574",
"collection":"mycoll"}},
"operation":{
"method":"POST",
"path":"/c/mycoll",
"command":{"move-replica":{
"targetNode":"10.0.0.80:8983_solr",
"inPlaceMove":"true",
"replica":"core_node5"}}}}],
"WARNING":"This response format is experimental. It is likely to change in the future."}
----
== History API
The history of autoscaling events is available at `/admin/autoscaling/history`. It returns information
about past autoscaling events and details about their processing. This history is kept in
the `.system` collection, and is populated by a trigger listener `SystemLogListener`. By default this
listener is added to all new triggers.
History events are regular Solr documents so they can be also accessed directly by
searching on the `.system` collection. The history handler acts as a regular search handler, so all
query parameters supported by `/select` handler for that collection are supported here too.
However, the history handler makes this
process easier by offering a simpler syntax and knowledge of field names
used by `SystemLogListener` for serialization of event data.
History documents contain the action context, if it was available, which gives
further insight into e.g., exact operations that were computed and/or executed.
Specifically, the following query parameters can be used (they are turned into
filter queries, so an implicit AND is applied):
`trigger`::
The name of the trigger.
`eventType`::
The event type or trigger type (e.g., `nodeAdded`).
`collection`::
The name of the collection involved in event processing.
`stage`::
An event processing stage.
`action`::
A trigger action.
`node`::
A node name that the event refers to.
`beforeAction`::
A `beforeAction` stage.
`afterAction`::
An `afterAction` stage.
// TODO someday add an input example also
.Example output
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 64
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{
"type": "autoscaling_event",
"source_s": "SystemLogListener",
"id": "15f53efdf4bT2qlmj80580yuu997vktddfob3",
"event.id_s": "14f0d67fe7b97d80T2qlmj80580yuu997vktddfob2",
"event.type_s": "NODELOST",
"event.source_s": ".auto_add_replicas",
"event.time_l": 1508941720006000000,
"timestamp": "2017-10-25T14:29:10.091Z",
"event.property.eventTimes_ss": [
"1508941720006000000"
],
"event.property._enqueue_time__ss": [
"1508941750088000000"
],
"event.property.nodeNames_ss": [
"192.168.1.104:7574_solr"
],
"stage_s": "STARTED",
"event_str": "{\n \"id\":\"14f0d67fe7b97d80T2qlmj80580yuu997vktddfob2\",\n \"source\":\".auto_add_replicas\",\n \"eventTime\":1508941720006000000,\n \"eventType\":\"NODELOST\",\n \"properties\":{\n \"eventTimes\":[1508941720006000000],\n \"_enqueue_time_\":1508941750088000000,\n \"nodeNames\":[\"192.168.1.104:7574_solr\"]}}",
"_version_": 1582240104552857600
},
{
"type": "autoscaling_event",
"source_s": "SystemLogListener",
"id": "15f53eff316T2qlmj80580yuu997vktddfob6",
"event.id_s": "14f0d67fe7b97d80T2qlmj80580yuu997vktddfob2",
"event.type_s": "NODELOST",
"event.source_s": ".auto_add_replicas",
"event.time_l": 1508941720006000000,
"timestamp": "2017-10-25T14:29:15.158Z",
"event.property.eventTimes_ss": [
"1508941720006000000"
],
"event.property._enqueue_time__ss": [
"1508941750088000000"
],
"event.property.nodeNames_ss": [
"192.168.1.104:7574_solr"
],
"stage_s": "SUCCEEDED",
"event_str": "{\n \"id\":\"14f0d67fe7b97d80T2qlmj80580yuu997vktddfob2\",\n \"source\":\".auto_add_replicas\",\n \"eventTime\":1508941720006000000,\n \"eventType\":\"NODELOST\",\n \"properties\":{\n \"eventTimes\":[1508941720006000000],\n \"_enqueue_time_\":1508941750088000000,\n \"nodeNames\":[\"192.168.1.104:7574_solr\"]}}",
"_version_": 1582240109859700736
}
]
}
}
----
== Write API
The Write API is available at the same `/admin/autoscaling` and `/api/cluster/autoscaling` endpoints as the Read API but can only be used with the *POST* HTTP verb.
The payload of the POST request is a JSON message with commands to set and remove components. Multiple commands can be specified together in the payload. The commands are executed in the order specified and the changes are atomic, i.e., either all succeed or none.
=== Create and Modify Cluster Preferences
Cluster preferences are specified as a list of sort preferences. Multiple sorting preferences can be specified and they are applied in the order they are set.
They are defined using the `set-cluster-preferences` command.
Each preference is a JSON map having the following syntax:
`{'<sort_order>':'<sort_param>', 'precision':'<precision_val>'}`
See the section <<solrcloud-autoscaling-policy-preferences.adoc#cluster-preferences-specification,Cluster Preferences Specification>> for details about the allowed values for the `sort_order`, `sort_param` and `precision` parameters.
Changing the cluster preferences after the cluster is already built doesn't automatically reconfigure the cluster. However, all future cluster management operations will use the changed preferences.
*Input*
[source,json]
----
{
"set-cluster-preferences" : [
{"minimize": "cores"}
]
}
----
*Output*
The output has a key named `result` which will return either `success` or `failure` depending on whether the command succeeded or failed.
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 138
},
"result": "success",
"WARNING": "This response format is experimental. It is likely to change in the future."
}
----
==== Example Setting Cluster Preferences
In this example we add cluster preferences that sort on three different parameters:
[source,json]
----
{
"set-cluster-preferences": [
{
"minimize": "cores",
"precision": 2
},
{
"maximize": "freedisk",
"precision": 100
},
{
"minimize": "sysLoadAvg",
"precision": 10
}
]
}
----
We can remove all cluster preferences by setting preferences to an empty list.
[source,json]
----
{
"set-cluster-preferences": []
}
----
[[cluster-specific-policies]]
=== Create and Modify Cluster Policies
Cluster policies are set using the `set-cluster-policy` command.
Like `set-cluster-preferences`, the policy definition is a JSON map defining the desired attributes and values.
Refer to the <<solrcloud-autoscaling-policy-preferences.adoc#policy-specification,Policy Specification>> section for details of the allowed values for each condition in the policy.
*Input*:
[source,json]
----
{
"set-cluster-policy": [
{"replica": "<2", "shard": "#EACH", "node": "#ANY"}
]
}
----
*Output*:
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 47
},
"result": "success",
"WARNING": "This response format is experimental. It is likely to change in the future."
}
----
We can remove all cluster policy conditions by setting policy to an empty list.
[source,json]
----
{
"set-cluster-policy": []
}
----
Changing the cluster policy after the cluster is already built doesn't automatically reconfigure the cluster. However, all future cluster management operations will use the changed cluster policy.
=== Create and Modify Collection-Specific Policy
The `set-policy` command accepts a map of policy names to the list of conditions for that policy. Multiple named policies can be specified together. A named policy that does not exist already is created and if the named policy accepts already then it is replaced.
Refer to the <<solrcloud-autoscaling-policy-preferences.adoc#policy-specification,Policy Specification>> section for details of the allowed values for each condition in the policy.
*Input*
[source,json]
----
{
"set-policy": {
"policy1": [
{"replica": "1", "shard": "#EACH", "nodeset":{"port": "8983"}}
]
}
}
----
*Output*
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 246
},
"result": "success",
"WARNING": "This response format is experimental. It is likely to change in the future."
}
----
Changing the policy after the collection is already built doesn't automatically reconfigure the collection. However, all future cluster management operations will use the changed policy.
=== Remove a Collection-Specific Policy
The `remove-policy` command accepts a policy name to be removed from Solr. The policy being removed must not be attached to any collection otherwise the command will fail.
*Input*
[source,json]
----
{"remove-policy": "policy1"}
----
*Output*
[source,json]
----
{
"responseHeader": {
"status": 0,
"QTime": 42
},
"result": "success",
"WARNING": "This response format is experimental. It is likely to change in the future."
}
----
If you attempt to remove a policy that is being used by a collection, this command will fail to delete the policy until the collection itself is deleted.
=== Create or Update a Trigger
The `set-trigger` command can be used to create a new trigger or overwrite an existing one.
You can see the section <<solrcloud-autoscaling-triggers.adoc#trigger-configuration,Trigger Configuration>> for a full list of configuration options.
.Creating a nodeAdded Trigger
[source,json]
----
{
"set-trigger": {
"name" : "node_added_trigger",
"event" : "nodeAdded",
"waitFor" : "1s"
}
}
----
.Updating Trigger with waitFor set to 5 seconds
[source,json]
----
{
"set-trigger": {
"name" : "node_added_trigger",
"event" : "nodeAdded",
"waitFor" : "5s",
}
}
----
.Creating a nodeLost Trigger
[source,json]
----
{
"set-trigger": {
"name" : "node_lost_trigger1",
"event" : "nodeLost",
"waitFor" : "60s",
}
}
----
=== Remove Trigger
The `remove-trigger` command can be used to remove a trigger. It accepts a single parameter: the name of the trigger.
.Removing the nodeLost Trigger
[source,json]
----
{
"remove-trigger": {
"name" : "node_lost_trigger1"
}
}
----
=== Create or Update a Trigger Listener
The `set-listener` command can be used to create or modify a listener for a trigger.
You can see the section <<solrcloud-autoscaling-listeners.adoc#listener-configuration,Trigger Listener Configuration>> for a full list of configuration options.
.Creating a listener for the nodeAdded Trigger
[source,json]
----
{
"set-listener": {
"name": "foo",
"trigger": "node_added_trigger",
"stage": ["STARTED", "ABORTED", "SUCCEEDED", "FAILED"],
"class": "com.example.Listener"
}
}
----
=== Remove Trigger Listener
The `remove-listener` command can be used to remove an existing listener. It accepts a single parameter: the name of the listener.
.Removing the foo listener
[source,json]
----
{
"remove-listener": {
"name": "foo"
}
}
----
=== Change Autoscaling Properties
The `set-properties` command can be used to change the default properties used by the Autoscaling framework.
The following properties can be specified in the payload:
`triggerScheduleDelaySeconds`::
This is the delay in seconds between two executions of a trigger. Every trigger is scheduled using Java's ScheduledThreadPoolExecutor with this delay. The default is `1` second.
`triggerCooldownPeriodSeconds`::
Solr pauses all other triggers for this cool down period after a trigger fires so that the system can stabilize before running triggers again. The default is `5` seconds.
`triggerCorePoolSize`::
The core pool size of the `ScheduledThreadPoolExecutor` used to schedule triggers. The default is `4` threads.
The command allows setting arbitrary properties in addition to the above properties. Such arbitrary properties can be useful in custom `TriggerAction` instances.
.Change default `triggerScheduleDelaySeconds`
[source.json]
----
{
"set-properties": {
"triggerScheduleDelaySeconds": 8
}
}
----
The `set-properties` command replaces older values if present. So using `set-properties` to set the same value twice will overwrite the old value.
If a property is not specified then it retains the last set value or the default, if no change was made.
A changed value can be unset by using a null value.
.Revert changed value of `triggerScheduleDelaySeconds` to default
[source.json]
----
{
"set-properties": {
"triggerScheduleDelaySeconds": null
}
}
----
The changed values of these properties, if any, can be read using the Autoscaling <<Read API>> in the `properties` section.