solr/solr-ref-guide/src/rate-limiters.adoc - lucene-solr - Git at Google

 = Request Rate Limiters
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 Solr allows rate limiting per request type. Each request type can be allocated a maximum allowed number of concurrent requests
 that can be active. The default rate limiting is implemented for updates and searches.

 If a request exceeds the request quota, further incoming requests are rejected with HTTP error code 429 (Too Many Requests).

 Note that rate limiting works at an instance (JVM) level, not at a core or collection level. Consider that when planning capacity.
 There is future work planned to have finer grained execution here (https://issues.apache.org/jira/browse/SOLR-14710[SOLR-14710]).

 == When To Use Rate Limiters
 Rate limiters should be used when the user wishes to allocate a guaranteed capacity of the request threadpool to a specific
 request type. Indexing and search requests are mostly competing with each other for CPU resources. This becomes especially
 pronounced under high stress in production workloads. The current implementation has a query rate limiter which can free up
 resources for indexing.

 == Rate Limiter Configurations
 The default rate limiter is search rate limiter. Accordingly, it can be configured using the following command:

  curl -X POST -H 'Content-type:application/json' -d '{
    "set-ratelimiter": {
      "enabled": true,
      "guaranteedSlots":5,
      "allowedRequests":20,
      "slotBorrowingEnabled":true,
      "slotAcquisitionTimeoutInMS":70
    }
  }' http://localhost:8983/api/cluster

 === Enable Query Rate Limiter
 Controls enabling of query rate limiter. Default value is `false`.

   "enabled": true

 === Maximum Number Of Concurrent Requests
 Allows setting maximum concurrent search requests at a given point in time. Default value is number of cores * 3.

  "allowedRequests":20

 === Request Slot Allocation Wait Time
 Wait time in ms for which a request will wait for a slot to be available when all slots are full,
 before the request is put into the wait queue. This allows requests to have a chance to proceed if
 the unavailability of the request slots for this rate limiter is a transient phenomenon. Default value
 is -1, indicating no wait. 0 will represent the same -- no wait. Note that higher request allocation times
 can lead to larger queue times and can potentially lead to longer wait times for queries.

  "slotAcquisitionTimeoutInMS":70

 === Slot Borrowing Enabled
 If slot borrowing (described below) is enabled or not. Default value is false.

 NOTE: This feature is experimental and can cause slots to be blocked if the
 borrowing request is long lived.

  "slotBorrowingEnabled":true,

 === Guaranteed Slots
 The number of guaranteed slots that the query rate limiter will reserve irrespective
 of the load of query requests. This is used only if slot borrowing is enabled and acts
 as the threshold beyond which query rate limiter will not allow other request types to
 borrow slots from its quota. Default value is allowed number of concurrent requests / 2.

 NOTE: This feature is experimental and can cause slots to be blocked if the
 borrowing request is long lived.

  "guaranteedSlots":5,

 == Salient Points

 These are some of the things to keep in mind when using rate limiters.

 === Over Subscribing
 It is possible to define a size of quota for a request type which exceeds the size
 of the available threadpool. Solr does not enforce rules on the size of a quota that
 can be define for a request type. This is intentionally done to allow users full
 control on their quota allocation. However, if the quota exceeds the available threadpool's
 size, the standard queuing policies of the threadpool will kick in.

 === Slot Borrowing
 If a quota does not have backlog but other quotas do, then the relatively less busier quota can
 "borrow" slot from the busier quotas. This is done on a round robin basis today with a futuristic
 pending task to make it a priority based model (https://issues.apache.org/jira/browse/SOLR-14709).

 NOTE: This feature is experimental and gives no guarantee of borrowed slots being
 returned in time.
	= Request Rate Limiters
	// Licensed to the Apache Software Foundation (ASF) under one
	// or more contributor license agreements. See the NOTICE file
	// distributed with this work for additional information
	// regarding copyright ownership. The ASF licenses this file
	// to you under the Apache License, Version 2.0 (the
	// "License"); you may not use this file except in compliance
	// with the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing,
	// software distributed under the License is distributed on an
	// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	// KIND, either express or implied. See the License for the
	// specific language governing permissions and limitations
	// under the License.

	Solr allows rate limiting per request type. Each request type can be allocated a maximum allowed number of concurrent requests
	that can be active. The default rate limiting is implemented for updates and searches.

	If a request exceeds the request quota, further incoming requests are rejected with HTTP error code 429 (Too Many Requests).

	Note that rate limiting works at an instance (JVM) level, not at a core or collection level. Consider that when planning capacity.
	There is future work planned to have finer grained execution here (https://issues.apache.org/jira/browse/SOLR-14710[SOLR-14710]).

	== When To Use Rate Limiters
	Rate limiters should be used when the user wishes to allocate a guaranteed capacity of the request threadpool to a specific
	request type. Indexing and search requests are mostly competing with each other for CPU resources. This becomes especially
	pronounced under high stress in production workloads. The current implementation has a query rate limiter which can free up
	resources for indexing.

	== Rate Limiter Configurations
	The default rate limiter is search rate limiter. Accordingly, it can be configured using the following command:

	curl -X POST -H 'Content-type:application/json' -d '{
	"set-ratelimiter": {
	"enabled": true,
	"guaranteedSlots":5,
	"allowedRequests":20,
	"slotBorrowingEnabled":true,
	"slotAcquisitionTimeoutInMS":70
	}
	}' http://localhost:8983/api/cluster

	=== Enable Query Rate Limiter
	Controls enabling of query rate limiter. Default value is `false`.

	"enabled": true

	=== Maximum Number Of Concurrent Requests
	Allows setting maximum concurrent search requests at a given point in time. Default value is number of cores * 3.

	"allowedRequests":20

	=== Request Slot Allocation Wait Time
	Wait time in ms for which a request will wait for a slot to be available when all slots are full,
	before the request is put into the wait queue. This allows requests to have a chance to proceed if
	the unavailability of the request slots for this rate limiter is a transient phenomenon. Default value
	is -1, indicating no wait. 0 will represent the same -- no wait. Note that higher request allocation times
	can lead to larger queue times and can potentially lead to longer wait times for queries.

	"slotAcquisitionTimeoutInMS":70

	=== Slot Borrowing Enabled
	If slot borrowing (described below) is enabled or not. Default value is false.

	NOTE: This feature is experimental and can cause slots to be blocked if the
	borrowing request is long lived.

	"slotBorrowingEnabled":true,

	=== Guaranteed Slots
	The number of guaranteed slots that the query rate limiter will reserve irrespective
	of the load of query requests. This is used only if slot borrowing is enabled and acts
	as the threshold beyond which query rate limiter will not allow other request types to
	borrow slots from its quota. Default value is allowed number of concurrent requests / 2.

	NOTE: This feature is experimental and can cause slots to be blocked if the
	borrowing request is long lived.

	"guaranteedSlots":5,

	== Salient Points

	These are some of the things to keep in mind when using rate limiters.

	=== Over Subscribing
	It is possible to define a size of quota for a request type which exceeds the size
	of the available threadpool. Solr does not enforce rules on the size of a quota that
	can be define for a request type. This is intentionally done to allow users full
	control on their quota allocation. However, if the quota exceeds the available threadpool's
	size, the standard queuing policies of the threadpool will kick in.

	=== Slot Borrowing
	If a quota does not have backlog but other quotas do, then the relatively less busier quota can
	"borrow" slot from the busier quotas. This is done on a round robin basis today with a futuristic
	pending task to make it a priority based model (https://issues.apache.org/jira/browse/SOLR-14709).

	NOTE: This feature is experimental and gives no guarantee of borrowed slots being
	returned in time.