manual/core/throttling/README.md - cassandra-java-driver - Git at Google

 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 ## Request throttling

 ### Quick overview

 Limit session throughput.

 * `advanced.throttler` in the configuration; defaults to pass-through (no throttling), also
   available: concurrency-based (max simultaneous requests), rate-based (max requests per time unit),
   or write your own.
 * metrics: `throttling.delay`, `throttling.queue-size`, `throttling.errors`.

 -----

 Throttling allows you to limit how many requests a session can execute concurrently. This is
 useful if you have multiple applications connecting to the same Cassandra cluster, and want to
 enforce some kind of SLA to ensure fair resource allocation.

 The request throttler tracks the level of utilization of the session, and lets requests proceed as
 long as it is under a predefined threshold. When that threshold is exceeded, requests are enqueued
 and will be allowed to proceed when utilization goes back to normal.

 From a user's perspective, this process is mostly transparent: any time spent in the queue is
 included in the `session.execute()` or `session.executeAsync()` call. Similarly, the request timeout
 encompasses throttling: it starts ticking before the request is passed to the throttler; in other
 words, a request may time out while it is still in the throttler's queue, before the driver has even
 tried to send it to a node.

 The only visible effect is that a request may fail with a [RequestThrottlingException], if the
 throttler has determined that it can neither allow the request to proceed now, nor enqueue it;
 this indicates that your session is overloaded. How you react to that is specific to your
 application; typically, you could display an error asking the end user to retry later.

 Note that the following requests are also affected by throttling:

 * preparing a statement (either directly, or indirectly when the driver reprepares on other nodes,
   or when a node comes back up -- see
   [how the driver prepares](../statements/prepared/#how-the-driver-prepares));
 * fetching the next page of a result set (which happens in the background when you iterate the
   synchronous variant `ResultSet`).
 * fetching a [query trace](../tracing/).

 ### Configuration

 Request throttling is parameterized in the [configuration](../configuration/) under
 `advanced.throttler`. There are various implementations, detailed in the following sections:

 #### Pass through

 ```
 datastax-java-driver {
   advanced.throttler {
     class = PassThroughRequestThrottler
   }
 }
 ```

 This is a no-op implementation: requests are simply allowed to proceed all the time, never enqueued.

 Note that you will still hit a limit if all your connections run out of stream ids. In that case,
 requests will fail with an [AllNodesFailedException], with the `getErrors()` method returning a
 [BusyConnectionException] for each node. See the [connection pooling](../pooling/) page.

 #### Concurrency-based

 ```
 datastax-java-driver {
   advanced.throttler {
     class = ConcurrencyLimitingRequestThrottler

     # Note: the values below are for illustration purposes only, not prescriptive
     max-concurrent-requests = 10000
     max-queue-size = 100000
   }
 }
 ```

 This implementation limits the number of requests that are allowed to execute simultaneously.
 Additional requests get enqueued up to the configured limit. Every time an active request completes
 (either by succeeding, failing or timing out), the oldest enqueued request is allowed to proceed.

 Make sure you pick a threshold that is consistent with your pooling settings; the driver should
 never run out of stream ids before reaching the maximum concurrency, otherwise requests will fail
 with [BusyConnectionException] instead of being throttled. The total number of stream ids is a
 function of the number of connected nodes and the `connection.pool.*.size` and
 `connection.max-requests-per-connection` configuration options. Keep in mind that aggressive
 speculative executions and timeout options can inflate stream id consumption, so keep a safety
 margin. One good way to get this right is to track the `pool.available-streams` [metric](../metrics)
 on every node, and make sure it never reaches 0. See the [connection pooling](../pooling/) page.

 #### Rate-based

 ```
 datastax-java-driver {
   advanced.throttler {
     class = RateLimitingRequestThrottler

     # Note: the values below are for illustration purposes only, not prescriptive
     max-requests-per-second = 5000
     max-queue-size = 50000
     drain-interval = 1 millisecond
   }
 }
 ```

 This implementation tracks the rate at which requests start, and enqueues when it exceeds the
 configured threshold.

 With this approach, we can't dequeue when requests complete, because having less active requests
 does not necessarily mean that the rate is back to normal. So instead the throttler re-checks the
 rate periodically and dequeues when possible, this is controlled by the `drain-interval` option.
 Picking the right interval is a matter of balance: too low might consume too many resources and only
 dequeue a few requests at a time, but too high will delay your requests too much; start with a few
 milliseconds and use the `cql-requests` [metric](../metrics/) to check the impact on your latencies.

 Like with the concurrency-based throttler, you should make sure that your target rate is in line
 with the pooling options; see the recommendations in the previous section.

 ### Monitoring

 Enable the following [metrics](../metrics/) to monitor how the throttler is performing:

 ```
 datastax-java-driver {
   advanced.metrics.session.enabled = [
     # How long requests are being throttled (exposed as a Timer).
     #
     # This is the time between the start of the session.execute() call, and the moment when the
     # throttler allows the request to proceed.
     throttling.delay,

     # The size of the throttling queue (exposed as a Gauge<Integer>).
     #
     # This is the number of requests that the throttler is currently delaying in order to
     # preserve its SLA. This metric only works with the built-in concurrency- and rate-based
     # throttlers; in other cases, it will always be 0.
     throttling.queue-size,

     # The number of times a request was rejected with a RequestThrottlingException (exposed as a
     # Counter)
     throttling.errors,
   ]
 }
 ```

 If you enable `throttling.delay`, make sure to also check the associated extra options to correctly
 size the underlying histograms (`metrics.session.throttling.delay.*`).

 [RequestThrottlingException]: https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/RequestThrottlingException.html
 [AllNodesFailedException]:    https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/AllNodesFailedException.html
 [BusyConnectionException]:    https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/connection/BusyConnectionException.html
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	## Request throttling

	### Quick overview

	Limit session throughput.

	* `advanced.throttler` in the configuration; defaults to pass-through (no throttling), also
	available: concurrency-based (max simultaneous requests), rate-based (max requests per time unit),
	or write your own.
	* metrics: `throttling.delay`, `throttling.queue-size`, `throttling.errors`.

	-----

	Throttling allows you to limit how many requests a session can execute concurrently. This is
	useful if you have multiple applications connecting to the same Cassandra cluster, and want to
	enforce some kind of SLA to ensure fair resource allocation.

	The request throttler tracks the level of utilization of the session, and lets requests proceed as
	long as it is under a predefined threshold. When that threshold is exceeded, requests are enqueued
	and will be allowed to proceed when utilization goes back to normal.

	From a user's perspective, this process is mostly transparent: any time spent in the queue is
	included in the `session.execute()` or `session.executeAsync()` call. Similarly, the request timeout
	encompasses throttling: it starts ticking before the request is passed to the throttler; in other
	words, a request may time out while it is still in the throttler's queue, before the driver has even
	tried to send it to a node.

	The only visible effect is that a request may fail with a [RequestThrottlingException], if the
	throttler has determined that it can neither allow the request to proceed now, nor enqueue it;
	this indicates that your session is overloaded. How you react to that is specific to your
	application; typically, you could display an error asking the end user to retry later.

	Note that the following requests are also affected by throttling:

	* preparing a statement (either directly, or indirectly when the driver reprepares on other nodes,
	or when a node comes back up -- see
	[how the driver prepares](../statements/prepared/#how-the-driver-prepares));
	* fetching the next page of a result set (which happens in the background when you iterate the
	synchronous variant `ResultSet`).
	* fetching a [query trace](../tracing/).

	### Configuration

	Request throttling is parameterized in the [configuration](../configuration/) under
	`advanced.throttler`. There are various implementations, detailed in the following sections:

	#### Pass through

	```
	datastax-java-driver {
	advanced.throttler {
	class = PassThroughRequestThrottler
	}
	}
	```

	This is a no-op implementation: requests are simply allowed to proceed all the time, never enqueued.

	Note that you will still hit a limit if all your connections run out of stream ids. In that case,
	requests will fail with an [AllNodesFailedException], with the `getErrors()` method returning a
	[BusyConnectionException] for each node. See the [connection pooling](../pooling/) page.

	#### Concurrency-based

	```
	datastax-java-driver {
	advanced.throttler {
	class = ConcurrencyLimitingRequestThrottler

	# Note: the values below are for illustration purposes only, not prescriptive
	max-concurrent-requests = 10000
	max-queue-size = 100000
	}
	}
	```

	This implementation limits the number of requests that are allowed to execute simultaneously.
	Additional requests get enqueued up to the configured limit. Every time an active request completes
	(either by succeeding, failing or timing out), the oldest enqueued request is allowed to proceed.

	Make sure you pick a threshold that is consistent with your pooling settings; the driver should
	never run out of stream ids before reaching the maximum concurrency, otherwise requests will fail
	with [BusyConnectionException] instead of being throttled. The total number of stream ids is a
	function of the number of connected nodes and the `connection.pool.*.size` and
	`connection.max-requests-per-connection` configuration options. Keep in mind that aggressive
	speculative executions and timeout options can inflate stream id consumption, so keep a safety
	margin. One good way to get this right is to track the `pool.available-streams` [metric](../metrics)
	on every node, and make sure it never reaches 0. See the [connection pooling](../pooling/) page.

	#### Rate-based

	```
	datastax-java-driver {
	advanced.throttler {
	class = RateLimitingRequestThrottler

	# Note: the values below are for illustration purposes only, not prescriptive
	max-requests-per-second = 5000
	max-queue-size = 50000
	drain-interval = 1 millisecond
	}
	}
	```

	This implementation tracks the rate at which requests start, and enqueues when it exceeds the
	configured threshold.

	With this approach, we can't dequeue when requests complete, because having less active requests
	does not necessarily mean that the rate is back to normal. So instead the throttler re-checks the
	rate periodically and dequeues when possible, this is controlled by the `drain-interval` option.
	Picking the right interval is a matter of balance: too low might consume too many resources and only
	dequeue a few requests at a time, but too high will delay your requests too much; start with a few
	milliseconds and use the `cql-requests` [metric](../metrics/) to check the impact on your latencies.

	Like with the concurrency-based throttler, you should make sure that your target rate is in line
	with the pooling options; see the recommendations in the previous section.

	### Monitoring

	Enable the following [metrics](../metrics/) to monitor how the throttler is performing:

	```
	datastax-java-driver {
	advanced.metrics.session.enabled = [
	# How long requests are being throttled (exposed as a Timer).
	#
	# This is the time between the start of the session.execute() call, and the moment when the
	# throttler allows the request to proceed.
	throttling.delay,

	# The size of the throttling queue (exposed as a Gauge<Integer>).
	#
	# This is the number of requests that the throttler is currently delaying in order to
	# preserve its SLA. This metric only works with the built-in concurrency- and rate-based
	# throttlers; in other cases, it will always be 0.
	throttling.queue-size,

	# The number of times a request was rejected with a RequestThrottlingException (exposed as a
	# Counter)
	throttling.errors,
	]
	}
	```

	If you enable `throttling.delay`, make sure to also check the associated extra options to correctly
	size the underlying histograms (`metrics.session.throttling.delay.*`).

	[RequestThrottlingException]: https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/RequestThrottlingException.html
	[AllNodesFailedException]: https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/AllNodesFailedException.html
	[BusyConnectionException]: https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/connection/BusyConnectionException.html