blob: a81dc8009c35e5fd8ef850410d1e09baa737ac88 [file] [log] [blame]
= Denylisting Partitions
Due to access patterns and data modeling, sometimes there are specific partitions
that are "hot" and can cause instability in a Cassandra cluster. This often occurs
when your data model includes many update or insert operations on a single partition,
causing the partition to grow very large over time and in turn making it very expensive
to read and maintain.
Cassandra supports "denylisting" these problematic partitions so that when clients
issue point reads (`SELECT` statements with the partition key specified) or range
reads (`SELECT *`, etc that pull a range of data) that intersect with a blocked
partition key, the query will be immediately rejected with an `InvalidQueryException`.
== How to denylist a partition key
The ``system_distributed.denylisted_partitions`` table can be used to denylist partitions.
There are a couple of ways to interact with and mutate this data. First: directly
via CQL by inserting a record with the following details:
- Keyspace name (ks_name)
- Table name (table_name)
- Partition Key (partition_key)
The partition key format needs to be in the same form required by ``nodetool getendpoints``.
Following are several examples for denylisting partition keys in keyspace `ks` and
table `table1` for different data types on the primary key `Id`:
- Id is a simple type - `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks','table1','1');`
- Id is a blob - `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks','table1','12345f');`
- Id has a colon - `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks','table1','1\:2');`
In the case of composite column partition keys (Key1, Key2):
- `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks', 'table1', 'k11:k21')`
=== Special considerations
The denylist has the property in that you want to keep your cache (see below) and
CQL data on a replica set as close together as possible, so you don't have different
nodes in your cluster denying or allowing different keys. To best achieve this,
the workflow for a denylist change (addition or deletion) should always be as follows:
JMX PATH (preferred for single changes):
1. Call the JMX hook for ``denylistKey()`` with the desired key
2. Double-check the cache reloaded with ``isKeyDenylisted()``
3. Check for warnings about unrecognized keyspace/table combinations, limits, or
consistency level. If you get a message about nodes being down and not hitting CL
for denylist, recover the downed nodes and then trigger a re-load of the cache on each
node with ``loadPartitionDenylist()``
CQL PATH (preferred for bulk changes):
1. Mutate the denylisted partition lists via CQL
2. Trigger a re-load of the denylist cache on each node via JMX ``loadPartitionDenylist()`` (see below)
3. Check for warnings about lack of availability for a denylist refresh. In the event nodes are down, recover them, then go to 2.
Due to conditions on known unavailable range slices leading to alert storming on
startup, the denylist cache will not load on node start unless it can achieve the
configured consistency level in `cassandra.yaml` - `denylist_consistency_level`.
The JMX call to `loadPartitionDenylist` will, however, load the cache regardless
of the number of nodes available. This leaves the control for denylisting or not
denylisting during degraded cluster states in the hands of the operator.
== Denylisted Partitions Cache
Cassandra internally maintains an on-heap cache of denylisted partitions loaded
from ``system_distributed.denylisted_partitions``. The values for a table will be
automatically repopulated every ``denylist_refresh`` as specified in the
`conf/cassandra.yaml` file, defaulting to `600s`, or 10 minutes. Invalid records
(unknown keyspaces, tables, or keys) will be ignored and not cached on load.
The cache can be refreshed in the following ways:
- During Cassandra node startup
- Via the automatic on-heap cache refresh mechanisms. Note: this will occur asynchronously
on query after the ``denylist_refresh`` time is hit.
- Via the JMX command: ``loadPartitionDenylist`` in ``the org.apache.cassandra.service.
StorageProxyMBean`` invocation point.
The Cache size is bounded by the following two config properties
- denylist_max_keys_per_table
- denylist_max_keys_total
On cache load, if a table exceeds the value allowed in `denylist_max_keys_per_table` (defaults to 1000),
a warning will be printed to the logs and the remainder of the keys will not be cached.
Similarly, if the total allowed size is exceeded, subsequent ks_name + table_name
combinations (in clustering / lexicographical order) will be skipped as well, and a
warning logged to the server logs.
[NOTE]
====
Given the required workflow of 1) Mutate, 2) Reload cache, the auto-reload
property seems superfluous. It exists to ensure that, should an operator make a
mistake and denylist (or undenylist) a key but forget to reload the cache, that
intent will be captured on the next cache reload.
====
== JMX Interface
[cols="1,1"]
|===
| Command | Effect
| loadPartitionDenylist()
| Reloads cached denylist from CQL table
| getPartitionDenylistLoadAttempts()
| Gets the count of cache reload attempts
| getPartitionDenylistLoadSuccesses()
| Gets the count of cache reload successes
| setEnablePartitionDenylist(boolean enabled)
| Enables or disables the partition denylisting functionality
| setEnableDenylistWrites(boolean enabled)
| Enables or disables write denylisting functionality
| setEnableDenylistReads(boolean enabled)
| Enables or disables read denylisting functionality
| setEnableDenylistRangeReads(boolean enabled)
| Enables or disables range read denylisting functionality
| denylistKey(String keyspace, String table, String partitionKeyAsString)
| Adds a specific keyspace, table, and partition key combo to the denylist
| removeDenylistKey(String keyspace, String cf, String partitionKeyAsString)
| Removes a specific keyspace, table, and partition key combo from the denylist
| setDenylistMaxKeysPerTable(int value)
| Limits count of allowed keys per table in the denylist
| setDenylistMaxKeysTotal(int value)
| Limits the total count of allowable denylisted keys in the system
| isKeyDenylisted(String keyspace, String table, String partitionKeyAsString)
| Indicates whether the keyspace.table has the input partition key denied
|===