| = Denylisting Partitions |
| |
| Due to access patterns and data modeling, sometimes there are specific partitions |
| that are "hot" and can cause instability in a Cassandra cluster. This often occurs |
| when your data model includes many update or insert operations on a single partition, |
| causing the partition to grow very large over time and in turn making it very expensive |
| to read and maintain. |
| |
| Cassandra supports "denylisting" these problematic partitions so that when clients |
| issue point reads (`SELECT` statements with the partition key specified) or range |
| reads (`SELECT *`, etc that pull a range of data) that intersect with a blocked |
| partition key, the query will be immediately rejected with an `InvalidQueryException`. |
| |
| == How to denylist a partition key |
| |
| The ``system_distributed.denylisted_partitions`` table can be used to denylist partitions. |
| There are a couple of ways to interact with and mutate this data. First: directly |
| via CQL by inserting a record with the following details: |
| |
| - Keyspace name (ks_name) |
| - Table name (table_name) |
| - Partition Key (partition_key) |
| |
| The partition key format needs to be in the same form required by ``nodetool getendpoints``. |
| |
| Following are several examples for denylisting partition keys in keyspace `ks` and |
| table `table1` for different data types on the primary key `Id`: |
| |
| - Id is a simple type - `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks','table1','1');` |
| - Id is a blob - `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks','table1','12345f');` |
| - Id has a colon - `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks','table1','1\:2');` |
| |
| In the case of composite column partition keys (Key1, Key2): |
| |
| - `INSERT INTO system_distributed.denylisted_partitions (ks_name, table_name, partition_key) VALUES ('ks', 'table1', 'k11:k21')` |
| |
| |
| === Special considerations |
| |
| The denylist has the property in that you want to keep your cache (see below) and |
| CQL data on a replica set as close together as possible, so you don't have different |
| nodes in your cluster denying or allowing different keys. To best achieve this, |
| the workflow for a denylist change (addition or deletion) should always be as follows: |
| |
| JMX PATH (preferred for single changes): |
| |
| 1. Call the JMX hook for ``denylistKey()`` with the desired key |
| 2. Double-check the cache reloaded with ``isKeyDenylisted()`` |
| 3. Check for warnings about unrecognized keyspace/table combinations, limits, or |
| consistency level. If you get a message about nodes being down and not hitting CL |
| for denylist, recover the downed nodes and then trigger a re-load of the cache on each |
| node with ``loadPartitionDenylist()`` |
| |
| CQL PATH (preferred for bulk changes): |
| |
| 1. Mutate the denylisted partition lists via CQL |
| 2. Trigger a re-load of the denylist cache on each node via JMX ``loadPartitionDenylist()`` (see below) |
| 3. Check for warnings about lack of availability for a denylist refresh. In the event nodes are down, recover them, then go to 2. |
| |
| Due to conditions on known unavailable range slices leading to alert storming on |
| startup, the denylist cache will not load on node start unless it can achieve the |
| configured consistency level in `cassandra.yaml` - `denylist_consistency_level`. |
| The JMX call to `loadPartitionDenylist` will, however, load the cache regardless |
| of the number of nodes available. This leaves the control for denylisting or not |
| denylisting during degraded cluster states in the hands of the operator. |
| |
| == Denylisted Partitions Cache |
| |
| Cassandra internally maintains an on-heap cache of denylisted partitions loaded |
| from ``system_distributed.denylisted_partitions``. The values for a table will be |
| automatically repopulated every ``denylist_refresh`` as specified in the |
| `conf/cassandra.yaml` file, defaulting to `600s`, or 10 minutes. Invalid records |
| (unknown keyspaces, tables, or keys) will be ignored and not cached on load. |
| |
| The cache can be refreshed in the following ways: |
| |
| - During Cassandra node startup |
| - Via the automatic on-heap cache refresh mechanisms. Note: this will occur asynchronously |
| on query after the ``denylist_refresh`` time is hit. |
| - Via the JMX command: ``loadPartitionDenylist`` in ``the org.apache.cassandra.service. |
| StorageProxyMBean`` invocation point. |
| |
| The Cache size is bounded by the following two config properties |
| |
| - denylist_max_keys_per_table |
| - denylist_max_keys_total |
| |
| On cache load, if a table exceeds the value allowed in `denylist_max_keys_per_table` (defaults to 1000), |
| a warning will be printed to the logs and the remainder of the keys will not be cached. |
| Similarly, if the total allowed size is exceeded, subsequent ks_name + table_name |
| combinations (in clustering / lexicographical order) will be skipped as well, and a |
| warning logged to the server logs. |
| |
| [NOTE] |
| ==== |
| Given the required workflow of 1) Mutate, 2) Reload cache, the auto-reload |
| property seems superfluous. It exists to ensure that, should an operator make a |
| mistake and denylist (or undenylist) a key but forget to reload the cache, that |
| intent will be captured on the next cache reload. |
| ==== |
| |
| == JMX Interface |
| |
| [cols="1,1"] |
| |=== |
| | Command | Effect |
| |
| | loadPartitionDenylist() |
| | Reloads cached denylist from CQL table |
| |
| | getPartitionDenylistLoadAttempts() |
| | Gets the count of cache reload attempts |
| |
| | getPartitionDenylistLoadSuccesses() |
| | Gets the count of cache reload successes |
| |
| | setEnablePartitionDenylist(boolean enabled) |
| | Enables or disables the partition denylisting functionality |
| |
| | setEnableDenylistWrites(boolean enabled) |
| | Enables or disables write denylisting functionality |
| |
| | setEnableDenylistReads(boolean enabled) |
| | Enables or disables read denylisting functionality |
| |
| | setEnableDenylistRangeReads(boolean enabled) |
| | Enables or disables range read denylisting functionality |
| |
| | denylistKey(String keyspace, String table, String partitionKeyAsString) |
| | Adds a specific keyspace, table, and partition key combo to the denylist |
| |
| | removeDenylistKey(String keyspace, String cf, String partitionKeyAsString) |
| | Removes a specific keyspace, table, and partition key combo from the denylist |
| |
| | setDenylistMaxKeysPerTable(int value) |
| | Limits count of allowed keys per table in the denylist |
| |
| | setDenylistMaxKeysTotal(int value) |
| | Limits the total count of allowable denylisted keys in the system |
| |
| | isKeyDenylisted(String keyspace, String table, String partitionKeyAsString) |
| | Indicates whether the keyspace.table has the input partition key denied |
| |=== |