| = Use Nodetool |
| |
| Cassandra's `nodetool` allows you to narrow problems from the cluster |
| down to a particular node and gives a lot of insight into the state of |
| the Cassandra process itself. There are dozens of useful commands (see |
| `nodetool help` for all the commands), but briefly some of the most |
| useful for troubleshooting: |
| |
| [[nodetool-status]] |
| == Cluster Status |
| |
| You can use `nodetool status` to assess status of the cluster: |
| |
| [source, bash] |
| ---- |
| $ nodetool status <optional keyspace> |
| |
| Datacenter: dc1 |
| ======================= |
| Status=Up/Down |
| |/ State=Normal/Leaving/Joining/Moving |
| -- Address Load Tokens Owns (effective) Host ID Rack |
| UN 127.0.1.1 4.69 GiB 1 100.0% 35ea8c9f-b7a2-40a7-b9c5-0ee8b91fdd0e r1 |
| UN 127.0.1.2 4.71 GiB 1 100.0% 752e278f-b7c5-4f58-974b-9328455af73f r2 |
| UN 127.0.1.3 4.69 GiB 1 100.0% 9dc1a293-2cc0-40fa-a6fd-9e6054da04a7 r3 |
| ---- |
| |
| In this case we can see that we have three nodes in one datacenter with |
| about 4.6GB of data each and they are all "up". The up/down status of a |
| node is independently determined by every node in the cluster, so you |
| may have to run `nodetool status` on multiple nodes in a cluster to see |
| the full view. |
| |
| You can use `nodetool status` plus a little grep to see which nodes are |
| down: |
| |
| [source, bash] |
| ---- |
| $ nodetool status | grep -v '^UN' |
| Datacenter: dc1 |
| =============== |
| Status=Up/Down |
| |/ State=Normal/Leaving/Joining/Moving |
| -- Address Load Tokens Owns (effective) Host ID Rack |
| Datacenter: dc2 |
| =============== |
| Status=Up/Down |
| |/ State=Normal/Leaving/Joining/Moving |
| -- Address Load Tokens Owns (effective) Host ID Rack |
| DN 127.0.0.5 105.73 KiB 1 33.3% df303ac7-61de-46e9-ac79-6e630115fd75 r1 |
| ---- |
| |
| In this case there are two datacenters and there is one node down in |
| datacenter `dc2` and rack `r1`. This may indicate an issue on |
| `127.0.0.5` warranting investigation. |
| |
| [[nodetool-proxyhistograms]] |
| == Coordinator Query Latency |
| |
| You can view latency distributions of coordinator read and write latency |
| to help narrow down latency issues using `nodetool proxyhistograms`: |
| |
| [source, bash] |
| ---- |
| $ nodetool proxyhistograms |
| Percentile Read Latency Write Latency Range Latency CAS Read Latency CAS Write Latency View Write Latency |
| (micros) (micros) (micros) (micros) (micros) (micros) |
| 50% 454.83 219.34 0.00 0.00 0.00 0.00 |
| 75% 545.79 263.21 0.00 0.00 0.00 0.00 |
| 95% 654.95 315.85 0.00 0.00 0.00 0.00 |
| 98% 785.94 379.02 0.00 0.00 0.00 0.00 |
| 99% 3379.39 2346.80 0.00 0.00 0.00 0.00 |
| Min 42.51 105.78 0.00 0.00 0.00 0.00 |
| Max 25109.16 43388.63 0.00 0.00 0.00 0.00 |
| ---- |
| |
| Here you can see the full latency distribution of reads, writes, range |
| requests (e.g. `select * from keyspace.table`), CAS read (compare phase |
| of CAS) and CAS write (set phase of compare and set). These can be |
| useful for narrowing down high level latency problems, for example in |
| this case if a client had a 20 millisecond timeout on their reads they |
| might experience the occasional timeout from this node but less than 1% |
| (since the 99% read latency is 3.3 milliseconds < 20 milliseconds). |
| |
| [[nodetool-tablehistograms]] |
| == Local Query Latency |
| |
| If you know which table is having latency/error issues, you can use |
| `nodetool tablehistograms` to get a better idea of what is happening |
| locally on a node: |
| |
| [source, bash] |
| ---- |
| $ nodetool tablehistograms keyspace table |
| Percentile SSTables Write Latency Read Latency Partition Size Cell Count |
| (micros) (micros) (bytes) |
| 50% 0.00 73.46 182.79 17084 103 |
| 75% 1.00 88.15 315.85 17084 103 |
| 95% 2.00 126.93 545.79 17084 103 |
| 98% 2.00 152.32 654.95 17084 103 |
| 99% 2.00 182.79 785.94 17084 103 |
| Min 0.00 42.51 24.60 14238 87 |
| Max 2.00 12108.97 17436.92 17084 103 |
| ---- |
| |
| This shows you percentile breakdowns particularly critical metrics. |
| |
| The first column contains how many sstables were read per logical read. |
| A very high number here indicates that you may have chosen the wrong |
| compaction strategy, e.g. `SizeTieredCompactionStrategy` typically has |
| many more reads per read than `LeveledCompactionStrategy` does for |
| update heavy workloads. |
| |
| The second column shows you a latency breakdown of _local_ write |
| latency. In this case we see that while the p50 is quite good at 73 |
| microseconds, the maximum latency is quite slow at 12 milliseconds. High |
| write max latencies often indicate a slow commitlog volume (slow to |
| fsync) or large writes that quickly saturate commitlog segments. |
| |
| The third column shows you a latency breakdown of _local_ read latency. |
| We can see that local Cassandra reads are (as expected) slower than |
| local writes, and the read speed correlates highly with the number of |
| sstables read per read. |
| |
| The fourth and fifth columns show distributions of partition size and |
| column count per partition. These are useful for determining if the |
| table has on average skinny or wide partitions and can help you isolate |
| bad data patterns. For example if you have a single cell that is 2 |
| megabytes, that is probably going to cause some heap pressure when it's |
| read. |
| |
| [[nodetool-tpstats]] |
| == Threadpool State |
| |
| You can use `nodetool tpstats` to view the current outstanding requests |
| on a particular node. This is useful for trying to find out which |
| resource (read threads, write threads, compaction, request response |
| threads) the Cassandra process lacks. For example: |
| |
| [source, bash] |
| ---- |
| $ nodetool tpstats |
| Pool Name Active Pending Completed Blocked All time blocked |
| ReadStage 2 0 12 0 0 |
| MiscStage 0 0 0 0 0 |
| CompactionExecutor 0 0 1940 0 0 |
| MutationStage 0 0 0 0 0 |
| GossipStage 0 0 10293 0 0 |
| Repair-Task 0 0 0 0 0 |
| RequestResponseStage 0 0 16 0 0 |
| ReadRepairStage 0 0 0 0 0 |
| CounterMutationStage 0 0 0 0 0 |
| MemtablePostFlush 0 0 83 0 0 |
| ValidationExecutor 0 0 0 0 0 |
| MemtableFlushWriter 0 0 30 0 0 |
| ViewMutationStage 0 0 0 0 0 |
| CacheCleanupExecutor 0 0 0 0 0 |
| MemtableReclaimMemory 0 0 30 0 0 |
| PendingRangeCalculator 0 0 11 0 0 |
| SecondaryIndexManagement 0 0 0 0 0 |
| HintsDispatcher 0 0 0 0 0 |
| Native-Transport-Requests 0 0 192 0 0 |
| MigrationStage 0 0 14 0 0 |
| PerDiskMemtableFlushWriter_0 0 0 30 0 0 |
| Sampler 0 0 0 0 0 |
| ViewBuildExecutor 0 0 0 0 0 |
| InternalResponseStage 0 0 0 0 0 |
| AntiEntropyStage 0 0 0 0 0 |
| |
| Message type Dropped Latency waiting in queue (micros) |
| 50% 95% 99% Max |
| READ 0 N/A N/A N/A N/A |
| RANGE_SLICE 0 0.00 0.00 0.00 0.00 |
| _TRACE 0 N/A N/A N/A N/A |
| HINT 0 N/A N/A N/A N/A |
| MUTATION 0 N/A N/A N/A N/A |
| COUNTER_MUTATION 0 N/A N/A N/A N/A |
| BATCH_STORE 0 N/A N/A N/A N/A |
| BATCH_REMOVE 0 N/A N/A N/A N/A |
| REQUEST_RESPONSE 0 0.00 0.00 0.00 0.00 |
| PAGED_RANGE 0 N/A N/A N/A N/A |
| READ_REPAIR 0 N/A N/A N/A N/A |
| ---- |
| |
| This command shows you all kinds of interesting statistics. The first |
| section shows a detailed breakdown of threadpools for each Cassandra |
| stage, including how many threads are current executing (Active) and how |
| many are waiting to run (Pending). Typically if you see pending |
| executions in a particular threadpool that indicates a problem localized |
| to that type of operation. For example if the `RequestResponseState` |
| queue is backing up, that means that the coordinators are waiting on a |
| lot of downstream replica requests and may indicate a lack of token |
| awareness, or very high consistency levels being used on read requests |
| (for example reading at `ALL` ties up RF `RequestResponseState` threads |
| whereas `LOCAL_ONE` only uses a single thread in the `ReadStage` |
| threadpool). On the other hand if you see a lot of pending compactions |
| that may indicate that your compaction threads cannot keep up with the |
| volume of writes and you may need to tune either the compaction strategy |
| or the `concurrent_compactors` or `compaction_throughput` options. |
| |
| The second section shows drops (errors) and latency distributions for |
| all the major request types. Drops are cumulative since process start, |
| but if you have any that indicate a serious problem as the default |
| timeouts to qualify as a drop are quite high (~5-10 seconds). Dropped |
| messages often warrants further investigation. |
| |
| [[nodetool-compactionstats]] |
| == Compaction State |
| |
| As Cassandra is a LSM datastore, Cassandra sometimes has to compact |
| sstables together, which can have adverse effects on performance. In |
| particular, compaction uses a reasonable quantity of CPU resources, |
| invalidates large quantities of the OS |
| https://en.wikipedia.org/wiki/Page_cache[page cache], and can put a lot |
| of load on your disk drives. There are great `os tools <os-iostat>` to |
| determine if this is the case, but often it's a good idea to check if |
| compactions are even running using `nodetool compactionstats`: |
| |
| [source, bash] |
| ---- |
| $ nodetool compactionstats |
| pending tasks: 2 |
| - keyspace.table: 2 |
| |
| id compaction type keyspace table completed total unit progress |
| 2062b290-7f3a-11e8-9358-cd941b956e60 Compaction keyspace table 21848273 97867583 bytes 22.32% |
| Active compaction remaining time : 0h00m04s |
| ---- |
| |
| In this case there is a single compaction running on the |
| `keyspace.table` table, has completed 21.8 megabytes of 97 and Cassandra |
| estimates (based on the configured compaction throughput) that this will |
| take 4 seconds. You can also pass `-H` to get the units in a human |
| readable format. |
| |
| Generally each running compaction can consume a single core, but the |
| more you do in parallel the faster data compacts. Compaction is crucial |
| to ensuring good read performance so having the right balance of |
| concurrent compactions such that compactions complete quickly but don't |
| take too many resources away from query threads is very important for |
| performance. If you notice compaction unable to keep up, try tuning |
| Cassandra's `concurrent_compactors` or `compaction_throughput` options. |