blob: 5072f85d1deed4a0878a9de675873ad2b51dbd70 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS,
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. See the License for the specific language governing permissions and
.. limitations under the License.
.. _use-nodetool:
Use Nodetool
============
Cassandra's ``nodetool`` allows you to narrow problems from the cluster down
to a particular node and gives a lot of insight into the state of the Cassandra
process itself. There are dozens of useful commands (see ``nodetool help``
for all the commands), but briefly some of the most useful for troubleshooting:
.. _nodetool-status:
Cluster Status
--------------
You can use ``nodetool status`` to assess status of the cluster::
$ nodetool status <optional keyspace>
Datacenter: dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.1.1 4.69 GiB 1 100.0% 35ea8c9f-b7a2-40a7-b9c5-0ee8b91fdd0e r1
UN 127.0.1.2 4.71 GiB 1 100.0% 752e278f-b7c5-4f58-974b-9328455af73f r2
UN 127.0.1.3 4.69 GiB 1 100.0% 9dc1a293-2cc0-40fa-a6fd-9e6054da04a7 r3
In this case we can see that we have three nodes in one datacenter with about
4.6GB of data each and they are all "up". The up/down status of a node is
independently determined by every node in the cluster, so you may have to run
``nodetool status`` on multiple nodes in a cluster to see the full view.
You can use ``nodetool status`` plus a little grep to see which nodes are
down::
$ nodetool status | grep -v '^UN'
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 127.0.0.5 105.73 KiB 1 33.3% df303ac7-61de-46e9-ac79-6e630115fd75 r1
In this case there are two datacenters and there is one node down in datacenter
``dc2`` and rack ``r1``. This may indicate an issue on ``127.0.0.5``
warranting investigation.
.. _nodetool-proxyhistograms:
Coordinator Query Latency
-------------------------
You can view latency distributions of coordinator read and write latency
to help narrow down latency issues using ``nodetool proxyhistograms``::
$ nodetool proxyhistograms
Percentile Read Latency Write Latency Range Latency CAS Read Latency CAS Write Latency View Write Latency
(micros) (micros) (micros) (micros) (micros) (micros)
50% 454.83 219.34 0.00 0.00 0.00 0.00
75% 545.79 263.21 0.00 0.00 0.00 0.00
95% 654.95 315.85 0.00 0.00 0.00 0.00
98% 785.94 379.02 0.00 0.00 0.00 0.00
99% 3379.39 2346.80 0.00 0.00 0.00 0.00
Min 42.51 105.78 0.00 0.00 0.00 0.00
Max 25109.16 43388.63 0.00 0.00 0.00 0.00
Here you can see the full latency distribution of reads, writes, range requests
(e.g. ``select * from keyspace.table``), CAS read (compare phase of CAS) and
CAS write (set phase of compare and set). These can be useful for narrowing
down high level latency problems, for example in this case if a client had a
20 millisecond timeout on their reads they might experience the occasional
timeout from this node but less than 1% (since the 99% read latency is 3.3
milliseconds < 20 milliseconds).
.. _nodetool-tablehistograms:
Local Query Latency
-------------------
If you know which table is having latency/error issues, you can use
``nodetool tablehistograms`` to get a better idea of what is happening
locally on a node::
$ nodetool tablehistograms keyspace table
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 73.46 182.79 17084 103
75% 1.00 88.15 315.85 17084 103
95% 2.00 126.93 545.79 17084 103
98% 2.00 152.32 654.95 17084 103
99% 2.00 182.79 785.94 17084 103
Min 0.00 42.51 24.60 14238 87
Max 2.00 12108.97 17436.92 17084 103
This shows you percentile breakdowns particularly critical metrics.
The first column contains how many sstables were read per logical read. A very
high number here indicates that you may have chosen the wrong compaction
strategy, e.g. ``SizeTieredCompactionStrategy`` typically has many more reads
per read than ``LeveledCompactionStrategy`` does for update heavy workloads.
The second column shows you a latency breakdown of *local* write latency. In
this case we see that while the p50 is quite good at 73 microseconds, the
maximum latency is quite slow at 12 milliseconds. High write max latencies
often indicate a slow commitlog volume (slow to fsync) or large writes
that quickly saturate commitlog segments.
The third column shows you a latency breakdown of *local* read latency. We can
see that local Cassandra reads are (as expected) slower than local writes, and
the read speed correlates highly with the number of sstables read per read.
The fourth and fifth columns show distributions of partition size and column
count per partition. These are useful for determining if the table has on
average skinny or wide partitions and can help you isolate bad data patterns.
For example if you have a single cell that is 2 megabytes, that is probably
going to cause some heap pressure when it's read.
.. _nodetool-tpstats:
Threadpool State
----------------
You can use ``nodetool tpstats`` to view the current outstanding requests on
a particular node. This is useful for trying to find out which resource
(read threads, write threads, compaction, request response threads) the
Cassandra process lacks. For example::
$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 2 0 12 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 1940 0 0
MutationStage 0 0 0 0 0
GossipStage 0 0 10293 0 0
Repair-Task 0 0 0 0 0
RequestResponseStage 0 0 16 0 0
ReadRepairStage 0 0 0 0 0
CounterMutationStage 0 0 0 0 0
MemtablePostFlush 0 0 83 0 0
ValidationExecutor 0 0 0 0 0
MemtableFlushWriter 0 0 30 0 0
ViewMutationStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
MemtableReclaimMemory 0 0 30 0 0
PendingRangeCalculator 0 0 11 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 0 0 0
Native-Transport-Requests 0 0 192 0 0
MigrationStage 0 0 14 0 0
PerDiskMemtableFlushWriter_0 0 0 30 0 0
Sampler 0 0 0 0 0
ViewBuildExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
Message type Dropped Latency waiting in queue (micros)
50% 95% 99% Max
READ 0 N/A N/A N/A N/A
RANGE_SLICE 0 0.00 0.00 0.00 0.00
_TRACE 0 N/A N/A N/A N/A
HINT 0 N/A N/A N/A N/A
MUTATION 0 N/A N/A N/A N/A
COUNTER_MUTATION 0 N/A N/A N/A N/A
BATCH_STORE 0 N/A N/A N/A N/A
BATCH_REMOVE 0 N/A N/A N/A N/A
REQUEST_RESPONSE 0 0.00 0.00 0.00 0.00
PAGED_RANGE 0 N/A N/A N/A N/A
READ_REPAIR 0 N/A N/A N/A N/A
This command shows you all kinds of interesting statistics. The first section
shows a detailed breakdown of threadpools for each Cassandra stage, including
how many threads are current executing (Active) and how many are waiting to
run (Pending). Typically if you see pending executions in a particular
threadpool that indicates a problem localized to that type of operation. For
example if the ``RequestResponseState`` queue is backing up, that means
that the coordinators are waiting on a lot of downstream replica requests and
may indicate a lack of token awareness, or very high consistency levels being
used on read requests (for example reading at ``ALL`` ties up RF
``RequestResponseState`` threads whereas ``LOCAL_ONE`` only uses a single
thread in the ``ReadStage`` threadpool). On the other hand if you see a lot of
pending compactions that may indicate that your compaction threads cannot keep
up with the volume of writes and you may need to tune either the compaction
strategy or the ``concurrent_compactors`` or ``compaction_throughput`` options.
The second section shows drops (errors) and latency distributions for all the
major request types. Drops are cumulative since process start, but if you
have any that indicate a serious problem as the default timeouts to qualify as
a drop are quite high (~5-10 seconds). Dropped messages often warrants further
investigation.
.. _nodetool-compactionstats:
Compaction State
----------------
As Cassandra is a LSM datastore, Cassandra sometimes has to compact sstables
together, which can have adverse effects on performance. In particular,
compaction uses a reasonable quantity of CPU resources, invalidates large
quantities of the OS `page cache <https://en.wikipedia.org/wiki/Page_cache>`_,
and can put a lot of load on your disk drives. There are great
:ref:`os tools <os-iostat>` to determine if this is the case, but often it's a
good idea to check if compactions are even running using
``nodetool compactionstats``::
$ nodetool compactionstats
pending tasks: 2
- keyspace.table: 2
id compaction type keyspace table completed total unit progress
2062b290-7f3a-11e8-9358-cd941b956e60 Compaction keyspace table 21848273 97867583 bytes 22.32%
Active compaction remaining time : 0h00m04s
In this case there is a single compaction running on the ``keyspace.table``
table, has completed 21.8 megabytes of 97 and Cassandra estimates (based on
the configured compaction throughput) that this will take 4 seconds. You can
also pass ``-H`` to get the units in a human readable format.
Generally each running compaction can consume a single core, but the more
you do in parallel the faster data compacts. Compaction is crucial to ensuring
good read performance so having the right balance of concurrent compactions
such that compactions complete quickly but don't take too many resources
away from query threads is very important for performance. If you notice
compaction unable to keep up, try tuning Cassandra's ``concurrent_compactors``
or ``compaction_throughput`` options.