| --- |
| layout: docpage |
| |
| title: "Documentation" |
| |
| is_homepage: false |
| is_sphinx_doc: true |
| |
| doc-parent: "Troubleshooting" |
| |
| doc-title: "Use Nodetool" |
| doc-header-links: ' |
| <link rel="top" title="Apache Cassandra Documentation v4.0-rc1" href="../index.html"/> |
| <link rel="up" title="Troubleshooting" href="index.html"/> |
| <link rel="next" title="Diving Deep, Use External Tools" href="use_tools.html"/> |
| <link rel="prev" title="Cassandra Logs" href="reading_logs.html"/> |
| ' |
| doc-search-path: "../search.html" |
| |
| extra-footer: ' |
| <script type="text/javascript"> |
| var DOCUMENTATION_OPTIONS = { |
| URL_ROOT: "", |
| VERSION: "", |
| COLLAPSE_INDEX: false, |
| FILE_SUFFIX: ".html", |
| HAS_SOURCE: false, |
| SOURCELINK_SUFFIX: ".txt" |
| }; |
| </script> |
| ' |
| |
| --- |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-3"> |
| <div class="doc-navigation"> |
| <div class="doc-menu" role="navigation"> |
| <div class="navbar-header"> |
| <button type="button" class="pull-left navbar-toggle" data-toggle="collapse" data-target=".sidebar-navbar-collapse"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| </div> |
| <div class="navbar-collapse collapse sidebar-navbar-collapse"> |
| <form id="doc-search-form" class="navbar-form" action="../search.html" method="get" role="search"> |
| <div class="form-group"> |
| <input type="text" size="30" class="form-control input-sm" name="q" placeholder="Search docs"> |
| <input type="hidden" name="check_keywords" value="yes" /> |
| <input type="hidden" name="area" value="default" /> |
| </div> |
| </form> |
| |
| |
| |
| <ul class="current"> |
| <li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../new/index.html">New Features in Apache Cassandra 4.0</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../operating/index.html">Operating Cassandra</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li> |
| <li class="toctree-l1 current"><a class="reference internal" href="index.html">Troubleshooting</a><ul class="current"> |
| <li class="toctree-l2"><a class="reference internal" href="finding_nodes.html">Find The Misbehaving Nodes</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="reading_logs.html">Cassandra Logs</a></li> |
| <li class="toctree-l2 current"><a class="current reference internal" href="#">Use Nodetool</a><ul> |
| <li class="toctree-l3"><a class="reference internal" href="#cluster-status">Cluster Status</a></li> |
| <li class="toctree-l3"><a class="reference internal" href="#coordinator-query-latency">Coordinator Query Latency</a></li> |
| <li class="toctree-l3"><a class="reference internal" href="#local-query-latency">Local Query Latency</a></li> |
| <li class="toctree-l3"><a class="reference internal" href="#threadpool-state">Threadpool State</a></li> |
| <li class="toctree-l3"><a class="reference internal" href="#compaction-state">Compaction State</a></li> |
| </ul> |
| </li> |
| <li class="toctree-l2"><a class="reference internal" href="use_tools.html">Diving Deep, Use External Tools</a></li> |
| </ul> |
| </li> |
| <li class="toctree-l1"><a class="reference internal" href="../development/index.html">Contributing to Cassandra</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../plugins/index.html">Third-Party Plugins</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li> |
| </ul> |
| |
| |
| |
| </div><!--/.nav-collapse --> |
| </div> |
| </div> |
| </div> |
| <div class="col-md-8"> |
| <div class="content doc-content"> |
| <div class="content-container"> |
| |
| <div class="section" id="use-nodetool"> |
| <span id="id1"></span><h1>Use Nodetool<a class="headerlink" href="#use-nodetool" title="Permalink to this headline">¶</a></h1> |
| <p>Cassandra’s <code class="docutils literal notranslate"><span class="pre">nodetool</span></code> allows you to narrow problems from the cluster down |
| to a particular node and gives a lot of insight into the state of the Cassandra |
| process itself. There are dozens of useful commands (see <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">help</span></code> |
| for all the commands), but briefly some of the most useful for troubleshooting:</p> |
| <div class="section" id="cluster-status"> |
| <span id="nodetool-status"></span><h2>Cluster Status<a class="headerlink" href="#cluster-status" title="Permalink to this headline">¶</a></h2> |
| <p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> to assess status of the cluster:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool status <optional keyspace> |
| |
| Datacenter: dc1 |
| ======================= |
| Status=Up/Down |
| |/ State=Normal/Leaving/Joining/Moving |
| -- Address Load Tokens Owns (effective) Host ID Rack |
| UN 127.0.1.1 4.69 GiB 1 100.0% 35ea8c9f-b7a2-40a7-b9c5-0ee8b91fdd0e r1 |
| UN 127.0.1.2 4.71 GiB 1 100.0% 752e278f-b7c5-4f58-974b-9328455af73f r2 |
| UN 127.0.1.3 4.69 GiB 1 100.0% 9dc1a293-2cc0-40fa-a6fd-9e6054da04a7 r3 |
| </pre></div> |
| </div> |
| <p>In this case we can see that we have three nodes in one datacenter with about |
| 4.6GB of data each and they are all “up”. The up/down status of a node is |
| independently determined by every node in the cluster, so you may have to run |
| <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> on multiple nodes in a cluster to see the full view.</p> |
| <p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> plus a little grep to see which nodes are |
| down:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool status | grep -v '^UN' |
| Datacenter: dc1 |
| =============== |
| Status=Up/Down |
| |/ State=Normal/Leaving/Joining/Moving |
| -- Address Load Tokens Owns (effective) Host ID Rack |
| Datacenter: dc2 |
| =============== |
| Status=Up/Down |
| |/ State=Normal/Leaving/Joining/Moving |
| -- Address Load Tokens Owns (effective) Host ID Rack |
| DN 127.0.0.5 105.73 KiB 1 33.3% df303ac7-61de-46e9-ac79-6e630115fd75 r1 |
| </pre></div> |
| </div> |
| <p>In this case there are two datacenters and there is one node down in datacenter |
| <code class="docutils literal notranslate"><span class="pre">dc2</span></code> and rack <code class="docutils literal notranslate"><span class="pre">r1</span></code>. This may indicate an issue on <code class="docutils literal notranslate"><span class="pre">127.0.0.5</span></code> |
| warranting investigation.</p> |
| </div> |
| <div class="section" id="coordinator-query-latency"> |
| <span id="nodetool-proxyhistograms"></span><h2>Coordinator Query Latency<a class="headerlink" href="#coordinator-query-latency" title="Permalink to this headline">¶</a></h2> |
| <p>You can view latency distributions of coordinator read and write latency |
| to help narrow down latency issues using <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">proxyhistograms</span></code>:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool proxyhistograms |
| Percentile Read Latency Write Latency Range Latency CAS Read Latency CAS Write Latency View Write Latency |
| (micros) (micros) (micros) (micros) (micros) (micros) |
| 50% 454.83 219.34 0.00 0.00 0.00 0.00 |
| 75% 545.79 263.21 0.00 0.00 0.00 0.00 |
| 95% 654.95 315.85 0.00 0.00 0.00 0.00 |
| 98% 785.94 379.02 0.00 0.00 0.00 0.00 |
| 99% 3379.39 2346.80 0.00 0.00 0.00 0.00 |
| Min 42.51 105.78 0.00 0.00 0.00 0.00 |
| Max 25109.16 43388.63 0.00 0.00 0.00 0.00 |
| </pre></div> |
| </div> |
| <p>Here you can see the full latency distribution of reads, writes, range requests |
| (e.g. <code class="docutils literal notranslate"><span class="pre">select</span> <span class="pre">*</span> <span class="pre">from</span> <span class="pre">keyspace.table</span></code>), CAS read (compare phase of CAS) and |
| CAS write (set phase of compare and set). These can be useful for narrowing |
| down high level latency problems, for example in this case if a client had a |
| 20 millisecond timeout on their reads they might experience the occasional |
| timeout from this node but less than 1% (since the 99% read latency is 3.3 |
| milliseconds < 20 milliseconds).</p> |
| </div> |
| <div class="section" id="local-query-latency"> |
| <span id="nodetool-tablehistograms"></span><h2>Local Query Latency<a class="headerlink" href="#local-query-latency" title="Permalink to this headline">¶</a></h2> |
| <p>If you know which table is having latency/error issues, you can use |
| <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">tablehistograms</span></code> to get a better idea of what is happening |
| locally on a node:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool tablehistograms keyspace table |
| Percentile SSTables Write Latency Read Latency Partition Size Cell Count |
| (micros) (micros) (bytes) |
| 50% 0.00 73.46 182.79 17084 103 |
| 75% 1.00 88.15 315.85 17084 103 |
| 95% 2.00 126.93 545.79 17084 103 |
| 98% 2.00 152.32 654.95 17084 103 |
| 99% 2.00 182.79 785.94 17084 103 |
| Min 0.00 42.51 24.60 14238 87 |
| Max 2.00 12108.97 17436.92 17084 103 |
| </pre></div> |
| </div> |
| <p>This shows you percentile breakdowns particularly critical metrics.</p> |
| <p>The first column contains how many sstables were read per logical read. A very |
| high number here indicates that you may have chosen the wrong compaction |
| strategy, e.g. <code class="docutils literal notranslate"><span class="pre">SizeTieredCompactionStrategy</span></code> typically has many more reads |
| per read than <code class="docutils literal notranslate"><span class="pre">LeveledCompactionStrategy</span></code> does for update heavy workloads.</p> |
| <p>The second column shows you a latency breakdown of <em>local</em> write latency. In |
| this case we see that while the p50 is quite good at 73 microseconds, the |
| maximum latency is quite slow at 12 milliseconds. High write max latencies |
| often indicate a slow commitlog volume (slow to fsync) or large writes |
| that quickly saturate commitlog segments.</p> |
| <p>The third column shows you a latency breakdown of <em>local</em> read latency. We can |
| see that local Cassandra reads are (as expected) slower than local writes, and |
| the read speed correlates highly with the number of sstables read per read.</p> |
| <p>The fourth and fifth columns show distributions of partition size and column |
| count per partition. These are useful for determining if the table has on |
| average skinny or wide partitions and can help you isolate bad data patterns. |
| For example if you have a single cell that is 2 megabytes, that is probably |
| going to cause some heap pressure when it’s read.</p> |
| </div> |
| <div class="section" id="threadpool-state"> |
| <span id="nodetool-tpstats"></span><h2>Threadpool State<a class="headerlink" href="#threadpool-state" title="Permalink to this headline">¶</a></h2> |
| <p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">tpstats</span></code> to view the current outstanding requests on |
| a particular node. This is useful for trying to find out which resource |
| (read threads, write threads, compaction, request response threads) the |
| Cassandra process lacks. For example:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool tpstats |
| Pool Name Active Pending Completed Blocked All time blocked |
| ReadStage 2 0 12 0 0 |
| MiscStage 0 0 0 0 0 |
| CompactionExecutor 0 0 1940 0 0 |
| MutationStage 0 0 0 0 0 |
| GossipStage 0 0 10293 0 0 |
| Repair-Task 0 0 0 0 0 |
| RequestResponseStage 0 0 16 0 0 |
| ReadRepairStage 0 0 0 0 0 |
| CounterMutationStage 0 0 0 0 0 |
| MemtablePostFlush 0 0 83 0 0 |
| ValidationExecutor 0 0 0 0 0 |
| MemtableFlushWriter 0 0 30 0 0 |
| ViewMutationStage 0 0 0 0 0 |
| CacheCleanupExecutor 0 0 0 0 0 |
| MemtableReclaimMemory 0 0 30 0 0 |
| PendingRangeCalculator 0 0 11 0 0 |
| SecondaryIndexManagement 0 0 0 0 0 |
| HintsDispatcher 0 0 0 0 0 |
| Native-Transport-Requests 0 0 192 0 0 |
| MigrationStage 0 0 14 0 0 |
| PerDiskMemtableFlushWriter_0 0 0 30 0 0 |
| Sampler 0 0 0 0 0 |
| ViewBuildExecutor 0 0 0 0 0 |
| InternalResponseStage 0 0 0 0 0 |
| AntiEntropyStage 0 0 0 0 0 |
| |
| Message type Dropped Latency waiting in queue (micros) |
| 50% 95% 99% Max |
| READ 0 N/A N/A N/A N/A |
| RANGE_SLICE 0 0.00 0.00 0.00 0.00 |
| _TRACE 0 N/A N/A N/A N/A |
| HINT 0 N/A N/A N/A N/A |
| MUTATION 0 N/A N/A N/A N/A |
| COUNTER_MUTATION 0 N/A N/A N/A N/A |
| BATCH_STORE 0 N/A N/A N/A N/A |
| BATCH_REMOVE 0 N/A N/A N/A N/A |
| REQUEST_RESPONSE 0 0.00 0.00 0.00 0.00 |
| PAGED_RANGE 0 N/A N/A N/A N/A |
| READ_REPAIR 0 N/A N/A N/A N/A |
| </pre></div> |
| </div> |
| <p>This command shows you all kinds of interesting statistics. The first section |
| shows a detailed breakdown of threadpools for each Cassandra stage, including |
| how many threads are current executing (Active) and how many are waiting to |
| run (Pending). Typically if you see pending executions in a particular |
| threadpool that indicates a problem localized to that type of operation. For |
| example if the <code class="docutils literal notranslate"><span class="pre">RequestResponseState</span></code> queue is backing up, that means |
| that the coordinators are waiting on a lot of downstream replica requests and |
| may indicate a lack of token awareness, or very high consistency levels being |
| used on read requests (for example reading at <code class="docutils literal notranslate"><span class="pre">ALL</span></code> ties up RF |
| <code class="docutils literal notranslate"><span class="pre">RequestResponseState</span></code> threads whereas <code class="docutils literal notranslate"><span class="pre">LOCAL_ONE</span></code> only uses a single |
| thread in the <code class="docutils literal notranslate"><span class="pre">ReadStage</span></code> threadpool). On the other hand if you see a lot of |
| pending compactions that may indicate that your compaction threads cannot keep |
| up with the volume of writes and you may need to tune either the compaction |
| strategy or the <code class="docutils literal notranslate"><span class="pre">concurrent_compactors</span></code> or <code class="docutils literal notranslate"><span class="pre">compaction_throughput</span></code> options.</p> |
| <p>The second section shows drops (errors) and latency distributions for all the |
| major request types. Drops are cumulative since process start, but if you |
| have any that indicate a serious problem as the default timeouts to qualify as |
| a drop are quite high (~5-10 seconds). Dropped messages often warrants further |
| investigation.</p> |
| </div> |
| <div class="section" id="compaction-state"> |
| <span id="nodetool-compactionstats"></span><h2>Compaction State<a class="headerlink" href="#compaction-state" title="Permalink to this headline">¶</a></h2> |
| <p>As Cassandra is a LSM datastore, Cassandra sometimes has to compact sstables |
| together, which can have adverse effects on performance. In particular, |
| compaction uses a reasonable quantity of CPU resources, invalidates large |
| quantities of the OS <a class="reference external" href="https://en.wikipedia.org/wiki/Page_cache">page cache</a>, |
| and can put a lot of load on your disk drives. There are great |
| <a class="reference internal" href="use_tools.html#os-iostat"><span class="std std-ref">os tools</span></a> to determine if this is the case, but often it’s a |
| good idea to check if compactions are even running using |
| <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">compactionstats</span></code>:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool compactionstats |
| pending tasks: 2 |
| - keyspace.table: 2 |
| |
| id compaction type keyspace table completed total unit progress |
| 2062b290-7f3a-11e8-9358-cd941b956e60 Compaction keyspace table 21848273 97867583 bytes 22.32% |
| Active compaction remaining time : 0h00m04s |
| </pre></div> |
| </div> |
| <p>In this case there is a single compaction running on the <code class="docutils literal notranslate"><span class="pre">keyspace.table</span></code> |
| table, has completed 21.8 megabytes of 97 and Cassandra estimates (based on |
| the configured compaction throughput) that this will take 4 seconds. You can |
| also pass <code class="docutils literal notranslate"><span class="pre">-H</span></code> to get the units in a human readable format.</p> |
| <p>Generally each running compaction can consume a single core, but the more |
| you do in parallel the faster data compacts. Compaction is crucial to ensuring |
| good read performance so having the right balance of concurrent compactions |
| such that compactions complete quickly but don’t take too many resources |
| away from query threads is very important for performance. If you notice |
| compaction unable to keep up, try tuning Cassandra’s <code class="docutils literal notranslate"><span class="pre">concurrent_compactors</span></code> |
| or <code class="docutils literal notranslate"><span class="pre">compaction_throughput</span></code> options.</p> |
| </div> |
| </div> |
| |
| |
| |
| |
| <div class="doc-prev-next-links" role="navigation" aria-label="footer navigation"> |
| |
| <a href="use_tools.html" class="btn btn-default pull-right " role="button" title="Diving Deep, Use External Tools" accesskey="n">Next <span class="glyphicon glyphicon-circle-arrow-right" aria-hidden="true"></span></a> |
| |
| |
| <a href="reading_logs.html" class="btn btn-default" role="button" title="Cassandra Logs" accesskey="p"><span class="glyphicon glyphicon-circle-arrow-left" aria-hidden="true"></span> Previous</a> |
| |
| </div> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |