src/doc/4.0-rc1/troubleshooting/use_nodetool.html - cassandra-website - Git at Google

 ---
 layout: docpage

 title: "Documentation"

 is_homepage: false
 is_sphinx_doc: true

 doc-parent: "Troubleshooting"

 doc-title: "Use Nodetool"
 doc-header-links: '
   <link rel="top" title="Apache Cassandra Documentation v4.0-rc1" href="../index.html"/>
       <link rel="up" title="Troubleshooting" href="index.html"/>
       <link rel="next" title="Diving Deep, Use External Tools" href="use_tools.html"/>
       <link rel="prev" title="Cassandra Logs" href="reading_logs.html"/>
 '
 doc-search-path: "../search.html"

 extra-footer: '
 <script type="text/javascript">
     var DOCUMENTATION_OPTIONS = {
       URL_ROOT:    "",
       VERSION:     "",
       COLLAPSE_INDEX: false,
       FILE_SUFFIX: ".html",
       HAS_SOURCE:  false,
       SOURCELINK_SUFFIX: ".txt"
     };
 </script>
 '

 ---
 <div class="container-fluid">
   <div class="row">
     <div class="col-md-3">
       <div class="doc-navigation">
         <div class="doc-menu" role="navigation">
           <div class="navbar-header">
             <button type="button" class="pull-left navbar-toggle" data-toggle="collapse" data-target=".sidebar-navbar-collapse">
               <span class="sr-only">Toggle navigation</span>
               <span class="icon-bar"></span>
               <span class="icon-bar"></span>
               <span class="icon-bar"></span>
             </button>
           </div>
           <div class="navbar-collapse collapse sidebar-navbar-collapse">
             <form id="doc-search-form" class="navbar-form" action="../search.html" method="get" role="search">
               <div class="form-group">
                 <input type="text" size="30" class="form-control input-sm" name="q" placeholder="Search docs">
                 <input type="hidden" name="check_keywords" value="yes" />
                 <input type="hidden" name="area" value="default" />
               </div>
             </form>


             <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../new/index.html">New Features in Apache Cassandra 4.0</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../operating/index.html">Operating Cassandra</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li>
 <li class="toctree-l1 current"><a class="reference internal" href="index.html">Troubleshooting</a><ul class="current">
 <li class="toctree-l2"><a class="reference internal" href="finding_nodes.html">Find The Misbehaving Nodes</a></li>
 <li class="toctree-l2"><a class="reference internal" href="reading_logs.html">Cassandra Logs</a></li>
 <li class="toctree-l2 current"><a class="current reference internal" href="#">Use Nodetool</a><ul>
 <li class="toctree-l3"><a class="reference internal" href="#cluster-status">Cluster Status</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#coordinator-query-latency">Coordinator Query Latency</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#local-query-latency">Local Query Latency</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#threadpool-state">Threadpool State</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#compaction-state">Compaction State</a></li>
 </ul>
 </li>
 <li class="toctree-l2"><a class="reference internal" href="use_tools.html">Diving Deep, Use External Tools</a></li>
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../development/index.html">Contributing to Cassandra</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../plugins/index.html">Third-Party Plugins</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li>
 </ul>


           </div><!--/.nav-collapse -->
         </div>
       </div>
     </div>
     <div class="col-md-8">
       <div class="content doc-content">
         <div class="content-container">

   <div class="section" id="use-nodetool">
 <span id="id1"></span><h1>Use Nodetool<a class="headerlink" href="#use-nodetool" title="Permalink to this headline">¶</a></h1>
 <p>Cassandra’s <code class="docutils literal notranslate"><span class="pre">nodetool</span></code> allows you to narrow problems from the cluster down
 to a particular node and gives a lot of insight into the state of the Cassandra
 process itself. There are dozens of useful commands (see <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">help</span></code>
 for all the commands), but briefly some of the most useful for troubleshooting:</p>
 <div class="section" id="cluster-status">
 <span id="nodetool-status"></span><h2>Cluster Status<a class="headerlink" href="#cluster-status" title="Permalink to this headline">¶</a></h2>
 <p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> to assess status of the cluster:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool status &lt;optional keyspace&gt;

 Datacenter: dc1
 =======================
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
 UN  127.0.1.1  4.69 GiB   1            100.0%            35ea8c9f-b7a2-40a7-b9c5-0ee8b91fdd0e  r1
 UN  127.0.1.2  4.71 GiB   1            100.0%            752e278f-b7c5-4f58-974b-9328455af73f  r2
 UN  127.0.1.3  4.69 GiB   1            100.0%            9dc1a293-2cc0-40fa-a6fd-9e6054da04a7  r3
 </pre></div>
 </div>
 <p>In this case we can see that we have three nodes in one datacenter with about
 4.6GB of data each and they are all “up”. The up/down status of a node is
 independently determined by every node in the cluster, so you may have to run
 <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> on multiple nodes in a cluster to see the full view.</p>
 <p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> plus a little grep to see which nodes are
 down:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool status | grep -v &#39;^UN&#39;
 Datacenter: dc1
 ===============
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
 Datacenter: dc2
 ===============
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
 DN  127.0.0.5  105.73 KiB  1            33.3%             df303ac7-61de-46e9-ac79-6e630115fd75  r1
 </pre></div>
 </div>
 <p>In this case there are two datacenters and there is one node down in datacenter
 <code class="docutils literal notranslate"><span class="pre">dc2</span></code> and rack <code class="docutils literal notranslate"><span class="pre">r1</span></code>. This may indicate an issue on <code class="docutils literal notranslate"><span class="pre">127.0.0.5</span></code>
 warranting investigation.</p>
 </div>
 <div class="section" id="coordinator-query-latency">
 <span id="nodetool-proxyhistograms"></span><h2>Coordinator Query Latency<a class="headerlink" href="#coordinator-query-latency" title="Permalink to this headline">¶</a></h2>
 <p>You can view latency distributions of coordinator read and write latency
 to help narrow down latency issues using <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">proxyhistograms</span></code>:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool proxyhistograms
 Percentile       Read Latency      Write Latency      Range Latency   CAS Read Latency  CAS Write Latency View Write Latency
                      (micros)           (micros)           (micros)           (micros)           (micros)           (micros)
 50%                    454.83             219.34               0.00               0.00               0.00               0.00
 75%                    545.79             263.21               0.00               0.00               0.00               0.00
 95%                    654.95             315.85               0.00               0.00               0.00               0.00
 98%                    785.94             379.02               0.00               0.00               0.00               0.00
 99%                   3379.39            2346.80               0.00               0.00               0.00               0.00
 Min                     42.51             105.78               0.00               0.00               0.00               0.00
 Max                  25109.16           43388.63               0.00               0.00               0.00               0.00
 </pre></div>
 </div>
 <p>Here you can see the full latency distribution of reads, writes, range requests
 (e.g. <code class="docutils literal notranslate"><span class="pre">select</span> <span class="pre">*</span> <span class="pre">from</span> <span class="pre">keyspace.table</span></code>), CAS read (compare phase of CAS) and
 CAS write (set phase of compare and set). These can be useful for narrowing
 down high level latency problems, for example in this case if a client had a
 20 millisecond timeout on their reads they might experience the occasional
 timeout from this node but less than 1% (since the 99% read latency is 3.3
 milliseconds &lt; 20 milliseconds).</p>
 </div>
 <div class="section" id="local-query-latency">
 <span id="nodetool-tablehistograms"></span><h2>Local Query Latency<a class="headerlink" href="#local-query-latency" title="Permalink to this headline">¶</a></h2>
 <p>If you know which table is having latency/error issues, you can use
 <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">tablehistograms</span></code> to get a better idea of what is happening
 locally on a node:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool tablehistograms keyspace table
 Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                               (micros)          (micros)           (bytes)
 50%             0.00             73.46            182.79             17084               103
 75%             1.00             88.15            315.85             17084               103
 95%             2.00            126.93            545.79             17084               103
 98%             2.00            152.32            654.95             17084               103
 99%             2.00            182.79            785.94             17084               103
 Min             0.00             42.51             24.60             14238                87
 Max             2.00          12108.97          17436.92             17084               103
 </pre></div>
 </div>
 <p>This shows you percentile breakdowns particularly critical metrics.</p>
 <p>The first column contains how many sstables were read per logical read. A very
 high number here indicates that you may have chosen the wrong compaction
 strategy, e.g. <code class="docutils literal notranslate"><span class="pre">SizeTieredCompactionStrategy</span></code> typically has many more reads
 per read than <code class="docutils literal notranslate"><span class="pre">LeveledCompactionStrategy</span></code> does for update heavy workloads.</p>
 <p>The second column shows you a latency breakdown of <em>local</em> write latency. In
 this case we see that while the p50 is quite good at 73 microseconds, the
 maximum latency is quite slow at 12 milliseconds. High write max latencies
 often indicate a slow commitlog volume (slow to fsync) or large writes
 that quickly saturate commitlog segments.</p>
 <p>The third column shows you a latency breakdown of <em>local</em> read latency. We can
 see that local Cassandra reads are (as expected) slower than local writes, and
 the read speed correlates highly with the number of sstables read per read.</p>
 <p>The fourth and fifth columns show distributions of partition size and column
 count per partition. These are useful for determining if the table has on
 average skinny or wide partitions and can help you isolate bad data patterns.
 For example if you have a single cell that is 2 megabytes, that is probably
 going to cause some heap pressure when it’s read.</p>
 </div>
 <div class="section" id="threadpool-state">
 <span id="nodetool-tpstats"></span><h2>Threadpool State<a class="headerlink" href="#threadpool-state" title="Permalink to this headline">¶</a></h2>
 <p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">tpstats</span></code> to view the current outstanding requests on
 a particular node. This is useful for trying to find out which resource
 (read threads, write threads, compaction, request response threads) the
 Cassandra process lacks. For example:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool tpstats
 Pool Name                         Active   Pending      Completed   Blocked  All time blocked
 ReadStage                              2         0             12         0                 0
 MiscStage                              0         0              0         0                 0
 CompactionExecutor                     0         0           1940         0                 0
 MutationStage                          0         0              0         0                 0
 GossipStage                            0         0          10293         0                 0
 Repair-Task                            0         0              0         0                 0
 RequestResponseStage                   0         0             16         0                 0
 ReadRepairStage                        0         0              0         0                 0
 CounterMutationStage                   0         0              0         0                 0
 MemtablePostFlush                      0         0             83         0                 0
 ValidationExecutor                     0         0              0         0                 0
 MemtableFlushWriter                    0         0             30         0                 0
 ViewMutationStage                      0         0              0         0                 0
 CacheCleanupExecutor                   0         0              0         0                 0
 MemtableReclaimMemory                  0         0             30         0                 0
 PendingRangeCalculator                 0         0             11         0                 0
 SecondaryIndexManagement               0         0              0         0                 0
 HintsDispatcher                        0         0              0         0                 0
 Native-Transport-Requests              0         0            192         0                 0
 MigrationStage                         0         0             14         0                 0
 PerDiskMemtableFlushWriter_0           0         0             30         0                 0
 Sampler                                0         0              0         0                 0
 ViewBuildExecutor                      0         0              0         0                 0
 InternalResponseStage                  0         0              0         0                 0
 AntiEntropyStage                       0         0              0         0                 0

 Message type           Dropped                  Latency waiting in queue (micros)
                                              50%               95%               99%               Max
 READ                         0               N/A               N/A               N/A               N/A
 RANGE_SLICE                  0              0.00              0.00              0.00              0.00
 _TRACE                       0               N/A               N/A               N/A               N/A
 HINT                         0               N/A               N/A               N/A               N/A
 MUTATION                     0               N/A               N/A               N/A               N/A
 COUNTER_MUTATION             0               N/A               N/A               N/A               N/A
 BATCH_STORE                  0               N/A               N/A               N/A               N/A
 BATCH_REMOVE                 0               N/A               N/A               N/A               N/A
 REQUEST_RESPONSE             0              0.00              0.00              0.00              0.00
 PAGED_RANGE                  0               N/A               N/A               N/A               N/A
 READ_REPAIR                  0               N/A               N/A               N/A               N/A
 </pre></div>
 </div>
 <p>This command shows you all kinds of interesting statistics. The first section
 shows a detailed breakdown of threadpools for each Cassandra stage, including
 how many threads are current executing (Active) and how many are waiting to
 run (Pending). Typically if you see pending executions in a particular
 threadpool that indicates a problem localized to that type of operation. For
 example if the <code class="docutils literal notranslate"><span class="pre">RequestResponseState</span></code> queue is backing up, that means
 that the coordinators are waiting on a lot of downstream replica requests and
 may indicate a lack of token awareness, or very high consistency levels being
 used on read requests (for example reading at <code class="docutils literal notranslate"><span class="pre">ALL</span></code> ties up RF
 <code class="docutils literal notranslate"><span class="pre">RequestResponseState</span></code> threads whereas <code class="docutils literal notranslate"><span class="pre">LOCAL_ONE</span></code> only uses a single
 thread in the <code class="docutils literal notranslate"><span class="pre">ReadStage</span></code> threadpool). On the other hand if you see a lot of
 pending compactions that may indicate that your compaction threads cannot keep
 up with the volume of writes and you may need to tune either the compaction
 strategy or the <code class="docutils literal notranslate"><span class="pre">concurrent_compactors</span></code> or <code class="docutils literal notranslate"><span class="pre">compaction_throughput</span></code> options.</p>
 <p>The second section shows drops (errors) and latency distributions for all the
 major request types. Drops are cumulative since process start, but if you
 have any that indicate a serious problem as the default timeouts to qualify as
 a drop are quite high (~5-10 seconds). Dropped messages often warrants further
 investigation.</p>
 </div>
 <div class="section" id="compaction-state">
 <span id="nodetool-compactionstats"></span><h2>Compaction State<a class="headerlink" href="#compaction-state" title="Permalink to this headline">¶</a></h2>
 <p>As Cassandra is a LSM datastore, Cassandra sometimes has to compact sstables
 together, which can have adverse effects on performance. In particular,
 compaction uses a reasonable quantity of CPU resources, invalidates large
 quantities of the OS <a class="reference external" href="https://en.wikipedia.org/wiki/Page_cache">page cache</a>,
 and can put a lot of load on your disk drives. There are great
 <a class="reference internal" href="use_tools.html#os-iostat"><span class="std std-ref">os tools</span></a> to determine if this is the case, but often it’s a
 good idea to check if compactions are even running using
 <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">compactionstats</span></code>:</p>
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool compactionstats
 pending tasks: 2
 - keyspace.table: 2

 id                                   compaction type keyspace table completed total    unit  progress
 2062b290-7f3a-11e8-9358-cd941b956e60 Compaction      keyspace table 21848273  97867583 bytes 22.32%
 Active compaction remaining time :   0h00m04s
 </pre></div>
 </div>
 <p>In this case there is a single compaction running on the <code class="docutils literal notranslate"><span class="pre">keyspace.table</span></code>
 table, has completed 21.8 megabytes of 97 and Cassandra estimates (based on
 the configured compaction throughput) that this will take 4 seconds. You can
 also pass <code class="docutils literal notranslate"><span class="pre">-H</span></code> to get the units in a human readable format.</p>
 <p>Generally each running compaction can consume a single core, but the more
 you do in parallel the faster data compacts. Compaction is crucial to ensuring
 good read performance so having the right balance of concurrent compactions
 such that compactions complete quickly but don’t take too many resources
 away from query threads is very important for performance. If you notice
 compaction unable to keep up, try tuning Cassandra’s <code class="docutils literal notranslate"><span class="pre">concurrent_compactors</span></code>
 or <code class="docutils literal notranslate"><span class="pre">compaction_throughput</span></code> options.</p>
 </div>
 </div>


           <div class="doc-prev-next-links" role="navigation" aria-label="footer navigation">

             <a href="use_tools.html" class="btn btn-default pull-right " role="button" title="Diving Deep, Use External Tools" accesskey="n">Next <span class="glyphicon glyphicon-circle-arrow-right" aria-hidden="true"></span></a>


             <a href="reading_logs.html" class="btn btn-default" role="button" title="Cassandra Logs" accesskey="p"><span class="glyphicon glyphicon-circle-arrow-left" aria-hidden="true"></span> Previous</a>

           </div>

         </div>
       </div>
     </div>
   </div>
 </div>
	---
	layout: docpage

	title: "Documentation"

	is_homepage: false
	is_sphinx_doc: true

	doc-parent: "Troubleshooting"

	doc-title: "Use Nodetool"
	doc-header-links: '
	<link rel="top" title="Apache Cassandra Documentation v4.0-rc1" href="../index.html"/>
	<link rel="up" title="Troubleshooting" href="index.html"/>
	<link rel="next" title="Diving Deep, Use External Tools" href="use_tools.html"/>
	<link rel="prev" title="Cassandra Logs" href="reading_logs.html"/>
	'
	doc-search-path: "../search.html"

	extra-footer: '
	<script type="text/javascript">
	var DOCUMENTATION_OPTIONS = {
	URL_ROOT: "",
	VERSION: "",
	COLLAPSE_INDEX: false,
	FILE_SUFFIX: ".html",
	HAS_SOURCE: false,
	SOURCELINK_SUFFIX: ".txt"
	};
	</script>
	'

	---
	<div class="container-fluid">
	<div class="row">
	<div class="col-md-3">
	<div class="doc-navigation">
	<div class="doc-menu" role="navigation">
	<div class="navbar-header">
	<button type="button" class="pull-left navbar-toggle" data-toggle="collapse" data-target=".sidebar-navbar-collapse">
	<span class="sr-only">Toggle navigation</span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	</button>
	</div>
	<div class="navbar-collapse collapse sidebar-navbar-collapse">
	<form id="doc-search-form" class="navbar-form" action="../search.html" method="get" role="search">
	<div class="form-group">
	<input type="text" size="30" class="form-control input-sm" name="q" placeholder="Search docs">
	<input type="hidden" name="check_keywords" value="yes" />
	<input type="hidden" name="area" value="default" />
	</div>
	</form>



	<ul class="current">
	<li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../new/index.html">New Features in Apache Cassandra 4.0</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../operating/index.html">Operating Cassandra</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li>
	<li class="toctree-l1 current"><a class="reference internal" href="index.html">Troubleshooting</a><ul class="current">
	<li class="toctree-l2"><a class="reference internal" href="finding_nodes.html">Find The Misbehaving Nodes</a></li>
	<li class="toctree-l2"><a class="reference internal" href="reading_logs.html">Cassandra Logs</a></li>
	<li class="toctree-l2 current"><a class="current reference internal" href="#">Use Nodetool</a><ul>
	<li class="toctree-l3"><a class="reference internal" href="#cluster-status">Cluster Status</a></li>
	<li class="toctree-l3"><a class="reference internal" href="#coordinator-query-latency">Coordinator Query Latency</a></li>
	<li class="toctree-l3"><a class="reference internal" href="#local-query-latency">Local Query Latency</a></li>
	<li class="toctree-l3"><a class="reference internal" href="#threadpool-state">Threadpool State</a></li>
	<li class="toctree-l3"><a class="reference internal" href="#compaction-state">Compaction State</a></li>
	</ul>
	</li>
	<li class="toctree-l2"><a class="reference internal" href="use_tools.html">Diving Deep, Use External Tools</a></li>
	</ul>
	</li>
	<li class="toctree-l1"><a class="reference internal" href="../development/index.html">Contributing to Cassandra</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../plugins/index.html">Third-Party Plugins</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li>
	</ul>



	</div><!--/.nav-collapse -->
	</div>
	</div>
	</div>
	<div class="col-md-8">
	<div class="content doc-content">
	<div class="content-container">

	<div class="section" id="use-nodetool">
	<span id="id1"></span><h1>Use Nodetool<a class="headerlink" href="#use-nodetool" title="Permalink to this headline">¶</a></h1>
	<p>Cassandra’s <code class="docutils literal notranslate"><span class="pre">nodetool</span></code> allows you to narrow problems from the cluster down
	to a particular node and gives a lot of insight into the state of the Cassandra
	process itself. There are dozens of useful commands (see <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">help</span></code>
	for all the commands), but briefly some of the most useful for troubleshooting:</p>
	<div class="section" id="cluster-status">
	<span id="nodetool-status"></span><h2>Cluster Status<a class="headerlink" href="#cluster-status" title="Permalink to this headline">¶</a></h2>
	<p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> to assess status of the cluster:</p>
	<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool status <optional keyspace>

	Datacenter: dc1
	=======================
	Status=Up/Down
	\|/ State=Normal/Leaving/Joining/Moving
	-- Address Load Tokens Owns (effective) Host ID Rack
	UN 127.0.1.1 4.69 GiB 1 100.0% 35ea8c9f-b7a2-40a7-b9c5-0ee8b91fdd0e r1
	UN 127.0.1.2 4.71 GiB 1 100.0% 752e278f-b7c5-4f58-974b-9328455af73f r2
	UN 127.0.1.3 4.69 GiB 1 100.0% 9dc1a293-2cc0-40fa-a6fd-9e6054da04a7 r3
	</pre></div>
	</div>
	<p>In this case we can see that we have three nodes in one datacenter with about
	4.6GB of data each and they are all “up”. The up/down status of a node is
	independently determined by every node in the cluster, so you may have to run
	<code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> on multiple nodes in a cluster to see the full view.</p>
	<p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">status</span></code> plus a little grep to see which nodes are
	down:</p>
	<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool status \| grep -v '^UN'
	Datacenter: dc1
	===============
	Status=Up/Down
	\|/ State=Normal/Leaving/Joining/Moving
	-- Address Load Tokens Owns (effective) Host ID Rack
	Datacenter: dc2
	===============
	Status=Up/Down
	\|/ State=Normal/Leaving/Joining/Moving
	-- Address Load Tokens Owns (effective) Host ID Rack
	DN 127.0.0.5 105.73 KiB 1 33.3% df303ac7-61de-46e9-ac79-6e630115fd75 r1
	</pre></div>
	</div>
	<p>In this case there are two datacenters and there is one node down in datacenter
	<code class="docutils literal notranslate"><span class="pre">dc2</span></code> and rack <code class="docutils literal notranslate"><span class="pre">r1</span></code>. This may indicate an issue on <code class="docutils literal notranslate"><span class="pre">127.0.0.5</span></code>
	warranting investigation.</p>
	</div>
	<div class="section" id="coordinator-query-latency">
	<span id="nodetool-proxyhistograms"></span><h2>Coordinator Query Latency<a class="headerlink" href="#coordinator-query-latency" title="Permalink to this headline">¶</a></h2>
	<p>You can view latency distributions of coordinator read and write latency
	to help narrow down latency issues using <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">proxyhistograms</span></code>:</p>
	<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool proxyhistograms
	Percentile Read Latency Write Latency Range Latency CAS Read Latency CAS Write Latency View Write Latency
	(micros) (micros) (micros) (micros) (micros) (micros)
	50% 454.83 219.34 0.00 0.00 0.00 0.00
	75% 545.79 263.21 0.00 0.00 0.00 0.00
	95% 654.95 315.85 0.00 0.00 0.00 0.00
	98% 785.94 379.02 0.00 0.00 0.00 0.00
	99% 3379.39 2346.80 0.00 0.00 0.00 0.00
	Min 42.51 105.78 0.00 0.00 0.00 0.00
	Max 25109.16 43388.63 0.00 0.00 0.00 0.00
	</pre></div>
	</div>
	<p>Here you can see the full latency distribution of reads, writes, range requests
	(e.g. <code class="docutils literal notranslate"><span class="pre">select</span> <span class="pre">*</span> <span class="pre">from</span> <span class="pre">keyspace.table</span></code>), CAS read (compare phase of CAS) and
	CAS write (set phase of compare and set). These can be useful for narrowing
	down high level latency problems, for example in this case if a client had a
	20 millisecond timeout on their reads they might experience the occasional
	timeout from this node but less than 1% (since the 99% read latency is 3.3
	milliseconds < 20 milliseconds).</p>
	</div>
	<div class="section" id="local-query-latency">
	<span id="nodetool-tablehistograms"></span><h2>Local Query Latency<a class="headerlink" href="#local-query-latency" title="Permalink to this headline">¶</a></h2>
	<p>If you know which table is having latency/error issues, you can use
	<code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">tablehistograms</span></code> to get a better idea of what is happening
	locally on a node:</p>
	<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool tablehistograms keyspace table
	Percentile SSTables Write Latency Read Latency Partition Size Cell Count
	(micros) (micros) (bytes)
	50% 0.00 73.46 182.79 17084 103
	75% 1.00 88.15 315.85 17084 103
	95% 2.00 126.93 545.79 17084 103
	98% 2.00 152.32 654.95 17084 103
	99% 2.00 182.79 785.94 17084 103
	Min 0.00 42.51 24.60 14238 87
	Max 2.00 12108.97 17436.92 17084 103
	</pre></div>
	</div>
	<p>This shows you percentile breakdowns particularly critical metrics.</p>
	<p>The first column contains how many sstables were read per logical read. A very
	high number here indicates that you may have chosen the wrong compaction
	strategy, e.g. <code class="docutils literal notranslate"><span class="pre">SizeTieredCompactionStrategy</span></code> typically has many more reads
	per read than <code class="docutils literal notranslate"><span class="pre">LeveledCompactionStrategy</span></code> does for update heavy workloads.</p>
	<p>The second column shows you a latency breakdown of <em>local</em> write latency. In
	this case we see that while the p50 is quite good at 73 microseconds, the
	maximum latency is quite slow at 12 milliseconds. High write max latencies
	often indicate a slow commitlog volume (slow to fsync) or large writes
	that quickly saturate commitlog segments.</p>
	<p>The third column shows you a latency breakdown of <em>local</em> read latency. We can
	see that local Cassandra reads are (as expected) slower than local writes, and
	the read speed correlates highly with the number of sstables read per read.</p>
	<p>The fourth and fifth columns show distributions of partition size and column
	count per partition. These are useful for determining if the table has on
	average skinny or wide partitions and can help you isolate bad data patterns.
	For example if you have a single cell that is 2 megabytes, that is probably
	going to cause some heap pressure when it’s read.</p>
	</div>
	<div class="section" id="threadpool-state">
	<span id="nodetool-tpstats"></span><h2>Threadpool State<a class="headerlink" href="#threadpool-state" title="Permalink to this headline">¶</a></h2>
	<p>You can use <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">tpstats</span></code> to view the current outstanding requests on
	a particular node. This is useful for trying to find out which resource
	(read threads, write threads, compaction, request response threads) the
	Cassandra process lacks. For example:</p>
	<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool tpstats
	Pool Name Active Pending Completed Blocked All time blocked
	ReadStage 2 0 12 0 0
	MiscStage 0 0 0 0 0
	CompactionExecutor 0 0 1940 0 0
	MutationStage 0 0 0 0 0
	GossipStage 0 0 10293 0 0
	Repair-Task 0 0 0 0 0
	RequestResponseStage 0 0 16 0 0
	ReadRepairStage 0 0 0 0 0
	CounterMutationStage 0 0 0 0 0
	MemtablePostFlush 0 0 83 0 0
	ValidationExecutor 0 0 0 0 0
	MemtableFlushWriter 0 0 30 0 0
	ViewMutationStage 0 0 0 0 0
	CacheCleanupExecutor 0 0 0 0 0
	MemtableReclaimMemory 0 0 30 0 0
	PendingRangeCalculator 0 0 11 0 0
	SecondaryIndexManagement 0 0 0 0 0
	HintsDispatcher 0 0 0 0 0
	Native-Transport-Requests 0 0 192 0 0
	MigrationStage 0 0 14 0 0
	PerDiskMemtableFlushWriter_0 0 0 30 0 0
	Sampler 0 0 0 0 0
	ViewBuildExecutor 0 0 0 0 0
	InternalResponseStage 0 0 0 0 0
	AntiEntropyStage 0 0 0 0 0

	Message type Dropped Latency waiting in queue (micros)
	50% 95% 99% Max
	READ 0 N/A N/A N/A N/A
	RANGE_SLICE 0 0.00 0.00 0.00 0.00
	_TRACE 0 N/A N/A N/A N/A
	HINT 0 N/A N/A N/A N/A
	MUTATION 0 N/A N/A N/A N/A
	COUNTER_MUTATION 0 N/A N/A N/A N/A
	BATCH_STORE 0 N/A N/A N/A N/A
	BATCH_REMOVE 0 N/A N/A N/A N/A
	REQUEST_RESPONSE 0 0.00 0.00 0.00 0.00
	PAGED_RANGE 0 N/A N/A N/A N/A
	READ_REPAIR 0 N/A N/A N/A N/A
	</pre></div>
	</div>
	<p>This command shows you all kinds of interesting statistics. The first section
	shows a detailed breakdown of threadpools for each Cassandra stage, including
	how many threads are current executing (Active) and how many are waiting to
	run (Pending). Typically if you see pending executions in a particular
	threadpool that indicates a problem localized to that type of operation. For
	example if the <code class="docutils literal notranslate"><span class="pre">RequestResponseState</span></code> queue is backing up, that means
	that the coordinators are waiting on a lot of downstream replica requests and
	may indicate a lack of token awareness, or very high consistency levels being
	used on read requests (for example reading at <code class="docutils literal notranslate"><span class="pre">ALL</span></code> ties up RF
	<code class="docutils literal notranslate"><span class="pre">RequestResponseState</span></code> threads whereas <code class="docutils literal notranslate"><span class="pre">LOCAL_ONE</span></code> only uses a single
	thread in the <code class="docutils literal notranslate"><span class="pre">ReadStage</span></code> threadpool). On the other hand if you see a lot of
	pending compactions that may indicate that your compaction threads cannot keep
	up with the volume of writes and you may need to tune either the compaction
	strategy or the <code class="docutils literal notranslate"><span class="pre">concurrent_compactors</span></code> or <code class="docutils literal notranslate"><span class="pre">compaction_throughput</span></code> options.</p>
	<p>The second section shows drops (errors) and latency distributions for all the
	major request types. Drops are cumulative since process start, but if you
	have any that indicate a serious problem as the default timeouts to qualify as
	a drop are quite high (~5-10 seconds). Dropped messages often warrants further
	investigation.</p>
	</div>
	<div class="section" id="compaction-state">
	<span id="nodetool-compactionstats"></span><h2>Compaction State<a class="headerlink" href="#compaction-state" title="Permalink to this headline">¶</a></h2>
	<p>As Cassandra is a LSM datastore, Cassandra sometimes has to compact sstables
	together, which can have adverse effects on performance. In particular,
	compaction uses a reasonable quantity of CPU resources, invalidates large
	quantities of the OS <a class="reference external" href="https://en.wikipedia.org/wiki/Page_cache">page cache</a>,
	and can put a lot of load on your disk drives. There are great
	<a class="reference internal" href="use_tools.html#os-iostat"><span class="std std-ref">os tools</span></a> to determine if this is the case, but often it’s a
	good idea to check if compactions are even running using
	<code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">compactionstats</span></code>:</p>
	<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ nodetool compactionstats
	pending tasks: 2
	- keyspace.table: 2

	id compaction type keyspace table completed total unit progress
	2062b290-7f3a-11e8-9358-cd941b956e60 Compaction keyspace table 21848273 97867583 bytes 22.32%
	Active compaction remaining time : 0h00m04s
	</pre></div>
	</div>
	<p>In this case there is a single compaction running on the <code class="docutils literal notranslate"><span class="pre">keyspace.table</span></code>
	table, has completed 21.8 megabytes of 97 and Cassandra estimates (based on
	the configured compaction throughput) that this will take 4 seconds. You can
	also pass <code class="docutils literal notranslate"><span class="pre">-H</span></code> to get the units in a human readable format.</p>
	<p>Generally each running compaction can consume a single core, but the more
	you do in parallel the faster data compacts. Compaction is crucial to ensuring
	good read performance so having the right balance of concurrent compactions
	such that compactions complete quickly but don’t take too many resources
	away from query threads is very important for performance. If you notice
	compaction unable to keep up, try tuning Cassandra’s <code class="docutils literal notranslate"><span class="pre">concurrent_compactors</span></code>
	or <code class="docutils literal notranslate"><span class="pre">compaction_throughput</span></code> options.</p>
	</div>
	</div>




	<div class="doc-prev-next-links" role="navigation" aria-label="footer navigation">

	<a href="use_tools.html" class="btn btn-default pull-right " role="button" title="Diving Deep, Use External Tools" accesskey="n">Next <span class="glyphicon glyphicon-circle-arrow-right" aria-hidden="true"></span></a>


	<a href="reading_logs.html" class="btn btn-default" role="button" title="Cassandra Logs" accesskey="p"><span class="glyphicon glyphicon-circle-arrow-left" aria-hidden="true"></span> Previous</a>

	</div>

	</div>
	</div>
	</div>
	</div>
	</div>