| --- |
| layout: docpage |
| |
| title: "Documentation" |
| |
| is_homepage: false |
| is_sphinx_doc: true |
| |
| doc-parent: "Troubleshooting" |
| |
| doc-title: "Diving Deep, Use External Tools" |
| doc-header-links: ' |
| <link rel="top" title="Apache Cassandra Documentation v4.0-alpha3" href="../index.html"/> |
| <link rel="up" title="Troubleshooting" href="index.html"/> |
| <link rel="next" title="Contributing to Cassandra" href="../development/index.html"/> |
| <link rel="prev" title="Use Nodetool" href="use_nodetool.html"/> |
| ' |
| doc-search-path: "../search.html" |
| |
| extra-footer: ' |
| <script type="text/javascript"> |
| var DOCUMENTATION_OPTIONS = { |
| URL_ROOT: "", |
| VERSION: "", |
| COLLAPSE_INDEX: false, |
| FILE_SUFFIX: ".html", |
| HAS_SOURCE: false, |
| SOURCELINK_SUFFIX: ".txt" |
| }; |
| </script> |
| ' |
| |
| --- |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-3"> |
| <div class="doc-navigation"> |
| <div class="doc-menu" role="navigation"> |
| <div class="navbar-header"> |
| <button type="button" class="pull-left navbar-toggle" data-toggle="collapse" data-target=".sidebar-navbar-collapse"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| </div> |
| <div class="navbar-collapse collapse sidebar-navbar-collapse"> |
| <form id="doc-search-form" class="navbar-form" action="../search.html" method="get" role="search"> |
| <div class="form-group"> |
| <input type="text" size="30" class="form-control input-sm" name="q" placeholder="Search docs"> |
| <input type="hidden" name="check_keywords" value="yes" /> |
| <input type="hidden" name="area" value="default" /> |
| </div> |
| </form> |
| |
| |
| |
| <ul class="current"> |
| <li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../new/index.html">New Features in Apache Cassandra 4.0</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../operating/index.html">Operating Cassandra</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li> |
| <li class="toctree-l1 current"><a class="reference internal" href="index.html">Troubleshooting</a><ul class="current"> |
| <li class="toctree-l2"><a class="reference internal" href="finding_nodes.html">Find The Misbehaving Nodes</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="reading_logs.html">Cassandra Logs</a></li> |
| <li class="toctree-l2"><a class="reference internal" href="use_nodetool.html">Use Nodetool</a></li> |
| <li class="toctree-l2 current"><a class="current reference internal" href="#">Diving Deep, Use External Tools</a><ul> |
| <li class="toctree-l3"><a class="reference internal" href="#jvm-tooling">JVM Tooling</a></li> |
| <li class="toctree-l3"><a class="reference internal" href="#basic-os-tooling">Basic OS Tooling</a></li> |
| <li class="toctree-l3"><a class="reference internal" href="#advanced-tools">Advanced tools</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li class="toctree-l1"><a class="reference internal" href="../development/index.html">Contributing to Cassandra</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../plugins/index.html">Third-Party Plugins</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs</a></li> |
| <li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li> |
| </ul> |
| |
| |
| |
| </div><!--/.nav-collapse --> |
| </div> |
| </div> |
| </div> |
| <div class="col-md-8"> |
| <div class="content doc-content"> |
| <div class="content-container"> |
| |
| <div class="section" id="diving-deep-use-external-tools"> |
| <span id="use-os-tools"></span><h1>Diving Deep, Use External Tools<a class="headerlink" href="#diving-deep-use-external-tools" title="Permalink to this headline">¶</a></h1> |
| <p>Machine access allows operators to dive even deeper than logs and <code class="docutils literal notranslate"><span class="pre">nodetool</span></code> |
| allow. While every Cassandra operator may have their personal favorite |
| toolsets for troubleshooting issues, this page contains some of the most common |
| operator techniques and examples of those tools. Many of these commands work |
| only on Linux, but if you are deploying on a different operating system you may |
| have access to other substantially similar tools that assess similar OS level |
| metrics and processes.</p> |
| <div class="section" id="jvm-tooling"> |
| <h2>JVM Tooling<a class="headerlink" href="#jvm-tooling" title="Permalink to this headline">¶</a></h2> |
| <p>The JVM ships with a number of useful tools. Some of them are useful for |
| debugging Cassandra issues, especially related to heap and execution stacks.</p> |
| <p><strong>NOTE</strong>: There are two common gotchas with JVM tooling and Cassandra:</p> |
| <ol class="arabic simple"> |
| <li>By default Cassandra ships with <code class="docutils literal notranslate"><span class="pre">-XX:+PerfDisableSharedMem</span></code> set to prevent |
| long pauses (see <code class="docutils literal notranslate"><span class="pre">CASSANDRA-9242</span></code> and <code class="docutils literal notranslate"><span class="pre">CASSANDRA-9483</span></code> for details). If |
| you want to use JVM tooling you can instead have <code class="docutils literal notranslate"><span class="pre">/tmp</span></code> mounted on an in |
| memory <code class="docutils literal notranslate"><span class="pre">tmpfs</span></code> which also effectively works around <code class="docutils literal notranslate"><span class="pre">CASSANDRA-9242</span></code>.</li> |
| <li>Make sure you run the tools as the same user as Cassandra is running as, |
| e.g. if the database is running as <code class="docutils literal notranslate"><span class="pre">cassandra</span></code> the tool also has to be |
| run as <code class="docutils literal notranslate"><span class="pre">cassandra</span></code>, e.g. via <code class="docutils literal notranslate"><span class="pre">sudo</span> <span class="pre">-u</span> <span class="pre">cassandra</span> <span class="pre"><cmd></span></code>.</li> |
| </ol> |
| <div class="section" id="garbage-collection-state-jstat"> |
| <h3>Garbage Collection State (jstat)<a class="headerlink" href="#garbage-collection-state-jstat" title="Permalink to this headline">¶</a></h3> |
| <p>If you suspect heap pressure you can use <code class="docutils literal notranslate"><span class="pre">jstat</span></code> to dive deep into the |
| garbage collection state of a Cassandra process. This command is always |
| safe to run and yields detailed heap information including eden heap usage (E), |
| old generation heap usage (O), count of eden collections (YGC), time spend in |
| eden collections (YGCT), old/mixed generation collections (FGC) and time spent |
| in old/mixed generation collections (FGCT):</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">jstat</span> <span class="o">-</span><span class="n">gcutil</span> <span class="o"><</span><span class="n">cassandra</span> <span class="n">pid</span><span class="o">></span> <span class="mi">500</span><span class="n">ms</span> |
| <span class="n">S0</span> <span class="n">S1</span> <span class="n">E</span> <span class="n">O</span> <span class="n">M</span> <span class="n">CCS</span> <span class="n">YGC</span> <span class="n">YGCT</span> <span class="n">FGC</span> <span class="n">FGCT</span> <span class="n">GCT</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">81.53</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">82.36</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">82.36</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">83.19</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">83.19</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">84.19</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">84.19</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">85.03</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">85.03</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| <span class="mf">0.00</span> <span class="mf">0.00</span> <span class="mf">85.94</span> <span class="mf">31.16</span> <span class="mf">93.07</span> <span class="mf">88.20</span> <span class="mi">12</span> <span class="mf">0.151</span> <span class="mi">3</span> <span class="mf">0.257</span> <span class="mf">0.408</span> |
| </pre></div> |
| </div> |
| <p>In this case we see we have a relatively healthy heap profile, with 31.16% |
| old generation heap usage and 83% eden. If the old generation routinely is |
| above 75% then you probably need more heap (assuming CMS with a 75% occupancy |
| threshold). If you do have such persistently high old gen that often means you |
| either have under-provisioned the old generation heap, or that there is too |
| much live data on heap for Cassandra to collect (e.g. because of memtables). |
| Another thing to watch for is time between young garbage collections (YGC), |
| which indicate how frequently the eden heap is collected. Each young gc pause |
| is about 20-50ms, so if you have a lot of them your clients will notice in |
| their high percentile latencies.</p> |
| </div> |
| <div class="section" id="thread-information-jstack"> |
| <h3>Thread Information (jstack)<a class="headerlink" href="#thread-information-jstack" title="Permalink to this headline">¶</a></h3> |
| <p>To get a point in time snapshot of exactly what Cassandra is doing, run |
| <code class="docutils literal notranslate"><span class="pre">jstack</span></code> against the Cassandra PID. <strong>Note</strong> that this does pause the JVM for |
| a very brief period (<20ms).:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ jstack <cassandra pid> > threaddump |
| |
| # display the threaddump |
| $ cat threaddump |
| ... |
| |
| # look at runnable threads |
| $grep RUNNABLE threaddump -B 1 |
| "Attach Listener" #15 daemon prio=9 os_prio=0 tid=0x00007f829c001000 nid=0x3a74 waiting on condition [0x0000000000000000] |
| java.lang.Thread.State: RUNNABLE |
| -- |
| "DestroyJavaVM" #13 prio=5 os_prio=0 tid=0x00007f82e800e000 nid=0x2a19 waiting on condition [0x0000000000000000] |
| java.lang.Thread.State: RUNNABLE |
| -- |
| "JPS thread pool" #10 prio=5 os_prio=0 tid=0x00007f82e84d0800 nid=0x2a2c runnable [0x00007f82d0856000] |
| java.lang.Thread.State: RUNNABLE |
| -- |
| "Service Thread" #9 daemon prio=9 os_prio=0 tid=0x00007f82e80d7000 nid=0x2a2a runnable [0x0000000000000000] |
| java.lang.Thread.State: RUNNABLE |
| -- |
| "C1 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007f82e80cc000 nid=0x2a29 waiting on condition [0x0000000000000000] |
| java.lang.Thread.State: RUNNABLE |
| -- |
| ... |
| |
| # Note that the nid is the Linux thread id |
| </pre></div> |
| </div> |
| <p>Some of the most important information in the threaddumps are waiting/blocking |
| threads, including what locks or monitors the thread is blocking/waiting on.</p> |
| </div> |
| </div> |
| <div class="section" id="basic-os-tooling"> |
| <h2>Basic OS Tooling<a class="headerlink" href="#basic-os-tooling" title="Permalink to this headline">¶</a></h2> |
| <p>A great place to start when debugging a Cassandra issue is understanding how |
| Cassandra is interacting with system resources. The following are all |
| resources that Cassandra makes heavy uses of:</p> |
| <ul class="simple"> |
| <li>CPU cores. For executing concurrent user queries</li> |
| <li>CPU processing time. For query activity (data decompression, row merging, |
| etc…)</li> |
| <li>CPU processing time (low priority). For background tasks (compaction, |
| streaming, etc …)</li> |
| <li>RAM for Java Heap. Used to hold internal data-structures and by default the |
| Cassandra memtables. Heap space is a crucial component of write performance |
| as well as generally.</li> |
| <li>RAM for OS disk cache. Used to cache frequently accessed SSTable blocks. OS |
| disk cache is a crucial component of read performance.</li> |
| <li>Disks. Cassandra cares a lot about disk read latency, disk write throughput, |
| and of course disk space.</li> |
| <li>Network latency. Cassandra makes many internode requests, so network latency |
| between nodes can directly impact performance.</li> |
| <li>Network throughput. Cassandra (as other databases) frequently have the |
| so called “incast” problem where a small request (e.g. <code class="docutils literal notranslate"><span class="pre">SELECT</span> <span class="pre">*</span> <span class="pre">from</span> |
| <span class="pre">foo.bar</span></code>) returns a massively large result set (e.g. the entire dataset). |
| In such situations outgoing bandwidth is crucial.</li> |
| </ul> |
| <p>Often troubleshooting Cassandra comes down to troubleshooting what resource |
| the machine or cluster is running out of. Then you create more of that resource |
| or change the query pattern to make less use of that resource.</p> |
| <div class="section" id="high-level-resource-usage-top-htop"> |
| <h3>High Level Resource Usage (top/htop)<a class="headerlink" href="#high-level-resource-usage-top-htop" title="Permalink to this headline">¶</a></h3> |
| <p>Cassandra makes signifiant use of system resources, and often the very first |
| useful action is to run <code class="docutils literal notranslate"><span class="pre">top</span></code> or <code class="docutils literal notranslate"><span class="pre">htop</span></code> (<a class="reference external" href="https://hisham.hm/htop/">website</a>)to see the state of the machine.</p> |
| <p>Useful things to look at:</p> |
| <ul class="simple"> |
| <li>System load levels. While these numbers can be confusing, generally speaking |
| if the load average is greater than the number of CPU cores, Cassandra |
| probably won’t have very good (sub 100 millisecond) latencies. See |
| <a class="reference external" href="http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html">Linux Load Averages</a> |
| for more information.</li> |
| <li>CPU utilization. <code class="docutils literal notranslate"><span class="pre">htop</span></code> in particular can help break down CPU utilization |
| into <code class="docutils literal notranslate"><span class="pre">user</span></code> (low and normal priority), <code class="docutils literal notranslate"><span class="pre">system</span></code> (kernel), and <code class="docutils literal notranslate"><span class="pre">io-wait</span></code> |
| . Cassandra query threads execute as normal priority <code class="docutils literal notranslate"><span class="pre">user</span></code> threads, while |
| compaction threads execute as low priority <code class="docutils literal notranslate"><span class="pre">user</span></code> threads. High <code class="docutils literal notranslate"><span class="pre">system</span></code> |
| time could indicate problems like thread contention, and high <code class="docutils literal notranslate"><span class="pre">io-wait</span></code> |
| may indicate slow disk drives. This can help you understand what Cassandra |
| is spending processing resources doing.</li> |
| <li>Memory usage. Look for which programs have the most resident memory, it is |
| probably Cassandra. The number for Cassandra is likely inaccurately high due |
| to how Linux (as of 2018) accounts for memory mapped file memory.</li> |
| </ul> |
| </div> |
| <div class="section" id="io-usage-iostat"> |
| <span id="os-iostat"></span><h3>IO Usage (iostat)<a class="headerlink" href="#io-usage-iostat" title="Permalink to this headline">¶</a></h3> |
| <p>Use iostat to determine how data drives are faring, including latency |
| distributions, throughput, and utilization:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ sudo iostat -xdm 2 |
| Linux 4.13.0-13-generic (hostname) 07/03/2018 _x86_64_ (8 CPU) |
| |
| Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util |
| sda 0.00 0.28 0.32 5.42 0.01 0.13 48.55 0.01 2.21 0.26 2.32 0.64 0.37 |
| sdb 0.00 0.00 0.00 0.00 0.00 0.00 79.34 0.00 0.20 0.20 0.00 0.16 0.00 |
| sdc 0.34 0.27 0.76 0.36 0.01 0.02 47.56 0.03 26.90 2.98 77.73 9.21 1.03 |
| |
| Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util |
| sda 0.00 0.00 2.00 32.00 0.01 4.04 244.24 0.54 16.00 0.00 17.00 1.06 3.60 |
| sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 |
| sdc 0.00 24.50 0.00 114.00 0.00 11.62 208.70 5.56 48.79 0.00 48.79 1.12 12.80 |
| </pre></div> |
| </div> |
| <p>In this case we can see that <code class="docutils literal notranslate"><span class="pre">/dev/sdc1</span></code> is a very slow drive, having an |
| <code class="docutils literal notranslate"><span class="pre">await</span></code> close to 50 milliseconds and an <code class="docutils literal notranslate"><span class="pre">avgqu-sz</span></code> close to 5 ios. The |
| drive is not particularly saturated (utilization is only 12.8%), but we should |
| still be concerned about how this would affect our p99 latency since 50ms is |
| quite long for typical Cassandra operations. That being said, in this case |
| most of the latency is present in writes (typically writes are more latent |
| than reads), which due to the LSM nature of Cassandra is often hidden from |
| the user.</p> |
| <p>Important metrics to assess using iostat:</p> |
| <ul class="simple"> |
| <li>Reads and writes per second. These numbers will change with the workload, |
| but generally speaking the more reads Cassandra has to do from disk the |
| slower Cassandra read latencies are. Large numbers of reads per second |
| can be a dead giveaway that the cluster has insufficient memory for OS |
| page caching.</li> |
| <li>Write throughput. Cassandra’s LSM model defers user writes and batches them |
| together, which means that throughput to the underlying medium is the most |
| important write metric for Cassandra.</li> |
| <li>Read latency (<code class="docutils literal notranslate"><span class="pre">r_await</span></code>). When Cassandra missed the OS page cache and reads |
| from SSTables, the read latency directly determines how fast Cassandra can |
| respond with the data.</li> |
| <li>Write latency. Cassandra is less sensitive to write latency except when it |
| syncs the commit log. This typically enters into the very high percentiles of |
| write latency.</li> |
| </ul> |
| <p>Note that to get detailed latency breakdowns you will need a more advanced |
| tool such as <a class="reference internal" href="#use-bcc-tools"><span class="std std-ref">bcc-tools</span></a>.</p> |
| </div> |
| <div class="section" id="os-page-cache-usage"> |
| <h3>OS page Cache Usage<a class="headerlink" href="#os-page-cache-usage" title="Permalink to this headline">¶</a></h3> |
| <p>As Cassandra makes heavy use of memory mapped files, the health of the |
| operating system’s <a class="reference external" href="https://en.wikipedia.org/wiki/Page_cache">Page Cache</a> is |
| crucial to performance. Start by finding how much available cache is in the |
| system:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ free -g |
| total used free shared buff/cache available |
| Mem: 15 9 2 0 3 5 |
| Swap: 0 0 0 |
| </pre></div> |
| </div> |
| <p>In this case 9GB of memory is used by user processes (Cassandra heap) and 8GB |
| is available for OS page cache. Of that, 3GB is actually used to cache files. |
| If most memory is used and unavailable to the page cache, Cassandra performance |
| can suffer significantly. This is why Cassandra starts with a reasonably small |
| amount of memory reserved for the heap.</p> |
| <p>If you suspect that you are missing the OS page cache frequently you can use |
| advanced tools like <a class="reference internal" href="#use-bcc-tools"><span class="std std-ref">cachestat</span></a> or |
| <a class="reference internal" href="#use-vmtouch"><span class="std std-ref">vmtouch</span></a> to dive deeper.</p> |
| </div> |
| <div class="section" id="network-latency-and-reliability"> |
| <h3>Network Latency and Reliability<a class="headerlink" href="#network-latency-and-reliability" title="Permalink to this headline">¶</a></h3> |
| <p>Whenever Cassandra does writes or reads that involve other replicas, |
| <code class="docutils literal notranslate"><span class="pre">LOCAL_QUORUM</span></code> reads for example, one of the dominant effects on latency is |
| network latency. When trying to debug issues with multi machine operations, |
| the network can be an important resource to investigate. You can determine |
| internode latency using tools like <code class="docutils literal notranslate"><span class="pre">ping</span></code> and <code class="docutils literal notranslate"><span class="pre">traceroute</span></code> or most |
| effectively <code class="docutils literal notranslate"><span class="pre">mtr</span></code>:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mtr -nr www.google.com |
| Start: Sun Jul 22 13:10:28 2018 |
| HOST: hostname Loss% Snt Last Avg Best Wrst StDev |
| 1.|-- 192.168.1.1 0.0% 10 2.0 1.9 1.1 3.7 0.7 |
| 2.|-- 96.123.29.15 0.0% 10 11.4 11.0 9.0 16.4 1.9 |
| 3.|-- 68.86.249.21 0.0% 10 10.6 10.7 9.0 13.7 1.1 |
| 4.|-- 162.141.78.129 0.0% 10 11.5 10.6 9.6 12.4 0.7 |
| 5.|-- 162.151.78.253 0.0% 10 10.9 12.1 10.4 20.2 2.8 |
| 6.|-- 68.86.143.93 0.0% 10 12.4 12.6 9.9 23.1 3.8 |
| 7.|-- 96.112.146.18 0.0% 10 11.9 12.4 10.6 15.5 1.6 |
| 9.|-- 209.85.252.250 0.0% 10 13.7 13.2 12.5 13.9 0.0 |
| 10.|-- 108.170.242.238 0.0% 10 12.7 12.4 11.1 13.0 0.5 |
| 11.|-- 74.125.253.149 0.0% 10 13.4 13.7 11.8 19.2 2.1 |
| 12.|-- 216.239.62.40 0.0% 10 13.4 14.7 11.5 26.9 4.6 |
| 13.|-- 108.170.242.81 0.0% 10 14.4 13.2 10.9 16.0 1.7 |
| 14.|-- 72.14.239.43 0.0% 10 12.2 16.1 11.0 32.8 7.1 |
| 15.|-- 216.58.195.68 0.0% 10 25.1 15.3 11.1 25.1 4.8 |
| </pre></div> |
| </div> |
| <p>In this example of <code class="docutils literal notranslate"><span class="pre">mtr</span></code>, we can rapidly assess the path that your packets |
| are taking, as well as what their typical loss and latency are. Packet loss |
| typically leads to between <code class="docutils literal notranslate"><span class="pre">200ms</span></code> and <code class="docutils literal notranslate"><span class="pre">3s</span></code> of additional latency, so that |
| can be a common cause of latency issues.</p> |
| </div> |
| <div class="section" id="network-throughput"> |
| <h3>Network Throughput<a class="headerlink" href="#network-throughput" title="Permalink to this headline">¶</a></h3> |
| <p>As Cassandra is sensitive to outgoing bandwidth limitations, sometimes it is |
| useful to determine if network throughput is limited. One handy tool to do |
| this is <a class="reference external" href="https://www.systutorials.com/docs/linux/man/8-iftop/">iftop</a> which |
| shows both bandwidth usage as well as connection information at a glance. An |
| example showing traffic during a stress run against a local <code class="docutils literal notranslate"><span class="pre">ccm</span></code> cluster:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ # remove the -t for ncurses instead of pure text |
| $ sudo iftop -nNtP -i lo |
| interface: lo |
| IP address is: 127.0.0.1 |
| MAC address is: 00:00:00:00:00:00 |
| Listening on lo |
| # Host name (port/service if enabled) last 2s last 10s last 40s cumulative |
| -------------------------------------------------------------------------------------------- |
| 1 127.0.0.1:58946 => 869Kb 869Kb 869Kb 217KB |
| 127.0.0.3:9042 <= 0b 0b 0b 0B |
| 2 127.0.0.1:54654 => 736Kb 736Kb 736Kb 184KB |
| 127.0.0.1:9042 <= 0b 0b 0b 0B |
| 3 127.0.0.1:51186 => 669Kb 669Kb 669Kb 167KB |
| 127.0.0.2:9042 <= 0b 0b 0b 0B |
| 4 127.0.0.3:9042 => 3.30Kb 3.30Kb 3.30Kb 845B |
| 127.0.0.1:58946 <= 0b 0b 0b 0B |
| 5 127.0.0.1:9042 => 2.79Kb 2.79Kb 2.79Kb 715B |
| 127.0.0.1:54654 <= 0b 0b 0b 0B |
| 6 127.0.0.2:9042 => 2.54Kb 2.54Kb 2.54Kb 650B |
| 127.0.0.1:51186 <= 0b 0b 0b 0B |
| 7 127.0.0.1:36894 => 1.65Kb 1.65Kb 1.65Kb 423B |
| 127.0.0.5:7000 <= 0b 0b 0b 0B |
| 8 127.0.0.1:38034 => 1.50Kb 1.50Kb 1.50Kb 385B |
| 127.0.0.2:7000 <= 0b 0b 0b 0B |
| 9 127.0.0.1:56324 => 1.50Kb 1.50Kb 1.50Kb 383B |
| 127.0.0.1:7000 <= 0b 0b 0b 0B |
| 10 127.0.0.1:53044 => 1.43Kb 1.43Kb 1.43Kb 366B |
| 127.0.0.4:7000 <= 0b 0b 0b 0B |
| -------------------------------------------------------------------------------------------- |
| Total send rate: 2.25Mb 2.25Mb 2.25Mb |
| Total receive rate: 0b 0b 0b |
| Total send and receive rate: 2.25Mb 2.25Mb 2.25Mb |
| -------------------------------------------------------------------------------------------- |
| Peak rate (sent/received/total): 2.25Mb 0b 2.25Mb |
| Cumulative (sent/received/total): 576KB 0B 576KB |
| ============================================================================================ |
| </pre></div> |
| </div> |
| <p>In this case we can see that bandwidth is fairly shared between many peers, |
| but if the total was getting close to the rated capacity of the NIC or was focussed |
| on a single client, that may indicate a clue as to what issue is occurring.</p> |
| </div> |
| </div> |
| <div class="section" id="advanced-tools"> |
| <h2>Advanced tools<a class="headerlink" href="#advanced-tools" title="Permalink to this headline">¶</a></h2> |
| <p>Sometimes as an operator you may need to really dive deep. This is where |
| advanced OS tooling can come in handy.</p> |
| <div class="section" id="bcc-tools"> |
| <span id="use-bcc-tools"></span><h3>bcc-tools<a class="headerlink" href="#bcc-tools" title="Permalink to this headline">¶</a></h3> |
| <p>Most modern Linux distributions (kernels newer than <code class="docutils literal notranslate"><span class="pre">4.1</span></code>) support <a class="reference external" href="https://github.com/iovisor/bcc">bcc-tools</a> for diving deep into performance problems. |
| First install <code class="docutils literal notranslate"><span class="pre">bcc-tools</span></code>, e.g. via <code class="docutils literal notranslate"><span class="pre">apt</span></code> on Debian:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ apt install bcc-tools |
| </pre></div> |
| </div> |
| <p>Then you can use all the tools that <code class="docutils literal notranslate"><span class="pre">bcc-tools</span></code> contains. One of the most |
| useful tools is <code class="docutils literal notranslate"><span class="pre">cachestat</span></code> |
| (<a class="reference external" href="https://github.com/iovisor/bcc/blob/master/tools/cachestat_example.txt">cachestat examples</a>) |
| which allows you to determine exactly how many OS page cache hits and misses |
| are happening:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ sudo /usr/share/bcc/tools/cachestat -T 1 |
| TIME TOTAL MISSES HITS DIRTIES BUFFERS_MB CACHED_MB |
| 18:44:08 66 66 0 64 88 4427 |
| 18:44:09 40 40 0 75 88 4427 |
| 18:44:10 4353 45 4308 203 88 4427 |
| 18:44:11 84 77 7 13 88 4428 |
| 18:44:12 2511 14 2497 14 88 4428 |
| 18:44:13 101 98 3 18 88 4428 |
| 18:44:14 16741 0 16741 58 88 4428 |
| 18:44:15 1935 36 1899 18 88 4428 |
| 18:44:16 89 34 55 18 88 4428 |
| </pre></div> |
| </div> |
| <p>In this case there are not too many page cache <code class="docutils literal notranslate"><span class="pre">MISSES</span></code> which indicates a |
| reasonably sized cache. These metrics are the most direct measurement of your |
| Cassandra node’s “hot” dataset. If you don’t have enough cache, <code class="docutils literal notranslate"><span class="pre">MISSES</span></code> will |
| be high and performance will be slow. If you have enough cache, <code class="docutils literal notranslate"><span class="pre">MISSES</span></code> will |
| be low and performance will be fast (as almost all reads are being served out |
| of memory).</p> |
| <p>You can also measure disk latency distributions using <code class="docutils literal notranslate"><span class="pre">biolatency</span></code> |
| (<a class="reference external" href="https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt">biolatency examples</a>) |
| to get an idea of how slow Cassandra will be when reads miss the OS page Cache |
| and have to hit disks:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ sudo /usr/share/bcc/tools/biolatency -D 10 |
| Tracing block device I/O... Hit Ctrl-C to end. |
| |
| |
| disk = 'sda' |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 12 |****************************************| |
| 32 -> 63 : 9 |****************************** | |
| 64 -> 127 : 1 |*** | |
| 128 -> 255 : 3 |********** | |
| 256 -> 511 : 7 |*********************** | |
| 512 -> 1023 : 2 |****** | |
| |
| disk = 'sdc' |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 0 | | |
| 64 -> 127 : 41 |************ | |
| 128 -> 255 : 17 |***** | |
| 256 -> 511 : 13 |*** | |
| 512 -> 1023 : 2 | | |
| 1024 -> 2047 : 0 | | |
| 2048 -> 4095 : 0 | | |
| 4096 -> 8191 : 56 |***************** | |
| 8192 -> 16383 : 131 |****************************************| |
| 16384 -> 32767 : 9 |** | |
| </pre></div> |
| </div> |
| <p>In this case most ios on the data drive (<code class="docutils literal notranslate"><span class="pre">sdc</span></code>) are fast, but many take |
| between 8 and 16 milliseconds.</p> |
| <p>Finally <code class="docutils literal notranslate"><span class="pre">biosnoop</span></code> (<a class="reference external" href="https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt">examples</a>) |
| can be used to dive even deeper and see per IO latencies:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ sudo /usr/share/bcc/tools/biosnoop | grep java | head |
| 0.000000000 java 17427 sdc R 3972458600 4096 13.58 |
| 0.000818000 java 17427 sdc R 3972459408 4096 0.35 |
| 0.007098000 java 17416 sdc R 3972401824 4096 5.81 |
| 0.007896000 java 17416 sdc R 3972489960 4096 0.34 |
| 0.008920000 java 17416 sdc R 3972489896 4096 0.34 |
| 0.009487000 java 17427 sdc R 3972401880 4096 0.32 |
| 0.010238000 java 17416 sdc R 3972488368 4096 0.37 |
| 0.010596000 java 17427 sdc R 3972488376 4096 0.34 |
| 0.011236000 java 17410 sdc R 3972488424 4096 0.32 |
| 0.011825000 java 17427 sdc R 3972488576 16384 0.65 |
| ... time passes |
| 8.032687000 java 18279 sdc R 10899712 122880 3.01 |
| 8.033175000 java 18279 sdc R 10899952 8192 0.46 |
| 8.073295000 java 18279 sdc R 23384320 122880 3.01 |
| 8.073768000 java 18279 sdc R 23384560 8192 0.46 |
| </pre></div> |
| </div> |
| <p>With <code class="docutils literal notranslate"><span class="pre">biosnoop</span></code> you see every single IO and how long they take. This data |
| can be used to construct the latency distributions in <code class="docutils literal notranslate"><span class="pre">biolatency</span></code> but can |
| also be used to better understand how disk latency affects performance. For |
| example this particular drive takes ~3ms to service a memory mapped read due to |
| the large default value (<code class="docutils literal notranslate"><span class="pre">128kb</span></code>) of <code class="docutils literal notranslate"><span class="pre">read_ahead_kb</span></code>. To improve point read |
| performance you may may want to decrease <code class="docutils literal notranslate"><span class="pre">read_ahead_kb</span></code> on fast data volumes |
| such as SSDs while keeping the a higher value like <code class="docutils literal notranslate"><span class="pre">128kb</span></code> value is probably |
| right for HDs. There are tradeoffs involved, see <a class="reference external" href="https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt">queue-sysfs</a> docs for more |
| information, but regardless <code class="docutils literal notranslate"><span class="pre">biosnoop</span></code> is useful for understanding <em>how</em> |
| Cassandra uses drives.</p> |
| </div> |
| <div class="section" id="vmtouch"> |
| <span id="use-vmtouch"></span><h3>vmtouch<a class="headerlink" href="#vmtouch" title="Permalink to this headline">¶</a></h3> |
| <p>Sometimes it’s useful to know how much of the Cassandra data files are being |
| cached by the OS. A great tool for answering this question is |
| <a class="reference external" href="https://github.com/hoytech/vmtouch">vmtouch</a>.</p> |
| <p>First install it:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ git clone https://github.com/hoytech/vmtouch.git |
| $ cd vmtouch |
| $ make |
| </pre></div> |
| </div> |
| <p>Then run it on the Cassandra data directory:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ./vmtouch /var/lib/cassandra/data/ |
| Files: 312 |
| Directories: 92 |
| Resident Pages: 62503/64308 244M/251M 97.2% |
| Elapsed: 0.005657 seconds |
| </pre></div> |
| </div> |
| <p>In this case almost the entire dataset is hot in OS page Cache. Generally |
| speaking the percentage doesn’t really matter unless reads are missing the |
| cache (per e.g. <a class="reference internal" href="#use-bcc-tools"><span class="std std-ref">cachestat</span></a>), in which case having |
| additional memory may help read performance.</p> |
| </div> |
| <div class="section" id="cpu-flamegraphs"> |
| <h3>CPU Flamegraphs<a class="headerlink" href="#cpu-flamegraphs" title="Permalink to this headline">¶</a></h3> |
| <p>Cassandra often uses a lot of CPU, but telling <em>what</em> it is doing can prove |
| difficult. One of the best ways to analyze Cassandra on CPU time is to use |
| <a class="reference external" href="http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html">CPU Flamegraphs</a> |
| which display in a useful way which areas of Cassandra code are using CPU. This |
| may help narrow down a compaction problem to a “compaction problem dropping |
| tombstones” or just generally help you narrow down what Cassandra is doing |
| while it is having an issue. To get CPU flamegraphs follow the instructions for |
| <a class="reference external" href="http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#Java">Java Flamegraphs</a>.</p> |
| <p>Generally:</p> |
| <ol class="arabic simple"> |
| <li>Enable the <code class="docutils literal notranslate"><span class="pre">-XX:+PreserveFramePointer</span></code> option in Cassandra’s |
| <code class="docutils literal notranslate"><span class="pre">jvm.options</span></code> configuation file. This has a negligible performance impact |
| but allows you actually see what Cassandra is doing.</li> |
| <li>Run <code class="docutils literal notranslate"><span class="pre">perf</span></code> to get some data.</li> |
| <li>Send that data through the relevant scripts in the FlameGraph toolset and |
| convert the data into a pretty flamegraph. View the resulting SVG image in |
| a browser or other image browser.</li> |
| </ol> |
| <p>For example just cloning straight off github we first install the |
| <code class="docutils literal notranslate"><span class="pre">perf-map-agent</span></code> to the location of our JVMs (assumed to be |
| <code class="docutils literal notranslate"><span class="pre">/usr/lib/jvm</span></code>):</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ sudo bash |
| $ export JAVA_HOME=/usr/lib/jvm/java-8-oracle/ |
| $ cd /usr/lib/jvm |
| $ git clone --depth=1 https://github.com/jvm-profiling-tools/perf-map-agent |
| $ cd perf-map-agent |
| $ cmake . |
| $ make |
| </pre></div> |
| </div> |
| <p>Now to get a flamegraph:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ git clone --depth=1 https://github.com/brendangregg/FlameGraph |
| $ sudo bash |
| $ cd FlameGraph |
| $ # Record traces of Cassandra and map symbols for all java processes |
| $ perf record -F 49 -a -g -p <CASSANDRA PID> -- sleep 30; ./jmaps |
| $ # Translate the data |
| $ perf script > cassandra_stacks |
| $ cat cassandra_stacks | ./stackcollapse-perf.pl | grep -v cpu_idle | \ |
| ./flamegraph.pl --color=java --hash > cassandra_flames.svg |
| </pre></div> |
| </div> |
| <p>The resulting SVG is searchable, zoomable, and generally easy to introspect |
| using a browser.</p> |
| </div> |
| <div class="section" id="packet-capture"> |
| <span id="id4"></span><h3>Packet Capture<a class="headerlink" href="#packet-capture" title="Permalink to this headline">¶</a></h3> |
| <p>Sometimes you have to understand what queries a Cassandra node is performing |
| <em>right now</em> to troubleshoot an issue. For these times trusty packet capture |
| tools like <code class="docutils literal notranslate"><span class="pre">tcpdump</span></code> and <a class="reference external" href="https://www.wireshark.org/">Wireshark</a> can be very helpful to dissect packet captures. |
| Wireshark even has native <a class="reference external" href="https://www.wireshark.org/docs/dfref/c/cql.html">CQL support</a> although it sometimes has |
| compatibility issues with newer Cassandra protocol releases.</p> |
| <p>To get a packet capture first capture some packets:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ sudo tcpdump -U -s0 -i <INTERFACE> -w cassandra.pcap -n "tcp port 9042" |
| </pre></div> |
| </div> |
| <p>Now open it up with wireshark:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ wireshark cassandra.pcap |
| </pre></div> |
| </div> |
| <p>If you don’t see CQL like statements try telling to decode as CQL by right |
| clicking on a packet going to 9042 -> <code class="docutils literal notranslate"><span class="pre">Decode</span> <span class="pre">as</span></code> -> select CQL from the |
| dropdown for port 9042.</p> |
| <p>If you don’t want to do this manually or use a GUI, you can also use something |
| like <a class="reference external" href="https://github.com/jolynch/cqltrace">cqltrace</a> to ease obtaining and |
| parsing CQL packet captures.</p> |
| </div> |
| </div> |
| </div> |
| |
| |
| |
| |
| <div class="doc-prev-next-links" role="navigation" aria-label="footer navigation"> |
| |
| <a href="../development/index.html" class="btn btn-default pull-right " role="button" title="Contributing to Cassandra" accesskey="n">Next <span class="glyphicon glyphicon-circle-arrow-right" aria-hidden="true"></span></a> |
| |
| |
| <a href="use_nodetool.html" class="btn btn-default" role="button" title="Use Nodetool" accesskey="p"><span class="glyphicon glyphicon-circle-arrow-left" aria-hidden="true"></span> Previous</a> |
| |
| </div> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |