blob: 70857db4ec33851b112abd64dea5d55f80b7c4a4 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Improved Streaming &mdash; Apache Cassandra Documentation v4.0-rc2</title>
<script type="text/javascript" src="../_static/js/modernizr.min.js"></script>
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/extra.css" type="text/css" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Transient Replication" href="transientreplication.html" />
<link rel="prev" title="Improved Internode Messaging" href="messaging.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home"> Apache Cassandra
</a>
<div class="version">
4.0-rc2
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">New Features in Apache Cassandra 4.0</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="java11.html">Support for Java 11</a></li>
<li class="toctree-l2"><a class="reference internal" href="virtualtables.html">Virtual Tables</a></li>
<li class="toctree-l2"><a class="reference internal" href="auditlogging.html">Audit Logging</a></li>
<li class="toctree-l2"><a class="reference internal" href="fqllogging.html">Full Query Logging (FQL)</a></li>
<li class="toctree-l2"><a class="reference internal" href="messaging.html">Improved Internode Messaging</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Improved Streaming</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#streaming-based-on-netty">Streaming based on Netty</a></li>
<li class="toctree-l3"><a class="reference internal" href="#zero-copy-streaming">Zero Copy Streaming</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#high-availability">High Availability</a></li>
<li class="toctree-l4"><a class="reference internal" href="#enabling-zero-copy-streaming">Enabling Zero Copy Streaming</a></li>
<li class="toctree-l4"><a class="reference internal" href="#sstables-eligible-for-zero-copy-streaming">SSTables Eligible for Zero Copy Streaming</a></li>
<li class="toctree-l4"><a class="reference internal" href="#benefits-of-zero-copy-streaming">Benefits of Zero Copy Streaming</a></li>
<li class="toctree-l4"><a class="reference internal" href="#configuring-for-zero-copy-streaming">Configuring for Zero Copy Streaming</a></li>
<li class="toctree-l4"><a class="reference internal" href="#sstable-components-streamed-with-zero-copy-streaming">SSTable Components Streamed with Zero Copy Streaming</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="#repair-streaming-preview">Repair Streaming Preview</a></li>
<li class="toctree-l3"><a class="reference internal" href="#parallelizing-of-streaming-of-keyspaces">Parallelizing of Streaming of Keyspaces</a></li>
<li class="toctree-l3"><a class="reference internal" href="#unique-nodes-for-streaming-in-multi-dc-deployment">Unique nodes for Streaming in Multi-DC deployment</a></li>
<li class="toctree-l3"><a class="reference internal" href="#stream-operation-types">Stream Operation Types</a></li>
<li class="toctree-l3"><a class="reference internal" href="#disallow-decommission-when-number-of-replicas-will-drop-below-configured-rf">Disallow Decommission when number of Replicas will drop below configured RF</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="transientreplication.html">Transient Replication</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li>
<li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li>
<li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li>
<li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li>
<li class="toctree-l1"><a class="reference internal" href="../operating/index.html">Operating Cassandra</a></li>
<li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="../troubleshooting/index.html">Troubleshooting</a></li>
<li class="toctree-l1"><a class="reference internal" href="../development/index.html">Contributing to Cassandra</a></li>
<li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li>
<li class="toctree-l1"><a class="reference internal" href="../plugins/index.html">Third-Party Plugins</a></li>
<li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs</a></li>
<li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Apache Cassandra</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li><a href="index.html">New Features in Apache Cassandra 4.0</a> &raquo;</li>
<li>Improved Streaming</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/new/streaming.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="improved-streaming">
<h1>Improved Streaming<a class="headerlink" href="#improved-streaming" title="Permalink to this headline"></a></h1>
<p>Apache Cassandra 4.0 has made several improvements to streaming. Streaming is the process used by nodes of a cluster to exchange data in the form of SSTables. Streaming of SSTables is performed for several operations, such as:</p>
<ul class="simple">
<li><p>SSTable Repair</p></li>
<li><p>Host Replacement</p></li>
<li><p>Range movements</p></li>
<li><p>Bootstrapping</p></li>
<li><p>Rebuild</p></li>
<li><p>Cluster expansion</p></li>
</ul>
<div class="section" id="streaming-based-on-netty">
<h2>Streaming based on Netty<a class="headerlink" href="#streaming-based-on-netty" title="Permalink to this headline"></a></h2>
<p>Streaming in Cassandra 4.0 is based on Non-blocking Input/Output (NIO) with Netty (<a class="reference external" href="https://issues.apache.org/jira/browse/CASSANDRA-12229">CASSANDRA-12229</a>). It replaces the single-threaded (or sequential), synchronous, blocking model of streaming messages and transfer of files. Netty supports non-blocking, asynchronous, multi-threaded streaming with which multiple connections are opened simultaneously. Non-blocking implies that threads are not blocked as they don’t wait for a response for a sent request. A response could be returned in a different thread. With asynchronous, connections and threads are decoupled and do not have a 1:1 relation. Several more connections than threads may be opened.</p>
</div>
<div class="section" id="zero-copy-streaming">
<h2>Zero Copy Streaming<a class="headerlink" href="#zero-copy-streaming" title="Permalink to this headline"></a></h2>
<p>Pre-4.0, during streaming Cassandra reifies the SSTables into objects. This creates unnecessary garbage and slows down the whole streaming process as some SSTables can be transferred as a whole file rather than individual partitions. Cassandra 4.0 has added support for streaming entire SSTables when possible (<a class="reference external" href="https://issues.apache.org/jira/browse/CASSANDRA-14556">CASSANDRA-14556</a>) for faster Streaming using ZeroCopy APIs. If enabled, Cassandra will use ZeroCopy for eligible SSTables significantly speeding up transfers and increasing throughput. A zero-copy path avoids bringing data into user-space on both sending and receiving side. Any streaming related operations will notice corresponding improvement. Zero copy streaming is hardware bound; only limited by the hardware limitations (Network and Disk IO ).</p>
<div class="section" id="high-availability">
<h3>High Availability<a class="headerlink" href="#high-availability" title="Permalink to this headline"></a></h3>
<p>In benchmark tests Zero Copy Streaming is 5x faster than partitions based streaming. Faster streaming provides the benefit of improved availability. A cluster’s recovery mainly depends on the streaming speed, Cassandra clusters with failed nodes will be able to recover much more quickly (5x faster). If a node fails, SSTables need to be streamed to a replacement node. During the replacement operation, the new Cassandra node streams SSTables from the neighboring nodes that hold copies of the data belonging to this new node’s token range. Depending on the amount of data stored, this process can require substantial network bandwidth, taking some time to complete. The longer these range movement operations take, the more the cluster availability is lost. Failure of multiple nodes would reduce high availability greatly. The faster the new node completes streaming its data, the faster it can serve traffic, increasing the availability of the cluster.</p>
</div>
<div class="section" id="enabling-zero-copy-streaming">
<h3>Enabling Zero Copy Streaming<a class="headerlink" href="#enabling-zero-copy-streaming" title="Permalink to this headline"></a></h3>
<p>Zero copy streaming is enabled by setting the following setting in <code class="docutils literal notranslate"><span class="pre">cassandra.yaml</span></code>.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">stream_entire_sstables</span><span class="p">:</span> <span class="n">true</span>
</pre></div>
</div>
<p>It is enabled by default.</p>
<p>This feature is automatically disabled if internode encryption is enabled.</p>
</div>
<div class="section" id="sstables-eligible-for-zero-copy-streaming">
<h3>SSTables Eligible for Zero Copy Streaming<a class="headerlink" href="#sstables-eligible-for-zero-copy-streaming" title="Permalink to this headline"></a></h3>
<p>Zero copy streaming is used if all partitions within the SSTable need to be transmitted. This is common when using <code class="docutils literal notranslate"><span class="pre">LeveledCompactionStrategy</span></code> or when partitioning SSTables by token range has been enabled. All partition keys in the SSTables are iterated over to determine the eligibility for Zero Copy streaming.</p>
</div>
<div class="section" id="benefits-of-zero-copy-streaming">
<h3>Benefits of Zero Copy Streaming<a class="headerlink" href="#benefits-of-zero-copy-streaming" title="Permalink to this headline"></a></h3>
<p>When enabled, it permits Cassandra to zero-copy stream entire eligible SSTables between nodes, including every component. This speeds up the network transfer significantly subject to throttling specified by <code class="docutils literal notranslate"><span class="pre">stream_throughput_outbound_megabits_per_sec</span></code>.</p>
<p>Enabling zero copy streaming also reduces the GC pressure on the sending and receiving nodes.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>While this feature tries to keep the disks balanced, it cannot guarantee it.
For instance, it is expected that some of the SSTables do not fit entirely in their disk boundaries, when bootstraping a new node having multiple data directoris.</p>
</div>
</div>
<div class="section" id="configuring-for-zero-copy-streaming">
<h3>Configuring for Zero Copy Streaming<a class="headerlink" href="#configuring-for-zero-copy-streaming" title="Permalink to this headline"></a></h3>
<p>Throttling would reduce the streaming speed. The <code class="docutils literal notranslate"><span class="pre">stream_throughput_outbound_megabits_per_sec</span></code> throttles all outbound streaming file transfers on a node to the given total throughput in Mbps. When unset, the default is 200 Mbps or 25 MB/s.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">stream_throughput_outbound_megabits_per_sec</span><span class="p">:</span> <span class="mi">200</span>
</pre></div>
</div>
<p>To run any Zero Copy streaming benchmark the <code class="docutils literal notranslate"><span class="pre">stream_throughput_outbound_megabits_per_sec</span></code> must be set to a really high value otherwise, throttling will be significant and the benchmark results will not be meaningful.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">inter_dc_stream_throughput_outbound_megabits_per_sec</span></code> throttles all streaming file transfer between the datacenters, this setting allows users to throttle inter dc stream throughput in addition to throttling all network stream traffic as configured with <code class="docutils literal notranslate"><span class="pre">stream_throughput_outbound_megabits_per_sec</span></code>. When unset, the default is 200 Mbps or 25 MB/s.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">inter_dc_stream_throughput_outbound_megabits_per_sec</span><span class="p">:</span> <span class="mi">200</span>
</pre></div>
</div>
</div>
<div class="section" id="sstable-components-streamed-with-zero-copy-streaming">
<h3>SSTable Components Streamed with Zero Copy Streaming<a class="headerlink" href="#sstable-components-streamed-with-zero-copy-streaming" title="Permalink to this headline"></a></h3>
<p>Zero Copy Streaming streams entire SSTables. SSTables are made up of multiple components in separate files. SSTable components streamed are listed in Table 1.</p>
<p>Table 1. SSTable Components</p>
<table class="docutils align-default">
<colgroup>
<col style="width: 26%" />
<col style="width: 74%" />
</colgroup>
<tbody>
<tr class="row-odd"><td><p>SSTable Component</p></td>
<td><p>Description</p></td>
</tr>
<tr class="row-even"><td><p>Data.db</p></td>
<td><p>The base data for an SSTable: the remaining
components can be regenerated based on the data
component.</p></td>
</tr>
<tr class="row-odd"><td><p>Index.db</p></td>
<td><p>Index of the row keys with pointers to their
positions in the data file.</p></td>
</tr>
<tr class="row-even"><td><p>Filter.db</p></td>
<td><p>Serialized bloom filter for the row keys in the
SSTable.</p></td>
</tr>
<tr class="row-odd"><td><p>CompressionInfo.db</p></td>
<td><p>File to hold information about uncompressed
data length, chunk offsets etc.</p></td>
</tr>
<tr class="row-even"><td><p>Statistics.db</p></td>
<td><p>Statistical metadata about the content of the
SSTable.</p></td>
</tr>
<tr class="row-odd"><td><p>Digest.crc32</p></td>
<td><p>Holds CRC32 checksum of the data file
size_bytes.</p></td>
</tr>
<tr class="row-even"><td><p>CRC.db</p></td>
<td><p>Holds the CRC32 for chunks in an uncompressed file.</p></td>
</tr>
<tr class="row-odd"><td><p>Summary.db</p></td>
<td><p>Holds SSTable Index Summary
(sampling of Index component)</p></td>
</tr>
<tr class="row-even"><td><p>TOC.txt</p></td>
<td><p>Table of contents, stores the list of all
components for the SSTable.</p></td>
</tr>
</tbody>
</table>
<p>Custom component, used by e.g. custom compaction strategy may also be included.</p>
</div>
</div>
<div class="section" id="repair-streaming-preview">
<h2>Repair Streaming Preview<a class="headerlink" href="#repair-streaming-preview" title="Permalink to this headline"></a></h2>
<p>Repair with <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">repair</span></code> involves streaming of repaired SSTables and a repair preview has been added to provide an estimate of the amount of repair streaming that would need to be performed. Repair preview (<a class="reference external" href="https://issues.apache.org/jira/browse/CASSANDRA-13257">CASSANDRA-13257</a>) is invoke with <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">repair</span> <span class="pre">--preview</span></code> using option:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">-</span><span class="n">prv</span><span class="p">,</span> <span class="o">--</span><span class="n">preview</span>
</pre></div>
</div>
<p>It determines ranges and amount of data to be streamed, but doesn’t actually perform repair.</p>
</div>
<div class="section" id="parallelizing-of-streaming-of-keyspaces">
<h2>Parallelizing of Streaming of Keyspaces<a class="headerlink" href="#parallelizing-of-streaming-of-keyspaces" title="Permalink to this headline"></a></h2>
<p>The streaming of the different keyspaces for bootstrap and rebuild has been parallelized in Cassandra 4.0 (<a class="reference external" href="https://issues.apache.org/jira/browse/CASSANDRA-4663">CASSANDRA-4663</a>).</p>
</div>
<div class="section" id="unique-nodes-for-streaming-in-multi-dc-deployment">
<h2>Unique nodes for Streaming in Multi-DC deployment<a class="headerlink" href="#unique-nodes-for-streaming-in-multi-dc-deployment" title="Permalink to this headline"></a></h2>
<p>Range Streamer picks unique nodes to stream data from when number of replicas in each DC is three or more (<a class="reference external" href="https://issues.apache.org/jira/browse/CASSANDRA-4650">CASSANDRA-4650</a>). What the optimization does is to even out the streaming load across the cluster. Without the optimization, some node can be picked up to stream more data than others. This patch allows to select dedicated node to stream only one range.</p>
<p>This will increase the performance of bootstrapping a node and will also put less pressure on nodes serving the data. This does not affect if N &lt; 3 in each DC as then it streams data from only 2 nodes.</p>
</div>
<div class="section" id="stream-operation-types">
<h2>Stream Operation Types<a class="headerlink" href="#stream-operation-types" title="Permalink to this headline"></a></h2>
<p>It is important to know the type or purpose of a certain stream. Version 4.0 (<a class="reference external" href="https://issues.apache.org/jira/browse/CASSANDRA-13064">CASSANDRA-13064</a>) adds an <code class="docutils literal notranslate"><span class="pre">enum</span></code> to distinguish between the different types of streams. Stream types are available both in a stream request and a stream task. The different stream types are:</p>
<ul class="simple">
<li><p>Restore replica count</p></li>
<li><p>Unbootstrap</p></li>
<li><p>Relocation</p></li>
<li><p>Bootstrap</p></li>
<li><p>Rebuild</p></li>
<li><p>Bulk Load</p></li>
<li><p>Repair</p></li>
</ul>
</div>
<div class="section" id="disallow-decommission-when-number-of-replicas-will-drop-below-configured-rf">
<h2>Disallow Decommission when number of Replicas will drop below configured RF<a class="headerlink" href="#disallow-decommission-when-number-of-replicas-will-drop-below-configured-rf" title="Permalink to this headline"></a></h2>
<p><a class="reference external" href="https://issues.apache.org/jira/browse/CASSANDRA-12510">CASSANDRA-12510</a> guards against decommission that will drop # of replicas below configured replication factor (RF), and adds the <code class="docutils literal notranslate"><span class="pre">--force</span></code> option that allows decommission to continue if intentional; force decommission of this node even when it reduces the number of replicas to below configured RF.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="transientreplication.html" class="btn btn-neutral float-right" title="Transient Replication" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="messaging.html" class="btn btn-neutral float-left" title="Improved Internode Messaging" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright 2020, The Apache Cassandra team
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>