blob: f690f378aadaffe20da57b40d658fd463bb8e54f [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Compression &mdash; Apache Cassandra Documentation v3.11.11</title>
<script type="text/javascript" src="../_static/js/modernizr.min.js"></script>
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/language_data.js"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/extra.css" type="text/css" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Change Data Capture" href="cdc.html" />
<link rel="prev" title="Bloom Filters" href="bloom_filters.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home"> Apache Cassandra
</a>
<div class="version">
3.11.11
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li>
<li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li>
<li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li>
<li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li>
<li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Operating Cassandra</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="snitch.html">Snitch</a></li>
<li class="toctree-l2"><a class="reference internal" href="topo_changes.html">Adding, replacing, moving and removing nodes</a></li>
<li class="toctree-l2"><a class="reference internal" href="repair.html">Repair</a></li>
<li class="toctree-l2"><a class="reference internal" href="read_repair.html">Read repair</a></li>
<li class="toctree-l2"><a class="reference internal" href="hints.html">Hints</a></li>
<li class="toctree-l2"><a class="reference internal" href="compaction.html">Compaction</a></li>
<li class="toctree-l2"><a class="reference internal" href="bloom_filters.html">Bloom Filters</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Compression</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#configuring-compression">Configuring Compression</a></li>
<li class="toctree-l3"><a class="reference internal" href="#benefits-and-uses">Benefits and Uses</a></li>
<li class="toctree-l3"><a class="reference internal" href="#operational-impact">Operational Impact</a></li>
<li class="toctree-l3"><a class="reference internal" href="#advanced-use">Advanced Use</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="cdc.html">Change Data Capture</a></li>
<li class="toctree-l2"><a class="reference internal" href="backups.html">Backups</a></li>
<li class="toctree-l2"><a class="reference internal" href="bulk_loading.html">Bulk Loading</a></li>
<li class="toctree-l2"><a class="reference internal" href="metrics.html">Monitoring</a></li>
<li class="toctree-l2"><a class="reference internal" href="security.html">Security</a></li>
<li class="toctree-l2"><a class="reference internal" href="hardware.html">Hardware Choices</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="../troubleshooting/index.html">Troubleshooting</a></li>
<li class="toctree-l1"><a class="reference internal" href="../development/index.html">Cassandra Development</a></li>
<li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li>
<li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs and Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Apache Cassandra</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li><a href="index.html">Operating Cassandra</a> &raquo;</li>
<li>Compression</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/operating/compression.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="compression">
<h1>Compression<a class="headerlink" href="#compression" title="Permalink to this headline"></a></h1>
<p>Cassandra offers operators the ability to configure compression on a per-table basis. Compression reduces the size of
data on disk by compressing the SSTable in user-configurable compression <code class="docutils literal notranslate"><span class="pre">chunk_length_in_kb</span></code>. Because Cassandra
SSTables are immutable, the CPU cost of compressing is only necessary when the SSTable is written - subsequent updates
to data will land in different SSTables, so Cassandra will not need to decompress, overwrite, and recompress data when
UPDATE commands are issued. On reads, Cassandra will locate the relevant compressed chunks on disk, decompress the full
chunk, and then proceed with the remainder of the read path (merging data from disks and memtables, read repair, and so
on).</p>
<div class="section" id="configuring-compression">
<h2>Configuring Compression<a class="headerlink" href="#configuring-compression" title="Permalink to this headline"></a></h2>
<p>Compression is configured on a per-table basis as an optional argument to <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">TABLE</span></code> or <code class="docutils literal notranslate"><span class="pre">ALTER</span> <span class="pre">TABLE</span></code>. By
default, three options are relevant:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">class</span></code> specifies the compression class - Cassandra provides three classes (<code class="docutils literal notranslate"><span class="pre">LZ4Compressor</span></code>,
<code class="docutils literal notranslate"><span class="pre">SnappyCompressor</span></code>, and <code class="docutils literal notranslate"><span class="pre">DeflateCompressor</span></code> ). The default is <code class="docutils literal notranslate"><span class="pre">LZ4Compressor</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">chunk_length_in_kb</span></code> specifies the number of kilobytes of data per compression chunk. The default is 64KB.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">crc_check_chance</span></code> determines how likely Cassandra is to verify the checksum on each compression chunk during
reads. The default is 1.0.</p></li>
</ul>
<p>Users can set compression using the following syntax:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>CREATE TABLE keyspace.table (id int PRIMARY KEY) WITH compression = {&#39;class&#39;: &#39;LZ4Compressor&#39;};
</pre></div>
</div>
<p>Or</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>ALTER TABLE keyspace.table WITH compression = {&#39;class&#39;: &#39;SnappyCompressor&#39;, &#39;chunk_length_in_kb&#39;: 128, &#39;crc_check_chance&#39;: 0.5};
</pre></div>
</div>
<p>Once enabled, compression can be disabled with <code class="docutils literal notranslate"><span class="pre">ALTER</span> <span class="pre">TABLE</span></code> setting <code class="docutils literal notranslate"><span class="pre">enabled</span></code> to <code class="docutils literal notranslate"><span class="pre">false</span></code>:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>ALTER TABLE keyspace.table WITH compression = {&#39;enabled&#39;:&#39;false&#39;};
</pre></div>
</div>
<p>Operators should be aware, however, that changing compression is not immediate. The data is compressed when the SSTable
is written, and as SSTables are immutable, the compression will not be modified until the table is compacted. Upon
issuing a change to the compression options via <code class="docutils literal notranslate"><span class="pre">ALTER</span> <span class="pre">TABLE</span></code>, the existing SSTables will not be modified until they
are compacted - if an operator needs compression changes to take effect immediately, the operator can trigger an SSTable
rewrite using <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">scrub</span></code> or <code class="docutils literal notranslate"><span class="pre">nodetool</span> <span class="pre">upgradesstables</span> <span class="pre">-a</span></code>, both of which will rebuild the SSTables on disk,
re-compressing the data in the process.</p>
</div>
<div class="section" id="benefits-and-uses">
<h2>Benefits and Uses<a class="headerlink" href="#benefits-and-uses" title="Permalink to this headline"></a></h2>
<p>Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save
in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster
than the time it would take to read or write the larger volume of uncompressed data from disk.</p>
<p>Compression is most useful in tables comprised of many rows, where the rows are similar in nature. Tables containing
similar text columns (such as repeated JSON blobs) often compress very well.</p>
</div>
<div class="section" id="operational-impact">
<h2>Operational Impact<a class="headerlink" href="#operational-impact" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per
terabyte of data on disk, though the exact usage varies with <code class="docutils literal notranslate"><span class="pre">chunk_length_in_kb</span></code> and compression ratios.</p></li>
<li><p>Streaming operations involve compressing and decompressing data on compressed tables - in some code paths (such as
non-vnode bootstrap), the CPU overhead of compression can be a limiting factor.</p></li>
<li><p>The compression path checksums data to ensure correctness - while the traditional Cassandra read path does not have a
way to ensure correctness of data on disk, compressed tables allow the user to set <code class="docutils literal notranslate"><span class="pre">crc_check_chance</span></code> (a float from
0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read to verify bits on disk are not corrupt.</p></li>
</ul>
</div>
<div class="section" id="advanced-use">
<h2>Advanced Use<a class="headerlink" href="#advanced-use" title="Permalink to this headline"></a></h2>
<p>Advanced users can provide their own compression class by implementing the interface at
<code class="docutils literal notranslate"><span class="pre">org.apache.cassandra.io.compress.ICompressor</span></code>.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="cdc.html" class="btn btn-neutral float-right" title="Change Data Capture" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="bloom_filters.html" class="btn btn-neutral float-left" title="Bloom Filters" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright 2016, The Apache Cassandra team
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>