| = Compression |
| |
| Cassandra offers operators the ability to configure compression on a |
| per-table basis. Compression reduces the size of data on disk by |
| compressing the SSTable in user-configurable compression |
| `chunk_length_in_kb`. As Cassandra SSTables are immutable, the CPU cost |
| of compressing is only necessary when the SSTable is written - |
| subsequent updates to data will land in different SSTables, so Cassandra |
| will not need to decompress, overwrite, and recompress data when UPDATE |
| commands are issued. On reads, Cassandra will locate the relevant |
| compressed chunks on disk, decompress the full chunk, and then proceed |
| with the remainder of the read path (merging data from disks and |
| memtables, read repair, and so on). |
| |
| Compression algorithms typically trade off between the following three |
| areas: |
| |
| * *Compression speed*: How fast does the compression algorithm compress |
| data. This is critical in the flush and compaction paths because data |
| must be compressed before it is written to disk. |
| * *Decompression speed*: How fast does the compression algorithm |
| de-compress data. This is critical in the read and compaction paths as |
| data must be read off disk in a full chunk and decompressed before it |
| can be returned. |
| * *Ratio*: By what ratio is the uncompressed data reduced by. Cassandra |
| typically measures this as the size of data on disk relative to the |
| uncompressed size. For example a ratio of `0.5` means that the data on |
| disk is 50% the size of the uncompressed data. Cassandra exposes this |
| ratio per table as the `SSTable Compression Ratio` field of |
| `nodetool tablestats`. |
| |
| Cassandra offers five compression algorithms by default that make |
| different tradeoffs in these areas. While benchmarking compression |
| algorithms depends on many factors (algorithm parameters such as |
| compression level, the compressibility of the input data, underlying |
| processor class, etc ...), the following table should help you pick a |
| starting point based on your application's requirements with an |
| extremely rough grading of the different choices by their performance in |
| these areas (A is relatively good, F is relatively bad): |
| |
| [width="100%",cols="40%,19%,11%,13%,6%,11%",options="header",] |
| |=== |
| |Compression Algorithm |Cassandra Class |Compression |Decompression |
| |Ratio |C* Version |
| |
| |https://lz4.github.io/lz4/[LZ4] |`LZ4Compressor` | A+ | A+ | C+ | `>=1.2.2` |
| |
| |https://lz4.github.io/lz4/[LZ4HC] |`LZ4Compressor` | C+ | A+ | B+ | `>= 3.6` |
| |
| |https://facebook.github.io/zstd/[Zstd] |`ZstdCompressor` | A- | A- | A+ | `>= 4.0` |
| |
| |http://google.github.io/snappy/[Snappy] |`SnappyCompressor` | A- | A | C | `>= 1.0` |
| |
| |https://zlib.net[Deflate (zlib)] |`DeflateCompressor` | C | C | A | `>= 1.0` |
| |=== |
| |
| Generally speaking for a performance critical (latency or throughput) |
| application `LZ4` is the right choice as it gets excellent ratio per CPU |
| cycle spent. This is why it is the default choice in Cassandra. |
| |
| For storage critical applications (disk footprint), however, `Zstd` may |
| be a better choice as it can get significant additional ratio to `LZ4`. |
| |
| `Snappy` is kept for backwards compatibility and `LZ4` will typically be |
| preferable. |
| |
| `Deflate` is kept for backwards compatibility and `Zstd` will typically |
| be preferable. |
| |
| == Configuring Compression |
| |
| Compression is configured on a per-table basis as an optional argument |
| to `CREATE TABLE` or `ALTER TABLE`. Three options are available for all |
| compressors: |
| |
| * `class` (default: `LZ4Compressor`): specifies the compression class to |
| use. The two "fast" compressors are `LZ4Compressor` and |
| `SnappyCompressor` and the two "good" ratio compressors are |
| `ZstdCompressor` and `DeflateCompressor`. |
| * `chunk_length_in_kb` (default: `16KiB`): specifies the number of |
| kilobytes of data per compression chunk. The main tradeoff here is that |
| larger chunk sizes give compression algorithms more context and improve |
| their ratio, but require reads to deserialize and read more off disk. |
| * `crc_check_chance` (default: `1.0`): determines how likely Cassandra |
| is to verify the checksum on each compression chunk during reads to |
| protect against data corruption. Unless you have profiles indicating |
| this is a performance problem it is highly encouraged not to turn this |
| off as it is Cassandra's only protection against bitrot. |
| |
| The `LZ4Compressor` supports the following additional options: |
| |
| * `lz4_compressor_type` (default `fast`): specifies if we should use the |
| `high` (a.k.a `LZ4HC`) ratio version or the `fast` (a.k.a `LZ4`) version |
| of `LZ4`. The `high` mode supports a configurable level, which can allow |
| operators to tune the performance <-> ratio tradeoff via the |
| `lz4_high_compressor_level` option. Note that in `4.0` and above it may |
| be preferable to use the `Zstd` compressor. |
| * `lz4_high_compressor_level` (default `9`): A number between `1` and |
| `17` inclusive that represents how much CPU time to spend trying to get |
| more compression ratio. Generally lower levels are "faster" but they get |
| less ratio and higher levels are slower but get more compression ratio. |
| |
| The `ZstdCompressor` supports the following options in addition: |
| |
| * `compression_level` (default `3`): A number between `-131072` and `22` |
| inclusive that represents how much CPU time to spend trying to get more |
| compression ratio. The lower the level, the faster the speed (at the |
| cost of ratio). Values from 20 to 22 are called "ultra levels" and |
| should be used with caution, as they require more memory. The default of |
| `3` is a good choice for competing with `Deflate` ratios and `1` is a |
| good choice for competing with `LZ4`. |
| |
| Users can set compression using the following syntax: |
| |
| [source,cql] |
| ---- |
| CREATE TABLE keyspace.table (id int PRIMARY KEY) |
| WITH compression = {'class': 'LZ4Compressor'}; |
| ---- |
| |
| Or |
| |
| [source,cql] |
| ---- |
| ALTER TABLE keyspace.table |
| WITH compression = {'class': 'LZ4Compressor', 'chunk_length_in_kb': 64, 'crc_check_chance': 0.5}; |
| ---- |
| |
| Once enabled, compression can be disabled with `ALTER TABLE` setting |
| `enabled` to `false`: |
| |
| [source,cql] |
| ---- |
| ALTER TABLE keyspace.table |
| WITH compression = {'enabled':'false'}; |
| ---- |
| |
| Operators should be aware, however, that changing compression is not |
| immediate. The data is compressed when the SSTable is written, and as |
| SSTables are immutable, the compression will not be modified until the |
| table is compacted. Upon issuing a change to the compression options via |
| `ALTER TABLE`, the existing SSTables will not be modified until they are |
| compacted - if an operator needs compression changes to take effect |
| immediately, the operator can trigger an SSTable rewrite using |
| `nodetool scrub` or `nodetool upgradesstables -a`, both of which will |
| rebuild the SSTables on disk, re-compressing the data in the process. |
| |
| == Benefits and Uses |
| |
| Compression's primary benefit is that it reduces the amount of data |
| written to disk. Not only does the reduced size save in storage |
| requirements, it often increases read and write throughput, as the CPU |
| overhead of compressing data is faster than the time it would take to |
| read or write the larger volume of uncompressed data from disk. |
| |
| Compression is most useful in tables comprised of many rows, where the |
| rows are similar in nature. Tables containing similar text columns (such |
| as repeated JSON blobs) often compress very well. Tables containing data |
| that has already been compressed or random data (e.g. benchmark |
| datasets) do not typically compress well. |
| |
| == Operational Impact |
| |
| * Compression metadata is stored off-heap and scales with data on disk. |
| This often requires 1-3GB of off-heap RAM per terabyte of data on disk, |
| though the exact usage varies with `chunk_length_in_kb` and compression |
| ratios. |
| * Streaming operations involve compressing and decompressing data on |
| compressed tables - in some code paths (such as non-vnode bootstrap), |
| the CPU overhead of compression can be a limiting factor. |
| * To prevent slow compressors (`Zstd`, `Deflate`, `LZ4HC`) from blocking |
| flushes for too long, all three flush with the default fast `LZ4` |
| compressor and then rely on normal compaction to re-compress the data |
| into the desired compression strategy. See [.title-ref]#CASSANDRA-15379 |
| <https://issues.apache.org/jira/browse/CASSANDRA-15379># for more |
| details. |
| * The compression path checksums data to ensure correctness - while the |
| traditional Cassandra read path does not have a way to ensure |
| correctness of data on disk, compressed tables allow the user to set |
| `crc_check_chance` (a float from 0.0 to 1.0) to allow Cassandra to |
| probabilistically validate chunks on read to verify bits on disk are not |
| corrupt. |
| |
| == Advanced Use |
| |
| Advanced users can provide their own compression class by implementing |
| the interface at `org.apache.cassandra.io.compress.ICompressor`. |