doc/modules/cassandra/pages/operating/compression.adoc - cassandra - Git at Google

 = Compression

 Cassandra offers operators the ability to configure compression on a
 per-table basis. Compression reduces the size of data on disk by
 compressing the SSTable in user-configurable compression
 `chunk_length_in_kb`. As Cassandra SSTables are immutable, the CPU cost
 of compressing is only necessary when the SSTable is written -
 subsequent updates to data will land in different SSTables, so Cassandra
 will not need to decompress, overwrite, and recompress data when UPDATE
 commands are issued. On reads, Cassandra will locate the relevant
 compressed chunks on disk, decompress the full chunk, and then proceed
 with the remainder of the read path (merging data from disks and
 memtables, read repair, and so on).

 Compression algorithms typically trade off between the following three
 areas:

 * *Compression speed*: How fast does the compression algorithm compress
 data. This is critical in the flush and compaction paths because data
 must be compressed before it is written to disk.
 * *Decompression speed*: How fast does the compression algorithm
 de-compress data. This is critical in the read and compaction paths as
 data must be read off disk in a full chunk and decompressed before it
 can be returned.
 * *Ratio*: By what ratio is the uncompressed data reduced by. Cassandra
 typically measures this as the size of data on disk relative to the
 uncompressed size. For example a ratio of `0.5` means that the data on
 disk is 50% the size of the uncompressed data. Cassandra exposes this
 ratio per table as the `SSTable Compression Ratio` field of
 `nodetool tablestats`.

 Cassandra offers five compression algorithms by default that make
 different tradeoffs in these areas. While benchmarking compression
 algorithms depends on many factors (algorithm parameters such as
 compression level, the compressibility of the input data, underlying
 processor class, etc ...), the following table should help you pick a
 starting point based on your application's requirements with an
 extremely rough grading of the different choices by their performance in
 these areas (A is relatively good, F is relatively bad):

 [width="100%",cols="40%,19%,11%,13%,6%,11%",options="header",]
 |===
 |Compression Algorithm |Cassandra Class |Compression |Decompression
 |Ratio |C* Version

 |https://lz4.github.io/lz4/[LZ4] |`LZ4Compressor` | A+ | A+ | C+ | `>=1.2.2`

 |https://lz4.github.io/lz4/[LZ4HC] |`LZ4Compressor` | C+ | A+ | B+ | `>= 3.6`

 |https://facebook.github.io/zstd/[Zstd] |`ZstdCompressor` | A- | A- | A+ | `>= 4.0`

 |http://google.github.io/snappy/[Snappy] |`SnappyCompressor` | A- | A | C | `>= 1.0`

 |https://zlib.net[Deflate (zlib)] |`DeflateCompressor` | C | C | A | `>= 1.0`
 |===

 Generally speaking for a performance critical (latency or throughput)
 application `LZ4` is the right choice as it gets excellent ratio per CPU
 cycle spent. This is why it is the default choice in Cassandra.

 For storage critical applications (disk footprint), however, `Zstd` may
 be a better choice as it can get significant additional ratio to `LZ4`.

 `Snappy` is kept for backwards compatibility and `LZ4` will typically be
 preferable.

 `Deflate` is kept for backwards compatibility and `Zstd` will typically
 be preferable.

 == Configuring Compression

 Compression is configured on a per-table basis as an optional argument
 to `CREATE TABLE` or `ALTER TABLE`. Three options are available for all
 compressors:

 * `class` (default: `LZ4Compressor`): specifies the compression class to
 use. The two "fast" compressors are `LZ4Compressor` and
 `SnappyCompressor` and the two "good" ratio compressors are
 `ZstdCompressor` and `DeflateCompressor`.
 * `chunk_length_in_kb` (default: `16KiB`): specifies the number of
 kilobytes of data per compression chunk. The main tradeoff here is that
 larger chunk sizes give compression algorithms more context and improve
 their ratio, but require reads to deserialize and read more off disk.
 * `crc_check_chance` (default: `1.0`): determines how likely Cassandra
 is to verify the checksum on each compression chunk during reads to
 protect against data corruption. Unless you have profiles indicating
 this is a performance problem it is highly encouraged not to turn this
 off as it is Cassandra's only protection against bitrot.

 The `LZ4Compressor` supports the following additional options:

 * `lz4_compressor_type` (default `fast`): specifies if we should use the
 `high` (a.k.a `LZ4HC`) ratio version or the `fast` (a.k.a `LZ4`) version
 of `LZ4`. The `high` mode supports a configurable level, which can allow
 operators to tune the performance <-> ratio tradeoff via the
 `lz4_high_compressor_level` option. Note that in `4.0` and above it may
 be preferable to use the `Zstd` compressor.
 * `lz4_high_compressor_level` (default `9`): A number between `1` and
 `17` inclusive that represents how much CPU time to spend trying to get
 more compression ratio. Generally lower levels are "faster" but they get
 less ratio and higher levels are slower but get more compression ratio.

 The `ZstdCompressor` supports the following options in addition:

 * `compression_level` (default `3`): A number between `-131072` and `22`
 inclusive that represents how much CPU time to spend trying to get more
 compression ratio. The lower the level, the faster the speed (at the
 cost of ratio). Values from 20 to 22 are called "ultra levels" and
 should be used with caution, as they require more memory. The default of
 `3` is a good choice for competing with `Deflate` ratios and `1` is a
 good choice for competing with `LZ4`.

 Users can set compression using the following syntax:

 [source,cql]
 ----
 CREATE TABLE keyspace.table (id int PRIMARY KEY)
    WITH compression = {'class': 'LZ4Compressor'};
 ----

 Or

 [source,cql]
 ----
 ALTER TABLE keyspace.table
    WITH compression = {'class': 'LZ4Compressor', 'chunk_length_in_kb': 64, 'crc_check_chance': 0.5};
 ----

 Once enabled, compression can be disabled with `ALTER TABLE` setting
 `enabled` to `false`:

 [source,cql]
 ----
 ALTER TABLE keyspace.table
    WITH compression = {'enabled':'false'};
 ----

 Operators should be aware, however, that changing compression is not
 immediate. The data is compressed when the SSTable is written, and as
 SSTables are immutable, the compression will not be modified until the
 table is compacted. Upon issuing a change to the compression options via
 `ALTER TABLE`, the existing SSTables will not be modified until they are
 compacted - if an operator needs compression changes to take effect
 immediately, the operator can trigger an SSTable rewrite using
 `nodetool scrub` or `nodetool upgradesstables -a`, both of which will
 rebuild the SSTables on disk, re-compressing the data in the process.

 == Benefits and Uses

 Compression's primary benefit is that it reduces the amount of data
 written to disk. Not only does the reduced size save in storage
 requirements, it often increases read and write throughput, as the CPU
 overhead of compressing data is faster than the time it would take to
 read or write the larger volume of uncompressed data from disk.

 Compression is most useful in tables comprised of many rows, where the
 rows are similar in nature. Tables containing similar text columns (such
 as repeated JSON blobs) often compress very well. Tables containing data
 that has already been compressed or random data (e.g. benchmark
 datasets) do not typically compress well.

 == Operational Impact

 * Compression metadata is stored off-heap and scales with data on disk.
 This often requires 1-3GB of off-heap RAM per terabyte of data on disk,
 though the exact usage varies with `chunk_length_in_kb` and compression
 ratios.
 * Streaming operations involve compressing and decompressing data on
 compressed tables - in some code paths (such as non-vnode bootstrap),
 the CPU overhead of compression can be a limiting factor.
 * To prevent slow compressors (`Zstd`, `Deflate`, `LZ4HC`) from blocking
 flushes for too long, all three flush with the default fast `LZ4`
 compressor and then rely on normal compaction to re-compress the data
 into the desired compression strategy. See [.title-ref]#CASSANDRA-15379
 <https://issues.apache.org/jira/browse/CASSANDRA-15379># for more
 details.
 * The compression path checksums data to ensure correctness - while the
 traditional Cassandra read path does not have a way to ensure
 correctness of data on disk, compressed tables allow the user to set
 `crc_check_chance` (a float from 0.0 to 1.0) to allow Cassandra to
 probabilistically validate chunks on read to verify bits on disk are not
 corrupt.

 == Advanced Use

 Advanced users can provide their own compression class by implementing
 the interface at `org.apache.cassandra.io.compress.ICompressor`.
	= Compression

	Cassandra offers operators the ability to configure compression on a
	per-table basis. Compression reduces the size of data on disk by
	compressing the SSTable in user-configurable compression
	`chunk_length_in_kb`. As Cassandra SSTables are immutable, the CPU cost
	of compressing is only necessary when the SSTable is written -
	subsequent updates to data will land in different SSTables, so Cassandra
	will not need to decompress, overwrite, and recompress data when UPDATE
	commands are issued. On reads, Cassandra will locate the relevant
	compressed chunks on disk, decompress the full chunk, and then proceed
	with the remainder of the read path (merging data from disks and
	memtables, read repair, and so on).

	Compression algorithms typically trade off between the following three
	areas:

	* Compression speed: How fast does the compression algorithm compress
	data. This is critical in the flush and compaction paths because data
	must be compressed before it is written to disk.
	* Decompression speed: How fast does the compression algorithm
	de-compress data. This is critical in the read and compaction paths as
	data must be read off disk in a full chunk and decompressed before it
	can be returned.
	* Ratio: By what ratio is the uncompressed data reduced by. Cassandra
	typically measures this as the size of data on disk relative to the
	uncompressed size. For example a ratio of `0.5` means that the data on
	disk is 50% the size of the uncompressed data. Cassandra exposes this
	ratio per table as the `SSTable Compression Ratio` field of
	`nodetool tablestats`.

	Cassandra offers five compression algorithms by default that make
	different tradeoffs in these areas. While benchmarking compression
	algorithms depends on many factors (algorithm parameters such as
	compression level, the compressibility of the input data, underlying
	processor class, etc ...), the following table should help you pick a
	starting point based on your application's requirements with an
	extremely rough grading of the different choices by their performance in
	these areas (A is relatively good, F is relatively bad):

	[width="100%",cols="40%,19%,11%,13%,6%,11%",options="header",]
	\|===
	\|Compression Algorithm \|Cassandra Class \|Compression \|Decompression
	\|Ratio \|C* Version

	\|https://lz4.github.io/lz4/[LZ4] \|`LZ4Compressor` \| A+ \| A+ \| C+ \| `>=1.2.2`

	\|https://lz4.github.io/lz4/[LZ4HC] \|`LZ4Compressor` \| C+ \| A+ \| B+ \| `>= 3.6`

	\|https://facebook.github.io/zstd/[Zstd] \|`ZstdCompressor` \| A- \| A- \| A+ \| `>= 4.0`

	\|http://google.github.io/snappy/[Snappy] \|`SnappyCompressor` \| A- \| A \| C \| `>= 1.0`

	\|https://zlib.net[Deflate (zlib)] \|`DeflateCompressor` \| C \| C \| A \| `>= 1.0`
	\|===

	Generally speaking for a performance critical (latency or throughput)
	application `LZ4` is the right choice as it gets excellent ratio per CPU
	cycle spent. This is why it is the default choice in Cassandra.

	For storage critical applications (disk footprint), however, `Zstd` may
	be a better choice as it can get significant additional ratio to `LZ4`.

	`Snappy` is kept for backwards compatibility and `LZ4` will typically be
	preferable.

	`Deflate` is kept for backwards compatibility and `Zstd` will typically
	be preferable.

	== Configuring Compression

	Compression is configured on a per-table basis as an optional argument
	to `CREATE TABLE` or `ALTER TABLE`. Three options are available for all
	compressors:

	* `class` (default: `LZ4Compressor`): specifies the compression class to
	use. The two "fast" compressors are `LZ4Compressor` and
	`SnappyCompressor` and the two "good" ratio compressors are
	`ZstdCompressor` and `DeflateCompressor`.
	* `chunk_length_in_kb` (default: `16KiB`): specifies the number of
	kilobytes of data per compression chunk. The main tradeoff here is that
	larger chunk sizes give compression algorithms more context and improve
	their ratio, but require reads to deserialize and read more off disk.
	* `crc_check_chance` (default: `1.0`): determines how likely Cassandra
	is to verify the checksum on each compression chunk during reads to
	protect against data corruption. Unless you have profiles indicating
	this is a performance problem it is highly encouraged not to turn this
	off as it is Cassandra's only protection against bitrot.

	The `LZ4Compressor` supports the following additional options:

	* `lz4_compressor_type` (default `fast`): specifies if we should use the
	`high` (a.k.a `LZ4HC`) ratio version or the `fast` (a.k.a `LZ4`) version
	of `LZ4`. The `high` mode supports a configurable level, which can allow
	operators to tune the performance <-> ratio tradeoff via the
	`lz4_high_compressor_level` option. Note that in `4.0` and above it may
	be preferable to use the `Zstd` compressor.
	* `lz4_high_compressor_level` (default `9`): A number between `1` and
	`17` inclusive that represents how much CPU time to spend trying to get
	more compression ratio. Generally lower levels are "faster" but they get
	less ratio and higher levels are slower but get more compression ratio.

	The `ZstdCompressor` supports the following options in addition:

	* `compression_level` (default `3`): A number between `-131072` and `22`
	inclusive that represents how much CPU time to spend trying to get more
	compression ratio. The lower the level, the faster the speed (at the
	cost of ratio). Values from 20 to 22 are called "ultra levels" and
	should be used with caution, as they require more memory. The default of
	`3` is a good choice for competing with `Deflate` ratios and `1` is a
	good choice for competing with `LZ4`.

	Users can set compression using the following syntax:

	[source,cql]
	----
	CREATE TABLE keyspace.table (id int PRIMARY KEY)
	WITH compression = {'class': 'LZ4Compressor'};
	----

	Or

	[source,cql]
	----
	ALTER TABLE keyspace.table
	WITH compression = {'class': 'LZ4Compressor', 'chunk_length_in_kb': 64, 'crc_check_chance': 0.5};
	----

	Once enabled, compression can be disabled with `ALTER TABLE` setting
	`enabled` to `false`:

	[source,cql]
	----
	ALTER TABLE keyspace.table
	WITH compression = {'enabled':'false'};
	----

	Operators should be aware, however, that changing compression is not
	immediate. The data is compressed when the SSTable is written, and as
	SSTables are immutable, the compression will not be modified until the
	table is compacted. Upon issuing a change to the compression options via
	`ALTER TABLE`, the existing SSTables will not be modified until they are
	compacted - if an operator needs compression changes to take effect
	immediately, the operator can trigger an SSTable rewrite using
	`nodetool scrub` or `nodetool upgradesstables -a`, both of which will
	rebuild the SSTables on disk, re-compressing the data in the process.

	== Benefits and Uses

	Compression's primary benefit is that it reduces the amount of data
	written to disk. Not only does the reduced size save in storage
	requirements, it often increases read and write throughput, as the CPU
	overhead of compressing data is faster than the time it would take to
	read or write the larger volume of uncompressed data from disk.

	Compression is most useful in tables comprised of many rows, where the
	rows are similar in nature. Tables containing similar text columns (such
	as repeated JSON blobs) often compress very well. Tables containing data
	that has already been compressed or random data (e.g. benchmark
	datasets) do not typically compress well.

	== Operational Impact

	* Compression metadata is stored off-heap and scales with data on disk.
	This often requires 1-3GB of off-heap RAM per terabyte of data on disk,
	though the exact usage varies with `chunk_length_in_kb` and compression
	ratios.
	* Streaming operations involve compressing and decompressing data on
	compressed tables - in some code paths (such as non-vnode bootstrap),
	the CPU overhead of compression can be a limiting factor.
	* To prevent slow compressors (`Zstd`, `Deflate`, `LZ4HC`) from blocking
	flushes for too long, all three flush with the default fast `LZ4`
	compressor and then rely on normal compaction to re-compress the data
	into the desired compression strategy. See [.title-ref]#CASSANDRA-15379
	<https://issues.apache.org/jira/browse/CASSANDRA-15379># for more
	details.
	* The compression path checksums data to ensure correctness - while the
	traditional Cassandra read path does not have a way to ensure
	correctness of data on disk, compressed tables allow the user to set
	`crc_check_chance` (a float from 0.0 to 1.0) to allow Cassandra to
	probabilistically validate chunks on read to verify bits on disk are not
	corrupt.

	== Advanced Use

	Advanced users can provide their own compression class by implementing
	the interface at `org.apache.cassandra.io.compress.ICompressor`.