doc/modules/cassandra/pages/operating/compaction/lcs.adoc - cassandra - Git at Google

 = Leveled Compaction Strategy

 [[lcs]]
 The idea of `LeveledCompactionStrategy` (LCS) is that all sstables are
 put into different levels where we guarantee that no overlapping
 sstables are in the same level. By overlapping we mean that the
 first/last token of a single sstable are never overlapping with other
 sstables. This means that for a SELECT we will only have to look for the
 partition key in a single sstable per level. Each level is 10x the size
 of the previous one and each sstable is 160MB by default. L0 is where
 sstables are streamed/flushed - no overlap guarantees are given here.

 When picking compaction candidates we have to make sure that the
 compaction does not create overlap in the target level. This is done by
 always including all overlapping sstables in the next level. For example
 if we select an sstable in L3, we need to guarantee that we pick all
 overlapping sstables in L4 and make sure that no currently ongoing
 compactions will create overlap if we start that compaction. We can
 start many parallel compactions in a level if we guarantee that we wont
 create overlap. For L0 -> L1 compactions we almost always need to
 include all L1 sstables since most L0 sstables cover the full range. We
 also can't compact all L0 sstables with all L1 sstables in a single
 compaction since that can use too much memory.

 When deciding which level to compact LCS checks the higher levels first
 (with LCS, a "higher" level is one with a higher number, L0 being the
 lowest one) and if the level is behind a compaction will be started in
 that level.

 == Major compaction

 It is possible to do a major compaction with LCS - it will currently
 start by filling out L1 and then once L1 is full, it continues with L2
 etc. This is sub optimal and will change to create all the sstables in a
 high level instead, CASSANDRA-11817.

 == Bootstrapping

 During bootstrap sstables are streamed from other nodes. The level of
 the remote sstable is kept to avoid many compactions after the bootstrap
 is done. During bootstrap the new node also takes writes while it is
 streaming the data from a remote node - these writes are flushed to L0
 like all other writes and to avoid those sstables blocking the remote
 sstables from going to the correct level, we only do STCS in L0 until
 the bootstrap is done.

 == STCS in L0

 If LCS gets very many L0 sstables reads are going to hit all (or most)
 of the L0 sstables since they are likely to be overlapping. To more
 quickly remedy this LCS does STCS compactions in L0 if there are more
 than 32 sstables there. This should improve read performance more
 quickly compared to letting LCS do its L0 -> L1 compactions. If you keep
 getting too many sstables in L0 it is likely that LCS is not the best
 fit for your workload and STCS could work out better.

 == Starved sstables

 If a node ends up with a leveling where there are a few very high level
 sstables that are not getting compacted they might make it impossible
 for lower levels to drop tombstones etc. For example, if there are
 sstables in L6 but there is only enough data to actually get a L4 on the
 node the left over sstables in L6 will get starved and not compacted.
 This can happen if a user changes sstable_size_in_mb from 5MB to 160MB
 for example. To avoid this LCS tries to include those starved high level
 sstables in other compactions if there has been 25 compaction rounds
 where the highest level has not been involved.

 [[lcs_options]]
 == LCS options

 `sstable_size_in_mb` (default: 160MB)::
   The target compressed (if using compression) sstable size - the
   sstables can end up being larger if there are very large partitions on
   the node.
 `fanout_size` (default: 10)::
   The target size of levels increases by this fanout_size multiplier.
   You can reduce the space amplification by tuning this option.

 LCS also support the `cassandra.disable_stcs_in_l0` startup option
 (`-Dcassandra.disable_stcs_in_l0=true`) to avoid doing STCS in L0.
	= Leveled Compaction Strategy

	[[lcs]]
	The idea of `LeveledCompactionStrategy` (LCS) is that all sstables are
	put into different levels where we guarantee that no overlapping
	sstables are in the same level. By overlapping we mean that the
	first/last token of a single sstable are never overlapping with other
	sstables. This means that for a SELECT we will only have to look for the
	partition key in a single sstable per level. Each level is 10x the size
	of the previous one and each sstable is 160MB by default. L0 is where
	sstables are streamed/flushed - no overlap guarantees are given here.

	When picking compaction candidates we have to make sure that the
	compaction does not create overlap in the target level. This is done by
	always including all overlapping sstables in the next level. For example
	if we select an sstable in L3, we need to guarantee that we pick all
	overlapping sstables in L4 and make sure that no currently ongoing
	compactions will create overlap if we start that compaction. We can
	start many parallel compactions in a level if we guarantee that we wont
	create overlap. For L0 -> L1 compactions we almost always need to
	include all L1 sstables since most L0 sstables cover the full range. We
	also can't compact all L0 sstables with all L1 sstables in a single
	compaction since that can use too much memory.

	When deciding which level to compact LCS checks the higher levels first
	(with LCS, a "higher" level is one with a higher number, L0 being the
	lowest one) and if the level is behind a compaction will be started in
	that level.

	== Major compaction

	It is possible to do a major compaction with LCS - it will currently
	start by filling out L1 and then once L1 is full, it continues with L2
	etc. This is sub optimal and will change to create all the sstables in a
	high level instead, CASSANDRA-11817.

	== Bootstrapping

	During bootstrap sstables are streamed from other nodes. The level of
	the remote sstable is kept to avoid many compactions after the bootstrap
	is done. During bootstrap the new node also takes writes while it is
	streaming the data from a remote node - these writes are flushed to L0
	like all other writes and to avoid those sstables blocking the remote
	sstables from going to the correct level, we only do STCS in L0 until
	the bootstrap is done.

	== STCS in L0

	If LCS gets very many L0 sstables reads are going to hit all (or most)
	of the L0 sstables since they are likely to be overlapping. To more
	quickly remedy this LCS does STCS compactions in L0 if there are more
	than 32 sstables there. This should improve read performance more
	quickly compared to letting LCS do its L0 -> L1 compactions. If you keep
	getting too many sstables in L0 it is likely that LCS is not the best
	fit for your workload and STCS could work out better.

	== Starved sstables

	If a node ends up with a leveling where there are a few very high level
	sstables that are not getting compacted they might make it impossible
	for lower levels to drop tombstones etc. For example, if there are
	sstables in L6 but there is only enough data to actually get a L4 on the
	node the left over sstables in L6 will get starved and not compacted.
	This can happen if a user changes sstable_size_in_mb from 5MB to 160MB
	for example. To avoid this LCS tries to include those starved high level
	sstables in other compactions if there has been 25 compaction rounds
	where the highest level has not been involved.

	[[lcs_options]]
	== LCS options

	`sstable_size_in_mb` (default: 160MB)::
	The target compressed (if using compression) sstable size - the
	sstables can end up being larger if there are very large partitions on
	the node.
	`fanout_size` (default: 10)::
	The target size of levels increases by this fanout_size multiplier.
	You can reduce the space amplification by tuning this option.

	LCS also support the `cassandra.disable_stcs_in_l0` startup option
	(`-Dcassandra.disable_stcs_in_l0=true`) to avoid doing STCS in L0.