blob: 21c44f51769d7ec876d95d6b548baab9e26013b7 [file] [log] [blame]
= Time Window CompactionStrategy
[[twcs]]
`TimeWindowCompactionStrategy` (TWCS) is designed specifically for
workloads where it's beneficial to have data on disk grouped by the
timestamp of the data, a common goal when the workload is time-series in
nature or when all data is written with a TTL. In an expiring/TTL
workload, the contents of an entire SSTable likely expire at
approximately the same time, allowing them to be dropped completely, and
space reclaimed much more reliably than when using
`SizeTieredCompactionStrategy` or `LeveledCompactionStrategy`. The basic
concept is that `TimeWindowCompactionStrategy` will create one sstable per
file for a given window, where a window is simply calculated as the
combination of two primary options:
[[twcs_options]]
`compaction_window_unit` (default: DAYS)::
A Java TimeUnit (MINUTES, HOURS, or DAYS).
`compaction_window_size` (default: 1)::
The number of units that make up a window.
`unsafe_aggressive_sstable_expiration` (default: false)::
Expired sstables will be dropped without checking its data is
shadowing other sstables. This is a potentially risky option that can
lead to data loss or deleted data re-appearing, going beyond what
unchecked_tombstone_compaction does for single sstable
compaction. Due to the risk the jvm must also be started with
`-Dcassandra.unsafe_aggressive_sstable_expiration=true`.
Taken together, the operator can specify windows of virtually any size,
and `TimeWindowCompactionStrategy` will work to create a
single sstable for writes within that window. For efficiency during
writing, the newest window will be compacted using
`SizeTieredCompactionStrategy`.
Ideally, operators should select a `compaction_window_unit` and
`compaction_window_size` pair that produces approximately 20-30 windows
- if writing with a 90 day TTL, for example, a 3 Day window would be a
reasonable choice
(`'compaction_window_unit':'DAYS','compaction_window_size':3`).
== TimeWindowCompactionStrategy Operational Concerns
The primary motivation for TWCS is to separate data on disk by timestamp
and to allow fully expired SSTables to drop more efficiently. One
potential way this optimal behavior can be subverted is if data is
written to SSTables out of order, with new data and old data in the same
SSTable. Out of order data can appear in two ways:
* If the user mixes old data and new data in the traditional write path,
the data will be comingled in the memtables and flushed into the same
SSTable, where it will remain comingled.
* If the user's read requests for old data cause read repairs that pull
old data into the current memtable, that data will be comingled and
flushed into the same SSTable.
While TWCS tries to minimize the impact of comingled data, users should
attempt to avoid this behavior. Specifically, users should avoid queries
that explicitly set the timestamp via CQL `USING TIMESTAMP`.
Additionally, users should run frequent repairs (which streams data in
such a way that it does not become comingled).
== Changing TimeWindowCompactionStrategy Options
Operators wishing to enable `TimeWindowCompactionStrategy` on existing
data should consider running a major compaction first, placing all
existing data into a single (old) window. Subsequent newer writes will
then create typical SSTables as expected.
Operators wishing to change `compaction_window_unit` or
`compaction_window_size` can do so, but may trigger additional
compactions as adjacent windows are joined together. If the window size
is decrease d (for example, from 24 hours to 12 hours), then the
existing SSTables will not be modified - TWCS can not split existing
SSTables into multiple windows.