| = Change Data Capture |
| |
| == Overview |
| |
| Change data capture (CDC) provides a mechanism to flag specific tables |
| for archival as well as rejecting writes to those tables once a |
| configurable size-on-disk for the CDC log is reached. An operator can |
| enable CDC on a table by setting the table property `cdc=true` (either |
| when xref:cql/ddl.adoc#create-table[`creating the table`] or |
| xref:cql/ddl.adoc#alter-table[`altering it`]). Upon CommitLogSegment creation, |
| a hard-link to the segment is created in the directory specified in |
| `cassandra.yaml`. On segment fsync to disk, if CDC data is present |
| anywhere in the segment a <segment_name>_cdc.idx file is also created |
| with the integer offset of how much data in the original segment is |
| persisted to disk. Upon final segment flush, a second line with the |
| human-readable word "COMPLETED" will be added to the _cdc.idx file |
| indicating that Cassandra has completed all processing on the file. |
| |
| We we use an index file rather than just encouraging clients to parse |
| the log realtime off a memory mapped handle as data can be reflected in |
| a kernel buffer that is not yet persisted to disk. Parsing only up to |
| the listed offset in the _cdc.idx file will ensure that you only parse |
| CDC data for data that is durable. |
| |
| A threshold of total disk space allowed is specified in the yaml at |
| which time newly allocated CommitLogSegments will not allow CDC data |
| until a consumer parses and removes files from the specified cdc_raw |
| directory. |
| |
| == Configuration |
| |
| === Enabling or disabling CDC on a table |
| |
| CDC is enable or disable through the [.title-ref]#cdc# table property, |
| for instance: |
| |
| [source,cql] |
| ---- |
| CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true; |
| |
| ALTER TABLE foo WITH cdc=true; |
| |
| ALTER TABLE foo WITH cdc=false; |
| ---- |
| |
| === cassandra.yaml parameters |
| |
| The following cassandra.yaml options are available for CDC: |
| |
| `cdc_enabled` (default: false):: |
| Enable or disable CDC operations node-wide. |
| `cdc_raw_directory` (default: `$CASSANDRA_HOME/data/cdc_raw`):: |
| Destination for CommitLogSegments to be moved after all corresponding |
| memtables are flushed. |
| `cdc_free_space_in_mb`: (default: min of 4096 and 1/8th volume space):: |
| Calculated as sum of all active CommitLogSegments that permit CDC + |
| all flushed CDC segments in `cdc_raw_directory`. |
| `cdc_free_space_check_interval_ms` (default: 250):: |
| When at capacity, we limit the frequency with which we re-calculate |
| the space taken up by `cdc_raw_directory` to prevent burning CPU |
| cycles unnecessarily. Default is to check 4 times per second. |
| |
| == Reading CommitLogSegments |
| |
| Use a |
| https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java[CommitLogReader.java]. |
| Usage is |
| https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L132-L140[fairly |
| straightforward] with a |
| https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java#L71-L103[variety |
| of signatures] available for use. In order to handle mutations read from |
| disk, implement |
| https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java[CommitLogReadHandler]. |
| |
| == Warnings |
| |
| *Do not enable CDC without some kind of consumption process in-place.* |
| |
| If CDC is enabled on a node and then on a table, the |
| `cdc_free_space_in_mb` will fill up and then writes to CDC-enabled |
| tables will be rejected unless some consumption process is in place. |
| |
| == Further Reading |
| |
| * https://issues.apache.org/jira/browse/CASSANDRA-8844[JIRA ticket] |
| * https://issues.apache.org/jira/browse/CASSANDRA-12148[JIRA ticket] |