blob: 192f62a093b3b7ef4225f9e8436f09aee724408a [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS,
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. See the License for the specific language governing permissions and
.. limitations under the License.
.. highlight:: none
Change Data Capture
-------------------
Overview
^^^^^^^^
Change data capture (CDC) provides a mechanism to flag specific tables for archival as well as rejecting writes to those
tables once a configurable size-on-disk for the combined flushed and unflushed CDC-log is reached. An operator can
enable CDC on a table by setting the table property ``cdc=true`` (either when :ref:`creating the table
<create-table-statement>` or :ref:`altering it <alter-table-statement>`), after which any CommitLogSegments containing
data for a CDC-enabled table are moved to the directory specified in ``cassandra.yaml`` on segment discard. A threshold
of total disk space allowed is specified in the yaml at which time newly allocated CommitLogSegments will not allow CDC
data until a consumer parses and removes data from the destination archival directory.
Configuration
^^^^^^^^^^^^^
Enabling or disable CDC on a table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CDC is enable or disable through the `cdc` table property, for instance::
CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;
ALTER TABLE foo WITH cdc=true;
ALTER TABLE foo WITH cdc=false;
cassandra.yaml parameters
~~~~~~~~~~~~~~~~~~~~~~~~~
The following `cassandra.yaml` are available for CDC:
``cdc_enabled`` (default: false)
Enable or disable CDC operations node-wide.
``cdc_raw_directory`` (default: ``$CASSANDRA_HOME/data/cdc_raw``)
Destination for CommitLogSegments to be moved after all corresponding memtables are flushed.
``cdc_free_space_in_mb``: (default: min of 4096 and 1/8th volume space)
Calculated as sum of all active CommitLogSegments that permit CDC + all flushed CDC segments in
``cdc_raw_directory``.
``cdc_free_space_check_interval_ms`` (default: 250)
When at capacity, we limit the frequency with which we re-calculate the space taken up by ``cdc_raw_directory`` to
prevent burning CPU cycles unnecessarily. Default is to check 4 times per second.
.. _reading-commitlogsegments:
Reading CommitLogSegments
^^^^^^^^^^^^^^^^^^^^^^^^^
This implementation included a refactor of CommitLogReplayer into `CommitLogReader.java
<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java>`__.
Usage is `fairly straightforward
<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L132-L140>`__
with a `variety of signatures
<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java#L71-L103>`__
available for use. In order to handle mutations read from disk, implement `CommitLogReadHandler
<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java>`__.
Warnings
^^^^^^^^
**Do not enable CDC without some kind of consumption process in-place.**
The initial implementation of Change Data Capture does not include a parser (see :ref:`reading-commitlogsegments` above)
so, if CDC is enabled on a node and then on a table, the ``cdc_free_space_in_mb`` will fill up and then writes to
CDC-enabled tables will be rejected unless some consumption process is in place.
Further Reading
^^^^^^^^^^^^^^^
- `Design doc <https://docs.google.com/document/d/1ZxCWYkeZTquxsvf5hdPc0fiUnUHna8POvgt6TIzML4Y/edit>`__
- `JIRA ticket <https://issues.apache.org/jira/browse/CASSANDRA-8844>`__