This is a Sidecar for the highly scalable Apache Cassandra database. For more information, see the Apache Cassandra web site and CIP-1.
This is project is still WIP.
We depend on the Cassandra in-jvm dtest framework for testing. Because these jars are not published, you must manually build the dtest jars before you can build the project.
./scripts/build-dtest-jars.sh
The build script supports two parameters:
REPO - the Cassandra git repository to use for the source files. This is helpful if you need to test with a fork of the Cassandra codebase.git@github.com:apache/cassandra.gitBRANCHES - a space-delimited list of branches to build. -default: "cassandra-4.1 trunk"Remove any versions you may not want to test with. We recommend at least the latest (released) 4.X series and trunk. See Testing for more details on how to choose which Cassandra versions to use while testing.
For multi-node in-jvm dtests, network aliases will need to be setup for each Cassandra node. The tests assume each node's ip address is 127.0.0.x, where x is the node id.
For example if you populated your cluster with 3 nodes, create interfaces for 127.0.0.2 and 127.0.0.3 (the first node of course uses 127.0.0.1).
To get up and running, create a temporary alias for every node except the first:
for i in {2..20}; do sudo ifconfig lo0 alias "127.0.0.${i}"; done
Note that this does not persist across reboots, so you'll have to run it every time you restart.
After you clone the git repo, you can use the gradle wrapper to build and run the project. Make sure you have Apache Cassandra running on the host & port specified in conf/sidecar.yaml.
$ ./gradlew run
Alternatively, you can run against a local CCM cluster. Cassandra Sidecar provides a configuration for a 3-node CCM cluster named sidecardemo. You can use the gradle wrapper to run the project connected to a 3-node CCM cluster as follows:
$ ./gradlew run -Dsidecar.config=file:///$PWD/examples/conf/sidecar-ccm.yaml
Please see samples for details.
While setting up cassandra instance, make sure the data directories of cassandra are in the path stored in sidecar.yaml file, else modify data directories path to point to the correct directories for stream APIs to work.
Apache Cassandra Sidecar supports Change Data Capture (CDC) to stream table mutations to Apache Kafka. This section describes how to configure and run Sidecar with CDC enabled.
Edit your cassandra.yaml configuration file and enable CDC:
cdc_enabled: true
Restart your Cassandra instance for this change to take effect.
Edit your sidecar.yaml configuration file with the following settings:
sidecar: # Enable schema management (required for CDC) schema: is_enabled: true keyspace: sidecar_internal replication_strategy: SimpleStrategy replication_factor: 3 # Enable CDC feature cdc: enabled: true config_refresh_time: 10s table_schema_refresh_time: 60s segment_hardlink_cache_expiry: 1m
Configuration Parameters:
schema.is_enabled: Must be true for CDC to function. Creates the sidecar_internal keyspace for CDC state management.cdc.enabled: Enables the CDC feature in Sidecar.cdc.config_refresh_time: How frequently CDC configuration is refreshed from the database.cdc.table_schema_refresh_time: How frequently table schemas are refreshed for CDC-enabled tables.cdc.segment_hardlink_cache_expiry: Cache expiration time for CDC segment hard links.For each table you want to capture changes from, enable the CDC property using CQL:
-- For a new table CREATE TABLE my_keyspace.my_table ( id text PRIMARY KEY, name text, value int ) WITH cdc = true; -- For an existing table ALTER TABLE my_keyspace.my_table WITH cdc = true;
Use the CDC configuration API endpoint to set up CDC parameters:
curl --request PUT \ --url http://localhost:9043/api/v1/services/cdc/config \ --header 'content-type: application/json' \ --data '{ "config": { "datacenter": "datacenter1", "env": "production", "topic_format_type": "STATIC", "topic": "cdc-events" } }'
CDC Configuration Parameters:
datacenter: The datacenter name for this Sidecar instance.env: Environment identifier (e.g., production, staging, dev).topic_format_type: Determines how Kafka topic names are generated. Options:STATIC: Use a single fixed topic name specified in topic fieldKEYSPACE: Format as {topic}-{keyspace}KEYSPACETABLE: Format as {topic}-{keyspace}-{table}TABLE: Format as {topic}-{table}MAP: Use custom topic mapping (advanced)topic: Base Kafka topic name for CDC events.Configure the Kafka producer settings using the Kafka configuration API endpoint:
curl --request PUT \ --url http://localhost:9043/api/v1/services/kafka/config \ --header 'content-type: application/json' \ --data '{ "config": { "bootstrap.servers": "localhost:9092", "key.serializer": "org.apache.kafka.common.serialization.StringSerializer", "value.serializer": "org.apache.kafka.common.serialization.ByteArraySerializer", "acks": "all", "retries": "3", "retry.backoff.ms": "200", "enable.idempotence": "true", "batch.size": "16384", "linger.ms": "5", "buffer.memory": "33554432", "compression.type": "snappy", "request.timeout.ms": "30000", "delivery.timeout.ms": "120000", "max.in.flight.requests.per.connection": "5", "client.id": "cdc-producer" } }'
Key Kafka Producer Parameters:
bootstrap.servers: Comma-separated list of Kafka broker addresses.key.serializer: Serializer for the message key (use StringSerializer).value.serializer: Serializer for the message value (use ByteArraySerializer for Avro).acks: Number of acknowledgments the producer requires (all for maximum durability).enable.idempotence: Ensures exactly-once semantics when set to true.compression.type: Compression algorithm (snappy, gzip, lz4, zstd, or none).For a complete list of Kafka producer configurations, see the Apache Kafka Producer Configuration Documentation.
CDC events are serialized in Apache Avro format. Sidecar includes a built-in schema store (CachingSchemaStore) that:
table_schema_refresh_time configurationEach CDC event published to Kafka contains:
After completing the configuration:
Check Sidecar Logs: Verify CDC is enabled and connected to Kafka:
grep -i "cdc" /path/to/sidecar.log
Verify Configuration: Retrieve current CDC and Kafka configurations:
# Get CDC configuration curl http://localhost:9043/api/v1/services/cdc/config # Get Kafka configuration curl http://localhost:9043/api/v1/services/kafka/config # Get all service configurations curl http://localhost:9043/api/v1/services
While Sidecar includes a built-in schema store, you can integrate with external schema registries by:
SchemaStore interfaceCDC not starting:
schema.is_enabled: true in sidecar.yamlcdc_enabled: truesidecar_internal keyspace exists and is accessibleNo messages in Kafka:
cdc = true propertygrep -i "kafka\|cdc" /path/to/sidecar.logSchema errors:
table_schema_refresh_time is appropriate for your use caseThe test framework is set up to run 4.1 and 5.1 (Trunk) tests (see TestVersionSupplier.java) by default. You can change this via the Java property cassandra.sidecar.versions_to_test by supplying a comma-delimited string. For example, -Dcassandra.sidecar.versions_to_test=4.0,4.1,5.1.
You will need to use the “Add Projects” function of CircleCI to set up CircleCI on your fork. When promoted to create a branch, do not replace the CircleCI config, choose the option to do it manually. CircleCI will pick up the in project configuration.
We warmly welcome and appreciate contributions from the community. Please see CONTRIBUTING.md if you wish to submit pull requests.
1 The Sidecar Client offers Java 1.8 compatibility, and produces artifacts for both Java 1.8 and Java 11.