CASSANDRA-19048 - Audit table properties passed through Analytics CqlUtils

The following properties have an effect on the files generated by the
bulk writer, and therefore need to be retained when cleaning the table
schema:

bloom_filter_fp_chance
cdc
compression
default_time_to_live
min_index_interval
max_index_interval

Additionally, this commit adds tests to make sure all available TTL
paths, including table default TTLs and constant/per-row options, work
as designed.

Patch by Doug Rohrer; Reviewed by Francisco Guerrero Hernandez, Yifan Cai,
Dinesh Joshi for CASSANDRA-19048
8 files changed
tree: 8c94402f41c5a27918c9079c955711d83e348955
  1. .circleci/
  2. cassandra-analytics-core/
  3. cassandra-analytics-core-example/
  4. cassandra-analytics-integration-framework/
  5. cassandra-analytics-integration-tests/
  6. cassandra-bridge/
  7. cassandra-four-zero/
  8. cassandra-three-zero/
  9. config/
  10. githooks/
  11. gradle/
  12. ide/
  13. profiles/
  14. scripts/
  15. .asf.yaml
  16. .gitignore
  17. build.gradle
  18. CHANGES.txt
  19. code_version.sh
  20. DEV-README.md
  21. gradle.properties
  22. gradlew
  23. LICENSE.txt
  24. NOTICE.txt
  25. README.md
  26. settings.gradle
README.md

Cassandra Analytics

Cassandra Spark Bulk Reader

The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.

This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.

For example usage, see the example repository; sample steps:

import org.apache.cassandra.spark.sparksql.CassandraDataSource
import org.apache.spark.sql.SparkSession

val sparkSession = SparkSession.builder.getOrCreate()
val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource")
                          .option("sidecar_instances", "localhost,localhost2,localhost3")
                          .option("keyspace", "sbr_tests")
                          .option("table", "basic_test")
                          .option("DC", "datacenter1")
                          .option("createSnapshot", true)
                          .option("numCores", 4)
                          .load()

Cassandra Spark Bulk Writer

The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.

Developers interested in contributing to the Analytics library, please see the DEV-README.

Getting Started

For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.