commit	c73c76498b0c2b36705025de6b0b2a7bb38e758b	[log] [tgz]
author	Doug Rohrer <drohrer@apple.com>	Mon Nov 20 10:54:46 2023 -0500
committer	Dinesh Joshi <djoshi@apache.org>	Wed Nov 29 21:31:46 2023 -0800
tree	8c94402f41c5a27918c9079c955711d83e348955
parent	87a729feb4660f57bacb2a4be73e1bb2d509578b [diff]

commit

c73c76498b0c2b36705025de6b0b2a7bb38e758b

[log] [tgz]

author

Doug Rohrer <drohrer@apple.com>

Mon Nov 20 10:54:46 2023 -0500

committer

Dinesh Joshi <djoshi@apache.org>

Wed Nov 29 21:31:46 2023 -0800

tree

8c94402f41c5a27918c9079c955711d83e348955

parent

87a729feb4660f57bacb2a4be73e1bb2d509578b [diff]

CASSANDRA-19048 - Audit table properties passed through Analytics CqlUtils The following properties have an effect on the files generated by the bulk writer, and therefore need to be retained when cleaning the table schema: bloom_filter_fp_chance cdc compression default_time_to_live min_index_interval max_index_interval Additionally, this commit adds tests to make sure all available TTL paths, including table default TTLs and constant/per-row options, work as designed. Patch by Doug Rohrer; Reviewed by Francisco Guerrero Hernandez, Yifan Cai, Dinesh Joshi for CASSANDRA-19048

tree: 8c94402f41c5a27918c9079c955711d83e348955

README.md

Cassandra Analytics

Cassandra Spark Bulk Reader

The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.

This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.

For example usage, see the example repository; sample steps:

import org.apache.cassandra.spark.sparksql.CassandraDataSource
import org.apache.spark.sql.SparkSession

val sparkSession = SparkSession.builder.getOrCreate()
val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource")
                          .option("sidecar_instances", "localhost,localhost2,localhost3")
                          .option("keyspace", "sbr_tests")
                          .option("table", "basic_test")
                          .option("DC", "datacenter1")
                          .option("createSnapshot", true)
                          .option("numCores", 4)
                          .load()

Cassandra Spark Bulk Writer

The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.

Developers interested in contributing to the Analytics library, please see the DEV-README.

Getting Started

For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.