commit	c7c3bbca2c7cb415b39689e924fa2357c239f043	[log] [tgz]
author	Francisco Guerrero <frankgh@apache.org>	Tue Nov 14 16:28:14 2023 -0800
committer	Francisco Guerrero <frankgh@apache.org>	Fri Dec 08 10:57:26 2023 -0800
tree	43d79702ea4efcf4378553ba479d36ed438d73c1
parent	457b36bcb3c8a865cca83ca6c402246798113ab4 [diff]

commit

c7c3bbca2c7cb415b39689e924fa2357c239f043

[log] [tgz]

author

Francisco Guerrero <frankgh@apache.org>

Tue Nov 14 16:28:14 2023 -0800

committer

Francisco Guerrero <frankgh@apache.org>

Fri Dec 08 10:57:26 2023 -0800

tree

43d79702ea4efcf4378553ba479d36ed438d73c1

parent

457b36bcb3c8a865cca83ca6c402246798113ab4 [diff]

CASSANDRA-19031: Fix bulk writing when using identifiers that need quotes Cassandra treats all identifiers as lower case unless explicitly quoted by the users, (i.e. keyspace names, table names, column names, etc). We can define a case-sensitive identifier or we can use a reserved word as an identifier by quoting it during DDL creation. In the analytics library, bulk writing fails when we encounter these identifiers. In this commit, we fix the issue by property propagating the information about whether identifiers need to be quoted by exposing a new dataframe option (`quote_identifiers`). When set to `true`, it will _maybe_ quote the keyspace/table/column names and it will properly be able to write data when using mixed-case or reserved words in the identifiers. Patch by Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRA-19031

tree: 43d79702ea4efcf4378553ba479d36ed438d73c1

README.md

Cassandra Analytics

Cassandra Spark Bulk Reader

The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.

This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.

For example usage, see the example repository; sample steps:

import org.apache.cassandra.spark.sparksql.CassandraDataSource
import org.apache.spark.sql.SparkSession

val sparkSession = SparkSession.builder.getOrCreate()
val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource")
                          .option("sidecar_instances", "localhost,localhost2,localhost3")
                          .option("keyspace", "sbr_tests")
                          .option("table", "basic_test")
                          .option("DC", "datacenter1")
                          .option("createSnapshot", true)
                          .option("numCores", 4)
                          .load()

Cassandra Spark Bulk Writer

The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.

Developers interested in contributing to the Analytics library, please see the DEV-README.

Getting Started

For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.