commit	d28442ae712c1597052493aa3d2353a2de2495c2	[log] [tgz]
author	Francisco Guerrero <frankgh@apache.org>	Wed Mar 27 13:32:39 2024 -0700
committer	GitHub <noreply@github.com>	Wed Mar 27 13:32:39 2024 -0700
tree	c791d6f661820b205e4d06b22205e83a9aa3a615
parent	164243e78f1557a34bc699ebc716b532781d6422 [diff]

commit

d28442ae712c1597052493aa3d2353a2de2495c2

[log] [tgz]

author

Francisco Guerrero <frankgh@apache.org>

Wed Mar 27 13:32:39 2024 -0700

committer

GitHub <noreply@github.com>

Wed Mar 27 13:32:39 2024 -0700

tree

c791d6f661820b205e4d06b22205e83a9aa3a615

parent

164243e78f1557a34bc699ebc716b532781d6422 [diff]

CASSANDRA-19500 Fix XXHash32Digest calculated digest value (#46) This PR bumps the Sidecar version to the current latest HEAD of Sidecar. Bumping the version surfaced an issue with the way we are producing digest strings for the XXHash32 implementation. The hash value is not masked and this causes the negative sign to be forwarded producing the incorrect hash result. Patch by Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRA-19500

tree: c791d6f661820b205e4d06b22205e83a9aa3a615

README.md

Cassandra Analytics

Cassandra Spark Bulk Reader

The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.

This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.

For example usage, see the example repository; sample steps:

import org.apache.cassandra.spark.sparksql.CassandraDataSource
import org.apache.spark.sql.SparkSession

val sparkSession = SparkSession.builder.getOrCreate()
val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource")
                          .option("sidecar_instances", "localhost,localhost2,localhost3")
                          .option("keyspace", "sbr_tests")
                          .option("table", "basic_test")
                          .option("DC", "datacenter1")
                          .option("createSnapshot", true)
                          .option("numCores", 4)
                          .load()

Cassandra Spark Bulk Writer

The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.

Developers interested in contributing to the Analytics library, please see the DEV-README.

Getting Started

For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.