commit	d1d0dd70951c9997ca7f9eeb184da64a0eb8fed7	[log] [tgz]
author	Francisco Guerrero <frankgh@apache.org>	Tue Apr 02 12:01:49 2024 -0700
committer	Francisco Guerrero <frankgh@apache.org>	Tue Apr 02 12:03:21 2024 -0700
tree	dfb15caa1922ec91a9e7cbb3b4d8eec1eccede98
parent	98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a [diff]

commit

d1d0dd70951c9997ca7f9eeb184da64a0eb8fed7

[log] [tgz]

author

Francisco Guerrero <frankgh@apache.org>

Tue Apr 02 12:01:49 2024 -0700

committer

Francisco Guerrero <frankgh@apache.org>

Tue Apr 02 12:03:21 2024 -0700

tree

dfb15caa1922ec91a9e7cbb3b4d8eec1eccede98

parent

98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a [diff]

Ninja fix for CASSANDRA-19340 Revert "Make sure bridge exists" This reverts commit 98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a. We revert this commit because the commit message was lost during merge. We immediately add the same commit with the correct commit message, to avoid rewriting git history.

tree: dfb15caa1922ec91a9e7cbb3b4d8eec1eccede98

README.md

Cassandra Analytics

Cassandra Spark Bulk Reader

The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.

This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.

For example usage, see the example repository; sample steps:

import org.apache.cassandra.spark.sparksql.CassandraDataSource
import org.apache.spark.sql.SparkSession

val sparkSession = SparkSession.builder.getOrCreate()
val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource")
                          .option("sidecar_instances", "localhost,localhost2,localhost3")
                          .option("keyspace", "sbr_tests")
                          .option("table", "basic_test")
                          .option("DC", "datacenter1")
                          .option("createSnapshot", true)
                          .option("numCores", 4)
                          .load()

Cassandra Spark Bulk Writer

The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.

Developers interested in contributing to the Analytics library, please see the DEV-README.

Getting Started

For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.