commit | d1d0dd70951c9997ca7f9eeb184da64a0eb8fed7 | [log] [tgz] |
---|---|---|
author | Francisco Guerrero <frankgh@apache.org> | Tue Apr 02 12:01:49 2024 -0700 |
committer | Francisco Guerrero <frankgh@apache.org> | Tue Apr 02 12:03:21 2024 -0700 |
tree | dfb15caa1922ec91a9e7cbb3b4d8eec1eccede98 | |
parent | 98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a [diff] |
Ninja fix for CASSANDRA-19340 Revert "Make sure bridge exists" This reverts commit 98baab1b8f0d5d7eb93f8d13db3b0a7a985fb03a. We revert this commit because the commit message was lost during merge. We immediately add the same commit with the correct commit message, to avoid rewriting git history.
The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.
This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.
For example usage, see the example repository; sample steps:
import org.apache.cassandra.spark.sparksql.CassandraDataSource import org.apache.spark.sql.SparkSession val sparkSession = SparkSession.builder.getOrCreate() val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource") .option("sidecar_instances", "localhost,localhost2,localhost3") .option("keyspace", "sbr_tests") .option("table", "basic_test") .option("DC", "datacenter1") .option("createSnapshot", true) .option("numCores", 4) .load()
The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.
Developers interested in contributing to the Analytics library, please see the DEV-README.
For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.