commit | 1633cd9c6c3d88d5c66825fab76a369266509f7e | [log] [tgz] |
---|---|---|
author | Dinesh Joshi <djoshi@apache.org> | Fri May 19 14:57:47 2023 -0700 |
committer | Dinesh Joshi <djoshi@apache.org> | Fri May 19 15:28:54 2023 -0700 |
tree | 81bc3a00a833a6ef597312f9fd0ee213b85dd0d3 |
CEP-28: Apache Cassandra Analytics This is the initial commit for the Apache Cassandra Analytics project where we support reading and writing bulk data from Apache Cassandra from Spark. Patch by James Berragan, Doug Rohrer; Reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-16222 Co-authored-by: James Berragan <jberragan@apple.com> Co-authored-by: Doug Rohrer <drohrer@apple.com> Co-authored-by: Saranya Krishnakumar <saranya_k@apple.com> Co-authored-by: Francisco Guerrero <francisco.guerrero@apple.com> Co-authored-by: Yifan Cai <ycai@apache.org> Co-authored-by: Jyothsna Konisa <jkonisa@apple.com> Co-authored-by: Yuriy Semchyshyn <ysemchyshyn@apple.com> Co-authored-by: Dinesh Joshi <djoshi@apache.org>
The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.
This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.
For example usage, see the example repository; sample steps:
import org.apache.cassandra.spark.sparksql.CassandraDataSource import org.apache.spark.sql.SparkSession val sparkSession = SparkSession.builder.getOrCreate() val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource") .option("sidecar_instances", "localhost,localhost2,localhost3") .option("keyspace", "sbr_tests") .option("table", "basic_test") .option("DC", "datacenter1") .option("createSnapshot", true) .option("numCores", 4) .load()
The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.
If you are a consumer of the Cassandra Spark Bulk Writer, please see our end-user documentation: usage instructions, FAQs, troubleshooting guides, and release notes.
Developers interested in contributing to the SBW, please see the DEV-README.
For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.