CASSANDRA-18574: Fix sample job documentation after Sidecar changes

This commit fixes the README file with documentation to setup and run the Sample job provided in the repository.
During Sidecar review, there was a suggestion to change the yaml property `uploads_staging_dir` to `staging_dir`.
That change however was not reflected as part of the sample job README.md.

patch by Francisco Guerrero; reviewed by Dinesh Joshi, Yifan Cai for CASSANDRA-18574
1 file changed
tree: a995c2cc8fdcb11332a69b9715e7e146439da785
  1. .circleci/
  2. cassandra-analytics-core/
  3. cassandra-analytics-core-example/
  4. cassandra-bridge/
  5. cassandra-four-zero/
  6. cassandra-three-zero/
  7. config/
  8. githooks/
  9. gradle/
  10. ide/
  11. profiles/
  12. .asf.yaml
  13. .gitignore
  14. build.gradle
  15. CHANGES.txt
  16. code_version.sh
  17. DEV-README.md
  18. gradle.properties
  19. gradlew
  20. LICENSE.txt
  21. NOTICE.txt
  22. README.md
  23. settings.gradle
README.md

Cassandra Analytics

Cassandra Spark Bulk Reader

The open-source repository for the Cassandra Spark Bulk Reader. This library allows integration between Cassandra and Spark job, allowing users to run arbitrary Spark jobs against a Cassandra cluster securely and consistently.

This project contains the necessary open-source implementations to connect to a Cassandra cluster and read the data into Spark.

For example usage, see the example repository; sample steps:

import org.apache.cassandra.spark.sparksql.CassandraDataSource
import org.apache.spark.sql.SparkSession

val sparkSession = SparkSession.builder.getOrCreate()
val df = sparkSession.read.format("org.apache.cassandra.spark.sparksql.CassandraDataSource")
                          .option("sidecar_instances", "localhost,localhost2,localhost3")
                          .option("keyspace", "sbr_tests")
                          .option("table", "basic_test")
                          .option("DC", "datacenter1")
                          .option("createSnapshot", true)
                          .option("numCores", 4)
                          .load()

Cassandra Spark Bulk Writer

The Cassandra Spark Bulk Writer allows for high-speed data ingest to Cassandra clusters running Cassandra 3.0 and 4.0.

If you are a consumer of the Cassandra Spark Bulk Writer, please see our end-user documentation: usage instructions, FAQs, troubleshooting guides, and release notes.

Developers interested in contributing to the SBW, please see the DEV-README.

Getting Started

For example usage, see the example repository. This example covers both setting up Cassandra 4.0, Apache Sidecar, and running a Spark Bulk Reader and Spark Bulk Writer job.