Cassandra Analytics

Cassandra Analytics supports Spark 2 (Scala 2.11 and 2.12) and Spark 3 (Scala 2.12).

This project uses Gradle as the dependency management and build framework.

Dependencies

This library depends on both the Cassandra Sidecar (test and production) and shaded in-jvm dtest jars from Cassandra (testing only). Because these artifacts are not published by the Cassandra project, we have provided a script to build them locally.

NOTE: If you are working on multiple projects that depend on the Cassandra Sidecar and in-jvm dtest dependencies, you can share those artifacts by setting the CASSANDRA_DEP_DIR environment variable to a shared directory and dependencies will build there instead of local to the project.

In order to build the necessary dependencies, please run the following:

./scripts/build-dependencies.sh

This will build both the necessary dtest jars and the sidecar libraries/package necessary for build and test. You can also skip either the dtest jar build or the sidecar build by setting the following environment variables to true:

SKIP_DTEST_JAR_BUILD=true SKIP_SIDECAR_BUILD=true ./scripts/build-dependencies.sh

Note that build-dependencies.sh attempts to pull the latest from branches specified in the BRANCHES environment variable for Cassandra dtest jars, and trunk for the sidecar.

Building

Once you‘ve built the dependencies, you’re ready to build the analytics project.

Cassandra Analytics will build for Spark 2 and Scala 2.11 by default.

Navigate to the top-level directory for this project:

./gradlew clean assemble

Spark 2 and Scala 2.12

To build for Scala 2.12, set the profile by exporting SCALA_VERSION=2.12:

export SCALA_VERSION=2.12
./gradlew clean assemble

Spark 3 and Scala 2.12

To build for Spark 3 and Scala 2.12, export both SCALA_VERSION=2.12 and SPARK_VERSION=3:

export SCALA_VERSION=2.12
export SPARK_VERSION=3
./gradlew clean assemble

Git hooks (optional)

To enable git hooks, run the following command at project root.

git config core.hooksPath githooks

Running Integration Tests

To run integration tests, build dependencies with instructions under Dependencies section and configure IP aliases needed for integration tests.

macOS network aliases

create a temporary alias for every node except the first:

 for i in {2..20}; do sudo ifconfig lo0 alias "127.0.0.${i}"; done

IntelliJ

The project is well-supported in IntelliJ.

Run the following profile to copy code style used for this project:

./gradlew copyCodeStyle

The project has different sources for Spark 2 and Spark 3.

Spark 2 uses the org.apache.spark.sql.sources.v2 APIs that have been deprecated in Spark 3.

Spark 3 uses new APIs that live in the org.apache.spark.sql.connector.read namespace.

By default, the project will load Spark 2 sources, but you can switch between sources by modifying the gradle.properties file.

For Spark 3, use the following in gradle.properties:

scala=2.12
spark=3

And then load Gradle changes (on Mac, the shortcut to load Gradle changes is Command + Shift + I).

This will make the IDE pick up the Spark 3 sources, and you should now be able to develop against Spark 3 as well.