Documentation

Performance monitoring

The Spark Cassandra Connector utilizes the Codahale metrics system to expose information on the latency and throughput of Cassandra operations.

Internal Spark metrics

Spark internal metrics are visible in Spark UI. The user is able to browse application stages and particular tasks along with the amount of data which was read, written and how long it took.

Because of a limitation in the Spark metrics system the amount of data that has been read or written in a task will be marked as a Hadoop operation. Spark doesn‘t allow custom labels in their metric system and the Connector is not actually passing data through Hadoop. However, it doesn’t really matter because it is just label.

Codahale metrics

The Connector metrics are also exposed through Spark's metric system in both the executor and the driver. To access these metrics add a new source called cassandra-connector in your metrics.properties file. Example:

executor.source.cassandra-connector.class=org.apache.spark.metrics.CassandraConnectorSource
driver.source.cassandra-connector.class=org.apache.spark.metrics.CassandraConnectorSource

Performance impact

While there should be a minimal performance effect from collecting metrics, Metric collection can be disabled. Codahale metrics are not collected if CassandraConnectorSource is not specified in the metrics configuration file. In order to disable task metrics, use these properties in Spark configuration:

  • spark.cassandra.input.metrics - set to false to disable collection of input task metrics
  • spark.cassandra.output.metrics - set to false to disable collection of output task metrics

Available metrics

Metric nameUnit description
write-byte-meterNumber of bytes written to Cassandra
write-row-meterNumber of rows written to Cassandra
write-batch-timerBatch write time length
write-batch-wait-timerThe length of time batches sit in the queue before being submitted to Cassandra
write-batch-size-histogramThe distribution of the rows in batches
write-task-timerTimer to measure time of writing a single partition
write-success-counterNumber successfully written batches
write-failure-counterNumber of failed batches
read-byte-meterNumber of bytes read from Cassandra
read-row-meterNumber of rows read from Cassandra
read-task-timerTimer to measure time of reading a single partition

Next - Building And Artifacts