doc/reference.md - cassandra-spark-connector - Git at Google

 # Configuration Reference


 ## Alternative Connection Configuration Options
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.connection.config.cloud.path</code></td>
   <td>None</td>
   <td>Path to Secure Connect Bundle to be used for this connection. Accepts URLs and references to files
 distributed via spark.files (--files) setting.<br/>
 Provided URL must by accessible from each executor.<br/>
 Using spark.files is recommended as it relies on Spark to distribute the bundle to every executor and
 leverages Spark capabilities to access files located in distributed file systems like HDFS, S3, etc.
 For example, to use a bundle located in HDFS in spark-shell:

     spark-shell --conf spark.files=hdfs:///some_dir/bundle.zip \
        --conf spark.cassandra.connection.config.cloud.path=bundle.zip \
        --conf spark.cassandra.auth.username=<name> \
        --conf spark.cassandra.auth.password=<pass> ...

 </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.config.profile.path</code></td>
   <td>None</td>
   <td>Specifies a default Java Driver 4.0 Profile file to be used for this connection. Accepts
 URLs and references to files distributed via spark.files (--files) setting.</td>
 </tr>
 </table>


 ## Cassandra Authentication Parameters
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.auth.conf.factory</code></td>
   <td>DefaultAuthConfFactory</td>
   <td>Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration</td>
 </tr>
 </table>


 ## Cassandra Connection Parameters
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.connection.compression</code></td>
   <td>NONE</td>
   <td>Compression to use (LZ4, SNAPPY or NONE)</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.factory</code></td>
   <td>DefaultConnectionFactory</td>
   <td>Name of a Scala module or class implementing
 CassandraConnectionFactory providing connections to the Cassandra cluster</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.host</code></td>
   <td>localhost</td>
   <td>Contact point to connect to the Cassandra cluster. A comma separated list
 may also be used. Ports may be provided but are optional. If Ports are missing spark.cassandra.connection.port will
  be used ("127.0.0.1:9042,192.168.0.1:9051")
       </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.keepAliveMS</code></td>
   <td>3600000</td>
   <td>Period of time to keep unused connections open</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.localConnectionsPerExecutor</code></td>
   <td>None</td>
   <td>Number of local connections set on each Executor JVM. Defaults to the number
  of available CPU cores on the local node if not specified and not in a Spark Env</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.localDC</code></td>
   <td>None</td>
   <td>The local DC to connect to (other nodes will be ignored)</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.port</code></td>
   <td>9042</td>
   <td>Cassandra native connection port, will be set to all hosts if no individual ports are given</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.quietPeriodBeforeCloseMS</code></td>
   <td>0</td>
   <td>The time in seconds that must pass without any additional request after requesting connection close (see Netty quiet period)</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.reconnectionDelayMS.max</code></td>
   <td>60000</td>
   <td>Maximum period of time to wait before reconnecting to a dead node</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.reconnectionDelayMS.min</code></td>
   <td>1000</td>
   <td>Minimum period of time to wait before reconnecting to a dead node</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.remoteConnectionsPerExecutor</code></td>
   <td>None</td>
   <td>Minimum number of remote connections per Host set on each Executor JVM. Default value is
  estimated automatically based on the total number of executors in the cluster</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.resolveContactPoints</code></td>
   <td>true</td>
   <td>Controls, if we need to resolve contact points at start (true), or at reconnection (false).
 Helpful for usage with Kubernetes or other systems with dynamic endpoints which may change
 while the application is running.</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.timeoutBeforeCloseMS</code></td>
   <td>15000</td>
   <td>The time in seconds for all in-flight connections to finish after requesting connection close</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.timeoutMS</code></td>
   <td>5000</td>
   <td>Maximum period of time to attempt connecting to a node</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.query.retry.count</code></td>
   <td>60</td>
   <td>Number of times to retry a timed-out query
 Setting this to -1 means unlimited retries
       </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.read.timeoutMS</code></td>
   <td>120000</td>
   <td>Maximum period of time to wait for a read to return </td>
 </tr>
 </table>


 ## Cassandra Datasource Parameters
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.sql.inClauseToFullScanConversionThreshold</code></td>
   <td>20000000</td>
   <td>Queries with `IN` clause(s) are not converted to JoinWithCassandraTable operation if the size of cross
 product of all `IN` value sets exceeds this value. It is meant to stop conversion for huge `IN` values sets
 that may cause memory problems. If this limit is exceeded full table scan is performed.
 This setting takes precedence over spark.cassandra.sql.inClauseToJoinConversionThreshold.
 Query `select * from t where k1 in (1,2,3) and k2 in (1,2) and k3 in (1,2,3,4)` has 3 sets of `IN` values.
 Cross product of these values has size of 24.
          </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.sql.inClauseToJoinConversionThreshold</code></td>
   <td>2500</td>
   <td>Queries with `IN` clause(s) are converted to JoinWithCassandraTable operation if the size of cross
 product of all `IN` value sets exceeds this value. To disable `IN` clause conversion, set this setting to 0.
 Query `select * from t where k1 in (1,2,3) and k2 in (1,2) and k3 in (1,2,3,4)` has 3 sets of `IN` values.
 Cross product of these values has size of 24.
          </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.sql.pushdown.additionalClasses</code></td>
   <td></td>
   <td>A comma separated list of classes to be used (in order) to apply additional
  pushdown rules for Cassandra Dataframes. Classes must implement CassandraPredicateRules
       </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.table.size.in.bytes</code></td>
   <td>None</td>
   <td>Used by DataFrames Internally, will be updated in a future release to
 retrieve size from Cassandra. Can be set manually now</td>
 </tr>
 <tr>
   <td><code>spark.sql.dse.search.autoRatio</code></td>
   <td>0.03</td>
   <td>When Search Predicate Optimization is set to auto, Search optimizations will be preformed if this parameter * the total number of rows is greater than the number of rows to be returned by the solr query</td>
 </tr>
 <tr>
   <td><code>spark.sql.dse.search.enableOptimization</code></td>
   <td>auto</td>
   <td>Enables SparkSQL to automatically replace Cassandra Pushdowns with DSE Search
 Pushdowns utilizing lucene indexes. Valid options are On, Off, and Auto. Auto enables
 optimizations when the solr query will pull less than spark.sql.dse.search.autoRatio * the
 total table record count</td>
 </tr>
 </table>


 ## Cassandra Datasource Table Options
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>directJoinSetting</code></td>
   <td>auto</td>
   <td>Acceptable values, "on", "off", "auto"
 "on" causes a direct join to happen if possible regardless of size ratio.
 "off" disables direct join even when possible
 "auto" only does a direct join when the size ratio is satisfied see directJoinSizeRatio
       </td>
 </tr>
 <tr>
   <td><code>directJoinSizeRatio</code></td>
   <td>0.9</td>
   <td>
  Sets the threshold on when to perform a DirectJoin in place of a full table scan. When
  the size of the (CassandraSource * thisParameter) > The other side of the join, A direct
  join will be performed if possible.
       </td>
 </tr>
 <tr>
   <td><code>ignoreMissingMetaColumns</code></td>
   <td>false</td>
   <td>Acceptable values, "true", "false"
 "true" ignore missing meta properties
 "false" throw error if missing property is requested
       </td>
 </tr>
 <tr>
   <td><code>ttl</code></td>
   <td>None</td>
   <td>Surfaces the Cassandra Row TTL as a Column
 with the named specified. When reading use ttl.columnName=aliasForTTL. This
 can be done for every column with a TTL. When writing use writetime=columnName and the
 columname will be used to set the TTL for that row.</td>
 </tr>
 <tr>
   <td><code>writetime</code></td>
   <td>None</td>
   <td>Surfaces the Cassandra Row Writetime as a Column
 with the named specified. When reading use writetime.columnName=aliasForWritetime. This
 can be done for every column with a writetime. When Writing use writetime=columnName and the
 columname will be used to set the writetime for that row.</td>
 </tr>
 </table>


 ## Cassandra SSL Connection Options
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.clientAuth.enabled</code></td>
   <td>false</td>
   <td>Enable 2-way secure connection to Cassandra cluster</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.enabled</code></td>
   <td>false</td>
   <td>Enable secure connection to Cassandra cluster</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.enabledAlgorithms</code></td>
   <td>Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA)</td>
   <td>SSL cipher suites</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.keyStore.password</code></td>
   <td>None</td>
   <td>Key store password</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.keyStore.path</code></td>
   <td>None</td>
   <td>Path for the key store being used</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.keyStore.type</code></td>
   <td>JKS</td>
   <td>Key store type</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.protocol</code></td>
   <td>TLS</td>
   <td>SSL protocol</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.trustStore.password</code></td>
   <td>None</td>
   <td>Trust store password</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.trustStore.path</code></td>
   <td>None</td>
   <td>Path for the trust store being used</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.connection.ssl.trustStore.type</code></td>
   <td>JKS</td>
   <td>Trust store type</td>
 </tr>
 </table>


 ## Continuous Paging
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.dse.continuousPagingEnabled</code></td>
   <td>true</td>
   <td>Enables DSE Continuous Paging which improves scanning performance</td>
 </tr>
 </table>


 ## Default Authentication Parameters
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.auth.password</code></td>
   <td>None</td>
   <td>password for password authentication</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.auth.username</code></td>
   <td>None</td>
   <td>Login name for password authentication</td>
 </tr>
 </table>


 ## Read Tuning Parameters
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.concurrent.reads</code></td>
   <td>512</td>
   <td>Sets read parallelism for joinWithCassandra tables</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.input.consistency.level</code></td>
   <td>LOCAL_ONE</td>
   <td>Consistency level to use when reading	</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.input.fetch.sizeInRows</code></td>
   <td>1000</td>
   <td>Number of CQL rows fetched per driver request</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.input.metrics</code></td>
   <td>true</td>
   <td>Sets whether to record connector specific metrics on write</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.input.readsPerSec</code></td>
   <td>None</td>
   <td>Sets max requests or pages per core per second, unlimited by default.</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.input.split.sizeInMB</code></td>
   <td>512</td>
   <td>Approx amount of data to be fetched into a Spark partition. Minimum number of resulting Spark partitions is <code>1 + 2 * SparkContext.defaultParallelism</code></td>
 </tr>
 <tr>
   <td><code>spark.cassandra.input.throughputMBPerSec</code></td>
   <td>None</td>
   <td>*(Floating points allowed)* <br> Maximum read throughput allowed
  per single core in MB/s. Effects point lookups as well as full
  scans.</td>
 </tr>
 </table>


 ## Write Tuning Parameters
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
 <tr>
   <td><code>spark.cassandra.output.batch.grouping.buffer.size</code></td>
   <td>1000</td>
   <td> How many batches per single Spark task can be stored in
 memory before sending to Cassandra</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.batch.grouping.key</code></td>
   <td>Partition</td>
   <td>Determines how insert statements are grouped into batches. Available values are
 <ul>
   <li> <code> none </code> : a batch may contain any statements </li>
   <li> <code> replica_set </code> : a batch may contain only statements to be written to the same replica set </li>
   <li> <code> partition </code> : a batch may contain only statements for rows sharing the same partition key value </li>
 </ul>
 </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.batch.size.bytes</code></td>
   <td>1024</td>
   <td>Maximum total size of the batch in bytes. Overridden by
 spark.cassandra.output.batch.size.rows
     </td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.batch.size.rows</code></td>
   <td>None</td>
   <td>Number of rows per single batch. The default is 'auto'
 which means the connector will adjust the number
 of rows based on the amount of data
 in each row</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.concurrent.writes</code></td>
   <td>5</td>
   <td>Maximum number of batches executed in parallel by a
  single Spark task</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.consistency.level</code></td>
   <td>LOCAL_QUORUM</td>
   <td>Consistency level for writing</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.ifNotExists</code></td>
   <td>false</td>
   <td>Determines that the INSERT operation is not performed if a row with the same primary
 key already exists. Using the feature incurs a performance hit.</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.ignoreNulls</code></td>
   <td>false</td>
   <td> In Cassandra >= 2.2 null values can be left as unset in bound statements. Setting
 this to true will cause all null values to be left as unset rather than bound. For
 finer control see the CassandraOption class</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.metrics</code></td>
   <td>true</td>
   <td>Sets whether to record connector specific metrics on write</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.throughputMBPerSec</code></td>
   <td>None</td>
   <td>*(Floating points allowed)* <br> Maximum write throughput allowed
  per single core in MB/s. <br> Limit this on long (+8 hour) runs to 70% of your max throughput
  as seen on a smaller job for stability</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.timestamp</code></td>
   <td>0</td>
   <td>Timestamp (microseconds since epoch) of the write. If not specified, the time that the
  write occurred is used. A value of 0 means time of write.</td>
 </tr>
 <tr>
   <td><code>spark.cassandra.output.ttl</code></td>
   <td>0</td>
   <td>Time To Live(TTL) assigned to writes to Cassandra. A value of 0 means no TTL</td>
 </tr>
 </table>
	# Configuration Reference



	## Alternative Connection Configuration Options
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.connection.config.cloud.path</code></td>
	<td>None</td>
	<td>Path to Secure Connect Bundle to be used for this connection. Accepts URLs and references to files
	distributed via spark.files (--files) setting.<br/>
	Provided URL must by accessible from each executor.<br/>
	Using spark.files is recommended as it relies on Spark to distribute the bundle to every executor and
	leverages Spark capabilities to access files located in distributed file systems like HDFS, S3, etc.
	For example, to use a bundle located in HDFS in spark-shell:

	spark-shell --conf spark.files=hdfs:///some_dir/bundle.zip \
	--conf spark.cassandra.connection.config.cloud.path=bundle.zip \
	--conf spark.cassandra.auth.username=<name> \
	--conf spark.cassandra.auth.password=<pass> ...

	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.config.profile.path</code></td>
	<td>None</td>
	<td>Specifies a default Java Driver 4.0 Profile file to be used for this connection. Accepts
	URLs and references to files distributed via spark.files (--files) setting.</td>
	</tr>
	</table>


	## Cassandra Authentication Parameters
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.auth.conf.factory</code></td>
	<td>DefaultAuthConfFactory</td>
	<td>Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration</td>
	</tr>
	</table>


	## Cassandra Connection Parameters
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.connection.compression</code></td>
	<td>NONE</td>
	<td>Compression to use (LZ4, SNAPPY or NONE)</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.factory</code></td>
	<td>DefaultConnectionFactory</td>
	<td>Name of a Scala module or class implementing
	CassandraConnectionFactory providing connections to the Cassandra cluster</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.host</code></td>
	<td>localhost</td>
	<td>Contact point to connect to the Cassandra cluster. A comma separated list
	may also be used. Ports may be provided but are optional. If Ports are missing spark.cassandra.connection.port will
	be used ("127.0.0.1:9042,192.168.0.1:9051")
	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.keepAliveMS</code></td>
	<td>3600000</td>
	<td>Period of time to keep unused connections open</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.localConnectionsPerExecutor</code></td>
	<td>None</td>
	<td>Number of local connections set on each Executor JVM. Defaults to the number
	of available CPU cores on the local node if not specified and not in a Spark Env</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.localDC</code></td>
	<td>None</td>
	<td>The local DC to connect to (other nodes will be ignored)</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.port</code></td>
	<td>9042</td>
	<td>Cassandra native connection port, will be set to all hosts if no individual ports are given</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.quietPeriodBeforeCloseMS</code></td>
	<td>0</td>
	<td>The time in seconds that must pass without any additional request after requesting connection close (see Netty quiet period)</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.reconnectionDelayMS.max</code></td>
	<td>60000</td>
	<td>Maximum period of time to wait before reconnecting to a dead node</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.reconnectionDelayMS.min</code></td>
	<td>1000</td>
	<td>Minimum period of time to wait before reconnecting to a dead node</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.remoteConnectionsPerExecutor</code></td>
	<td>None</td>
	<td>Minimum number of remote connections per Host set on each Executor JVM. Default value is
	estimated automatically based on the total number of executors in the cluster</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.resolveContactPoints</code></td>
	<td>true</td>
	<td>Controls, if we need to resolve contact points at start (true), or at reconnection (false).
	Helpful for usage with Kubernetes or other systems with dynamic endpoints which may change
	while the application is running.</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.timeoutBeforeCloseMS</code></td>
	<td>15000</td>
	<td>The time in seconds for all in-flight connections to finish after requesting connection close</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.timeoutMS</code></td>
	<td>5000</td>
	<td>Maximum period of time to attempt connecting to a node</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.query.retry.count</code></td>
	<td>60</td>
	<td>Number of times to retry a timed-out query
	Setting this to -1 means unlimited retries
	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.read.timeoutMS</code></td>
	<td>120000</td>
	<td>Maximum period of time to wait for a read to return </td>
	</tr>
	</table>


	## Cassandra Datasource Parameters
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.sql.inClauseToFullScanConversionThreshold</code></td>
	<td>20000000</td>
	<td>Queries with `IN` clause(s) are not converted to JoinWithCassandraTable operation if the size of cross
	product of all `IN` value sets exceeds this value. It is meant to stop conversion for huge `IN` values sets
	that may cause memory problems. If this limit is exceeded full table scan is performed.
	This setting takes precedence over spark.cassandra.sql.inClauseToJoinConversionThreshold.
	Query `select * from t where k1 in (1,2,3) and k2 in (1,2) and k3 in (1,2,3,4)` has 3 sets of `IN` values.
	Cross product of these values has size of 24.
	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.sql.inClauseToJoinConversionThreshold</code></td>
	<td>2500</td>
	<td>Queries with `IN` clause(s) are converted to JoinWithCassandraTable operation if the size of cross
	product of all `IN` value sets exceeds this value. To disable `IN` clause conversion, set this setting to 0.
	Query `select * from t where k1 in (1,2,3) and k2 in (1,2) and k3 in (1,2,3,4)` has 3 sets of `IN` values.
	Cross product of these values has size of 24.
	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.sql.pushdown.additionalClasses</code></td>
	<td></td>
	<td>A comma separated list of classes to be used (in order) to apply additional
	pushdown rules for Cassandra Dataframes. Classes must implement CassandraPredicateRules
	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.table.size.in.bytes</code></td>
	<td>None</td>
	<td>Used by DataFrames Internally, will be updated in a future release to
	retrieve size from Cassandra. Can be set manually now</td>
	</tr>
	<tr>
	<td><code>spark.sql.dse.search.autoRatio</code></td>
	<td>0.03</td>
	<td>When Search Predicate Optimization is set to auto, Search optimizations will be preformed if this parameter * the total number of rows is greater than the number of rows to be returned by the solr query</td>
	</tr>
	<tr>
	<td><code>spark.sql.dse.search.enableOptimization</code></td>
	<td>auto</td>
	<td>Enables SparkSQL to automatically replace Cassandra Pushdowns with DSE Search
	Pushdowns utilizing lucene indexes. Valid options are On, Off, and Auto. Auto enables
	optimizations when the solr query will pull less than spark.sql.dse.search.autoRatio * the
	total table record count</td>
	</tr>
	</table>


	## Cassandra Datasource Table Options
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>directJoinSetting</code></td>
	<td>auto</td>
	<td>Acceptable values, "on", "off", "auto"
	"on" causes a direct join to happen if possible regardless of size ratio.
	"off" disables direct join even when possible
	"auto" only does a direct join when the size ratio is satisfied see directJoinSizeRatio
	</td>
	</tr>
	<tr>
	<td><code>directJoinSizeRatio</code></td>
	<td>0.9</td>
	<td>
	Sets the threshold on when to perform a DirectJoin in place of a full table scan. When
	the size of the (CassandraSource * thisParameter) > The other side of the join, A direct
	join will be performed if possible.
	</td>
	</tr>
	<tr>
	<td><code>ignoreMissingMetaColumns</code></td>
	<td>false</td>
	<td>Acceptable values, "true", "false"
	"true" ignore missing meta properties
	"false" throw error if missing property is requested
	</td>
	</tr>
	<tr>
	<td><code>ttl</code></td>
	<td>None</td>
	<td>Surfaces the Cassandra Row TTL as a Column
	with the named specified. When reading use ttl.columnName=aliasForTTL. This
	can be done for every column with a TTL. When writing use writetime=columnName and the
	columname will be used to set the TTL for that row.</td>
	</tr>
	<tr>
	<td><code>writetime</code></td>
	<td>None</td>
	<td>Surfaces the Cassandra Row Writetime as a Column
	with the named specified. When reading use writetime.columnName=aliasForWritetime. This
	can be done for every column with a writetime. When Writing use writetime=columnName and the
	columname will be used to set the writetime for that row.</td>
	</tr>
	</table>


	## Cassandra SSL Connection Options
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.clientAuth.enabled</code></td>
	<td>false</td>
	<td>Enable 2-way secure connection to Cassandra cluster</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.enabled</code></td>
	<td>false</td>
	<td>Enable secure connection to Cassandra cluster</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.enabledAlgorithms</code></td>
	<td>Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA)</td>
	<td>SSL cipher suites</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.keyStore.password</code></td>
	<td>None</td>
	<td>Key store password</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.keyStore.path</code></td>
	<td>None</td>
	<td>Path for the key store being used</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.keyStore.type</code></td>
	<td>JKS</td>
	<td>Key store type</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.protocol</code></td>
	<td>TLS</td>
	<td>SSL protocol</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.trustStore.password</code></td>
	<td>None</td>
	<td>Trust store password</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.trustStore.path</code></td>
	<td>None</td>
	<td>Path for the trust store being used</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.connection.ssl.trustStore.type</code></td>
	<td>JKS</td>
	<td>Trust store type</td>
	</tr>
	</table>


	## Continuous Paging
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.dse.continuousPagingEnabled</code></td>
	<td>true</td>
	<td>Enables DSE Continuous Paging which improves scanning performance</td>
	</tr>
	</table>


	## Default Authentication Parameters
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.auth.password</code></td>
	<td>None</td>
	<td>password for password authentication</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.auth.username</code></td>
	<td>None</td>
	<td>Login name for password authentication</td>
	</tr>
	</table>


	## Read Tuning Parameters
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.concurrent.reads</code></td>
	<td>512</td>
	<td>Sets read parallelism for joinWithCassandra tables</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.input.consistency.level</code></td>
	<td>LOCAL_ONE</td>
	<td>Consistency level to use when reading </td>
	</tr>
	<tr>
	<td><code>spark.cassandra.input.fetch.sizeInRows</code></td>
	<td>1000</td>
	<td>Number of CQL rows fetched per driver request</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.input.metrics</code></td>
	<td>true</td>
	<td>Sets whether to record connector specific metrics on write</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.input.readsPerSec</code></td>
	<td>None</td>
	<td>Sets max requests or pages per core per second, unlimited by default.</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.input.split.sizeInMB</code></td>
	<td>512</td>
	<td>Approx amount of data to be fetched into a Spark partition. Minimum number of resulting Spark partitions is <code>1 + 2 * SparkContext.defaultParallelism</code></td>
	</tr>
	<tr>
	<td><code>spark.cassandra.input.throughputMBPerSec</code></td>
	<td>None</td>
	<td>(Floating points allowed) <br> Maximum read throughput allowed
	per single core in MB/s. Effects point lookups as well as full
	scans.</td>
	</tr>
	</table>


	## Write Tuning Parameters
	<table class="table">
	<tr><th>Property Name</th><th>Default</th><th>Description</th></tr>
	<tr>
	<td><code>spark.cassandra.output.batch.grouping.buffer.size</code></td>
	<td>1000</td>
	<td> How many batches per single Spark task can be stored in
	memory before sending to Cassandra</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.batch.grouping.key</code></td>
	<td>Partition</td>
	<td>Determines how insert statements are grouped into batches. Available values are
	<ul>
	<li> <code> none </code> : a batch may contain any statements </li>
	<li> <code> replica_set </code> : a batch may contain only statements to be written to the same replica set </li>
	<li> <code> partition </code> : a batch may contain only statements for rows sharing the same partition key value </li>
	</ul>
	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.batch.size.bytes</code></td>
	<td>1024</td>
	<td>Maximum total size of the batch in bytes. Overridden by
	spark.cassandra.output.batch.size.rows
	</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.batch.size.rows</code></td>
	<td>None</td>
	<td>Number of rows per single batch. The default is 'auto'
	which means the connector will adjust the number
	of rows based on the amount of data
	in each row</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.concurrent.writes</code></td>
	<td>5</td>
	<td>Maximum number of batches executed in parallel by a
	single Spark task</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.consistency.level</code></td>
	<td>LOCAL_QUORUM</td>
	<td>Consistency level for writing</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.ifNotExists</code></td>
	<td>false</td>
	<td>Determines that the INSERT operation is not performed if a row with the same primary
	key already exists. Using the feature incurs a performance hit.</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.ignoreNulls</code></td>
	<td>false</td>
	<td> In Cassandra >= 2.2 null values can be left as unset in bound statements. Setting
	this to true will cause all null values to be left as unset rather than bound. For
	finer control see the CassandraOption class</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.metrics</code></td>
	<td>true</td>
	<td>Sets whether to record connector specific metrics on write</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.throughputMBPerSec</code></td>
	<td>None</td>
	<td>(Floating points allowed) <br> Maximum write throughput allowed
	per single core in MB/s. <br> Limit this on long (+8 hour) runs to 70% of your max throughput
	as seen on a smaller job for stability</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.timestamp</code></td>
	<td>0</td>
	<td>Timestamp (microseconds since epoch) of the write. If not specified, the time that the
	write occurred is used. A value of 0 means time of write.</td>
	</tr>
	<tr>
	<td><code>spark.cassandra.output.ttl</code></td>
	<td>0</td>
	<td>Time To Live(TTL) assigned to writes to Cassandra. A value of 0 means no TTL</td>
	</tr>
	</table>