Apache HBase™ Spark Connector

Spark, Scala and Configurable Options

To generate an artifact for a different Spark version and/or Scala version, Hadoop version, or HBase version, pass command-line options as follows (changing version numbers appropriately):

$ mvn -Dspark.version=3.1.2 -Dscala.version=2.12.10 -Dhadoop-three.version=3.2.0 -Dscala.binary.version=2.12 -Dhbase.version=2.4.8 clean install

Note: to build the connector with Spark 2.x, compile it with -Dscala.binary.version=2.11 and use the profile -Dhadoop.profile=2.0

Configuration and Installation

Client-side (Spark) configuration:

  • The HBase configuration file hbase-site.xml should be made available to Spark, it can be copied to $SPARK_CONF_DIR (default is $SPARK_HOME/conf`)

Server-side (HBase region servers) configuration:

  • The following jars need to be in the CLASSPATH of the HBase region servers:
    • scala-library, hbase-spark, and hbase-spark-protocol-shaded.
  • The server-side configuration is needed for column filter pushdown
    • if you cannot perform the server-side configuration, consider using .option("hbase.spark.pushdown.columnfilter", false)
  • The Scala library version must match the Scala version (2.11 or 2.12) used for compiling the connector.