The Polaris Spark plugin provides a SparkCatalog class, which communicates with the Polaris REST endpoints, and provides implementations for Apache Spark's TableCatalog, ViewCatalog classes. SupportsNamespaces,
Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 and 2.13, and depends on iceberg-spark-runtime 1.9.1.
The following command starts a Polaris server for local testing, it runs on localhost:8181 with default realm POLARIS
and root credentials root:s3cr3t
:
./gradlew run
Once the local server is running, you can start Spark with the Polaris Spark plugin using either the --packages
option with the Polaris Spark package, or the --jars
option with the Polaris Spark bundle JAR.
The following sections explain how to build and run Spark with both the Polaris package and the bundle JAR.
The Polaris Spark client source code is located in plugins/spark/v3.5/spark. To use the Polaris Spark package with Spark, you first need to publish the source JAR to your local Maven repository.
Run the following command to build the Polaris Spark project and publish the source JAR to your local Maven repository:
./gradlew assemble
-- build the whole Polaris project without running tests./gradlew publishToMavenLocal
-- publish Polaris project source JAR to local Maven repositorybin/spark-shell \ --packages org.apache.polaris:polaris-spark-<spark_version>_<scala_version>:<polaris_version>,org.apache.iceberg:iceberg-aws-bundle:1.10.0,io.delta:delta-spark_2.12:3.3.1 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --conf spark.sql.catalog.<catalog-name>.warehouse=<catalog-name> \ --conf spark.sql.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation=vended-credentials \ --conf spark.sql.catalog.<catalog-name>=org.apache.polaris.spark.SparkCatalog \ --conf spark.sql.catalog.<catalog-name>.uri=http://localhost:8181/api/catalog \ --conf spark.sql.catalog.<catalog-name>.credential="root:secret" \ --conf spark.sql.catalog.<catalog-name>.scope='PRINCIPAL_ROLE:ALL' \ --conf spark.sql.catalog.<catalog-name>.token-refresh-enabled=true \ --conf spark.sql.sources.useV1SourceList=''
The Polaris version is defined in the versions.txt
file located in the root directory of the Polaris project. Assume the following values:
spark_version
: 3.5scala_version
: 2.12polaris_version
: 1.2.0-incubating-SNAPSHOTcatalog-name
: polaris
The Spark command would look like following:bin/spark-shell \ --packages org.apache.polaris:polaris-spark-3.5_2.12:1.2.0-incubating-SNAPSHOT,org.apache.iceberg:iceberg-aws-bundle:1.10.0,io.delta:delta-spark_2.12:3.3.1 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --conf spark.sql.catalog.polaris.warehouse=polaris \ --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \ --conf spark.sql.catalog.polaris=org.apache.polaris.spark.SparkCatalog \ --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \ --conf spark.sql.catalog.polaris.credential="root:secret" \ --conf spark.sql.catalog.polaris.scope='PRINCIPAL_ROLE:ALL' \ --conf spark.sql.catalog.polaris.token-refresh-enabled=true \ --conf spark.sql.sources.useV1SourceList=''
The polaris-spark project also provides a Spark bundle JAR for the --jars
use case. The resulting JAR will follow this naming format: polaris-spark-<spark_version>_<scala_version>-<polaris_version>-bundle.jar For example: polaris-spark-bundle-3.5_2.12-1.2.0-incubating-SNAPSHOT-bundle.jar
Run ./gradlew assemble
to build the entire Polaris project without running tests. After the build completes, the bundle JAR can be found under: plugins/spark/v3.5/spark/build/<scala_version>/libs/. To start Spark using the bundle JAR, specify it with the --jars
option as shown below:
bin/spark-shell \ --jars <path-to-spark-client-jar> \ --packages org.apache.iceberg:iceberg-aws-bundle:1.10.0,io.delta:delta-spark_2.12:3.3.1 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --conf spark.sql.catalog.<catalog-name>.warehouse=<catalog-name> \ --conf spark.sql.catalog.<catalog-name>.header.X-Iceberg-Access-Delegation=vended-credentials \ --conf spark.sql.catalog.<catalog-name>=org.apache.polaris.spark.SparkCatalog \ --conf spark.sql.catalog.<catalog-name>.uri=http://localhost:8181/api/catalog \ --conf spark.sql.catalog.<catalog-name>.credential="root:secret" \ --conf spark.sql.catalog.<catalog-name>.scope='PRINCIPAL_ROLE:ALL' \ --conf spark.sql.catalog.<catalog-name>.token-refresh-enabled=true \ --conf spark.sql.sources.useV1SourceList=''
The Polaris Spark client supports catalog management for both Iceberg and Delta tables, it routes all Iceberg table requests to the Iceberg REST endpoints, and routes all Delta table requests to the Generic Table REST endpoints.
The Spark Client requires at least delta 3.2.1 to work with Delta tables, which requires at least Apache Spark 3.5.3. Following describes the current functionality limitations of the Polaris Spark client:
saveAsTable
method of Dataframe
is also not supported, since it relies on the CTAS support.