Apache Spark Connect Client for Swift

Clone this repo:
  1. cdeaa46 [SPARK-56380] Use `4.2.0-preview4` RC1 for Spark 4.2 integration tests by Dongjoon Hyun · 2 days ago main
  2. f296982 [SPARK-56290] Rename `build-macos-26-swift(62 -> 63)` in CI by Dongjoon Hyun · 10 days ago 0.6.0-rc.1
  3. 5fc3ff8 [SPARK-56288] Upgrade `gRPC Swift NIO Transport` to 2.6.2 by Dongjoon Hyun · 10 days ago
  4. f43e83d [SPARK-56266] Migrate `Examples` to Swift 6.3 by Dongjoon Hyun · 13 days ago
  5. 3cfd114 [SPARK-56265] Make `Examples` Docker image sizes up-to-date with the published images by Dongjoon Hyun · 13 days ago

Apache Spark Connect Client for Swift

Release GitHub Actions Build Swift Version Compatibility Platform Compatibility

Apache Spark™ Connect for Swift is a subproject of Apache Spark and aims to provide a modern Swift library to enable Swift developers to leverage the power of Apache Spark for distributed data processing, machine learning, and analytical workloads directly from their Swift applications. For example, a user can develop and ship a lightweight Swift-based SparkPi app.

Docker Image Size | Name | Image Size | | ------------- | ---------- | | apache/spark:4.1.1-python3-based SparkPi | Docker Image Size | | pyspark-connect-based SparkPi | Docker Image Size | | Swift-based SparkPi | Docker Image Size |

Resources

Requirement

So far, this library project is tracking the upstream changes of Apache Arrow project's Swift-support.

How to use in your apps

Create a Swift project.

mkdir SparkConnectSwiftApp
cd SparkConnectSwiftApp
swift package init --name SparkConnectSwiftApp --type executable

Add SparkConnect package to the dependency like the following

$ cat Package.swift
import PackageDescription

let package = Package(
  name: "SparkConnectSwiftApp",
  platforms: [
    .macOS(.v15)
  ],
  dependencies: [
    .package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main")
  ],
  targets: [
    .executableTarget(
      name: "SparkConnectSwiftApp",
      dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")]
    )
  ]
)

Use SparkSession of SparkConnect module in Swift.

$ cat Sources/main.swift

import SparkConnect

let spark = try await SparkSession.builder.getOrCreate()
print("Connected to Apache Spark \(await spark.version) Server")

let statements = [
  "DROP TABLE IF EXISTS t",
  "CREATE TABLE IF NOT EXISTS t(a INT) USING ORC",
  "INSERT INTO t VALUES (1), (2), (3)",
]

for s in statements {
  print("EXECUTE: \(s)")
  _ = try await spark.sql(s).count()
}
print("SELECT * FROM t")
try await spark.sql("SELECT * FROM t").cache().show()

try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc")
try await spark.read.orc("/tmp/orc").show()

await spark.stop()

Run your Swift application.

$ swift run
...
Connected to Apache Spark 4.1.1 Server
EXECUTE: DROP TABLE IF EXISTS t
EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT) USING ORC
EXECUTE: INSERT INTO t VALUES (1), (2), (3)
SELECT * FROM t
+---+
|  a|
+---+
|  1|
|  3|
|  2|
+---+

+---+
| id|
+---+
|  6|
|  8|
|  4|
|  2|
|  0|
+---+

You can find more complete examples including Spark SQL REPL, Web Server and Streaming applications in the Examples directory.

This library also supports SPARK_REMOTE environment variable to specify the Spark Connect connection string in order to provide more options.