Apache Spark Connect Client for Swift

Clone this repo:
  1. 6076c74 [SPARK-55381] Use Spark `4.0.2` instead of `4.0.1` by Dongjoon Hyun · 10 hours ago main
  2. 5020fb7 [SPARK-55316] Upgrade `gRPC Swift NIO Transport` to 2.4.1 by Dongjoon Hyun · 3 days ago
  3. a8c7c3b [SPARK-55195] Add `MacOS` integration test with Apache Spark `4.2.0-preview1` by Dongjoon Hyun · 10 days ago
  4. 108ec0a [SPARK-55101] Add `Swift 6.3` build test CI by Dongjoon Hyun · 2 weeks ago
  5. 1e31947 [SPARK-55066] Use `Spark 3.5.8` for Spark 3 integration tests by Dongjoon Hyun · 3 weeks ago

Apache Spark Connect Client for Swift

GitHub Actions Build Swift Version Compatibility Platform Compatibility

This is an experimental Swift library to show how to connect to a remote Apache Spark Connect Server and run SQL statements to manipulate remote data.

So far, this library project is tracking the upstream changes of Apache Arrow project's Swift-support.

Resources

Requirement

How to use in your apps

Create a Swift project.

mkdir SparkConnectSwiftApp
cd SparkConnectSwiftApp
swift package init --name SparkConnectSwiftApp --type executable

Add SparkConnect package to the dependency like the following

$ cat Package.swift
import PackageDescription

let package = Package(
  name: "SparkConnectSwiftApp",
  platforms: [
    .macOS(.v15)
  ],
  dependencies: [
    .package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main")
  ],
  targets: [
    .executableTarget(
      name: "SparkConnectSwiftApp",
      dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")]
    )
  ]
)

Use SparkSession of SparkConnect module in Swift.

$ cat Sources/main.swift

import SparkConnect

let spark = try await SparkSession.builder.getOrCreate()
print("Connected to Apache Spark \(await spark.version) Server")

let statements = [
  "DROP TABLE IF EXISTS t",
  "CREATE TABLE IF NOT EXISTS t(a INT) USING ORC",
  "INSERT INTO t VALUES (1), (2), (3)",
]

for s in statements {
  print("EXECUTE: \(s)")
  _ = try await spark.sql(s).count()
}
print("SELECT * FROM t")
try await spark.sql("SELECT * FROM t").cache().show()

try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc")
try await spark.read.orc("/tmp/orc").show()

await spark.stop()

Run your Swift application.

$ swift run
...
Connected to Apache Spark 4.1.1 Server
EXECUTE: DROP TABLE IF EXISTS t
EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT) USING ORC
EXECUTE: INSERT INTO t VALUES (1), (2), (3)
SELECT * FROM t
+---+
|  a|
+---+
|  1|
|  3|
|  2|
+---+

+---+
| id|
+---+
|  6|
|  8|
|  4|
|  2|
|  0|
+---+

You can find more complete examples including Spark SQL REPL, Web Server and Streaming applications in the Examples directory.

This library also supports SPARK_REMOTE environment variable to specify the Spark Connect connection string in order to provide more options.