| commit | a87fc42fcc30262d7bd34f52ed423b26e23cceaf | [log] [tgz] |
|---|---|---|
| author | Dongjoon Hyun <dongjoon@apache.org> | Fri May 16 17:38:24 2025 -0700 |
| committer | Dongjoon Hyun <dongjoon@apache.org> | Fri May 16 17:38:24 2025 -0700 |
| tree | bfea8b0306a21379e67bb2d8defc7b3eaf6c74b7 | |
| parent | fea29f8de7f672f873553edcbd14d4fbb0c5186b [diff] |
[SPARK-52197] Upgrade `gRPC Swift` to 2.2.1 ### What changes were proposed in this pull request? This PR aims to upgrade `gRPC Swift` to 2.2.1. ### Why are the changes needed? To bring the latest bug fixes. - https://github.com/grpc/grpc-swift/releases/tag/2.2.1 - https://github.com/grpc/grpc-swift/pull/2234 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #161 from dongjoon-hyun/SPARK-52197. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This is an experimental Swift library to show how to connect to a remote Apache Spark Connect Server and run SQL statements to manipulate remote data.
So far, this library project is tracking the upstream changes like the Apache Spark 4.0.0 RC6 release and Apache Arrow project's Swift-support.
Create a Swift project.
mkdir SparkConnectSwiftApp cd SparkConnectSwiftApp swift package init --name SparkConnectSwiftApp --type executable
Add SparkConnect package to the dependency like the following
$ cat Package.swift import PackageDescription let package = Package( name: "SparkConnectSwiftApp", platforms: [ .macOS(.v15) ], dependencies: [ .package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main") ], targets: [ .executableTarget( name: "SparkConnectSwiftApp", dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")] ) ] )
Use SparkSession of SparkConnect module in Swift.
$ cat Sources/main.swift import SparkConnect let spark = try await SparkSession.builder.getOrCreate() print("Connected to Apache Spark \(await spark.version) Server") let statements = [ "DROP TABLE IF EXISTS t", "CREATE TABLE IF NOT EXISTS t(a INT) USING ORC", "INSERT INTO t VALUES (1), (2), (3)", ] for s in statements { print("EXECUTE: \(s)") _ = try await spark.sql(s).count() } print("SELECT * FROM t") try await spark.sql("SELECT * FROM t").cache().show() try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc") try await spark.read.orc("/tmp/orc").show() await spark.stop()
Run your Swift application.
$ swift run ... Connected to Apache Spark 4.0.0 Server EXECUTE: DROP TABLE IF EXISTS t EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT) EXECUTE: INSERT INTO t VALUES (1), (2), (3) SELECT * FROM t +---+ | a | +---+ | 2 | | 1 | | 3 | +---+ +----+ | id | +----+ | 2 | | 6 | | 0 | | 8 | | 4 | +----+
You can find more complete examples including Spark SQL REPL, Web Server and Streaming applications in the Examples directory.
This library also supports SPARK_REMOTE environment variable to specify the Spark Connect connection string in order to provide more options.