| commit | 07a276e0b77135894509d5637d8e5447345afe26 | [log] [tgz] |
|---|---|---|
| author | Dongjoon Hyun <dongjoon@apache.org> | Thu Sep 25 18:44:05 2025 -0700 |
| committer | Dongjoon Hyun <dongjoon@apache.org> | Thu Sep 25 18:44:05 2025 -0700 |
| tree | 6d869800052f641f379dafe7f490eeb2bbe3c425 | |
| parent | e02cfb8f7d332c0d9713d3d90de2094515d5ac70 [diff] |
[SPARK-53724] Update `Examples` and documentations to use `4.0.1` ### What changes were proposed in this pull request? This PR aims to update `Examples` and documentations to use `4.0.1`. ### Why are the changes needed? Apache Spark community highly recommends to use `4.0.1` for all Spark 4.0 users. ### Does this PR introduce _any_ user-facing change? No behavior change. ### How was this patch tested? Pass the CIs and manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #245 from dongjoon-hyun/SPARK-53724. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This is an experimental Swift library to show how to connect to a remote Apache Spark Connect Server and run SQL statements to manipulate remote data.
So far, this library project is tracking the upstream changes of Apache Arrow project's Swift-support.
Create a Swift project.
mkdir SparkConnectSwiftApp cd SparkConnectSwiftApp swift package init --name SparkConnectSwiftApp --type executable
Add SparkConnect package to the dependency like the following
$ cat Package.swift import PackageDescription let package = Package( name: "SparkConnectSwiftApp", platforms: [ .macOS(.v15) ], dependencies: [ .package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main") ], targets: [ .executableTarget( name: "SparkConnectSwiftApp", dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")] ) ] )
Use SparkSession of SparkConnect module in Swift.
$ cat Sources/main.swift import SparkConnect let spark = try await SparkSession.builder.getOrCreate() print("Connected to Apache Spark \(await spark.version) Server") let statements = [ "DROP TABLE IF EXISTS t", "CREATE TABLE IF NOT EXISTS t(a INT) USING ORC", "INSERT INTO t VALUES (1), (2), (3)", ] for s in statements { print("EXECUTE: \(s)") _ = try await spark.sql(s).count() } print("SELECT * FROM t") try await spark.sql("SELECT * FROM t").cache().show() try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc") try await spark.read.orc("/tmp/orc").show() await spark.stop()
Run your Swift application.
$ swift run ... Connected to Apache Spark 4.0.1 Server EXECUTE: DROP TABLE IF EXISTS t EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT) EXECUTE: INSERT INTO t VALUES (1), (2), (3) SELECT * FROM t +---+ | a | +---+ | 2 | | 1 | | 3 | +---+ +----+ | id | +----+ | 2 | | 6 | | 0 | | 8 | | 4 | +----+
You can find more complete examples including Spark SQL REPL, Web Server and Streaming applications in the Examples directory.
This library also supports SPARK_REMOTE environment variable to specify the Spark Connect connection string in order to provide more options.