Spark SQL REPL Application with Apache Spark Connect Swift ClientThis is an example Swift application to show how to develop a Spark SQL REPL(Read-eval-print Loop) with Apache Spark Connect Swift Client library.
Prepare Spark Connect Server via running Docker image.
docker run -it --rm -p 15002:15002 apache/spark:4.0.0-preview2 bash -c "/opt/spark/sbin/start-connect-server.sh --wait"
Build an application Docker image.
$ docker build -t apache/spark-connect-swift:spark-sql . $ docker images apache/spark-connect-swift:spark-sql REPOSITORY TAG IMAGE ID CREATED SIZE apache/spark-connect-swift spark-sql 265ddfec650d 7 seconds ago 390MB
Run spark-sql docker image.
$ docker run -it --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark-connect-swift:spark-sql Connected to Apache Spark 4.0.0-preview2 Server spark-sql (default)> SHOW DATABASES; +---------+ |namespace| +---------+ |default | +---------+ Time taken: 30 ms spark-sql (default)> CREATE DATABASE db1; ++ || ++ ++ Time taken: 31 ms spark-sql (default)> USE db1; ++ || ++ ++ Time taken: 27 ms spark-sql (db1)> CREATE TABLE t1 AS SELECT * FROM RANGE(10); ++ || ++ ++ Time taken: 99 ms spark-sql (db1)> SELECT * FROM t1; +---+ | id| +---+ | 1| | 5| | 3| | 0| | 6| | 9| | 4| | 8| | 7| | 2| +---+ Time taken: 80 ms spark-sql (db1)> USE default; ++ || ++ ++ Time taken: 26 ms spark-sql (default)> DROP DATABASE db1 CASCADE; ++ || ++ ++ spark-sql (default)> exit;
Apache Spark 4 supports SQL Pipe Syntax.
$ swift run ... Build of product 'SparkSQLRepl' complete! (2.33s) Connected to Apache Spark 4.0.0 Server spark-sql (default)> FROM ORC.`/opt/spark/examples/src/main/resources/users.orc` |> AGGREGATE COUNT(*) cnt GROUP BY name |> ORDER BY cnt DESC, name ASC ; +------+---+ | name|cnt| +------+---+ |Alyssa| 1| | Ben| 1| +------+---+ Time taken: 159 ms
Run from source code.
$ swift run ... Connected to Apache Spark 4.0.0.9-apple-SNAPSHOT Server spark-sql (default)>