commit | bb281260f059eab2a87dded24e1ebfec015d188e | [log] [tgz] |
---|---|---|
author | hiboyang <14280154+hiboyang@users.noreply.github.com> | Mon Sep 11 09:31:42 2023 +0900 |
committer | Hyukjin Kwon <gurwls223@apache.org> | Mon Sep 11 09:31:42 2023 +0900 |
tree | 1e9b855191e6b9fec0f096238a16537187182864 | |
parent | f2c9478d33e92fbc2d376367709b573a2412514a [diff] |
[DOC] Add Quick Start Guide for user to use this repo as a library ### What changes were proposed in this pull request? Add Quick Start Guide for user to use this repo as a library ### Why are the changes needed? Document to help user to write Spark Connect Go application in their own project ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Document change review Closes #15 from hiboyang/bo-dev-06. Authored-by: hiboyang <14280154+hiboyang@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This project houses the experimental client for Spark Connect for Apache Spark written in Golang.
Currently, the Spark Connect client for Golang is highly experimental and should not be used in any production setting. In addition, the PMC of the Apache Spark project reserves the right to withdraw and abandon the development of this project if it is not sustainable.
git clone https://github.com/apache/spark-connect-go.git git submodule update --init --recursive make gen && make test
Ensure you have installed
buf CLI
; more info
A very simple example in Go looks like following:
func main() { remote := "localhost:15002" spark, _ := sql.SparkSession.Builder.Remote(remote).Build() defer spark.Stop() df, _ := spark.Sql("select 'apple' as word, 123 as count union all select 'orange' as word, 456 as count") df.Show(100, false) }
Following diagram shows main code in current prototype:
+-------------------+ | | | dataFrameImpl | | | +-------------------+ | | + +-------------------+ | | | sparkSessionImpl | | | +-------------------+ | | + +---------------------------+ +----------------+ | | | | | SparkConnectServiceClient |--------------+| Spark Driver | | | | | +---------------------------+ +----------------+
SparkConnectServiceClient
is GRPC client which talks to Spark Driver. sparkSessionImpl
generates dataFrameImpl
instances. dataFrameImpl
uses the GRPC client in sparkSessionImpl
to communicate with Spark Driver.
We will mimic the logic in Spark Connect Scala implementation, and adopt Go common practices, e.g. returning error
object for error handling.
Install Golang: https://go.dev/doc/install.
Download Spark distribution (3.4.0+), unzip the folder.
Start Spark Connect server by running command:
sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
go run cmd/spark-connect-example-spark-session/main.go
Please review the Contribution to Spark guide for information on how to get started contributing to the project.