commit | 884ae1cddeccdea8297c8df35b7354b03469bb31 | [log] [tgz] |
---|---|---|
author | Bo (AIML) Yang <bo_yang6@apple.com> | Fri Apr 26 09:21:12 2024 +0900 |
committer | Hyukjin Kwon <gurwls223@apache.org> | Fri Apr 26 09:21:12 2024 +0900 |
tree | 71b11b0f10178cf3a33798fc71b05d09dc38504d | |
parent | f7ad5188552c4f0c78c2dc1ad6f24c1977583d5c [diff] |
[SPARK-47842][CONNECT] Use v1 as spark connect go library starting version ### What changes were proposed in this pull request? Use v1 for this Spark Connect Go Client library, instead of previously using v34. Also regenerate protobuf files based on Spark latest 3.5 release (3.5.1). ### Why are the changes needed? There is a recent discussion in Spark community for Spark Operator version naming convention. People like to use version independent of Spark versions. That applies to Spark Connect Go Client as well. Thus change the version here to start from v1. ### Does this PR introduce _any_ user-facing change? Yes, Spark Connect Go Client users will use v1 instead of use v34. ### How was this patch tested? Run unit test. Closes #19 from hiboyang/bo-dev-07. Authored-by: Bo (AIML) Yang <bo_yang6@apple.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This project houses the experimental client for Spark Connect for Apache Spark written in Golang.
Currently, the Spark Connect client for Golang is highly experimental and should not be used in any production setting. In addition, the PMC of the Apache Spark project reserves the right to withdraw and abandon the development of this project if it is not sustainable.
This section explains how to run Spark Connect Go locally.
Step 1: Install Golang: https://go.dev/doc/install.
Step 2: Ensure you have installed buf CLI
installed, more info here
Step 3: Run the following commands to setup the Spark Connect client.
git clone https://github.com/apache/spark-connect-go.git git submodule update --init --recursive make gen && make test
Step 4: Setup the Spark Driver on localhost.
Download Spark distribution (3.4.0+), unzip the package.
Start the Spark Connect server with the following command (make sure to use a package version that matches your Spark distribution):
sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
Step 5: Run the example Go application.
go run cmd/spark-connect-example-spark-session/main.go
Following diagram shows main code in current prototype:
+-------------------+ | | | dataFrameImpl | | | +-------------------+ | | + +-------------------+ | | | sparkSessionImpl | | | +-------------------+ | | + +---------------------------+ +----------------+ | | | | | SparkConnectServiceClient |--------------+| Spark Driver | | | | | +---------------------------+ +----------------+
SparkConnectServiceClient
is GRPC client which talks to Spark Driver. sparkSessionImpl
generates dataFrameImpl
instances. dataFrameImpl
uses the GRPC client in sparkSessionImpl
to communicate with Spark Driver.
We will mimic the logic in Spark Connect Scala implementation, and adopt Go common practices, e.g. returning error
object for error handling.
Please review the Contribution to Spark guide for information on how to get started contributing to the project.