[SPARK-48777][BUILD] Migrate build system to Bazel

### What changes were proposed in this pull request?
Until now the repository was using a hand-written Makefile to compile and build the Spark Connect go client. This comes with some downsides of how to generate for example the protobuf and grpc dependencies and required us to commit the generated code to the repository.

This patch fixes this by migrating the build to Bazel and leveraging the built-in protobuf generation.

### Why are the changes needed?
Build stability

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
Existing Tests

Closes #23 from grundprinzip/SPARK-48777.

Authored-by: Martin Grund <martin.grund@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
34 files changed
tree: f51042ec1917465b10aecca4633ff382160510e5
  1. .github/
  2. client/
  3. cmd/
  4. .asf.yaml
  5. .gitignore
  6. .gitmodules
  7. BUILD.bazel
  8. CONTRIBUTING.md
  9. go.mod
  10. go.sum
  11. LICENSE
  12. merge_connect_go_pr.py
  13. MODULE.bazel
  14. MODULE.bazel.lock
  15. quick-start.md
  16. README.md
  17. WORKSPACE
README.md

Apache Spark Connect Client for Golang

This project houses the experimental client for Spark Connect for Apache Spark written in Golang.

Current State of the Project

Currently, the Spark Connect client for Golang is highly experimental and should not be used in any production setting. In addition, the PMC of the Apache Spark project reserves the right to withdraw and abandon the development of this project if it is not sustainable.

Getting started

This section explains how to run Spark Connect Go locally.

Step 1: Install Golang: https://go.dev/doc/install.

Step 2: Install bazel to build the code.

Step 3: Run the following commands to setup the Spark Connect client.

git clone https://github.com/apache/spark-connect-go.git
git submodule update --init --recursive

bazel test //...

Step 4: Setup the Spark Driver on localhost.

  1. Download Spark distribution (3.5.1), unzip the package.

  2. Start the Spark Connect server with the following command (make sure to use a package version that matches your Spark distribution):

sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.1

Step 5: Run the example Go application.

go run cmd/spark-connect-example-spark-session/main.go

How to write Spark Connect Go Application in your own project

See Quick Start Guide

High Level Design

Following diagram shows main code in current prototype:

    +-------------------+
    |                   |
    |   dataFrameImpl   |
    |                   |
    +-------------------+
              |
              |
              +
    +-------------------+
    |                   |
    | sparkSessionImpl  |
    |                   |
    +-------------------+
              |
              |
              +
+---------------------------+               +----------------+
|                           |               |                |
| SparkConnectServiceClient |--------------+|  Spark Driver  |
|                           |               |                |
+---------------------------+               +----------------+

SparkConnectServiceClient is GRPC client which talks to Spark Driver. sparkSessionImpl generates dataFrameImpl instances. dataFrameImpl uses the GRPC client in sparkSessionImpl to communicate with Spark Driver.

We will mimic the logic in Spark Connect Scala implementation, and adopt Go common practices, e.g. returning error object for error handling.

Contributing

Please review the Contribution to Spark guide for information on how to get started contributing to the project.