blob: cc5ebcd9131d8820b24a00357ed7daeb0b4fafc7 [file] [log] [blame] [view]
# Spark Connect
This module contains the implementation of Spark Connect which is a logical plan
facade for the implementation in Spark. Spark Connect is directly integrated into the build
of Spark.
The documentation linked here is specifically for developers of Spark Connect and not
directly intended to be end-user documentation.
## Development Topics
### Guidelines for new clients
When contributing a new client please be aware that we strive to have a common
user experience across all languages. Please follow the below guidelines:
* [Connection string configuration](docs/client-connection-string.md)
* [Adding new messages](docs/adding-proto-messages.md) in the Spark Connect protocol.
### Python client development
Python-specific development guidelines are located in [python/docs/source/development/testing.rst](https://github.com/apache/spark/blob/master/python/docs/source/development/testing.rst) that is published at [Development tab](https://spark.apache.org/docs/latest/api/python/development/index.html) in PySpark documentation.
To generate the Python client code from the proto files:
First, make sure to have a Python environment with the installed dependencies.
Specifically, install `black` and dependencies from the "Spark Connect python proto generation plugin (optional)" section.
```
pip install -r dev/requirements.txt
```
Install [buf](https://github.com/bufbuild/buf)
```
brew install bufbuild/buf/buf
```
Generate the Python files by running:
```
dev/connect-gen-protos.sh
```
### Build with user-defined `protoc` and `protoc-gen-grpc-java`
When the user cannot use the official `protoc` and `protoc-gen-grpc-java` binary files to build the `connect` module in the compilation environment,
for example, compiling `connect` module on CentOS 6 or CentOS 7 which the default `glibc` version is less than 2.14, we can try to compile and test by
specifying the user-defined `protoc` and `protoc-gen-grpc-java` binary files as follows:
```bash
export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
export CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
./build/mvn -Phive -Puser-defined-protoc clean package
```
or
```bash
export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
export CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
./build/sbt -Puser-defined-protoc clean package
```
The user-defined `protoc` and `protoc-gen-grpc-java` binary files can be produced in the user's compilation environment by source code compilation,
for compilation steps, please refer to [protobuf](https://github.com/protocolbuffers/protobuf) and [grpc-java](https://github.com/grpc/grpc-java).