Connecting to Spark Connect using Clients

From the client perspective, Spark Connect mostly behaves as any other GRPC client and can be configured as such. However, to make it easy to use from different programming languages and to have a homogenous connection surface this document proposes what the user surface is for connecting to a Spark Connect endpoint.

Background

Similar to JDBC or other database connections, Spark Connect leverages a connection string that contains the relevant parameters that are interpreted to connect to the Spark Connect endpoint

Connection String

Generally, the connection string follows the standard URI definitions. The URI scheme is fixed and set to sc://. The full URI has to be a valid URI and must be parsed properly by most systems. For example, hostnames have to be valid and cannot contain arbitrary characters. Configuration parameter are passed in the style of the HTTP URL Path Parameter Syntax. This is similar to the JDBC connection strings. The path component must be empty. All parameters are interpreted case sensitive.

sc://hostname:port/;param1=value;param2=value

Examples

Valid Examples

Below we capture valid configuration examples, explaining how the connection string will be used when configuring the Spark Connect client.

The below example connects to port 15002 on myhost.com.

server_url = "sc://myhost.com/"

The next example configures the connection to use a different port with SSL.

server_url = "sc://myhost.com:443/;use_ssl=true"
server_url = "sc://myhost.com:443/;use_ssl=true;token=ABCDEFG"

Invalid Examples

As mentioned above, Spark Connect uses a regular GRPC client and the server path cannot be configured to remain compatible with the GRPC standard and HTTP. For example the following examles are invalid.

server_url = "sc://myhost.com:443/mypathprefix/;token=AAAAAAA"