blob: 717340d7790574772fddfdfdf20954d54c82097b [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Comet Kubernetes Support
## Comet Docker Images
Run the following command from the root of this repository to build the Comet Docker image, or use a [published
Docker image](https://hub.docker.com/r/apache/datafusion-comet).
```shell
docker build -t apache/datafusion-comet -f kube/Dockerfile .
```
## Example Spark Submit
The exact syntax will vary depending on the Kubernetes distribution, but an example `spark-submit` command can be
found [here](https://github.com/apache/datafusion-comet/tree/main/benchmarks).
## Helm chart
Install helm Spark operator for Kubernetes
```bash
# Add the Helm repository
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace --wait
```
Check the operator is deployed
```bash
helm status --namespace spark-operator spark-operator
NAME: my-release
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
```
Create example Spark application file `spark-pi.yaml`
```bash
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: apache/datafusion-comet:0.9.1-spark3.5.5-scala2.12-java11
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar
sparkConf:
"spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.9.1.jar"
"spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.9.1.jar"
"spark.plugins": "org.apache.spark.CometPlugin"
"spark.comet.enabled": "true"
"spark.comet.exec.enabled": "true"
"spark.comet.cast.allowIncompatible": "true"
"spark.comet.exec.shuffle.enabled": "true"
"spark.comet.exec.shuffle.mode": "auto"
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
sparkVersion: 3.5.6
driver:
labels:
version: 3.5.6
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.6
instances: 1
cores: 1
coreLimit: 1200m
memory: 512m
```
Refer to [Comet builds](#comet-docker-images)
Run Apache Spark application with Comet enabled
```bash
kubectl apply -f spark-pi.yaml
sparkapplication.sparkoperator.k8s.io/spark-pi created
```
Check application status
```bash
kubectl get sparkapp spark-pi
NAME STATUS ATTEMPTS START FINISH AGE
spark-pi RUNNING 1 2025-03-18T21:19:48Z <no value> 65s
```
To check more runtime details
```bash
kubectl describe sparkapplication spark-pi
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SparkApplicationSubmitted 8m15s spark-application-controller SparkApplication spark-pi was submitted successfully
Normal SparkDriverRunning 7m18s spark-application-controller Driver spark-pi-driver is running
Normal SparkExecutorPending 7m11s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] is pending
Normal SparkExecutorRunning 7m10s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] is running
Normal SparkExecutorCompleted 7m5s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] completed
Normal SparkDriverCompleted 7m4s spark-application-controller Driver spark-pi-driver completed
```
Get Driver Logs
```bash
kubectl logs spark-pi-driver
```
More info on [Kube Spark operator](https://www.kubeflow.org/docs/components/spark-operator/getting-started/)