docs/get-started/VeloxGPU.md - incubator-gluten - Git at Google

 ---
 layout: page
 title: Velox GPU
 nav_order: 9
 parent: Getting-Started
 ---


 # GPU Acceleration in Velox/Gluten
 *Unified execution engine leveraging CUDF for hardware-accelerated Spark SQL queries*

 ---

 ## **1. Overview**
 - **Purpose**: Accelerate Velox operators via CUDF APIs, replacing CPU execution when enabled.
 - **Status**: Experimental (TPC-H SF1 validated). Integrates RAPIDS ecosystem with Apache Spark via Gluten .
 - **Key Benefit**: Some queries achieved up to **8.1x speedup** on x86 vs. Spark Java engine .

 ---

 ## **2. Prerequisites**
 - **CUDA Toolkit**: 12.8.0 ([download](https://developer.nvidia.com/cuda-downloads?target_os=Linux)).
 - **NVIDIA Drivers**: Compatible with CUDA 12.8.
 - **Container Toolkit**: Install `nvidia-container-toolkit` ([guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)).
 - **System Reboot**: Required after driver installation.
 - **Environment Setup**: Use [`start_cudf.sh`](https://github.com/apache/incubator-gluten/tree/main/dev/start_cudf.sh) for host configuration .

 ---

 ## **3. Implementation Mechanics**
 - **Operator Conversion**:
     - Velox PlanNodes → **GPU operators** when `spark.gluten.sql.columnar.cudf=true`.
     - Falls back to CPU operators if GPU unsupported (triggers row/columnar data conversion) .
 - **Debugging**: Enable `spark.gluten.debug.enabled.cudf=true` for operator replacement logs.
 - **Memory**: Global [RMM](https://docs.rapids.ai/api/librmm/stable/) memory manager, cannot align with Spark memory system.

 ---

 ## **4. Docker Deployment**
 ```bash
 docker pull apache/gluten:centos-9-jdk8-cudf  # Pre-built GPU image
 docker run --name gpu_gluten_container --gpus all -it apache/gluten:centos-9-jdk8-cudf
 ```
 - **Image Includes**: Native build cache, Gluten dependencies, Spark 3.4 environment.

 ---

 ## **5. Build & Deployment**
 #### **Dependencies**
 The OS, Spark version, Java version aligns with Gluten CPU.

 ### **Compilation Commands**
 If building in the docker image, no need to set up script and build arrow.
 ```bash
 ./dev/buildbundle-veloxbe.sh --run_setup_script=OFF --build_arrow=OFF --enable_cudf=ON
 ```

 ---

 ## **6. GPU Operator Support Status**
 | **Operator**    | **Status**      | **Notes**                |
 |-----------------|-----------------|--------------------------|
 | **Scan**        |  ❌ Not supported| In Development           |
 | **Project**     | ⚠️ Partial      | Function TPCH-compatible |
 | **Filter**      | ✅ Implemented   | Core operator            |
 | **OrderBy**     | ✅ Implemented   | Merged in Velox #12735   |
 | **Aggregation** | ⚠️ Partial      | TPCH-compatible          |
 | **Join**        | ⚠️ Partial      | TPCH-compatible          |
 | **Spill**       | ❌ Not supported | In Planning              |

 ---

 ## **7. Performance Validation**

 GPU performs better on operator HashJoin and HashAggregation.
 Single Operator like Hash Agg shows 5x speedup.

 ---

 ## **8. Relevant Resources**
 1. [CUDF Docs](https://docs.rapids.ai/api/cudf/stable/libcudf_docs/) - GPU operator APIs.
 2. [Gluten GPU Issue #9098](https://github.com/apache/incubator-gluten/issues/8851) - Development tracker.
	---
	layout: page
	title: Velox GPU
	nav_order: 9
	parent: Getting-Started
	---


	# GPU Acceleration in Velox/Gluten
	Unified execution engine leveraging CUDF for hardware-accelerated Spark SQL queries

	---

	## 1. Overview
	- Purpose: Accelerate Velox operators via CUDF APIs, replacing CPU execution when enabled.
	- Status: Experimental (TPC-H SF1 validated). Integrates RAPIDS ecosystem with Apache Spark via Gluten .
	- Key Benefit: Some queries achieved up to 8.1x speedup on x86 vs. Spark Java engine .

	---

	## 2. Prerequisites
	- CUDA Toolkit: 12.8.0 ([download](https://developer.nvidia.com/cuda-downloads?target_os=Linux)).
	- NVIDIA Drivers: Compatible with CUDA 12.8.
	- Container Toolkit: Install `nvidia-container-toolkit` ([guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)).
	- System Reboot: Required after driver installation.
	- Environment Setup: Use [`start_cudf.sh`](https://github.com/apache/incubator-gluten/tree/main/dev/start_cudf.sh) for host configuration .

	---

	## 3. Implementation Mechanics
	- Operator Conversion:
	- Velox PlanNodes → GPU operators when `spark.gluten.sql.columnar.cudf=true`.
	- Falls back to CPU operators if GPU unsupported (triggers row/columnar data conversion) .
	- Debugging: Enable `spark.gluten.debug.enabled.cudf=true` for operator replacement logs.
	- Memory: Global [RMM](https://docs.rapids.ai/api/librmm/stable/) memory manager, cannot align with Spark memory system.

	---

	## 4. Docker Deployment
	```bash
	docker pull apache/gluten:centos-9-jdk8-cudf # Pre-built GPU image
	docker run --name gpu_gluten_container --gpus all -it apache/gluten:centos-9-jdk8-cudf
	```
	- Image Includes: Native build cache, Gluten dependencies, Spark 3.4 environment.

	---

	## 5. Build & Deployment
	#### Dependencies
	The OS, Spark version, Java version aligns with Gluten CPU.

	### Compilation Commands
	If building in the docker image, no need to set up script and build arrow.
	```bash
	./dev/buildbundle-veloxbe.sh --run_setup_script=OFF --build_arrow=OFF --enable_cudf=ON
	```

	---

	## 6. GPU Operator Support Status
	\| Operator \| Status \| Notes \|
	\|-----------------\|-----------------\|--------------------------\|
	\| Scan \| ❌ Not supported\| In Development \|
	\| Project \| ⚠️ Partial \| Function TPCH-compatible \|
	\| Filter \| ✅ Implemented \| Core operator \|
	\| OrderBy \| ✅ Implemented \| Merged in Velox #12735 \|
	\| Aggregation \| ⚠️ Partial \| TPCH-compatible \|
	\| Join \| ⚠️ Partial \| TPCH-compatible \|
	\| Spill \| ❌ Not supported \| In Planning \|

	---

	## 7. Performance Validation

	GPU performs better on operator HashJoin and HashAggregation.
	Single Operator like Hash Agg shows 5x speedup.

	---

	## 8. Relevant Resources
	1. [CUDF Docs](https://docs.rapids.ai/api/cudf/stable/libcudf_docs/) - GPU operator APIs.
	2. [Gluten GPU Issue #9098](https://github.com/apache/incubator-gluten/issues/8851) - Development tracker.