blob: 96776490125d0617882d8931bb594d3d42a7e22c [file] [log] [blame] [view]
---
layout: page
title: Velox GPU
nav_order: 9
parent: Getting-Started
---
# GPU Acceleration in Velox/Gluten
*Unified execution engine leveraging CUDF for hardware-accelerated Spark SQL queries*
---
## **1. Overview**
- **Purpose**: Accelerate Velox operators via CUDF APIs, replacing CPU execution when enabled.
- **Status**: Experimental (TPC-H SF1 validated). Integrates RAPIDS ecosystem with Apache Spark via Gluten .
- **Key Benefit**: Some queries achieved up to **8.1x speedup** on x86 vs. Spark Java engine .
---
## **2. Prerequisites**
- **CUDA Toolkit**: 12.8.0 ([download](https://developer.nvidia.com/cuda-downloads?target_os=Linux)).
- **NVIDIA Drivers**: Compatible with CUDA 12.8.
- **Container Toolkit**: Install `nvidia-container-toolkit` ([guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)).
- **System Reboot**: Required after driver installation.
- **Environment Setup**: Use [`start_cudf.sh`](https://github.com/apache/incubator-gluten/tree/main/dev/start_cudf.sh) for host configuration .
---
## **3. Implementation Mechanics**
- **Operator Conversion**:
- Velox PlanNodes **GPU operators** when `spark.gluten.sql.columnar.cudf=true`.
- Falls back to CPU operators if GPU unsupported (triggers row/columnar data conversion) .
- **Debugging**: Enable `spark.gluten.debug.enabled.cudf=true` for operator replacement logs.
- **Memory**: Global [RMM](https://docs.rapids.ai/api/librmm/stable/) memory manager, cannot align with Spark memory system.
---
## **4. Docker Deployment**
```bash
docker pull apache/gluten:centos-9-jdk8-cudf # Pre-built GPU image
docker run --name gpu_gluten_container --gpus all -it apache/gluten:centos-9-jdk8-cudf
```
- **Image Includes**: Native build cache, Gluten dependencies, Spark 3.4 environment.
---
## **5. Build & Deployment**
#### **Dependencies**
The OS, Spark version, Java version aligns with Gluten CPU.
### **Compilation Commands**
If building in the docker image, no need to set up script and build arrow.
```bash
./dev/buildbundle-veloxbe.sh --run_setup_script=OFF --build_arrow=OFF --enable_cudf=ON
```
---
## **6. GPU Operator Support Status**
| **Operator** | **Status** | **Notes** |
|-----------------|-----------------|--------------------------|
| **Scan** | Not supported| In Development |
| **Project** | ⚠️ Partial | Function TPCH-compatible |
| **Filter** | Implemented | Core operator |
| **OrderBy** | Implemented | Merged in Velox #12735 |
| **Aggregation** | ⚠️ Partial | TPCH-compatible |
| **Join** | ⚠️ Partial | TPCH-compatible |
| **Spill** | Not supported | In Planning |
---
## **7. Performance Validation**
GPU performs better on operator HashJoin and HashAggregation.
Single Operator like Hash Agg shows 5x speedup.
---
## **8. Relevant Resources**
1. [CUDF Docs](https://docs.rapids.ai/api/cudf/stable/libcudf_docs/) - GPU operator APIs.
2. [Gluten GPU Issue #9098](https://github.com/apache/incubator-gluten/issues/8851) - Development tracker.