layout: page title: Velox GPU nav_order: 9 parent: Getting-Started
GPU Acceleration in Velox/Gluten
Unified execution engine leveraging CUDF for hardware-accelerated Spark SQL queries
1. Overview
- Purpose: Accelerate Velox operators via CUDF APIs, replacing CPU execution when enabled.
- Status: Experimental (TPC-H SF1 validated). Integrates RAPIDS ecosystem with Apache Spark via Gluten .
- Key Benefit: Some queries achieved up to 8.1x speedup on x86 vs. Spark Java engine .
2. Prerequisites
- CUDA Toolkit: 12.8.0 (download).
- NVIDIA Drivers: Compatible with CUDA 12.8.
- Container Toolkit: Install
nvidia-container-toolkit
(guide). - System Reboot: Required after driver installation.
- Environment Setup: Use
start_cudf.sh
for host configuration .
3. Implementation Mechanics
- Operator Conversion:
- Velox PlanNodes → GPU operators when
spark.gluten.sql.columnar.cudf=true
. - Falls back to CPU operators if GPU unsupported (triggers row/columnar data conversion) .
- Debugging: Enable
spark.gluten.debug.enabled.cudf=true
for operator replacement logs. - Memory: Global RMM memory manager, cannot align with Spark memory system.
4. Docker Deployment
docker pull apache/gluten:centos-9-jdk8-cudf # Pre-built GPU image
docker run --name gpu_gluten_container --gpus all -it apache/gluten:centos-9-jdk8-cudf
- Image Includes: Native build cache, Gluten dependencies, Spark 3.4 environment.
5. Build & Deployment
Dependencies
The OS, Spark version, Java version aligns with Gluten CPU.
Compilation Commands
If building in the docker image, no need to set up script and build arrow.
./dev/buildbundle-veloxbe.sh --run_setup_script=OFF --build_arrow=OFF --enable_cudf=ON
6. GPU Operator Support Status
Operator | Status | Notes |
---|
Scan | ❌ Not supported | In Development |
Project | ⚠️ Partial | Function TPCH-compatible |
Filter | ✅ Implemented | Core operator |
OrderBy | ✅ Implemented | Merged in Velox #12735 |
Aggregation | ⚠️ Partial | TPCH-compatible |
Join | ⚠️ Partial | TPCH-compatible |
Spill | ❌ Not supported | In Planning |
7. Performance Validation
GPU performs better on operator HashJoin and HashAggregation. Single Operator like Hash Agg shows 5x speedup.
8. Relevant Resources
- CUDF Docs - GPU operator APIs.
- Gluten GPU Issue #9098 - Development tracker.