Unified execution engine leveraging CUDF for hardware-accelerated Spark SQL queries
nvidia-container-toolkit (guide).start-cudf.sh for host configuration .spark.gluten.sql.columnar.cudf=true.spark.gluten.debug.enabled.cudf=true for operator replacement logs.docker pull apache/gluten:centos-9-jdk8-cudf # Pre-built GPU image docker run --name gpu_gluten_container --gpus all -it apache/gluten:centos-9-jdk8-cudf
The OS, Spark version, Java version aligns with Gluten CPU.
If building in the docker image, no need to set up script and build arrow.
./dev/buildbundle-veloxbe.sh --run_setup_script=OFF --build_arrow=OFF --enable_gpu=ON
| Operator | Status | Notes |
|---|---|---|
| Scan | ❌ Not supported | In Development |
| Project | ⚠️ Partial | Function TPCH-compatible |
| Filter | ✅ Implemented | Core operator |
| OrderBy | ✅ Implemented | |
| Aggregation | ⚠️ Partial | TPCH-compatible |
| Join | ⚠️ Partial | TPCH-compatible |
| Spill | ❌ Not supported | In Planning |
The first stage contains TableScan operator which is IO bound stage, schedule to CPU node. The second stage that contains join which is computation intensive, schedule to GPU node.
GPU performs better on operator HashJoin and HashAggregation. Single Operator like Hash Agg shows 5x speedup.