Apache Gluten is a plugin that accelerates Apache Spark SQL by offloading execution to native engines. It requires no changes to your existing Spark SQL queries or DataFrame API code and only needs configuration changes.
Gluten supports two native backends:
| Backend | Description | Guide |
|---|---|---|
| Velox | Meta's C++ execution library | Velox Backend |
| ClickHouse | Column-oriented DBMS ported as a native library | ClickHouse Backend |
## Set JAVA_HOME (JDK 8 or 17) export JAVA_HOME=/path/to/your/jdk export PATH=$JAVA_HOME/bin:$PATH ## Clone and build git clone https://github.com/apache/gluten.git cd gluten ./dev/buildbundle-veloxbe.sh
See the Build Guide for build options and parameters.
Add the Gluten jar and required configuration to your Spark session. Off-heap memory must be enabled, and the columnar shuffle manager is needed for native shuffle support:
spark-shell \ --master yarn --deploy-mode client \ --conf spark.plugins=org.apache.gluten.GlutenPlugin \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=20g \ --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \ --jars /path/to/gluten-velox-bundle-*.jar
Adjust offHeap.size based on your environment.
Alternatively, you can enable dynamic off-heap sizing (experimental) to let Gluten manage the off-heap/on-heap split automatically based on spark.executor.memory:
spark-shell \ --master yarn --deploy-mode client \ --conf spark.plugins=org.apache.gluten.GlutenPlugin \ --conf spark.gluten.memory.dynamic.offHeap.sizing.enabled=true \ --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \ --jars /path/to/gluten-velox-bundle-*.jar
With dynamic sizing, spark.memory.offHeap.enabled and spark.memory.offHeap.size are not needed - Velox uses the on-heap size as its memory budget. See Dynamic Off-Heap Sizing for details.
See the Velox Backend guide for a complete example with executor and driver settings.
Run a simple query and check the Spark UI for nodes containing Transformer or Velox or Columnar in the query plan, which indicate native execution.