| commit | f21199ddcc98a7cc80a0195bbc820a7954581370 | [log] [tgz] |
|---|---|---|
| author | zhangli20 <richselian@gmail.com> | Thu Aug 21 17:42:12 2025 +0800 |
| committer | zhangli20 <richselian@gmail.com> | Thu Aug 21 18:27:38 2025 +0800 |
| tree | 3ea8f681c676ca1ce0789a86a9d0a92f8a42b0bf | |
| parent | 718078fa87191ff5d091b22a19e392c9644e3b18 [diff] |
move all configurations to AuronConf
The Auron accelerator for Apache Spark leverages native vectorized execution to accelerate query processing. It combines the power of the Apache DataFusion library and the scale of the Spark distributed computing framework.
Auron takes a fully optimized physical plan from Spark, mapping it into DataFusion's execution plan, and performs native plan computation in Spark executors.
Auron is composed of the following high-level components:
Based on the inherent well-defined extensibility of DataFusion, Auron can be easily extended to support:
We encourage you to extend DataFusion capability directly and add the supports in Auron with simple modifications in plan-serde and extension translation.
To build Auron, please follow the steps below:
The native execution lib is written in Rust. So you're required to install Rust (nightly) first for compilation. We recommend you to use rustup.
Auron has been well tested on jdk8/11/17.
git clone git@github.com:kwai/auron.git cd auron
use ./auron-build.sh for building the project. execute ./auron-build.sh --help for help.
After the build is finished, a fat Jar package that contains all the dependencies will be generated in the target directory.
You can use the following command to build a centos-7 compatible release:
SHIM=spark-3.3 MODE=release JAVA_VERSION=8 SCALA_VERSION=2.12 ./release-docker.sh
This section describes how to submit and configure a Spark Job with Auron support.
move auron jar package to spark client classpath (normally spark-xx.xx.xx/jars/).
add the follow confs to spark configuration in spark-xx.xx.xx/conf/spark-default.conf:
spark.auron.enable true spark.sql.extensions org.apache.spark.sql.auron.AuronSparkSessionExtension spark.shuffle.manager org.apache.spark.sql.execution.auron.shuffle.AuronShuffleManager spark.memory.offHeap.enabled false # suggested executor memory configuration spark.executor.memory 4g spark.executor.memoryOverhead 4096
spark-sql -f tpcds/q01.sql
Auron has supported Celeborn integration now, use the following configurations to enable shuffling with Celeborn:
# change celeborn endpoint and storage directory to the correct location spark.shuffle.manager org.apache.spark.sql.execution.auron.shuffle.celeborn.AuronCelebornShuffleManager spark.serializer org.apache.spark.serializer.KryoSerializer spark.celeborn.master.endpoints localhost:9097 spark.celeborn.client.spark.shuffle.writer hash spark.celeborn.client.push.replicate.enabled false spark.celeborn.storage.availableTypes HDFS spark.celeborn.storage.hdfs.dir hdfs:///home/celeborn spark.sql.adaptive.localShuffleReader.enabled false
Auron supports integration with Apache Uniffle, a high-performance remote shuffle service for Apache Spark.
To enable Uniffle as the shuffle manager in Auron, configure your Spark application with the following settings in spark-defaults.conf or via Spark submit options:
spark.shuffle.manager org.apache.spark.sql.execution.auron.shuffle.uniffle.AuronUniffleShuffleManager spark.serializer org.apache.spark.serializer.KryoSerializer spark.rss.coordinator.quorum <coordinatorIp1>:19999,<coordinatorIp2>:19999 spark.rss.enabled true
Notes:
rss-client-spark3-shaded-0.9.2.jar for Uniffle 0.9.2 or later) is included in your Spark application's classpath.<coordinator-host>:19999 with the actual Uniffle coordinator address in your cluster.TPC-DS 1TB Benchmark (for details, see https://auron-project.github.io/documents/benchmarks.html):
We also encourage you to benchmark Auron and share the results with us. 🤗
We're using Discussions to connect with other members of our community. We hope that you:
Auron is licensed under the Apache 2.0 License. A copy of the license can be found here.