blob: 0cb2bc889e3ae14df855530bfdc1f0fc4654c103 [file] [log] [blame] [view]
<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-->
# Apache Auron (Incubating)
[![TPC-DS](https://github.com/apache/auron/actions/workflows/tpcds.yml/badge.svg?branch=master)](https://github.com/apache/auron/actions/workflows/tpcds.yml)
[![master-ce7-builds](https://github.com/apache/auron/actions/workflows/build-ce7-releases.yml/badge.svg?branch=master)](https://github.com/apache/auron/actions/workflows/build-ce7-releases.yml)
<p align="center"><img src="./dev/auron-logo.png" /></p>
The Auron accelerator for big data engines (e.g., Spark, Flink) leverages native vectorized execution to accelerate query processing. It combines
the power of the [Apache DataFusion](https://arrow.apache.org/datafusion/) library and the scale of the distributed
computing framework.
Auron takes a fully optimized physical plan from distributed computing framework, mapping it into DataFusion's execution plan, and performs native
plan computation.
The key capabilities of Auron include:
- **Native execution**: Implemented in Rust, eliminating JVM overhead and enabling predictable performance.
- **Vectorized computation**: Built on Apache Arrow's columnar format, fully leveraging SIMD instructions for batch processing.
- **Pluggable architecture:**: Seamlessly integrates with Apache Spark while designed for future extensibility to other engines.
- **Production-hardened optimizations:** Multi-level memory management, compacted shuffle formats, and adaptive execution strategies developed through large-scale deployment.
Based on the inherent well-defined extensibility of DataFusion, Auron can be easily extended to support:
- Various object stores.
- Operators.
- Simple and Aggregate functions.
- File formats.
We encourage you to extend [DataFusion](https://github.com/apache/arrow-datafusion) capability directly and add the
supports in Auron with simple modifications in plan-serde and extension translation.
## Build from source
To build Auron from source, follow the steps below:
1. Install Rust
Auron's native execution lib is written in Rust. You need to install Rust (nightly) before compiling.
We recommend using [rustup](https://rustup.rs/) for installation.
2. Install JDK
Auron has been well tested with JDK 8, 11, and 17.
Make sure `JAVA_HOME` is properly set and points to your desired version.
3. Check out the source code.
4. Build the project.
You can build Auron either *locally* or *inside Docker with CentOS7* using a unified script: `auron-build.sh`.
Run `./auron-build.sh --help` to see all available options.
After the build completes, a fat JAR with all dependencies will be generated in either the `target/` directory (for local builds)
or `target-docker/` directory (for Docker builds), depending on the selected build mode.
## Run Spark Job with Auron Accelerator
This section describes how to submit and configure a Spark Job with Auron support.
1. Move the Auron JAR to the Spark client classpath (normally spark-xx.xx.xx/jars/).
2. Add the following configs to spark configuration in `spark-xx.xx.xx/conf/spark-default.conf`:
```properties
spark.auron.enable true
spark.sql.extensions org.apache.spark.sql.auron.AuronSparkSessionExtension
spark.shuffle.manager org.apache.spark.sql.execution.auron.shuffle.AuronShuffleManager
spark.memory.offHeap.enabled false
# suggested executor memory configuration
spark.executor.memory 4g
spark.executor.memoryOverhead 4096
```
3. submit a query with spark-sql, or other tools like spark-thriftserver:
```shell
spark-sql -f tpcds/q01.sql
```
## Performance
TPC-DS 1TB Benchmark Results:
![tpcds-benchmark-echarts.png](./benchmark-results/tpcds-benchmark-echarts.png)
For methodology and additional results, please refer to [benchmark documentation](https://auron.apache.org/documents/benchmarks.html).
We also encourage you to benchmark Auron and share the results with us. 🤗
## Community
### Subscribe Mailing Lists
Mail List is the most recognized form of communication in the Apache community.
Contact us through the following mailing list.
| Name | Scope | | |
|:-----------------------------------------------------------|:--------------------------------|:---------------------------------------------------------|:--------------------------------------------------------------|
| [dev@auron.apache.org](mailto:dev@auron.apache.org) | Development-related discussions | [Subscribe](mailto:dev-subscribe@auron.apache.org) | [Unsubscribe](mailto:dev-unsubscribe@auron.apache.org) |
### Report Issues or Submit Pull Request
If you meet any questions, connect us and fix it by submitting a 🔗[Pull Request](https://github.com/apache/auron/pulls).
## License
Auron is licensed under the Apache 2.0 License. A copy of the license
[can be found here.](LICENSE)