See the architecture guide for more details.
We run some simple benchmarks comparing Ballista with Apache Spark to track progress with performance optimizations. These are benchmarks derived from TPC-H and not official TPC-H benchmarks. These results are from running individual queries at scale factor 10 (10 GB) on a single node with a single executor and 24 concurrent tasks.
The tracking issue for improving these results is #339.
The easiest way to get started is to run one of the standalone or distributed examples. After that, refer to the Getting Started Guide.
Ballista supports a wide range of SQL, including CTEs, Joins, and Subqueries and can execute complex queries at scale.
Refer to the DataFusion SQL Reference for more information on supported SQL.
Ballista is maturing quickly and is now working towards being production ready. See the roadmap for more details.
Please see the Contribution Guide for information about contributing to Ballista.