Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported as first-class citizens without paying a penalty for serialization costs.
The foundational technologies in Ballista are:
Ballista can be deployed as a standalone cluster and also supports Kubernetes. In either case, the scheduler can be configured to use etcd as a backing store to (eventually) provide redundancy in the case of a scheduler failing.
Although Ballista is largely inspired by Apache Spark, there are some key differences.
The Ballista project was donated to Apache Arrow in April 2021 and work is underway to integrate more tightly with DataFusion.
One of the goals is to implement a common scheduler that can seamlessly scale queries across cores in DataFusion and across nodes in Ballista.
Ballista issues are tracked in ASF JIRA here