Docker Compose is a convenient way to launch a cluster when testing locally.
There is no officially published Docker image so it is currently necessary to build the image from source instead.
Run the following commands to clone the source repository and build the Docker image.
git clone git@github.com:apache/arrow-datafusion.git -b 5.1.0 cd arrow-datafusion ./dev/build-ballista-docker.sh
This will create an image with the tag ballista:0.6.0.
The following Docker Compose example demonstrates how to start a cluster using one scheduler process and one executor process, with the scheduler using etcd as a backing store. A data volume is mounted into each container so that Ballista can access the host file system.
version: "2.2" services: etcd: image: quay.io/coreos/etcd:v3.4.9 command: "etcd -advertise-client-urls http://etcd:2379 -listen-client-urls http://0.0.0.0:2379" ballista-scheduler: image: ballista:0.6.0 command: "/scheduler --config-backend etcd --etcd-urls etcd:2379 --bind-host 0.0.0.0 --bind-port 50050" ports: - "50050:50050" environment: - RUST_LOG=info volumes: - ./data:/data depends_on: - etcd ballista-executor: image: ballista:0.6.0 command: "/executor --bind-host 0.0.0.0 --bind-port 50051 --scheduler-host ballista-scheduler" ports: - "50051:50051" environment: - RUST_LOG=info volumes: - ./data:/data depends_on: - ballista-scheduler
With the above content saved to a docker-compose.yaml file, the following command can be used to start the single node cluster.
docker-compose up
This should show output similar to the following:
$ docker-compose up Creating network "ballista-benchmarks_default" with the default driver Creating ballista-benchmarks_etcd_1 ... done Creating ballista-benchmarks_ballista-scheduler_1 ... done Creating ballista-benchmarks_ballista-executor_1 ... done Attaching to ballista-benchmarks_etcd_1, ballista-benchmarks_ballista-scheduler_1, ballista-benchmarks_ballista-executor_1 ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor] Running with config: ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor] work_dir: /tmp/.tmpLVx39c ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor] concurrent_tasks: 4 ballista-scheduler_1 | [2021-08-28T15:55:22Z INFO ballista_scheduler] Ballista v0.6.0 Scheduler listening on 0.0.0.0:50050 ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor] Ballista v0.6.0 Rust Executor listening on 0.0.0.0:50051
The scheduler listens on port 50050 and this is the port that clients will need to connect to.