There is no officially published Docker image so it is currently necessary to build the image from source instead.
Run the following commands to clone the source repository and build the Docker image.
git clone git@github.com:apache/arrow-datafusion.git -b 5.1.0 cd arrow-datafusion ./dev/build-ballista-docker.sh
This will create an image with the tag ballista:0.6.0.
Start a scheduler using the following syntax:
docker run --network=host \ -d ballista:0.6.0 \ /scheduler --bind-port 50050
Run docker ps to check that the process is running:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1f3f8b5ed93a ballista:0.6.0 "/scheduler --bind-p…" 2 seconds ago Up 1 second tender_archimedes
Run docker logs CONTAINER_ID to check the output from the process:
$ docker logs 1f3f8b5ed93a [2021-08-28T15:45:11Z INFO ballista_scheduler] Ballista v0.6.0 Scheduler listening on 0.0.0.0:50050
Start one or more executor processes. Each executor process will need to listen on a different port.
docker run --network=host \ -d ballista:0.6.0 \ /executor --external-host localhost --bind-port 50051
Use docker ps to check that both the scheduer and executor(s) are now running:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7c6941bb8dc0 ballista:0.6.0 "/executor --externa…" 3 seconds ago Up 2 seconds tender_goldberg 1f3f8b5ed93a ballista:0.6.0 "/scheduler --bind-p…" 50 seconds ago Up 49 seconds tender_archimedes
Use docker logs CONTAINER_ID to check the output from the executor(s):
$ docker logs 7c6941bb8dc0 [2021-08-28T15:45:58Z INFO ballista_executor] Running with config: [2021-08-28T15:45:58Z INFO ballista_executor] work_dir: /tmp/.tmpeyEM76 [2021-08-28T15:45:58Z INFO ballista_executor] concurrent_tasks: 4 [2021-08-28T15:45:58Z INFO ballista_executor] Ballista v0.6.0 Rust Executor listening on 0.0.0.0:50051
NOTE: This functionality is currently experimental
Ballista can optionally use etcd as a backing store for the scheduler. Use the following commands to launch the scheduler with this option enabled.
docker run --network=host \ -d ballista:0.6.0 \ /scheduler --bind-port 50050 \ --config-backend etcd \ --etcd-urls etcd:2379
Please refer to the etcd web site for installation instructions. Etcd version 3.4.9 or later is recommended.