Ballista is a distributed query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
The Ballista CLI allows SQL queries to be executed by an in-process DataFusion context, or by a distributed Ballista context.
USAGE: ballista-cli [OPTIONS] OPTIONS: -c, --batch-size <BATCH_SIZE> The batch size of each query, or use Ballista default --color Enables console syntax highlighting --concurrent-tasks <CONCURRENT_TASKS> The max concurrent tasks, only for Ballista local mode. Default: all available cores -f, --file <FILE>... Execute commands from file(s), then exit --format <FORMAT> [default: table] [possible values: csv, tsv, table, json, nd-json, automatic] -h, --help Print help information --host <HOST> Ballista scheduler host -p, --data-path <DATA_PATH> Path to your data, default to current directory --port <PORT> Ballista scheduler port -q, --quiet Reduce printing other than the results and work quietly -r, --rc <RC>... Run the provided files on startup instead of ~/.ballistarc -V, --version Print version information
Create a CSV file to query.
$ echo "1,2" > data.csv
$ ballista-cli Ballista CLI v0.12.0 > CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv'; 0 rows in set. Query took 0.001 seconds. > SELECT * FROM foo; +---+---+ | a | b | +---+---+ | 1 | 2 | +---+---+ 1 row in set. Query took 0.017 seconds. > \q
If you want to execute the SQL in ballista by ballista-cli
, you must build/compile ballista-cli
first.
cd datafusion-ballista/ballista-cli cargo build cargo install --path .
The Ballista CLI can connect to a Ballista scheduler for query execution.
ballista-cli --host localhost --port 50050