DataFusion

DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries against CSV and Parquet files as well as querying directly against in-memory data.

Using DataFusion as a library

DataFusion can be used as a library by adding the following to your Cargo.toml file.

[dependencies]
datafusion = "0.17.1"

Using DataFusion as a binary

DataFusion includes a simple command-line interactive SQL utility. See the CLI reference for more information.

Status

General

[x] SQL Parser
[x] SQL Query Planner
[x] Query Optimizer
[x] Projection push down
[x] Projection push down
[ ] Predicate push down
[x] Type coercion
[x] Parallel query execution

SQL Support

[x] Projection
[x] Selection
[x] Limit
[x] Aggregate
[x] UDFs
[x] Common math functions
[ ] Common string functions
[ ] Common date/time functions
[ ] Sorting
[ ] Nested types
[ ] Lists
[ ] Subqueries
[ ] Joins

Data Sources

[x] CSV
[x] Parquet primitive types
[ ] Parquet nested types