DataFusion
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries against CSV and Parquet files as well as querying directly against in-memory data.
Using DataFusion as a library
DataFusion can be used as a library by adding the following to your Cargo.toml
file.
[dependencies]
datafusion = "0.17.1"
Using DataFusion as a binary
DataFusion includes a simple command-line interactive SQL utility. See the CLI reference for more information.
Status
General
- [x] SQL Parser
- [x] SQL Query Planner
- [x] Query Optimizer
- [x] Projection push down
- [x] Projection push down
- [ ] Predicate push down
- [x] Type coercion
- [x] Parallel query execution
SQL Support
- [x] Projection
- [x] Selection
- [x] Limit
- [x] Aggregate
- [x] UDFs
- [x] Common math functions
- [ ] Common string functions
- [ ] Common date/time functions
- [ ] Sorting
- [ ] Nested types
- [ ] Lists
- [ ] Subqueries
- [ ] Joins
Data Sources
- [x] CSV
- [x] Parquet primitive types
- [ ] Parquet nested types
Examples