Features
General
- [x] SQL Parser
- [x] SQL Query Planner
- [x] DataFrame API
- [x] Parallel query execution
- [x] Streaming Execution
Optimizations
- [x] Query Optimizer
- [x] Constant folding
- [x] Join Reordering
- [x] Limit Pushdown
- [x] Projection push down
- [x] Predicate push down
SQL Support
- [x] Type coercion
- [x] Projection (
SELECT) - [x] Filter (
WHERE) - [x] Filter post-aggregate (
HAVING) - [x] Sorting (
ORDER BY) - [x] Limit (
LIMIT) - [x] Aggregate (
GROUP BY) - [x] cast /try_cast
- [x]
VALUES lists - [x] String Functions
- [x] Conditional Functions
- [x] Time and Date Functions
- [x] Math Functions
- [x] Aggregate Functions (
SUM, MEDIAN, and many more) - [x] Schema Queries
- [x] Support for nested types (
ARRAY/LIST and STRUCT. - [x] Subqueries
- [x] Common Table Expressions (CTE)
- [x] Set Operations (
UNION [ALL], INTERSECT [ALL], EXCEPT[ALL]) - [x] Joins (
INNER, LEFT, RIGHT, FULL, CROSS) - [x] Window Functions
- [x] Empty (
OVER()) - [x] Partitioning and ordering: (
OVER(PARTITION BY <..> ORDER BY <..>)) - [x] Custom Window (
ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING)) - [x] User Defined Window and Aggregate Functions
- [x] Catalogs
- [x] Schemas (
CREATE / DROP SCHEMA) - [x] Tables (
CREATE / DROP TABLE, CREATE TABLE AS SELECT)
- [x] Data Insert
- [x]
INSERT INTO - [x]
COPY .. INTO .. - [x] CSV
- [x] JSON
- [x] Parquet
- [ ] Avro
Runtime
- [x] Streaming Grouping
- [x] Streaming Window Evaluation
- [x] Memory limits enforced
- [x] Spilling (to disk) Sort
- [x] Spilling (to disk) Grouping
- [x] Spilling (to disk) Sort Merge Join
- [ ] Spilling (to disk) Hash Join
Data Sources
In addition to allowing arbitrary datasources via the TableProvider trait, DataFusion includes built in support for the following formats:
- [x] CSV
- [x] Parquet
- [x] Primitive and Nested Types
- [x] Row Group and Data Page pruning on min/max statistics
- [x] Row Group pruning on Bloom Filters
- [x] Predicate push down (late materialization) not by default
- [x] JSON
- [x] Avro
- [x] Arrow