Status
General
- [x] SQL Parser
- [x] SQL Query Planner
- [x] Query Optimizer
- [x] Constant folding
- [x] Join Reordering
- [x] Limit Pushdown
- [x] Projection push down
- [x] Predicate push down
- [x] Type coercion
- [x] Parallel query execution
SQL Support
- [x] Projection (
SELECT) - [x] Filter (
WHERE) - [x] Filter post-aggregate (
HAVING) - [x] Sorting (
ORDER BY) - [x] Limit (
LIMIT - [x] Aggregate (
GROUP BY) - [x] cast /try_cast
- [x]
VALUES lists - [x] String Functions
- [x] Conditional Functions
- [x] Time and Date Functions
- [x] Math Functions
- [x] Aggregate Functions (
SUM, MEDIAN, and many more) - [x] Schema Queries
- [ ] Support for nested types (
ARRAY/LIST and STRUCT. See #2326 for details) - [x] Subqueries
- [x] Common Table Expressions (CTE)
- [x] Set Operations (
UNION [ALL], INTERSECT [ALL], EXCEPT[ALL]) - [x] Joins (
INNER, LEFT, RIGHT, FULL, CROSS) - [x] Window Functions
- [x] Empty (
OVER()) - [x] Partitioning and ordering: (
OVER(PARTITION BY <..> ORDER BY <..>)) - [x] Custom Window (
ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING)) - [x] User Defined Window and Aggregate Functions
- [x] Catalogs
- [x] Schemas (
CREATE / DROP SCHEMA) - [x] Tables (
CREATE / DROP TABLE, CREATE TABLE AS SELECT)
- [ ] Data Insert
- [x]
INSERT INTO - [ ]
COPY .. INTO .. - [x] CSV
- [ ] JSON
- [ ] Parquet
- [ ] Avro
Runtime
- [x] Streaming Grouping
- [x] Streaming Window Evaluation
- [x] Memory limits enforced
- [x] Spilling (to disk) Sort
- [ ] Spilling (to disk) Grouping
- [ ] Spilling (to disk) Joins
Data Sources
In addition to allowing arbitrary datasources via the TableProvider trait, DataFusion includes built in support for the following formats:
- [x] CSV
- [x] Parquet (for all primitive and nested types)
- [x] JSON
- [x] Avro
- [x] Arrow