Status

General

  • [x] SQL Parser
  • [x] SQL Query Planner
  • [x] Query Optimizer
  • [x] Constant folding
  • [x] Join Reordering
  • [x] Limit Pushdown
  • [x] Projection push down
  • [x] Predicate push down
  • [x] Type coercion
  • [x] Parallel query execution

SQL Support

  • [x] Projection (SELECT)
  • [x] Filter (WHERE)
  • [x] Filter post-aggregate (HAVING)
  • [x] Sorting (ORDER BY)
  • [x] Limit (LIMIT
  • [x] Aggregate (GROUP BY)
  • [x] cast /try_cast
  • [x] VALUES lists
  • [x] String Functions
  • [x] Conditional Functions
  • [x] Time and Date Functions
  • [x] Math Functions
  • [x] Aggregate Functions (SUM, MEDIAN, and many more)
  • [x] Schema Queries
  • [ ] Support for nested types (ARRAY/LIST and STRUCT. See #2326 for details)
  • [x] Subqueries
  • [x] Common Table Expressions (CTE)
  • [x] Set Operations (UNION [ALL], INTERSECT [ALL], EXCEPT[ALL])
  • [x] Joins (INNER, LEFT, RIGHT, FULL, CROSS)
  • [x] Window Functions
    • [x] Empty (OVER())
    • [x] Partitioning and ordering: (OVER(PARTITION BY <..> ORDER BY <..>))
    • [x] Custom Window (ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING))
    • [x] User Defined Window and Aggregate Functions
  • [x] Catalogs
    • [x] Schemas (CREATE / DROP SCHEMA)
    • [x] Tables (CREATE / DROP TABLE, CREATE TABLE AS SELECT)
  • [ ] Data Insert
    • [x] INSERT INTO
    • [ ] COPY .. INTO ..
    • [x] CSV
    • [ ] JSON
    • [ ] Parquet
    • [ ] Avro

Runtime

  • [x] Streaming Grouping
  • [x] Streaming Window Evaluation
  • [x] Memory limits enforced
  • [x] Spilling (to disk) Sort
  • [ ] Spilling (to disk) Grouping
  • [ ] Spilling (to disk) Joins

Data Sources

In addition to allowing arbitrary datasources via the TableProvider trait, DataFusion includes built in support for the following formats:

  • [x] CSV
  • [x] Parquet (for all primitive and nested types)
  • [x] JSON
  • [x] Avro
  • [x] Arrow