Features

General

  • [x] SQL Parser
  • [x] SQL Query Planner
  • [x] DataFrame API
  • [x] Parallel query execution
  • [x] Streaming Execution

Optimizations

  • [x] Query Optimizer
  • [x] Constant folding
  • [x] Join Reordering
  • [x] Limit Pushdown
  • [x] Projection push down
  • [x] Predicate push down

SQL Support

  • [x] Type coercion
  • [x] Projection (SELECT)
  • [x] Filter (WHERE)
  • [x] Filter post-aggregate (HAVING)
  • [x] Sorting (ORDER BY)
  • [x] Limit (LIMIT)
  • [x] Aggregate (GROUP BY)
  • [x] cast /try_cast
  • [x] VALUES lists
  • [x] String Functions
  • [x] Conditional Functions
  • [x] Time and Date Functions
  • [x] Math Functions
  • [x] Aggregate Functions (SUM, MEDIAN, and many more)
  • [x] Schema Queries
  • [x] Support for nested types (ARRAY/LIST and STRUCT.
  • [x] Subqueries
  • [x] Common Table Expressions (CTE)
  • [x] Set Operations (UNION [ALL], INTERSECT [ALL], EXCEPT[ALL])
  • [x] Joins (INNER, LEFT, RIGHT, FULL, CROSS)
  • [x] Window Functions
    • [x] Empty (OVER())
    • [x] Partitioning and ordering: (OVER(PARTITION BY <..> ORDER BY <..>))
    • [x] Custom Window (ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING))
    • [x] User Defined Window and Aggregate Functions
  • [x] Catalogs
    • [x] Schemas (CREATE / DROP SCHEMA)
    • [x] Tables (CREATE / DROP TABLE, CREATE TABLE AS SELECT)
  • [x] Data Insert
    • [x] INSERT INTO
    • [x] COPY .. INTO ..
    • [x] CSV
    • [x] JSON
    • [x] Parquet
    • [ ] Avro

Runtime

  • [x] Streaming Grouping
  • [x] Streaming Window Evaluation
  • [x] Memory limits enforced
  • [x] Spilling (to disk) Sort
  • [x] Spilling (to disk) Grouping
  • [x] Spilling (to disk) Sort Merge Join
  • [ ] Spilling (to disk) Hash Join

Data Sources

In addition to allowing arbitrary datasources via the TableProvider trait, DataFusion includes built in support for the following formats:

  • [x] CSV
  • [x] Parquet
    • [x] Primitive and Nested Types
    • [x] Row Group and Data Page pruning on min/max statistics
    • [x] Row Group pruning on Bloom Filters
    • [x] Predicate push down (late materialization) not by default
  • [x] JSON
  • [x] Avro
  • [x] Arrow