tree: 93aba992c369a187c7ccbd21096e47c3d1c8bf6f [path history] [tgz]
  1. benches/
  2. docs/
  3. examples/
  4. src/
  5. tests/
  6. Cargo.toml
  7. Dockerfile
  8. README.md
rust/datafusion/README.md

DataFusion

DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries against CSV and Parquet files as well as querying directly against in-memory data.

Using DataFusion as a library

DataFusion can be used as a library by adding the following to your Cargo.toml file.

[dependencies]
datafusion = "0.17.1"

Using DataFusion as a binary

DataFusion includes a simple command-line interactive SQL utility. See the CLI reference for more information.

Status

General

  • [x] SQL Parser
  • [x] SQL Query Planner
  • [x] Query Optimizer
  • [x] Projection push down
  • [x] Projection push down
  • [ ] Predicate push down
  • [x] Type coercion
  • [x] Parallel query execution

SQL Support

  • [x] Projection
  • [x] Selection
  • [x] Limit
  • [x] Aggregate
  • [x] UDFs
  • [x] Common math functions
  • [ ] Common string functions
  • [ ] Common date/time functions
  • [ ] Sorting
  • [ ] Nested types
  • [ ] Lists
  • [ ] Subqueries
  • [ ] Joins

Data Sources

  • [x] CSV
  • [x] Parquet primitive types
  • [ ] Parquet nested types

Examples