DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries against CSV and Parquet files as well as querying directly against in-memory data.
DataFusion can be used as a library by adding the following to your Cargo.toml
file.
[dependencies] datafusion = "1.0.0"
DataFusion includes a simple command-line interactive SQL utility. See the CLI reference for more information.
This library currently supports the following SQL constructs:
CREATE EXTERNAL TABLE X STORED AS PARQUET LOCATION '...';
to register a table's locationsSELECT ... FROM ...
together with any expressionALIAS
to name an expressionCAST
to change types, including e.g. Timestamp(Nanosecond, None)
+
, /
, sqrt
, tan
, >=
.WHERE
to filterGROUP BY
together with one of the following aggregations: MIN
, MAX
, COUNT
, SUM
, AVG
ORDER BY
together with an expression and optional DESC