commit | 860918d17b6bde396b04d718ee1c76d93054bf11 | [log] [tgz] |
---|---|---|
author | Andy Grove <andygrove73@gmail.com> | Fri Mar 10 16:05:51 2023 -0700 |
committer | GitHub <noreply@github.com> | Fri Mar 10 18:05:51 2023 -0500 |
tree | 293d80d6aff66353d45ae7612281345bd126f9bb | |
parent | a76800408263454735d6613758ae20de8d25863a [diff] |
Manual changelog for 20.0.0 (#5551) * manual changelog * add bullets * add link * fix link
DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.
DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.
TableProvider
trait.ObjectStore
trait.DataFusion can be used without modification as an embedded SQL engine or can be customized and used as a foundation for building new systems. Here are some examples of systems built using DataFusion:
By using DataFusion, the projects are freed to focus on their specific features, and avoid reimplementing general (but still necessary) features such as an expression representation, standard optimizations, execution plans, file format support, etc.
Here is a comparison with similar projects that may help understand when DataFusion might be be suitable and unsuitable for your needs:
DuckDB is an open source, in process analytic database. Like DataFusion, it supports very fast execution, both from its custom file format and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it is primarily used directly by users as a serverless database and query system rather than as a library for building such database systems.
Polars: Polars is one of the fastest DataFrame libraries at the time of writing. Like DataFusion, it is also written in Rust and uses the Apache Arrow memory model, but unlike DataFusion it does not provide SQL nor as many extension points.
Facebook Velox is an execution engine. Like DataFusion, Velox aims to provide a reusable foundation for building database-like systems. Unlike DataFusion, it is written in C/C++ and does not include a SQL frontend or planning /optimization framework.
Databend is a complete database system. Like DataFusion it is also written in Rust and utilizes the Apache Arrow memory model, but unlike DataFusion it targets end-users rather than developers of other database systems.
There are a number of community projects that extend DataFusion or provide integrations with other systems.
Here are some of the projects known to use DataFusion:
Please see the example usage in the user guide and the datafusion-examples crate for more information on how to use DataFusion.
Please see Roadmap for information of where the project is headed.
There is no formal document describing DataFusion's architecture yet, but the following presentations offer a good overview of its different components and how they interact together.
Please see User Guide for more information about DataFusion.
Please see Contributor Guide for information about contributing to DataFusion.