blob: dffdc819b840c3538e163bbb383878a89e5518d6 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en" data-content_root="../">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Introduction &#8212; Apache DataFusion documentation</title>
<link href="../_static/styles/theme.css?digest=1999514e3f237ded88cf" rel="stylesheet">
<link href="../_static/styles/pydata-sphinx-theme.css?digest=1999514e3f237ded88cf" rel="stylesheet">
<link rel="stylesheet"
href="../_static/vendor/fontawesome/5.13.0/css/all.min.css">
<link rel="preload" as="font" type="font/woff2" crossorigin
href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-solid-900.woff2">
<link rel="preload" as="font" type="font/woff2" crossorigin
href="../_static/vendor/fontawesome/5.13.0/webfonts/fa-brands-400.woff2">
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=8f2a1f02" />
<link rel="stylesheet" type="text/css" href="../_static/styles/pydata-sphinx-theme.css?v=1140d252" />
<link rel="stylesheet" type="text/css" href="../_static/theme_overrides.css?v=c6d785ac" />
<link rel="preload" as="script" href="../_static/scripts/pydata-sphinx-theme.js?digest=1999514e3f237ded88cf">
<script src="../_static/documentation_options.js?v=8a448e45"></script>
<script src="../_static/doctools.js?v=9bcbadda"></script>
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
<script async="true" defer="true" src="https://buttons.github.io/buttons.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Example Usage" href="example-usage.html" />
<link rel="prev" title="Download" href="../download.html" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="docsearch:language" content="en">
<!-- Google Analytics -->
</head>
<body data-spy="scroll" data-target="#bd-toc-nav" data-offset="80">
<div class="container-fluid" id="banner"></div>
<div class="container-xl">
<div class="row">
<!-- Only show if we have sidebars configured, else just a small margin -->
<div class="col-12 col-md-3 bd-sidebar">
<div class="sidebar-start-items">
<form class="bd-search d-flex align-items-center" action="../search.html" method="get">
<i class="icon fas fa-search"></i>
<input type="search" class="form-control" name="q" id="search-input" placeholder="Search the docs ..." aria-label="Search the docs ..." autocomplete="off" >
</form>
<nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
<div class="bd-toc-item active">
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
ASF Links
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1">
<a class="reference external" href="https://apache.org">
Apache Software Foundation
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://www.apache.org/licenses/">
License
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://www.apache.org/foundation/sponsorship.html">
Donate
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://www.apache.org/foundation/thanks.html">
Thanks
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://www.apache.org/security/">
Security
</a>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
Links
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1">
<a class="reference external" href="https://github.com/apache/datafusion">
GitHub and Issue Tracker
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://crates.io/crates/datafusion">
crates.io
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://docs.rs/datafusion/latest/datafusion/">
API Docs
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://datafusion.apache.org/blog/">
Blog
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md">
Code of conduct
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../download.html">
Download
</a>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
User Guide
</span>
</p>
<ul class="current nav bd-sidenav">
<li class="toctree-l1 current active">
<a class="current reference internal" href="#">
Introduction
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="example-usage.html">
Example Usage
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="concepts-readings-events.html">
Concepts, Readings, Events
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="crate-configuration.html">
Crate Configuration
</a>
</li>
<li class="toctree-l1 has-children">
<a class="reference internal" href="cli/index.html">
DataFusion CLI
</a>
<input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" type="checkbox"/>
<label for="toctree-checkbox-1">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l2">
<a class="reference internal" href="cli/overview.html">
Overview
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="cli/installation.html">
Installation
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="cli/usage.html">
Usage
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="cli/datasources.html">
Local Files / Directories
</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="reference internal" href="dataframe.html">
DataFrame API
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="expressions.html">
Expression API
</a>
</li>
<li class="toctree-l1 has-children">
<a class="reference internal" href="sql/index.html">
SQL Reference
</a>
<input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" type="checkbox"/>
<label for="toctree-checkbox-2">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l2">
<a class="reference internal" href="sql/data_types.html">
Data Types
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/select.html">
SELECT syntax
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/subqueries.html">
Subqueries
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/ddl.html">
DDL
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/dml.html">
DML
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/explain.html">
EXPLAIN
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/information_schema.html">
Information Schema
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/operators.html">
Operators
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/aggregate_functions.html">
Aggregate Functions
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/window_functions.html">
Window Functions
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/scalar_functions.html">
Scalar Functions
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/special_functions.html">
Special Functions
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/sql_status.html">
Status
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="sql/write_options.html">
Write Options
</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="reference internal" href="configs.html">
Configuration Settings
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="explain-usage.html">
Reading Explain Plans
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="faq.html">
Frequently Asked Questions
</a>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
Library User Guide
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/index.html">
Introduction
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/extensions.html">
Extensions List
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/using-the-sql-api.html">
Using the SQL API
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/working-with-exprs.html">
Working with
<code class="docutils literal notranslate">
<span class="pre">
Expr
</span>
</code>
s
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/using-the-dataframe-api.html">
Using the DataFrame API
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/building-logical-plans.html">
Building Logical Plans
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/catalogs.html">
Catalogs, Schemas, and Tables
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/adding-udfs.html">
Adding User Defined Functions: Scalar/Window/Aggregate/Table Functions
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/custom-table-providers.html">
Custom Table Provider
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/extending-operators.html">
Extending DataFusion’s operators: custom LogicalPlan and Execution Plans
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/profiling.html">
Profiling Cookbook
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/query-optimizer.html">
DataFusion Query Optimizer
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../library-user-guide/api-health.html">
API health policy
</a>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
Contributor Guide
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/index.html">
Introduction
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/communication.html">
Communication
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/getting_started.html">
Getting Started
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/architecture.html">
Architecture
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/testing.html">
Testing
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/howtos.html">
HOWTOs
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/roadmap.html">
Roadmap
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/governance.html">
Governance
</a>
</li>
<li class="toctree-l1">
<a class="reference internal" href="../contributor-guide/inviting.html">
Inviting New Committers and PMC Members
</a>
</li>
<li class="toctree-l1 has-children">
<a class="reference internal" href="../contributor-guide/specification/index.html">
Specifications
</a>
<input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/>
<label for="toctree-checkbox-3">
<i class="fas fa-chevron-down">
</i>
</label>
<ul>
<li class="toctree-l2">
<a class="reference internal" href="../contributor-guide/specification/invariants.html">
Invariants
</a>
</li>
<li class="toctree-l2">
<a class="reference internal" href="../contributor-guide/specification/output-field-name-semantic.html">
Output field name semantics
</a>
</li>
</ul>
</li>
</ul>
<p aria-level="2" class="caption" role="heading">
<span class="caption-text">
DataFusion Subprojects
</span>
</p>
<ul class="nav bd-sidenav">
<li class="toctree-l1">
<a class="reference external" href="https://arrow.apache.org/ballista/">
DataFusion Ballista
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://datafusion.apache.org/comet/">
DataFusion Comet
</a>
</li>
<li class="toctree-l1">
<a class="reference external" href="https://datafusion.apache.org/python/">
DataFusion Python
</a>
</li>
</ul>
</div>
<a class="navbar-brand" href="../index.html">
<img src="../_static/images/2x_bgwhite_original.png" class="logo" alt="logo">
</a>
</nav>
</div>
<div class="sidebar-end-items">
</div>
</div>
<div class="d-none d-xl-block col-xl-2 bd-toc">
<div class="toc-item">
<div class="tocsection onthispage pt-5 pb-3">
<i class="fas fa-list"></i> On this page
</div>
<nav id="bd-toc-nav">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#project-goals">
Project Goals
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#features">
Features
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#use-cases">
Use Cases
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#known-users">
Known Users
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#integrations-and-extensions">
Integrations and Extensions
</a>
<ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry">
<a class="reference internal nav-link" href="#language-bindings">
Language Bindings
</a>
</li>
<li class="toc-h3 nav-item toc-entry">
<a class="reference internal nav-link" href="#integrations">
Integrations
</a>
</li>
</ul>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#why-datafusion">
Why DataFusion?
</a>
</li>
</ul>
</nav>
</div>
<div class="toc-item">
<div class="tocsection editthispage">
<a href="https://github.com/apache/arrow-datafusion/edit/main/docs/source/user-guide/introduction.md">
<i class="fas fa-pencil-alt"></i> Edit this page
</a>
</div>
</div>
</div>
<main class="col-12 col-md-9 col-xl-7 py-md-5 pl-md-5 pr-md-4 bd-content" role="main">
<div>
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<section id="introduction">
<h1>Introduction<a class="headerlink" href="#introduction" title="Link to this heading"></a></h1>
<p>DataFusion is a very fast, extensible query engine for building
high-quality data-centric systems in <a class="reference external" href="http://rustlang.org">Rust</a>,
using the <a class="reference external" href="https://arrow.apache.org">Apache Arrow</a> in-memory format.
DataFusion originated as part of the <a class="reference external" href="https://arrow.apache.org/">Apache Arrow</a>
project.</p>
<p>DataFusion offers SQL and Dataframe APIs, excellent <a class="reference external" href="https://benchmark.clickhouse.com/">performance</a>, built-in support for CSV, Parquet, JSON, and Avro, <a class="reference external" href="https://github.com/apache/datafusion-python">python bindings</a>, extensive customization, a great community, and more.</p>
<section id="project-goals">
<h2>Project Goals<a class="headerlink" href="#project-goals" title="Link to this heading"></a></h2>
<p>DataFusion aims to be the query engine of choice for new, fast
data centric systems such as databases, dataframe libraries, machine
learning and streaming applications by leveraging the unique features
of <a class="reference external" href="https://www.rust-lang.org/">Rust</a> and <a class="reference external" href="https://arrow.apache.org/">Apache
Arrow</a>.</p>
</section>
<section id="features">
<h2>Features<a class="headerlink" href="#features" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p>Feature-rich <a class="reference external" href="https://datafusion.apache.org/user-guide/sql/index.html">SQL support</a> and <a class="reference external" href="https://datafusion.apache.org/user-guide/dataframe.html">DataFrame API</a></p></li>
<li><p>Blazingly fast, vectorized, multi-threaded, streaming execution engine.</p></li>
<li><p>Native support for Parquet, CSV, JSON, and Avro file formats. Support
for custom file formats and non file datasources via the <code class="docutils literal notranslate"><span class="pre">TableProvider</span></code> trait.</p></li>
<li><p>Many extension points: user defined scalar/aggregate/window functions, DataSources, SQL,
other query languages, custom plan and execution nodes, optimizer passes, and more.</p></li>
<li><p>Streaming, asynchronous IO directly from popular object stores, including AWS S3,
Azure Blob Storage, and Google Cloud Storage (Other storage systems are supported via the
<code class="docutils literal notranslate"><span class="pre">ObjectStore</span></code> trait).</p></li>
<li><p><a class="reference external" href="https://docs.rs/datafusion/latest">Excellent Documentation</a> and a
<a class="reference external" href="https://datafusion.apache.org/contributor-guide/communication.html">welcoming community</a>.</p></li>
<li><p>A state of the art query optimizer with expression coercion and
simplification, projection and filter pushdown, sort and distribution
aware optimizations, automatic join reordering, and more.</p></li>
<li><p>Permissive Apache 2.0 License, predictable and well understood
<a class="reference external" href="https://www.apache.org/">Apache Software Foundation</a> governance.</p></li>
<li><p>Implementation in <a class="reference external" href="https://www.rust-lang.org/">Rust</a>, a modern
system language with development productivity similar to Java or
Golang, the performance of C++, and <a class="reference external" href="https://insights.stackoverflow.com/survey/2021#technology-most-loved-dreaded-and-wanted">loved by programmers
everywhere</a>.</p></li>
<li><p>Support for <a class="reference external" href="https://substrait.io/">Substrait</a> query plans, to
easily pass plans across language and system boundaries.</p></li>
</ul>
</section>
<section id="use-cases">
<h2>Use Cases<a class="headerlink" href="#use-cases" title="Link to this heading"></a></h2>
<p>DataFusion can be used without modification as an embedded SQL
engine or can be customized and used as a foundation for
building new systems.</p>
<p>While most current usecases are “analytic” or (throughput) some
components of DataFusion such as the plan representations, are
suitable for “streaming” and “transaction” style systems (low
latency).</p>
<p>Here are some example systems built using DataFusion:</p>
<ul class="simple">
<li><p>Specialized Analytical Database systems such as <a class="reference external" href="https://github.com/apache/incubator-horaedb">HoraeDB</a> and more general Apache Spark like system such a <a class="reference external" href="https://github.com/apache/datafusion-ballista">Ballista</a>.</p></li>
<li><p>New query language engines such as <a class="reference external" href="https://github.com/prql/prql-query">prql-query</a> and accelerators such as <a class="reference external" href="https://vegafusion.io/" title="if you know of another project, please submit a PR to add a link!">VegaFusion</a></p></li>
<li><p>Research platform for new Database Systems, such as <a class="reference external" href="https://github.com/flock-lab/flock">Flock</a></p></li>
<li><p>SQL support to another library, such as <a class="reference external" href="https://github.com/dask-contrib/dask-sql">dask sql</a></p></li>
<li><p>Streaming data platforms such as <a class="reference external" href="https://synnada.ai/">Synnada</a></p></li>
<li><p>Tools for reading / sorting / transcoding Parquet, CSV, AVRO, and JSON files such as <a class="reference external" href="https://github.com/timvw/qv">qv</a></p></li>
<li><p>Native Spark runtime replacement such as <a class="reference external" href="https://github.com/blaze-init/blaze">Blaze</a></p></li>
</ul>
<p>By using DataFusion, projects are freed to focus on their specific
features, and avoid reimplementing general (but still necessary)
features such as an expression representation, standard optimizations,
parellelized streaming execution plans, file format support, etc.</p>
</section>
<section id="known-users">
<h2>Known Users<a class="headerlink" href="#known-users" title="Link to this heading"></a></h2>
<p>Here are some active projects using DataFusion:</p>
<!-- "Active" means github repositories that had at least one commit in the last 6 months -->
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/ArroyoSystems/arroyo">Arroyo</a> Distributed stream processing engine in Rust</p></li>
<li><p><a class="reference external" href="https://github.com/apache/datafusion-ballista">Ballista</a> Distributed SQL Query Engine</p></li>
<li><p><a class="reference external" href="https://github.com/kwai/blaze">Blaze</a> The Blaze accelerator for Apache Spark leverages native vectorized execution to accelerate query processing</p></li>
<li><p><a class="reference external" href="https://github.com/cnosdb/cnosdb">CnosDB</a> Open Source Distributed Time Series Database</p></li>
<li><p><a class="reference external" href="https://github.com/apache/datafusion-comet">Comet</a> Apache Spark native query execution plugin</p></li>
<li><p><a class="reference external" href="https://github.com/cube-js/cube.js/tree/master/rust">Cube Store</a></p></li>
<li><p><a class="reference external" href="https://github.com/dask-contrib/dask-sql">Dask SQL</a> Distributed SQL query engine in Python</p></li>
<li><p><a class="reference external" href="https://github.com/delta-io/delta-rs">delta-rs</a> Native Rust implementation of Delta Lake</p></li>
<li><p><a class="reference external" href="https://github.com/wheretrue/exon">Exon</a> Analysis toolkit for life-science applications</p></li>
<li><p><a class="reference external" href="https://funnel.io/">Funnel</a> Data Platform powering Marketing Intelligence applications.</p></li>
<li><p><a class="reference external" href="https://github.com/GlareDB/glaredb">GlareDB</a> Fast SQL database for querying and analyzing distributed data.</p></li>
<li><p><a class="reference external" href="https://github.com/GreptimeTeam/greptimedb">GreptimeDB</a> Open Source &amp; Cloud Native Distributed Time Series Database</p></li>
<li><p><a class="reference external" href="https://github.com/apache/incubator-horaedb">HoraeDB</a> Distributed Time-Series Database</p></li>
<li><p><a class="reference external" href="https://github.com/influxdata/influxdb">InfluxDB</a> Time Series Database</p></li>
<li><p><a class="reference external" href="https://github.com/kamu-data/kamu-cli/">Kamu</a> Planet-scale streaming data pipeline</p></li>
<li><p><a class="reference external" href="https://github.com/lakesoul-io/LakeSoul">LakeSoul</a> Open source LakeHouse framework with native IO in Rust.</p></li>
<li><p><a class="reference external" href="https://github.com/lancedb/lance">Lance</a> Modern columnar data format for ML</p></li>
<li><p><a class="reference external" href="https://github.com/openobserve/openobserve">OpenObserve</a> Distributed cloud native observability platform</p></li>
<li><p><a class="reference external" href="https://github.com/paradedb/paradedb">ParadeDB</a> PostgreSQL for Search &amp; Analytics</p></li>
<li><p><a class="reference external" href="https://github.com/parseablehq/parseable">Parseable</a> Log storage and observability platform</p></li>
<li><p><a class="reference external" href="https://github.com/timvw/qv">qv</a> Quickly view your data</p></li>
<li><p><a class="reference external" href="https://github.com/restatedev">Restate</a> Easily build resilient applications using distributed durable async/await</p></li>
<li><p><a class="reference external" href="https://github.com/roapi/roapi">ROAPI</a></p></li>
<li><p><a class="reference external" href="https://github.com/lakehq/sail">Sail</a> Unifying stream, batch, and AI workloads with Apache Spark compatibility</p></li>
<li><p><a class="reference external" href="https://github.com/splitgraph/seafowl">Seafowl</a> CDN-friendly analytical database</p></li>
<li><p><a class="reference external" href="https://github.com/spiceai/spiceai">Spice.ai</a> Unified SQL query interface &amp; materialization engine</p></li>
<li><p><a class="reference external" href="https://synnada.ai/">Synnada</a> Streaming-first framework for data products</p></li>
<li><p><a class="reference external" href="https://vegafusion.io/">VegaFusion</a> Server-side acceleration for the <a class="reference external" href="https://vega.github.io/">Vega</a> visualization grammar</p></li>
<li><p><a class="reference external" href="https://telemetry.sh/">Telemetry</a> Structured logging made easy</p></li>
</ul>
<p>Here are some less active projects that used DataFusion:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/bdt">bdt</a> Boring Data Tool</p></li>
<li><p><a class="reference external" href="https://github.com/cloudfuse-io/buzz-rust">Cloudfuse Buzz</a></p></li>
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/datafusion-tui">datafusion-tui</a> Text UI for DataFusion</p></li>
<li><p><a class="reference external" href="https://github.com/flock-lab/flock">Flock</a></p></li>
<li><p><a class="reference external" href="https://github.com/tensorbase/tensorbase">Tensorbase</a></p></li>
</ul>
</section>
<section id="integrations-and-extensions">
<h2>Integrations and Extensions<a class="headerlink" href="#integrations-and-extensions" title="Link to this heading"></a></h2>
<p>There are a number of community projects that extend DataFusion or
provide integrations with other systems, some of which are described below:</p>
<section id="language-bindings">
<h3>Language Bindings<a class="headerlink" href="#language-bindings" title="Link to this heading"></a></h3>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/datafusion-c">datafusion-c</a></p></li>
<li><p><a class="reference external" href="https://github.com/apache/datafusion-python">datafusion-python</a></p></li>
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/datafusion-ruby">datafusion-ruby</a></p></li>
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/datafusion-java">datafusion-java</a></p></li>
</ul>
</section>
<section id="integrations">
<h3>Integrations<a class="headerlink" href="#integrations" title="Link to this heading"></a></h3>
<ul class="simple">
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/datafusion-bigtable">datafusion-bigtable</a></p></li>
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/datafusion-catalogprovider-glue">datafusion-catalogprovider-glue</a></p></li>
<li><p><a class="reference external" href="https://github.com/datafusion-contrib/datafusion-federation">datafusion-federation</a></p></li>
</ul>
</section>
</section>
<section id="why-datafusion">
<h2>Why DataFusion?<a class="headerlink" href="#why-datafusion" title="Link to this heading"></a></h2>
<ul class="simple">
<li><p><em>High Performance</em>: Leveraging Rust and Arrow’s memory model, DataFusion is very fast.</p></li>
<li><p><em>Easy to Connect</em>: Being part of the Apache Arrow ecosystem (Arrow, Parquet and Flight), DataFusion works well with the rest of the big data ecosystem</p></li>
<li><p><em>Easy to Embed</em>: Allowing extension at almost any point in its design, and published regularly as a crate on <a class="reference external" href="http://crates.io">crates.io</a>, DataFusion can be integrated and tailored for your specific usecase.</p></li>
<li><p><em>High Quality</em>: Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can and is used as the foundation for production systems.</p></li>
</ul>
</section>
</section>
</div>
<!-- Previous / next buttons -->
<div class='prev-next-area'>
<a class='left-prev' id="prev-link" href="../download.html" title="previous page">
<i class="fas fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
<p class="prev-next-title">Download</p>
</div>
</a>
<a class='right-next' id="next-link" href="example-usage.html" title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
<p class="prev-next-title">Example Usage</p>
</div>
<i class="fas fa-angle-right"></i>
</a>
</div>
</main>
</div>
</div>
<script src="../_static/scripts/pydata-sphinx-theme.js?digest=1999514e3f237ded88cf"></script>
<!-- Based on pydata_sphinx_theme/footer.html -->
<footer class="footer mt-5 mt-md-0">
<div class="container">
<div class="footer-item">
<p class="copyright">
&copy; Copyright 2019-2024, Apache Software Foundation.<br>
</p>
</div>
<div class="footer-item">
<p class="sphinx-version">
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 8.1.3.<br>
</p>
</div>
<div class="footer-item">
<p>Apache DataFusion, Apache, the Apache feather logo, and the Apache DataFusion project logo</p>
<p>are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>
</div>
</div>
</footer>
</body>
</html>