docs/source/user-guide/faq.md - datafusion - Git at Google

 <!---
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
 -->

 # Frequently Asked Questions

 ## What is the relationship between Apache Arrow, DataFusion, and Ballista?

 Apache Arrow is a library which provides a standardized memory representation for columnar data. It also provides
 "kernels" for performing common operations on this data.

 DataFusion is a library for executing queries in-process using the Apache Arrow memory
 model and computational kernels. It is designed to run within a single process, using threads
 for parallel query execution.

 [Ballista](https://github.com/apache/datafusion-ballista) is a distributed compute platform built on DataFusion.

 # How does DataFusion Compare with `XYZ`?

 When compared to similar systems, DataFusion typically is:

 1. Targeted at developers, rather than end users / data scientists.
 2. Designed to be embedded, rather than a complete file based SQL system.
 3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
 4. Implemented in `Rust`, rather than `C/C++`

 Here is a comparison with similar projects that may help understand
 when DataFusion might be suitable or unsuitable for your needs:

 - [DuckDB](https://www.duckdb.org) is an open source, in process analytic database.
   Like DataFusion, it supports very fast execution, both from its custom file format
   and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
   is primarily used directly by users as a serverless database and query system rather
   than as a library for building such database systems.

 - [Polars](http://pola.rs): Polars is one of the fastest DataFrame
   libraries at the time of writing. Like DataFusion, it is also
   written in Rust and uses the Apache Arrow memory model, but unlike
   DataFusion it is not designed with as many extension points.

 - [Facebook Velox](https://github.com/facebookincubator/velox)
   is an execution engine. Like DataFusion, Velox aims to
   provide a reusable foundation for building database-like systems. Unlike DataFusion,
   it is written in C/C++ and does not include a SQL frontend or planning / optimization
   framework.

 - [Databend](https://github.com/datafuselabs/databend) is a complete
   database system. Like DataFusion it is also written in Rust and
   utilizes the Apache Arrow memory model, but unlike DataFusion it
   targets end-users rather than developers of other database systems.
	<!---
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Frequently Asked Questions

	## What is the relationship between Apache Arrow, DataFusion, and Ballista?

	Apache Arrow is a library which provides a standardized memory representation for columnar data. It also provides
	"kernels" for performing common operations on this data.

	DataFusion is a library for executing queries in-process using the Apache Arrow memory
	model and computational kernels. It is designed to run within a single process, using threads
	for parallel query execution.

	[Ballista](https://github.com/apache/datafusion-ballista) is a distributed compute platform built on DataFusion.

	# How does DataFusion Compare with `XYZ`?

	When compared to similar systems, DataFusion typically is:

	1. Targeted at developers, rather than end users / data scientists.
	2. Designed to be embedded, rather than a complete file based SQL system.
	3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
	4. Implemented in `Rust`, rather than `C/C++`

	Here is a comparison with similar projects that may help understand
	when DataFusion might be suitable or unsuitable for your needs:

	- [DuckDB](https://www.duckdb.org) is an open source, in process analytic database.
	Like DataFusion, it supports very fast execution, both from its custom file format
	and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
	is primarily used directly by users as a serverless database and query system rather
	than as a library for building such database systems.

	- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
	libraries at the time of writing. Like DataFusion, it is also
	written in Rust and uses the Apache Arrow memory model, but unlike
	DataFusion it is not designed with as many extension points.

	- [Facebook Velox](https://github.com/facebookincubator/velox)
	is an execution engine. Like DataFusion, Velox aims to
	provide a reusable foundation for building database-like systems. Unlike DataFusion,
	it is written in C/C++ and does not include a SQL frontend or planning / optimization
	framework.

	- [Databend](https://github.com/datafuselabs/databend) is a complete
	database system. Like DataFusion it is also written in Rust and
	utilizes the Apache Arrow memory model, but unlike DataFusion it
	targets end-users rather than developers of other database systems.