blob: bba0b36ae5ba66261e21b67705935121b3093b2c [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# DataFusion Quickstart
You can easily query a DataFusion table with the Python API or with pure SQL.
Let's create a small DataFrame and then run some queries with both APIs.
Start by creating a DataFrame with four rows of data and two columns: `a` and `b`.
```python
from datafusion import SessionContext
ctx = SessionContext()
df = ctx.from_pydict({"a": [1, 2, 3, 1], "b": [4, 5, 6, 7]}, name="my_table")
```
Let's append a column to this DataFrame that adds columns `a` and `b` with the SQL API.
```
ctx.sql("select a, b, a + b as sum_a_b from my_table")
+---+---+---------+
| a | b | sum_a_b |
+---+---+---------+
| 1 | 4 | 5 |
| 2 | 5 | 7 |
| 3 | 6 | 9 |
| 1 | 7 | 8 |
+---+---+---------+
```
DataFusion makes it easy to run SQL queries on DataFrames.
Now let's run the same query with the DataFusion Python API:
```python
from datafusion import col
df.select(
col("a"),
col("b"),
col("a") + col("b"),
)
```
We get the same result as before:
```
+---+---+-------------------------+
| a | b | my_table.a + my_table.b |
+---+---+-------------------------+
| 1 | 4 | 5 |
| 2 | 5 | 7 |
| 3 | 6 | 9 |
| 1 | 7 | 8 |
+---+---+-------------------------+
```
DataFusion also allows you to query data with a well-designed Python interface.
Python users have two great ways to query DataFusion tables.