You can easily query a DataFusion table with the Python API or with pure SQL.
Let's create a small DataFrame and then run some queries with both APIs.
Start by creating a DataFrame with four rows of data and two columns: a and b.
from datafusion import SessionContext ctx = SessionContext() df = ctx.from_pydict({"a": [1, 2, 3, 1], "b": [4, 5, 6, 7]}, name="my_table")
Let's append a column to this DataFrame that adds columns a and b with the SQL API.
ctx.sql("select a, b, a + b as sum_a_b from my_table")
+---+---+---------+
| a | b | sum_a_b |
+---+---+---------+
| 1 | 4 | 5 |
| 2 | 5 | 7 |
| 3 | 6 | 9 |
| 1 | 7 | 8 |
+---+---+---------+
DataFusion makes it easy to run SQL queries on DataFrames.
Now let's run the same query with the DataFusion Python API:
from datafusion import col df.select( col("a"), col("b"), col("a") + col("b"), )
We get the same result as before:
+---+---+-------------------------+ | a | b | my_table.a + my_table.b | +---+---+-------------------------+ | 1 | 4 | 5 | | 2 | 5 | 7 | | 3 | 6 | 9 | | 1 | 7 | 8 | +---+---+-------------------------+
DataFusion also allows you to query data with a well-designed Python interface.
Python users have two great ways to query DataFusion tables.