The queries in DataFusion scan data from tables and return 0 or more rows. Please be aware that column names in queries are made lower-case, but not on the inferred schema. Accordingly, if you want to query against a capitalized field, make sure to use double quotes. Please see this example for clarification. In this documentation we describe the SQL syntax in DataFusion.
DataFusion supports the following syntax for queries:
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM from_item [, ...] ]
[ JOIN join_item [, ...] ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ UNION [ ALL | select ]
[ ORDER BY expression [ ASC | DESC ][, ...] ]
[ LIMIT count ]
A with clause allows to give names for queries and reference them by name.
WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a) SELECT a, b FROM x;
Example:
SELECT a, b, a + b FROM table
The DISTINCT quantifier can be added to make the query return all distinct rows. By default ALL will be used, which returns all the rows.
SELECT DISTINCT person, age FROM employees
Example:
SELECT t.a FROM table AS t
Example:
SELECT a FROM table WHERE a > 10
DataFusion supports INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and CROSS JOIN.
The following examples are based on this table:
select * from x; +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+
The keywords JOIN or INNER JOIN define a join that only shows rows where there is a match in both tables.
❯ select * from x inner join x y ON x.column_1 = y.column_1; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | 1 | 2 | +----------+----------+----------+----------+
The keywords LEFT JOIN or LEFT OUTER JOIN define a join that includes all rows from the left table even if there is not a match in the right table. When there is no match, null values are produced for the right side of the join.
❯ select * from x left join x y ON x.column_1 = y.column_2; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | | | +----------+----------+----------+----------+
The keywords RIGHT JOIN or RIGHT OUTER JOIN define a join that includes all rows from the right table even if there is not a match in the left table. When there is no match, null values are produced for the left side of the join.
❯ select * from x right join x y ON x.column_1 = y.column_2; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | | | 1 | 2 | +----------+----------+----------+----------+
The keywords FULL JOIN or FULL OUTER JOIN define a join that is effectively a union of a LEFT OUTER JOIN and RIGHT OUTER JOIN. It will show all rows from the left and right side of the join and will produce null values on either side of the join where there is not a match.
❯ select * from x full outer join x y ON x.column_1 = y.column_2; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | | | | | | 1 | 2 | +----------+----------+----------+----------+
A cross join produces a cartesian product that matches every row in the left side of the join with every row in the right side of the join.
❯ select * from x cross join x y; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | 1 | 2 | +----------+----------+----------+----------+
Example:
SELECT a, b, MAX(c) FROM table GROUP BY a, b
Example:
SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10
Example:
SELECT a, b, c FROM table1 UNION ALL SELECT a, b, c FROM table2
Orders the results by the referenced expression. By default it uses ascending order (ASC). This order can be changed to descending by adding DESC after the order-by expressions.
Examples:
SELECT age, person FROM table ORDER BY age; SELECT age, person FROM table ORDER BY age DESC; SELECT age, person FROM table ORDER BY age, person DESC;
Limits the number of rows to be a maximum of count rows. count should be a non-negative integer.
Example:
SELECT age, person FROM table LIMIT 10