| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # SELECT syntax |
| |
| The queries in DataFusion scan data from tables and return 0 or more rows. |
| Please be aware that column names in queries are made lower-case, but not on the inferred schema. Accordingly, if you |
| want to query against a capitalized field, make sure to use double quotes. Please see this |
| [example](https://datafusion.apache.org/user-guide/example-usage.html) for clarification. |
| In this documentation we describe the SQL syntax in DataFusion. |
| |
| DataFusion supports the following syntax for queries: |
| <code class="language-sql hljs"> |
| |
| [ [WITH](#with-clause) with_query [, ...] ] <br/> |
| [SELECT](#select-clause) [ ALL | DISTINCT ] select_expr [, ...] <br/> |
| [ [FROM](#from-clause) from_item [, ...] ] <br/> |
| [ [JOIN](#join-clause) join_item [, ...] ] <br/> |
| [ [WHERE](#where-clause) condition ] <br/> |
| [ [GROUP BY](#group-by-clause) grouping_element [, ...] ] <br/> |
| [ [HAVING](#having-clause) condition] <br/> |
| [ [QUALIFY](#qualify-clause) condition] <br/> |
| [ [UNION](#union-clause) [ ALL | select ] <br/> |
| [ [ORDER BY](#order-by-clause) expression [ ASC | DESC ][, ...] ] <br/> |
| [ [LIMIT](#limit-clause) count ] <br/> |
| [ [EXCLUDE | EXCEPT](#exclude-and-except-clause) ] <br/> |
| [Pipe operators](#pipe-operators) <br/> |
| |
| </code> |
| |
| ## WITH clause |
| |
| A with clause allows to give names for queries and reference them by name. |
| |
| ```sql |
| WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a) |
| SELECT a, b FROM x; |
| ``` |
| |
| ## SELECT clause |
| |
| Example: |
| |
| ```sql |
| SELECT a, b, a + b FROM table |
| ``` |
| |
| The `DISTINCT` quantifier can be added to make the query return all distinct rows. |
| By default `ALL` will be used, which returns all the rows. |
| |
| ```sql |
| SELECT DISTINCT person, age FROM employees |
| ``` |
| |
| ## FROM clause |
| |
| Example: |
| |
| ```sql |
| SELECT t.a FROM table AS t |
| ``` |
| |
| ## WHERE clause |
| |
| Example: |
| |
| ```sql |
| SELECT a FROM table WHERE a > 10 |
| ``` |
| |
| ## JOIN clause |
| |
| DataFusion supports `INNER JOIN`, `LEFT OUTER JOIN`, `RIGHT OUTER JOIN`, `FULL OUTER JOIN`, `NATURAL JOIN`, `CROSS JOIN`, `LEFT SEMI JOIN`, `RIGHT SEMI JOIN`, `LEFT ANTI JOIN`, and `RIGHT ANTI JOIN`. |
| |
| The following examples are based on this table: |
| |
| ```sql |
| select * from x; |
| +----------+----------+ |
| | column_1 | column_2 | |
| +----------+----------+ |
| | 1 | 2 | |
| +----------+----------+ |
| ``` |
| |
| ### INNER JOIN |
| |
| The keywords `JOIN` or `INNER JOIN` define a join that only shows rows where there is a match in both tables. |
| |
| ```sql |
| SELECT * FROM x INNER JOIN x y ON x.column_1 = y.column_1; |
| +----------+----------+----------+----------+ |
| | column_1 | column_2 | column_1 | column_2 | |
| +----------+----------+----------+----------+ |
| | 1 | 2 | 1 | 2 | |
| +----------+----------+----------+----------+ |
| ``` |
| |
| ### LEFT OUTER JOIN |
| |
| The keywords `LEFT JOIN` or `LEFT OUTER JOIN` define a join that includes all rows from the left table even if there |
| is not a match in the right table. When there is no match, null values are produced for the right side of the join. |
| |
| ```sql |
| SELECT * FROM x LEFT JOIN x y ON x.column_1 = y.column_2; |
| +----------+----------+----------+----------+ |
| | column_1 | column_2 | column_1 | column_2 | |
| +----------+----------+----------+----------+ |
| | 1 | 2 | | | |
| +----------+----------+----------+----------+ |
| ``` |
| |
| ### RIGHT OUTER JOIN |
| |
| The keywords `RIGHT JOIN` or `RIGHT OUTER JOIN` define a join that includes all rows from the right table even if there |
| is not a match in the left table. When there is no match, null values are produced for the left side of the join. |
| |
| ```sql |
| SELECT * FROM x RIGHT JOIN x y ON x.column_1 = y.column_2; |
| +----------+----------+----------+----------+ |
| | column_1 | column_2 | column_1 | column_2 | |
| +----------+----------+----------+----------+ |
| | | | 1 | 2 | |
| +----------+----------+----------+----------+ |
| ``` |
| |
| ### FULL OUTER JOIN |
| |
| The keywords `FULL JOIN` or `FULL OUTER JOIN` define a join that is effectively a union of a `LEFT OUTER JOIN` and |
| `RIGHT OUTER JOIN`. It will show all rows from the left and right side of the join and will produce null values on |
| either side of the join where there is not a match. |
| |
| ```sql |
| SELECT * FROM x FULL OUTER JOIN x y ON x.column_1 = y.column_2; |
| +----------+----------+----------+----------+ |
| | column_1 | column_2 | column_1 | column_2 | |
| +----------+----------+----------+----------+ |
| | 1 | 2 | | | |
| | | | 1 | 2 | |
| +----------+----------+----------+----------+ |
| ``` |
| |
| ### NATURAL JOIN |
| |
| A `NATURAL JOIN` defines an inner join based on common column names found between the input tables. When no common |
| column names are found, it behaves like a `CROSS JOIN`. |
| |
| ```sql |
| SELECT * FROM x NATURAL JOIN x y; |
| +----------+----------+ |
| | column_1 | column_2 | |
| +----------+----------+ |
| | 1 | 2 | |
| +----------+----------+ |
| ``` |
| |
| ### CROSS JOIN |
| |
| A `CROSS JOIN` produces a cartesian product that matches every row in the left side of the join with every row in the |
| right side of the join. |
| |
| ```sql |
| SELECT * FROM x CROSS JOIN x y; |
| +----------+----------+----------+----------+ |
| | column_1 | column_2 | column_1 | column_2 | |
| +----------+----------+----------+----------+ |
| | 1 | 2 | 1 | 2 | |
| +----------+----------+----------+----------+ |
| ``` |
| |
| ### LEFT SEMI JOIN |
| |
| The `LEFT SEMI JOIN` returns all rows from the left table that have at least one matching row in the right table, and |
| projects only the columns from the left table. |
| |
| ```sql |
| SELECT * FROM x LEFT SEMI JOIN x y ON x.column_1 = y.column_1; |
| +----------+----------+ |
| | column_1 | column_2 | |
| +----------+----------+ |
| | 1 | 2 | |
| +----------+----------+ |
| ``` |
| |
| ### RIGHT SEMI JOIN |
| |
| The `RIGHT SEMI JOIN` returns all rows from the right table that have at least one matching row in the left table, and |
| only projects the columns from the right table. |
| |
| ```sql |
| SELECT * FROM x RIGHT SEMI JOIN x y ON x.column_1 = y.column_1; |
| +----------+----------+ |
| | column_1 | column_2 | |
| +----------+----------+ |
| | 1 | 2 | |
| +----------+----------+ |
| ``` |
| |
| ### LEFT ANTI JOIN |
| |
| The `LEFT ANTI JOIN` returns all rows from the left table that do not have any matching row in the right table, projecting |
| only the left table’s columns. |
| |
| ```sql |
| SELECT * FROM x LEFT ANTI JOIN x y ON x.column_1 = y.column_1; |
| +----------+----------+ |
| | column_1 | column_2 | |
| +----------+----------+ |
| +----------+----------+ |
| ``` |
| |
| ### RIGHT ANTI JOIN |
| |
| The `RIGHT ANTI JOIN` returns all rows from the right table that do not have any matching row in the left table, projecting |
| only the right table’s columns. |
| |
| ```sql |
| SELECT * FROM x RIGHT ANTI JOIN x y ON x.column_1 = y.column_1; |
| +----------+----------+ |
| | column_1 | column_2 | |
| +----------+----------+ |
| +----------+----------+ |
| ``` |
| |
| ## GROUP BY clause |
| |
| Example: |
| |
| ```sql |
| SELECT a, b, MAX(c) FROM table GROUP BY a, b |
| ``` |
| |
| Some aggregation functions accept optional ordering requirement, such as `ARRAY_AGG`. If a requirement is given, |
| aggregation is calculated in the order of the requirement. |
| |
| Example: |
| |
| ```sql |
| SELECT a, b, ARRAY_AGG(c, ORDER BY d) FROM table GROUP BY a, b |
| ``` |
| |
| ## HAVING clause |
| |
| Example: |
| |
| ```sql |
| SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10 |
| ``` |
| |
| ## QUALIFY clause |
| |
| Example: |
| |
| ```sql |
| SELECT ROW_NUMBER() OVER (PARTITION BY region) AS rk FROM table QUALIFY rk > 1; |
| ``` |
| |
| ## UNION clause |
| |
| Example: |
| |
| ```sql |
| SELECT |
| a, |
| b, |
| c |
| FROM table1 |
| UNION ALL |
| SELECT |
| a, |
| b, |
| c |
| FROM table2 |
| ``` |
| |
| ## ORDER BY clause |
| |
| Orders the results by the referenced expression. By default it uses ascending order (`ASC`). |
| This order can be changed to descending by adding `DESC` after the order-by expressions. |
| |
| Examples: |
| |
| ```sql |
| SELECT age, person FROM table ORDER BY age; |
| SELECT age, person FROM table ORDER BY age DESC; |
| SELECT age, person FROM table ORDER BY age, person DESC; |
| ``` |
| |
| ## LIMIT clause |
| |
| Limits the number of rows to be a maximum of `count` rows. `count` should be a non-negative integer. |
| |
| Example: |
| |
| ```sql |
| SELECT age, person FROM table |
| LIMIT 10 |
| ``` |
| |
| ## EXCLUDE and EXCEPT clause |
| |
| Excluded named columns from query results. |
| |
| Example selecting all columns except for `age` and `person`: |
| |
| ```sql |
| SELECT * EXCEPT(age, person) |
| FROM table; |
| ``` |
| |
| ```sql |
| SELECT * EXCLUDE(age, person) |
| FROM table; |
| ``` |
| |
| ## Pipe operators |
| |
| Some SQL dialects (e.g. BigQuery) support the pipe operator `|>`. |
| The SQL dialect can be set like this: |
| |
| ```sql |
| set datafusion.sql_parser.dialect = 'BigQuery'; |
| ``` |
| |
| DataFusion currently supports the following pipe operators: |
| |
| - [WHERE](#pipe_where) |
| - [ORDER BY](#pipe_order_by) |
| - [LIMIT](#pipe_limit) |
| - [SELECT](#pipe_select) |
| - [EXTEND](#pipe_extend) |
| - [AS](#pipe_as) |
| - [UNION](#pipe_union) |
| - [INTERSECT](#pipe_intersect) |
| - [EXCEPT](#pipe_except) |
| - [AGGREGATE](#pipe_aggregate) |
| - [JOIN](#pipe_join) |
| |
| (pipe_where)= |
| |
| ### WHERE |
| |
| ```sql |
| select * from range(0,10) |
| |> where value < 2; |
| +-------+ |
| | value | |
| +-------+ |
| | 0 | |
| | 1 | |
| +-------+ |
| ``` |
| |
| (pipe_order_by)= |
| |
| ### ORDER BY |
| |
| ```sql |
| select * from range(0,3) |
| |> order by value desc; |
| +-------+ |
| | value | |
| +-------+ |
| | 2 | |
| | 1 | |
| | 0 | |
| +-------+ |
| ``` |
| |
| (pipe_limit)= |
| |
| ### LIMIT |
| |
| ```sql |
| select * from range(0,3) |
| |> order by value desc |
| |> limit 1; |
| +-------+ |
| | value | |
| +-------+ |
| | 2 | |
| +-------+ |
| ``` |
| |
| (pipe_select)= |
| |
| ### SELECT |
| |
| ```sql |
| select * from range(0,3) |
| |> select value + 10; |
| +---------------------------+ |
| | range().value + Int64(10) | |
| +---------------------------+ |
| | 10 | |
| | 11 | |
| | 12 | |
| +---------------------------+ |
| ``` |
| |
| (pipe_extend)= |
| |
| ### EXTEND |
| |
| ```sql |
| select * from range(0,3) |
| |> extend -value AS minus_value; |
| +-------+-------------+ |
| | value | minus_value | |
| +-------+-------------+ |
| | 0 | 0 | |
| | 1 | -1 | |
| | 2 | -2 | |
| +-------+-------------+ |
| ``` |
| |
| (pipe_as)= |
| |
| ### AS |
| |
| ```sql |
| select * from range(0,3) |
| |> as my_range |
| |> SELECT my_range.value; |
| +-------+ |
| | value | |
| +-------+ |
| | 0 | |
| | 1 | |
| | 2 | |
| +-------+ |
| ``` |
| |
| (pipe_union)= |
| |
| ### UNION |
| |
| ```sql |
| select * from range(0,3) |
| |> union all ( |
| select * from range(3,6) |
| ); |
| +-------+ |
| | value | |
| +-------+ |
| | 0 | |
| | 1 | |
| | 2 | |
| | 3 | |
| | 4 | |
| | 5 | |
| +-------+ |
| ``` |
| |
| (pipe_intersect)= |
| |
| ### INTERSECT |
| |
| ```sql |
| select * from range(0,100) |
| |> INTERSECT DISTINCT ( |
| select 3 |
| ); |
| +-------+ |
| | value | |
| +-------+ |
| | 3 | |
| +-------+ |
| ``` |
| |
| (pipe_except)= |
| |
| ### EXCEPT |
| |
| ```sql |
| select * from range(0,10) |
| |> EXCEPT DISTINCT (select * from range(5,10)); |
| +-------+ |
| | value | |
| +-------+ |
| | 0 | |
| | 1 | |
| | 2 | |
| | 3 | |
| | 4 | |
| +-------+ |
| ``` |
| |
| (pipe_aggregate)= |
| |
| ### AGGREGATE |
| |
| ```sql |
| select * from range(0,3) |
| |> aggregate sum(value) AS total; |
| +-------+ |
| | total | |
| +-------+ |
| | 3 | |
| +-------+ |
| ``` |
| |
| (pipe_join)= |
| |
| ### JOIN |
| |
| ```sql |
| ( |
| SELECT 'apples' AS item, 2 AS sales |
| UNION ALL |
| SELECT 'bananas' AS item, 5 AS sales |
| ) |
| |> AS produce_sales |
| |> LEFT JOIN |
| ( |
| SELECT 'apples' AS item, 123 AS id |
| ) AS produce_data |
| ON produce_sales.item = produce_data.item |
| |> SELECT produce_sales.item, sales, id; |
| +--------+-------+------+ |
| | item | sales | id | |
| +--------+-------+------+ |
| | apples | 2 | 123 | |
| | bananas| 5 | NULL | |
| +--------+-------+------+ |
| ``` |