DataFusion supports EXISTS, NOT EXISTS, IN, NOT IN and Scalar Subqueries.
The examples below are based on the following table.
❯ select * from x; +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+
The EXISTS syntax can be used to find all rows in a relation where a correlated subquery produces one or more matches for that row. Only correlated subqueries are supported.
❯ select * from x y where exists (select * from x where x.column_1 = y.column_1); +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+ 1 row in set.
The NOT EXISTS syntax can be used to find all rows in a relation where a correlated subquery produces zero matches for that row. Only correlated subqueries are supported.
❯ select * from x y where not exists (select * from x where x.column_1 = y.column_1); 0 rows in set.
The IN syntax can be used to find all rows in a relation where a given expression's value can be found in the results of a correlated subquery.
❯ select * from x where column_1 in (select column_1 from x); +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+ 1 row in set.
The NOT IN syntax can be used to find all rows in a relation where a given expression's value can not be found in the results of a correlated subquery.
❯ select * from x where column_1 not in (select column_1 from x); 0 rows in set.
A scalar subquery can be used to produce a single value that can be used in many different contexts in a query. Here is an example of a filter using a scalar subquery. Only correlated subqueries are supported.
❯ select * from x y where column_1 < (select sum(column_2) from x where x.column_1 = y.column_1); +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+ 1 row in set.