Common Table Expression (CTE) define a temporary result set that can be referenced multiple times within the scope of an SQL statement. CTEs are primarily used in SELECT statements.
To specify a CTE, use the WITH clause with one or more comma-separated clauses. Each clause provides a subquery that generates a result set and associates a name with the subquery.
Doris supports nested CTE. Within the statement that contains the WITH clause, you can reference each CTE name to access the corresponding CTE result set. CTE names can be referenced in other CTE, allowing you to define CTE based on other CTE.
Doris DOES NOT support recursive CTE. For more information, please read MySQL manual about recursive CTE
The following example defines CTE named cte1 and cte2 within the WITH clause and refers to them in the top-level SELECT below the WITH clause:
WITH cte1 AS (SELECT a, b FROM table1), cte2 AS (SELECT c, d FROM table2) SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c;
WITH cte1 AS (SELECT a, b FROM table1), cte2 AS (SELECT c, d FROM cte1) SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c;
A recursive CTE (Common Table Expression with the RECURSIVE keyword) is used to express self-referential queries within a single SQL statement, and is commonly applied in scenarios such as tree/hierarchy traversal, graph traversal, and hierarchical aggregation. A recursive CTE consists of two parts:
The anchor and recursive parts are typically connected by UNION or UNION ALL. Recursive execution continues until no new rows are generated or a system limit is reached.
WITH [RECURSIVE] cte_name [(col1, col2, ...)] AS ( <anchor_query> -- Non-recursive part (executed once) UNION [ALL] <recursive_query> -- Recursive part that can reference cte_name ) SELECT ... FROM cte_name;
Key Points:
RECURSIVE keyword allows the CTE definition to reference itself.cte_name can be referenced in the recursive_query, usually used in the form of a JOIN.Typical iterative execution flow:
anchor_query, write the results to the output set (Output), and use them as the work set (WorkSet) for the first iteration.recursive_query, execute the recursive_query, and obtain newRows.UNION ALL is used: Directly append newRows to the Output and set newRows as the WorkSet for the next iteration.UNION (deduplication) is used: Compute the difference set between newRows and the existing Output (to remove duplicates), and only add the non-duplicate rows to the Output and the next iteration's WorkSet.newRows is empty or a preset system upper limit is triggered (the Doris session variable cte_max_recursion_depth limits the recursion depth, with a default value of 100; exceeding this will throw an error).Termination occurs when no new rows are generated in the current iteration (or the system's maximum recursion depth limit is reached).
UNION ALL: Retains duplicates and has low execution overhead (no deduplication required). Suitable for scenarios where duplicates are allowed or controlled by business logic in the backend.UNION: Implicitly performs deduplication, which adds sorting/hash-based deduplication overhead per iteration or globally—this cost is significant, especially with large data volumes.Recommendation: Prefer UNION ALL if the semantics allow it and duplicates can be post-processed at the application layer.
CREATE TABLE tree ( id int, parent_id int, data varchar(100) ) DUPLICATE KEY (id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ('replication_num' = '1'); INSERT INTO tree VALUES (0, NULL, 'ROOT'), (1, 0, 'Child_1'), (2, 0, 'Child_2'), (3, 1, 'Child_1_1'); WITH RECURSIVE search_tree AS ( SELECT id, parent_id, data FROM tree t WHERE t.id = 0 UNION ALL SELECT t.id, t.parent_id, t.data FROM tree t, search_tree st WHERE t.parent_id = st.id ) SELECT * FROM search_tree ORDER BY id;
CREATE TABLE graph ( c_from int, c_to int, label varchar(100) ) DUPLICATE KEY (c_from) DISTRIBUTED BY HASH(c_from) BUCKETS 1 PROPERTIES 'replication_num' = '1'; INSERT INTO graph VALUES (1, 2, '1 -> 2'), (1, 3, '1 -> 3'), (2, 3, '2 -> 3'), (1, 4, '1 -> 4'), (4, 5, '4 -> 5'); WITH RECURSIVE search_graph AS ( SELECT c_from, c_to, label FROM graph g UNION ALL SELECT g.c_from, g.c_to, g.label FROM graph g, search_graph sg WHERE g.c_from = sg.c_to ) SELECT DISTINCT * FROM search_graph ORDER BY c_from, c_to;
Note: Using UNION performs deduplication in each iteration, resulting in high overhead.
cte_max_recursion_depth limits the maximum number of recursions to prevent infinite loops (default value: 100).SELECT clauses of the two parts are inconsistent.CAST or explicit column names if necessary.WHERE filter, adjust the system's maximum recursion depth, or correct the query logic if infinite recursion is inherent to the logic.