The Doris query optimizer (Nereids) is a modern query optimizer built on the Cascades framework. It combines RBO (rule-based optimization) and CBO (cost-based optimization) to generate efficient execution plans for complex queries.
Before reading this article, you are recommended to first understand the following:
This article covers the following topics in order:
The current query optimizer faces three categories of challenges:
| Challenge | Specific manifestation |
|---|---|
| High query complexity | User queries are becoming increasingly complex, and query scenarios are becoming increasingly diverse |
| Strict real-time requirements | Users expect to obtain query results immediately |
| Fast iteration speed | The optimizer needs to quickly adapt to constantly emerging new requirements |
Based on this background, Doris launched the development of a brand-new query optimizer. Built on a modern optimizer architecture, this optimizer aims to handle query requests in Doris scenarios more efficiently and to provide a solid foundation for extending to more complex requirements in the future.
The Doris query optimizer has significant advantages over the legacy optimizer in three dimensions: smarter, more stable, and more flexible.
The CBO part is based on the advanced Cascades framework and fully uses the following three categories of information:
With this information, the optimizer can handle complex queries such as multi-table joins with ease.
Take subquery handling as an example: the new optimizer is based on the new data structure and avoids the situation in the legacy optimizer where many rules handle subqueries individually. This reduces the possibility of logical errors in optimization rules.
The optimizer architecture is well-designed and modern, making it convenient to extend optimization rules and processing stages. New features can be quickly added to meet ever-changing requirements.
The execution flow of the optimizer can be roughly divided into the following four steps:
| Step | Stage | Description |
|---|---|---|
| 1 | Syntax analysis | Convert the SQL text into an abstract syntax tree (AST). If the SQL is valid, continue; otherwise, report an error and terminate |
| 2 | Semantic analysis | Check the existence of tables, columns, and functions in the AST and whether their usage complies with syntax and semantic rules. If valid, continue; otherwise, report an error and terminate |
| 3 | Rewrite query plan (RBO) | Rewrite the query plan through predefined rules. Common techniques include column pruning, predicate pushdown, and partition pruning |
| 4 | Optimize query plan (CBO) | Enumerate the set of equivalent plans within the search space, evaluate the execution cost of each plan, and choose the plan with the lowest cost as the final execution plan |
The goals of each step are as follows:
| Item | Content |
|---|---|
| Purpose | Sets the maximum allowed time for query planning. When the planning time exceeds this value, planning is terminated and an error message is returned |
| Default value | 30s |
| Applicable scenario | When a query involves a large number of external tables, or when the query statement is particularly complex, increase this value appropriately to ensure that the query can proceed normally |
Design purpose: During the planning of a query statement, the system acquires read locks on all tables involved in the SQL. The main purposes of setting a timeout mechanism are:
Tuning suggestions:
SET nereids_timeout_second = <seconds>;.