In analytical workloads, the same aggregation query is often executed repeatedly on data that has not changed, for example:
SELECT region, SUM(revenue) FROM orders WHERE dt = '2024-01-01' GROUP BY region; SELECT region, SUM(revenue) FROM orders WHERE dt = '2024-01-01' GROUP BY region;
Each execution re-scans identical tablets and re-computes identical aggregation results, wasting CPU and I/O resources.
To address this, Apache Doris provides a Query Cache mechanism. It caches the intermediate aggregation results produced inside the pipeline execution engine and serves them directly to subsequent queries that share the same execution context, significantly reducing query latency.
:::caution Important Limitations
Query Cache is designed for aggregation queries. Specifically, only fragments whose plan tree matches one of the following patterns are eligible:
AggregationNode → OlapScanNode (single-phase aggregation directly on a scan)AggregationNode → AggregationNode → OlapScanNode (two-phase aggregation on a scan)Intermediate nodes such as FilterNode and ProjectNode are allowed between the aggregation and scan nodes. However, the plan tree must not contain JoinNode, SortNode, UnionNode, WindowNode, or ExchangeNode within the cache-eligible subtree.
The cache key is composed of three parts:
SQL Digest — A SHA-256 hash computed from the normalized plan tree (aggregation functions, grouping expressions, non-partition filter predicates, projections, and result-affecting session variables). The normalization process assigns canonical IDs to all internal identifiers, so two semantically identical queries produce the same digest even if they have different internal plan node / slot IDs.
Tablet IDs — The sorted list of tablet IDs assigned to the current pipeline instance.
Tablet Range — The effective scan range for each tablet, derived from partition predicates (see Partition and Filter Behavior).
A cache entry becomes invalid when any of the following occurs:
query_cache_force_refresh = true, cached results are ignored and the query re-executes.First execution (cache miss):
Subsequent execution (cache hit):
SELECT a, b vs. SELECT b, a with the same digest), columns are reordered automatically.Understanding how partition predicates and filter expressions interact with Query Cache is essential for achieving good hit rates.
For tables with single-column RANGE partitioning, partition predicates receive special treatment:
Example:
Consider a table orders partitioned by dt with daily partitions:
-- Query A SELECT region, SUM(revenue) FROM orders WHERE dt >= '2024-01-01' AND dt < '2024-01-03' GROUP BY region; -- Query B SELECT region, SUM(revenue) FROM orders WHERE dt >= '2024-01-02' AND dt < '2024-01-04' GROUP BY region;
2024-01-01 and 2024-01-02.2024-01-02 and 2024-01-03.2024-01-02 have the same digest and the same tablet range, so Query B can reuse Query A's cache for the 2024-01-02 partition. Only partition 2024-01-03 needs to be computed fresh.For multi-column RANGE partitioning, LIST partitioning, or UNPARTITIONED tables, partition predicates cannot be extracted and are included directly in the digest. In this case, even minor differences in partition predicates produce different digests and cache misses.
Non-partition filter expressions (e.g., WHERE status = 'active') are included in the normalized plan digest. Two queries can share a cache entry only when their non-partition filter expressions are semantically identical after normalization.
WHERE status = 'active' and WHERE status = 'active' — same digest, cache hit.WHERE status = 'active' and WHERE status = 'inactive' — different digest, cache miss.WHERE status = 'active' AND region = 'ASIA' and WHERE region = 'ASIA' AND status = 'active' — the normalization process sorts conjuncts, so they produce the same digest and can hit the cache.Session variables that affect query results (such as time_zone, sql_mode, sql_select_limit, etc.) are included in the digest. Changing any of these variables between queries produces a different cache key and causes a cache miss.
The following conditions cause the planner to skip Query Cache entirely for a fragment:
| Condition | Reason |
|---|---|
| Fragment is a target of runtime filters | Runtime filter values are dynamic and unknown at plan time; caching would produce incorrect results |
Non-deterministic expressions (rand(), now(), uuid(), UDFs, etc.) | Results vary across executions even with identical input |
| Plan contains JOIN, SORT, UNION, or WINDOW nodes in the cache subtree | Only aggregation-over-scan patterns are supported |
Scan node is not OlapScanNode (e.g., external table scan) | Cache depends on tablet IDs and versions, which do not exist for external tables |
Query Cache relies on three properties unique to internal OLAP tables:
Tablet-based data organization — The cache key includes tablet IDs and per-tablet scan ranges. External tables store data in external systems (HDFS, S3, JDBC, etc.) and have no tablet concept.
Version-based invalidation — Each internal tablet has a monotonically increasing version number that changes on data modification. The cache uses this version to detect staleness. External tables do not expose such versioning to Doris.
OlapScanNode requirement — The plan normalization logic only recognizes OlapScanNode as a valid scan node beneath the aggregation cache point. External table scan nodes are not recognized.
For caching needs on external tables, consider using SQL Cache instead.
| Parameter | Description | Default |
|---|---|---|
enable_query_cache | Master switch to enable or disable Query Cache | false |
query_cache_force_refresh | When true, ignores cached results and re-executes the query; the new result is still written to cache | false |
query_cache_entry_max_bytes | Maximum size (in bytes) of a single cache entry. If the aggregation result exceeds this limit, caching is abandoned for that fragment | 5242880 (5 MB) |
query_cache_entry_max_rows | Maximum number of rows for a single cache entry. If the aggregation result exceeds this limit, caching is abandoned for that fragment | 500000 |
| Parameter | Description | Default |
|---|---|---|
query_cache_size | Total memory capacity of the Query Cache on each BE, in MB | 512 |
:::note The parameters query_cache_max_size_mb and query_cache_elasticity_size_mb in be.conf control the older SQL Result Cache, not the pipeline-level Query Cache described here. Do not confuse the two. :::
SET enable_query_cache = true;
-- First execution: cache miss, results are computed and cached SELECT region, SUM(revenue), COUNT(*) FROM orders WHERE dt = '2024-01-15' AND status = 'completed' GROUP BY region; -- Second execution: cache hit, results are served directly from cache SELECT region, SUM(revenue), COUNT(*) FROM orders WHERE dt = '2024-01-15' AND status = 'completed' GROUP BY region;
After executing a query, examine the query profile. Look for the CacheSourceOperator section:
HitCache: true — The query served results from the cache.HitCache: false, InsertCache: true — The query missed the cache but successfully inserted results.HitCache: false, InsertCache: false — The query missed the cache and the result was too large to cache.The profile also shows CacheTabletId to indicate which tablets were involved.
-- Force the next query to bypass cache and re-compute results SET query_cache_force_refresh = true; SELECT region, SUM(revenue) FROM orders WHERE dt = '2024-01-15' GROUP BY region; -- Reset SET query_cache_force_refresh = false;
Query Cache is most effective in the following cases:
Query Cache is not suitable for:
now(), rand(), uuid(), and UDFs disable caching.query_cache_size as needed.Query Cache is a pipeline-level optimization mechanism in Doris that caches intermediate aggregation results per tablet. Its key characteristics: