In large-scale analytical workloads, queries often include repeated filtering conditions (Conditions), for example:
SELECT * FROM orders WHERE region = 'ASIA'; SELECT count(*) FROM orders WHERE region = 'ASIA';
Such queries repeatedly execute the same filtering logic on identical data segments, leading to redundant CPU and I/O overhead.
To address this, Apache Doris introduces the Condition Cache mechanism. It caches the filtering results of specific conditions on a given segment, allowing subsequent queries to reuse those results directly, thereby reducing unnecessary scans and filtering operations and significantly lowering query latency.
The core concept of the Condition Cache is:
Cached results are stored as compressed bit vectors (std::vector<bool>):
Through this mechanism, Doris can quickly eliminate irrelevant data blocks at a coarse granularity, performing fine-grained filtering only when necessary.
Condition Cache is most effective in the following cases:
Condition Cache will not be used in the following situations:
SET enable_condition_cache = true;
condition_cache_limit, the least recently used entries are automatically cleared.You can modify the memory limit in be.conf:
condition_cache_limit = 1024 # Unit: MB
Doris provides comprehensive metrics to help users monitor the effectiveness of Condition Cache:
ConditionCacheSegmentHit: Number of segments that hit the cacheConditionCacheFilteredRows: Number of rows skipped directly by cached results/metrics)condition_cache_search_count: Total cache lookup countcondition_cache_hit_count: Number of successful cache hitsThese metrics help evaluate the cache’s benefit and hit ratio.
Consider the following query:
SELECT order_id, amount FROM orders WHERE region = 'ASIA' AND order_date >= '2023-01-01';
When multiple queries share the same filtering condition (e.g., region = 'ASIA' AND order_date >= '2023-01-01'), they can reuse each other’s Condition Cache entries, reducing overall workload.
Condition Cache is an optimization mechanism in Doris designed for repeated conditional queries. Its advantages include:
By leveraging the Condition Cache effectively, users can achieve significantly faster response times in high-frequency OLAP query scenarios.