docs/query-acceleration/condition-cache.md - doris-website - Git at Google

 ---
 {
     "title": "Condition Cache",
     "language": "en",
     "description": "In large-scale analytical workloads, queries often include repeated filtering conditions (Conditions)"
 }
 ---


 # Condition Cache

 ## Introduction

 In large-scale analytical workloads, queries often include **repeated filtering conditions (Conditions)**, for example:

 ```
 SELECT * FROM orders WHERE region = 'ASIA';
 SELECT count(*) FROM orders WHERE region = 'ASIA';
 ```

 Such queries repeatedly execute the same filtering logic on identical data segments, leading to **redundant CPU and I/O overhead**.

 To address this, **Apache Doris introduces the Condition Cache mechanism**.
  It caches the filtering results of specific conditions on a given segment, allowing subsequent queries to **reuse those results directly**, thereby **reducing unnecessary scans and filtering operations** and significantly lowering query latency.

 ## Working Principle

 The core concept of the Condition Cache is:

 - **The same filtering condition produces the same result on the same data segment.**
 - Doris generates a **64-bit digest** from the combination of “condition expression + key range,” which serves as a unique cache identifier.
 - Each segment can then look up existing filtering results in the cache using this digest.

 Cached results are stored as compressed **bit vectors (`std::vector<bool>`)**:

 - **0** indicates that the row range does not meet the condition and can be skipped directly;
 - **1** indicates that the range may contain matching data and needs further scanning.

 Through this mechanism, Doris can quickly eliminate irrelevant data blocks at a coarse granularity, performing fine-grained filtering only when necessary.

 ## Applicable Scenarios

 Condition Cache is most effective in the following cases:

 - **Repeated conditions**: Identical or similar filter conditions are frequently used.
 - **Relatively stable data**: Data inside a segment is typically immutable (new segments are generated after INSERT/Compaction, naturally invalidating old caches).
 - **High selectivity**: When filters leave only a small subset of rows, it maximizes scan reduction.

 Condition Cache will **not** be used in the following situations:

 - Queries containing **delete predicates** (to ensure correctness, caching is disabled).
 - **TopN runtime filters** generated at runtime (currently unsupported).

 ## Configuration and Management

 ### Enable or Disable

 ```
 SET enable_condition_cache = true;
 ```

 ### Memory Management

 - Condition Cache uses an **LRU policy** for cache eviction.
 - When exceeding `condition_cache_limit`, the least recently used entries are automatically cleared.

 You can modify the memory limit in `be.conf`:

 ```
 condition_cache_limit = 1024  # Unit: MB
 ```

 - After segment compaction, old cache entries are naturally invalidated through LRU eviction.

 ## Cache Statistics

 Doris provides comprehensive metrics to help users monitor the effectiveness of Condition Cache:

 - **Profile-level metrics** (visible in query execution plans)
   - `ConditionCacheSegmentHit`: Number of segments that hit the cache
   - `ConditionCacheFilteredRows`: Number of rows skipped directly by cached results
 - **System metrics** (viewable via the monitoring system or `/metrics`)
   - `condition_cache_search_count`: Total cache lookup count
   - `condition_cache_hit_count`: Number of successful cache hits

 These metrics help evaluate the cache’s benefit and hit ratio.

 ## Usage Example

 ### Typical Scenario

 Consider the following query:

 ```
 SELECT order_id, amount
 FROM orders
 WHERE region = 'ASIA' AND order_date >= '2023-01-01';
 ```

 - **First execution**: The query performs a full scan and evaluates the filter; the Condition Cache stores the result in the LRU cache.
 - **Subsequent identical queries**: They reuse the cached results, skipping most irrelevant row ranges and scanning only potential matches.

 When multiple queries share the same filtering condition (e.g., `region = 'ASIA' AND order_date >= '2023-01-01'`), they can reuse each other’s Condition Cache entries, reducing overall workload.

 ## Notes

 - **Cache is not persistent**: The Condition Cache is cleared upon Doris restart.
 - **Delete operations disable caching**: Segments with delete markers require strict consistency and thus do not use the cache.

 ## Summary

 Condition Cache is an optimization mechanism in Doris designed for **repeated conditional queries**. Its advantages include:

 - Avoiding redundant computation and reducing CPU/I/O overhead
 - Automatically and transparently effective without user intervention
 - Lightweight in memory consumption and highly efficient when hit and filter rates are high

 By leveraging the Condition Cache effectively, users can achieve significantly faster response times in high-frequency OLAP query scenarios.
	---
	{
	"title": "Condition Cache",
	"language": "en",
	"description": "In large-scale analytical workloads, queries often include repeated filtering conditions (Conditions)"
	}
	---


	# Condition Cache

	## Introduction

	In large-scale analytical workloads, queries often include repeated filtering conditions (Conditions), for example:

	```
	SELECT * FROM orders WHERE region = 'ASIA';
	SELECT count(*) FROM orders WHERE region = 'ASIA';
	```

	Such queries repeatedly execute the same filtering logic on identical data segments, leading to redundant CPU and I/O overhead.

	To address this, Apache Doris introduces the Condition Cache mechanism.
	It caches the filtering results of specific conditions on a given segment, allowing subsequent queries to reuse those results directly, thereby reducing unnecessary scans and filtering operations and significantly lowering query latency.

	## Working Principle

	The core concept of the Condition Cache is:

	- The same filtering condition produces the same result on the same data segment.
	- Doris generates a 64-bit digest from the combination of “condition expression + key range,” which serves as a unique cache identifier.
	- Each segment can then look up existing filtering results in the cache using this digest.

	Cached results are stored as compressed bit vectors (`std::vector<bool>`):

	- 0 indicates that the row range does not meet the condition and can be skipped directly;
	- 1 indicates that the range may contain matching data and needs further scanning.

	Through this mechanism, Doris can quickly eliminate irrelevant data blocks at a coarse granularity, performing fine-grained filtering only when necessary.

	## Applicable Scenarios

	Condition Cache is most effective in the following cases:

	- Repeated conditions: Identical or similar filter conditions are frequently used.
	- Relatively stable data: Data inside a segment is typically immutable (new segments are generated after INSERT/Compaction, naturally invalidating old caches).
	- High selectivity: When filters leave only a small subset of rows, it maximizes scan reduction.

	Condition Cache will not be used in the following situations:

	- Queries containing delete predicates (to ensure correctness, caching is disabled).
	- TopN runtime filters generated at runtime (currently unsupported).

	## Configuration and Management

	### Enable or Disable

	```
	SET enable_condition_cache = true;
	```

	### Memory Management

	- Condition Cache uses an LRU policy for cache eviction.
	- When exceeding `condition_cache_limit`, the least recently used entries are automatically cleared.

	You can modify the memory limit in `be.conf`:

	```
	condition_cache_limit = 1024 # Unit: MB
	```

	- After segment compaction, old cache entries are naturally invalidated through LRU eviction.

	## Cache Statistics

	Doris provides comprehensive metrics to help users monitor the effectiveness of Condition Cache:

	- Profile-level metrics (visible in query execution plans)
	- `ConditionCacheSegmentHit`: Number of segments that hit the cache
	- `ConditionCacheFilteredRows`: Number of rows skipped directly by cached results
	- System metrics (viewable via the monitoring system or `/metrics`)
	- `condition_cache_search_count`: Total cache lookup count
	- `condition_cache_hit_count`: Number of successful cache hits

	These metrics help evaluate the cache’s benefit and hit ratio.

	## Usage Example

	### Typical Scenario

	Consider the following query:

	```
	SELECT order_id, amount
	FROM orders
	WHERE region = 'ASIA' AND order_date >= '2023-01-01';
	```

	- First execution: The query performs a full scan and evaluates the filter; the Condition Cache stores the result in the LRU cache.
	- Subsequent identical queries: They reuse the cached results, skipping most irrelevant row ranges and scanning only potential matches.

	When multiple queries share the same filtering condition (e.g., `region = 'ASIA' AND order_date >= '2023-01-01'`), they can reuse each other’s Condition Cache entries, reducing overall workload.

	## Notes

	- Cache is not persistent: The Condition Cache is cleared upon Doris restart.
	- Delete operations disable caching: Segments with delete markers require strict consistency and thus do not use the cache.

	## Summary

	Condition Cache is an optimization mechanism in Doris designed for repeated conditional queries. Its advantages include:

	- Avoiding redundant computation and reducing CPU/I/O overhead
	- Automatically and transparently effective without user intervention
	- Lightweight in memory consumption and highly efficient when hit and filter rates are high

	By leveraging the Condition Cache effectively, users can achieve significantly faster response times in high-frequency OLAP query scenarios.