Doris employs cache slicing and prefetch mechanisms to optimize data cache management and read efficiency. Specifically, target files are sliced with 1MB alignment, and each slice is stored as a separate Block file in the local filesystem after complete download. This slicing approach effectively reduces cache granularity, improving cache flexibility and space utilization. Doris can cache only required portions of data, avoiding the space waste of caching entire large files. Smaller cache blocks also facilitate management and eviction, enabling more precise hotspot data access.
To better manage cached data, Doris adopts a specific local file directory structure. Caches may be distributed across multiple directories on multiple disks. To achieve uniform distribution across directories, Doris calculates a hash value from the target file path and uses this hash as the last-level directory for Block file storage. Each Block file is named based on its offset position in the target file.
For example, if the target file path is /remote/data/datafile1 with a hash value of 12345, the cached Block file might be stored at /cache/123/12345/offset1, where offset1 represents the block's offset position in the original file.
Doris' file cache uses a multi-queue mechanism to separate different data types, preventing cache pollution and improving hit rates. Cache data is categorized into the following types, each stored in separate queues prioritized by importance:
This multi-queue mechanism enables Doris to allocate cache space rationally based on different data characteristics and usage scenarios, maximizing cache resource utilization.
The cache eviction mechanism is crucial for file cache management, determining how to select data for eviction when space is limited. Doris' eviction mechanism includes the following triggers and selection strategies:
Eviction Triggers:
Eviction Target Selection:
Eviction Avoidance Recommendations:
Cache warm-up preloads data into cache to accelerate subsequent queries. Doris provides multiple warm-up approaches:
FINISHED, CANCELLED, RUNNING) via SHOW WARM UP JOB, including progress for running jobs. Repeated warm-ups for same tables/partitions won't redownload existing data, only performing incremental updates.WARM UP SQL. Instead of one-time execution, tasks periodically sync specified tables/partitions from one cluster to another incrementally.During queries, file cache reduces remote storage access and accelerates data retrieval:
During imports, file cache prepares data for subsequent queries:
Compaction optimizes storage and query performance by merging small files. Doris has two types:
Cache handling during compaction:
enable_file_cache_keep_base_compaction_output = true, but this may evict other hot data. Future versions plan adaptive strategies using historical query stats to determine cache insertion.Post-restart cache loading is critical for cache state recovery and quick query response. Pre-v3.1, unpreserved LRU information caused inconsistent queue ordering, affecting hit rates.
v3.1 introduces LRU persistence:
Scaling operations are common in cluster management. Doris handles file cache during scaling as follows:
curl http://BE_IP:WEB_PORT/api/file_cache?op=reset&capacity=123456 to notify BEs.reset operation. When new capacity is below cache size, eviction occurs per standard mechanisms.doris_fe_tablet_num metrics - when the curve stabilizes, warm-up completes.