In OLAP systems, when performing ETL or large ad-hoc queries, a significant amount of data needs to be read. To speed up data analysis, Doris internally uses multithreading to scan multiple disk files in parallel, which generates a large amount of disk I/O and can negatively impact other queries, such as report analysis.
By using Workload Groups, you can group offline ETL data processing and online report queries separately and limit the I/O bandwidth for offline data processing, thereby reducing its impact on online report analysis.
1FE,1BE(96 cores), test data is clickbench
// clear OS cache. sync; echo 3 > /proc/sys/vm/drop_caches // disable BE's cache. disable_storage_page_cache = true
set dry_run_query = true; select * from hits.hits;
mysql [information_schema]>select LOCAL_SCAN_BYTES_PER_SECOND / 1024 / 1024 as mb_per_sec from workload_group_resource_usage where WORKLOAD_GROUP_ID=11201; +--------------------+ | mb_per_sec | +--------------------+ | 1146.6208400726318 | +--------------------+ 1 row in set (0.03 sec) mysql [information_schema]>select LOCAL_SCAN_BYTES_PER_SECOND / 1024 / 1024 as mb_per_sec from workload_group_resource_usage where WORKLOAD_GROUP_ID=11201; +--------------------+ | mb_per_sec | +--------------------+ | 3496.2762966156006 | +--------------------+ 1 row in set (0.04 sec) mysql [information_schema]>select LOCAL_SCAN_BYTES_PER_SECOND / 1024 / 1024 as mb_per_sec from workload_group_resource_usage where WORKLOAD_GROUP_ID=11201; +--------------------+ | mb_per_sec | +--------------------+ | 2192.7690029144287 | +--------------------+ 1 row in set (0.02 sec)
4.Show IO by pidstat, the first column in picture is process id, the second column is IO(kb/s), it's 2G/s.
// clear os cache sync; echo 3 > /proc/sys/vm/drop_caches // disable BE cache disable_storage_page_cache = true
alter workload group g2 properties('read_bytes_per_second'='104857600');
mysql [information_schema]>select LOCAL_SCAN_BYTES_PER_SECOND / 1024 / 1024 as mb_per_sec from workload_group_resource_usage where WORKLOAD_GROUP_ID=11201; +--------------------+ | mb_per_sec | +--------------------+ | 97.94296646118164 | +--------------------+ 1 row in set (0.03 sec) mysql [information_schema]>select LOCAL_SCAN_BYTES_PER_SECOND / 1024 / 1024 as mb_per_sec from workload_group_resource_usage where WORKLOAD_GROUP_ID=11201; +--------------------+ | mb_per_sec | +--------------------+ | 98.37584781646729 | +--------------------+ 1 row in set (0.04 sec) mysql [information_schema]>select LOCAL_SCAN_BYTES_PER_SECOND / 1024 / 1024 as mb_per_sec from workload_group_resource_usage where WORKLOAD_GROUP_ID=11201; +--------------------+ | mb_per_sec | +--------------------+ | 98.06641292572021 | +--------------------+ 1 row in set (0.02 sec)