Doris supports analyzing workloads in the cluster through the Workload system tables, addressing the following issues:
All tables are in the databaseinformation_schema.
active_queries records quereis in FE:
A query is typically divided into multiple fragments and executed on multiple BEs. The backend_active_tasks represents the total amount of CPU and memory resources used by a query on a single BE. If the query has multiple concurrent fragments on a single BE, the data will be aggregated into a single row. Detailed information about the fields is as follows:
workload_group_resource_usagetable provides real-time information on the current resource usage of Workload Groups. The field descriptions are as follows:
When monitoring indicates a decrease in cluster availability, you can follow these steps to address the issue:
:::tip Note that the active_queries table records queries running on the FE, while the backend_active_tasks table records queries running on the BE. Not all queries are registered with the FE during execution; for example, stream loads are not registered with the FE. Therefore, it is normal to get no matching results when performing a LEFT JOIN between backend_active_tasks and active_queries.
When a query is a SELECT query, the queryId recorded in both active_queries and backend_active_tasks is the same. When a query is a stream load, the queryId in the active_queries table is empty, while the queryId in backend_active_tasks is the ID of the stream load. :::
select be_id,workload_group_id,memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second from workload_group_resource_usage order by memory_usage_bytes,cpu_usage_percent,local_scan_bytes_per_second desc
Cpu usage TopN Sql.
select
t1.query_id as be_query_id,
t1.query_type,
t2.query_id,
t2.workload_group_id,
t2.`database`,
t1.cpu_time,
t2.`sql`
from
(select query_id, query_type,sum(task_cpu_time_ms) as cpu_time from backend_active_tasks group by query_id, query_type)
t1 left join active_queries t2
on t1.query_id = t2.query_id
order by cpu_time desc limit 10;
Memory usage TopN Sql.
select
t1.query_id as be_query_id,
t1.query_type,
t2.query_id,
t2.workload_group_id,
t1.mem_used
from
(select query_id, query_type, sum(current_used_memory_bytes) as mem_used from backend_active_tasks group by query_id, query_type)
t1 left join active_queries t2
on t1.query_id = t2.query_id
order by mem_used desc limit 10;
Scan bytes/rows TopN Sql.
select
t1.query_id as be_query_id,
t1.query_type,
t2.query_id,
t2.workload_group_id,
t1.scan_rows,
t1.scan_bytes
from
(select query_id, query_type, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id,query_type)
t1 left join active_queries t2
on t1.query_id = t2.query_id
order by scan_rows desc,scan_bytes desc limit 10;
Show workload group's scan rows/bytes.
select
t2.workload_group_id,
sum(t1.scan_rows) as wg_scan_rows,
sum(t1.scan_bytes) as wg_scan_bytes
from
(select query_id, sum(scan_rows) as scan_rows,sum(scan_bytes) as scan_bytes from backend_active_tasks group by query_id)
t1 left join active_queries t2
on t1.query_id = t2.query_id
group by t2.workload_group_id
order by wg_scan_rows desc,wg_scan_bytes desc
Show workload group's query queue details.
select
workload_group_id,
query_id,
query_status,
now() - queue_start_time as queued_time
from
active_queries
where query_status='queued'
order by workload_group_id