Workload Fusing is typically applied in scenarios where certain queries excessively consume resources, leading to a decline in the overall availability of the online cluster. In such cases, the cluster can be restored to normal operation by detecting and blocking the abnormal big queries during runtime.
If users have already grouped their workload based on query latency, such as assigning big queries to one Workload Group and small queries to another, the big query circuit breaking method can also be used to improve the availability of the small query group.
Currently, the big query circuit breaking strategy is mainly implemented through the Workload Policy feature, allowing users to configure specific policies to achieve automatic circuit breaking of big queries.
1FE,1BE(96 cores), test data is clickbench.
|User=root|Ctl=internal|Db=hits|State=EOF|ErrorCode=0|ErrorMessage=|Time(ms)=4614|ScanBytes=13107527680|ScanRows=81032736|ReturnRows=11|StmtId=526|QueryId=e5b6c62d624146e4-b7291221492a7cc2|IsQuery=true|isNereids=true|feIp=10.16.10.8|StmtType=SELECT|Stmt=SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25|CpuTimeMS=105631|ShuffleSendBytes=0|ShuffleSendRows=0|SqlHash=b03d48a7e6849912003ad1cff9519957|peakMemoryBytes=8741352477|SqlDigest=|cloudClusterName=UNKNOWN|TraceId=|WorkloadGroup=normal|FuzzyVariables=|scanBytesFromLocalStorage=0|scanBytesFromRemoteStorage=0
create workload policy cancel_1g_query conditions(query_be_memory_bytes > 1073741824) actions(cancel_query); mysql [information_schema]>select * from workload_policy; +-------+-----------------+------------------------------------+--------------+----------+---------+---------+----------------+ | ID | NAME | CONDITION | ACTION | PRIORITY | ENABLED | VERSION | WORKLOAD_GROUP | +-------+-----------------+------------------------------------+--------------+----------+---------+---------+----------------+ | 11313 | cancel_1g_query | query_be_memory_bytes > 1073741824 | cancel_query | 0 | 1 | 0 | | +-------+-----------------+------------------------------------+--------------+----------+---------+---------+----------------+ 1 row in set (0.02 sec)
mysql [information_schema]>SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits.hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25; ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]query ceb0553d359e454f-8939599b2e602d07 cancelled by workload policy: cancel_1g_query, id:11313
// disable other policy first.
alter workload policy cancel_1g_query properties('enabled'='false');
create workload policy canel_scan_5kw_query conditions(be_scan_rows > 50000000) actions(cancel_query);
mysql [hits]>SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits.hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25; ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]query f9d15001bfc94322-9b63a8b84aef9dee cancelled by workload policy: canel_scan_5kw_query, id:11314
It is important to note that be_scan_rows mainly tracks the amount of data scanned by a query on a single BE. In a production environment, the data for a table is usually distributed across multiple BEs, and the number of rows scanned by a query as seen in the audit is the sum from all BEs.
Therefore, when a query fails due to the SCAN data volume, the number of scanned rows in the audit is typically much larger than the value configured in the Policy.
// disable other policy first.
alter workload policy canel_scan_5kw_query properties('enabled'='false');
create workload policy cancel_time_1s_query conditions(query_time > 1000) actions(cancel_query);
mysql [hits]>SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits.hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25; ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]query c6aebdf403e24c62-965daf0bdff179f7 cancelled by workload policy: cancel_time_1s_query, id:11315