The Workload Group solves the problem of isolation between different workload, but it cannot solve negative impact of large queries on stability within the same Group. When users encounter large queries that affect cluster stability, they can only manually handle them.
Therefore, Doris has implemented Workload Policy, which supports the automation of query load management, such as automatically canceling queries with running time exceeding 5 seconds.
Provide an example of Workload Policy
create workload policy test_cancel_policy
conditions(query_time > 1000)
actions(cancel_query)
properties('enabled'='true');
Workload Policy mainly includes the following concepts:
set_session_variable).The meaning of the policy in the above example is to create a policy named test_cancel_policy, which will cancel queries in the cluster that have been running for more than 1 second, and it's enabled. CREATE POLICY needs admin_priv.
Due to the fact that some actions can only take effect in FE and others can only take effect in BE, so the policy also needs to be distinguished between the policy of FE and the policy of BE.
set_session_variable, which means that a policy can execute multiple statements that modify the session variable.The policy used in FE is mainly used to modify the session variable, and currently does not support the set global.
Condition
Action
At present, BE's policy is mainly used for managing BE workload, such as canceling a query when the scan data volume is too large or the query time is too long.
Conditions and actions of the same policy are either used in FE or used in BE at the same time, for example, set_session_variable and cancel_query cannot be configured into the same policy. Condition be_scan_rows and condition username cannot be configured into the same policy.
Due to the current policy being executed by asynchronous threads at fixed time intervals, there is a certain lag in the effectiveness of the policy. For example, if a user has configured a strategy of canceling queries when the number of scan rows exceeds one million, and the cluster resources are relatively idle at this time, it is possible that the query may have ended before the cancel policy takes effect. At present, the time interval is 500ms, which means that queries with too short running time may bypass policy checks.
The currently supported workload types include select/insert select/stream load/broker load/route load.
A query may match multiple policies, but only the policy with the highest priority will take effect.
At present, it does not support modifying actions and conditions, and can only be modified by deleting and creating new ones.
create workload policy test_set_var_policy conditions(username='admin') actions(set_session_variable 'workload_group=normal')
create workload policy test_cancel_query conditions(be_scan_rows > 1000) actions(cancel_query)
create workload policy test_cancel_big_query conditions(query_time > 1000, be_scan_bytes > 5368709120) actions(cancel_query)
alter workload policy test_cancel_big_query properties('workload_group'='normal');
mysql [information_schema]>select * from workload_policy; +-------+-----------------------+----------------------------------------------+--------------+----------+---------+---------+----------------+ | ID | NAME | CONDITION | ACTION | PRIORITY | ENABLED | VERSION | WORKLOAD_GROUP | +-------+-----------------------+----------------------------------------------+--------------+----------+---------+---------+----------------+ | 35025 | test_cancel_big_query | query_time > 1000;be_scan_bytes > 5368709120 | cancel_query | 0 | 1 | 1 | normal | +-------+-----------------------+----------------------------------------------+--------------+----------+---------+---------+----------------+ 1 row in set (0.03 sec)
drop workload policy test_cancel_3s_query;
Attempt to modify concurrency related parameters in the session variable of the admin user.
// show variable parallel_fragment_exec_instance_num of admin user. mysql [(none)]>show variables like '%parallel_fragment_exec_instance_num%'; +-------------------------------------+-------+---------------+---------+ | Variable_name | Value | Default_Value | Changed | +-------------------------------------+-------+---------------+---------+ | parallel_fragment_exec_instance_num | 8 | 8 | 0 | +-------------------------------------+-------+---------------+---------+ 1 row in set (0.00 sec) // create a policy which reset session variable. create workload policy test_set_var_policy conditions(username='admin') actions(set_session_variable 'parallel_fragment_exec_instance_num=1') // After a while, check the session variable of the admin. mysql [(none)]>show variables like '%parallel_fragment_exec_instance_num%'; +-------------------------------------+-------+---------------+---------+ | Variable_name | Value | Default_Value | Changed | +-------------------------------------+-------+---------------+---------+ | parallel_fragment_exec_instance_num | 1 | 8 | 1 | +-------------------------------------+-------+---------------+---------+ 1 row in set (0.01 sec)
Test fusing queries that have run for more than 3 seconds. The following is the audit log of a successful execution of q29 in Clickbench. It can be seen that it takes 4.5 seconds for this SQL to run.
mysql [hits]>SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25; +-----------------------------------------------------------------------+------------------+----------+---------------------------------------------------------------------------------------------------------------------+ | k | l | c | min(Referer) | +-----------------------------------------------------------------------+------------------+----------+---------------------------------------------------------------------------------------------------------------------+ | 1 | 85.4611926713085 | 67259319 | http://%26ad%3D1%25EA%25D0%26utm_source=web&cd=19590&input_onlist/би-2 место будущей кондицин | | http:%2F%2Fwwww.regnancies/search&evL8gE&where=all&filmId=bEmYZc_WTDE | 69 | 207347 | http:%2F%2Fwwww.regnancies/search&evL8gE&where=all&filmId=bEmYZc_WTDE | | http://новострашная | 31 | 740277 | http://новострашная | | http://loveche.html?ctid | 24 | 144901 | http://loveche.html?ctid | | http://rukodeliveresult | 23 | 226135 | http://rukodeliveresult | | http://holodilnik.ru | 20 | 133893 | http://holodilnik.ru | | http://smeshariki.ru | 20 | 210736 | http://smeshariki.ru | | http:%2F%2Fviewtopic | 20 | 391115 | http:%2F%2Fviewtopic | | http:%2F%2Fwwww.ukr | 19 | 655178 | http:%2F%2Fwwww.ukr | | http:%2F%2FviewType | 19 | 148907 | http:%2F%2FviewType | | http://state=2008 | 17 | 139630 | http://state=2008 | +-----------------------------------------------------------------------+------------------+----------+---------------------------------------------------------------------------------------------------------------------+ 11 rows in set (4.50 sec)
Create a policy that cancels queries after running for more than 3 seconds
create workload policy test_cancel_3s_query conditions(query_time > 3000) actions(cancel_query)
Executing SQL again will result in a direct error message.
mysql [hits]>SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25; ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.8)[CANCELLED]query canceled by workload policy,id:12345