The Distribute hint is used to control the shuffle method for joins.
[shuffle] or [broadcast], and should be written before the right table in the Join.Used in Combination with Ordered Hint
Fix the Join order to the textual sequence, and then specify the expected Distribute method for the Join. For example:
Before using:
mysql> explain shape plan select count(*) from t1 join t2 on t1.c1 = t2.c2; +----------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +----------------------------------------------------------------------------------+ | PhysicalResultSink | | --hashAgg[GLOBAL] | | ----PhysicalDistribute[DistributionSpecGather] | | ------hashAgg[LOCAL] | | --------PhysicalProject | | ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | | ------------PhysicalProject | | --------------PhysicalOlapScan[t1] | | ------------PhysicalDistribute[DistributionSpecHash] | | --------------PhysicalProject | | ----------------PhysicalOlapScan[t2] | +----------------------------------------------------------------------------------+
After using:
mysql> explain shape plan select /*+ ordered */ count(*) from t2 join[broadcast] t1 on t1.c1 = t2.c2; +----------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +----------------------------------------------------------------------------------+ | PhysicalResultSink | | --hashAgg[GLOBAL] | | ----PhysicalDistribute[DistributionSpecGather] | | ------hashAgg[LOCAL] | | --------PhysicalProject | | ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | | ------------PhysicalProject | | --------------PhysicalOlapScan[t2] | | ------------PhysicalDistribute[DistributionSpecReplicated] | | --------------PhysicalProject | | ----------------PhysicalOlapScan[t1] | | | | Hint log: | | Used: ORDERED | | UnUsed: | | SyntaxError: | +----------------------------------------------------------------------------------+
The Explain Shape Plan will display information related to the Distribute operator. Specifically:
DistributionSpecReplicated indicates that the corresponding data will be replicated to all BE nodes.DistributionSpecGather indicates that the data will be gathered to the FE node.DistributionSpecHash indicates that the data will be distributed to different BE nodes based on a specific hashKey and algorithm. Used in Combination with Leading HintWhen writing SQL queries, you can specify the corresponding DISTRIBUTE method for each JOIN operation while using the LEADING hint. Below is a specific example demonstrating how to mix Distribute Hint and Leading Hint in an SQL query.
explain shape plan select nation, o_year, sum(amount) as sum_profit from ( select /*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */ n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc;
The Distribute hint is a commonly used hint for controlling the join shuffle method, allowing manual specification of shuffle or broadcast distribution methods. Proper use of the Distribute hint can meet on-site tuning needs for join shuffle methods, increasing the flexibility of system control.