tree 9ae71059cd8ef6656bd559e3f7344b86de9f660b
parent a6632ffa16f6907eba96e745920d571924bf4b63
author fred-db <fredrik.klauss@databricks.com> 1715364139 -0700
committer Dongjoon Hyun <dhyun@apple.com> 1715364139 -0700

[SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints

### What changes were proposed in this pull request?

* Currently, `canPlanAsBroadcastHashJoin` incorrectly returns that a join can be planned as a BHJ, even though the join contains a SHJ.
* To fix this, add some logic that checks whether the join contains a SHJ hint before checking if the join can be broadcasted.
* Also made a small refactor to the `JoinSelectionHelperSuite` to make it a bit more readable.

### Why are the changes needed?

* `canPlanAsBroadcastHashJoin` should be in sync with the join selection in `SparkStrategies`. Currently, it is not in sync.

### Does this PR introduce _any_ user-facing change?

Yes, semi / anti joins that could not have been planned as broadcasts would now not be pushed through aggregates anymore. Generally, this would be a performance improvement.

### How was this patch tested?

* Added UTs to check that a join with a SHJ hint is not marked as being planned as a BHJ.
* Added tests to keep `canPlanAsBroadcastHashJoin` and the `JoinSelection` codepath in sync.

### Was this patch authored or co-authored using generative AI tooling?

* No

Closes #46401 from fred-db/fix-hint.

Authored-by: fred-db <fredrik.klauss@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
