tree a9e4179dd2f0f2fef9a604566f9a64771939903f
parent 9f5bd72e908244b2fe915e8dc39f55afa94bbffa
author Stamatis Zampetakis <zabetak@gmail.com> 1619090414 +0200
committer GitHub <noreply@github.com> 1619090414 +0200
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsBcBAABCAAQBQJggVvuCRBK7hj4Ov3rIwAAwE8IAKdhyyTJ9yr6lva4fyLP7aJd
 /1lhHodjUQJBUjqFP+Kf/mosZz1uA1RrdaroO8Sy3KuSKwcn4bYiRDGYc1xSTsNL
 rRI6FbJFcl7bude2ELckCdvMLQy1E6edyuf9XV9hOVcKFcG/WfBV3e/nh3h1bS5p
 53cocj6n/DnK8Xgqh+3wtnc6o6IoEv9UAwRHZfw2UqI2hr2flYq16plh331M50I8
 c6nQ7EQOOTo20od7LYKiOFt7IzDHnvWeEBLhQt/zEyOsdl1lZHdG2NZZq848kMNV
 sM5BIAhaqovlJjpp33Phfqq480MKqF+wUmXF8DYlNgLG8hEHrp8T3javideuiNo=
 =QO50
 -----END PGP SIGNATURE-----
 

HIVE-24957 HIVE-24999: Inefficient & wrong CBO plans in the presence of subqueries (Stamatis Zampetakis, reviewed by Krisztian Kasa)

* HIVE-24999: HiveSubQueryRemoveRule generates invalid plan for IN subquery with correlations

1. Add workaround for CALCITE-4574 in HiveRelBuilder to avoid generating
invalid plans (filter with references to columns which do not exist).

2. Adapt HiveRelDecorrelator based on new plans generated by HiveSubQueryRemoveRule

2a. Remove workaround getNewForOldInputRef that was needed due to the
invalid plans.

2b. Adapt input references based on new the input operator (frame) inside
decorrelateInputWithValueGenerator method.

3c. Refactor DecorrelateRexShuttle#visitCall to improve readability and
cover a few more corner cases.

3. Add subquery_in_invalid_intermediate_plan.q with problematic plans
relevant for the case.

4. Add CBO explain plans in queries related to masking since there are
easier to read and compare with. There are few plan regressions that
will be fixed by HIVE-24957.

* HIVE-24957: Wrong results when subquery has COALESCE in correlation predicate

1. Add plan transformations before starting the core RelDecorrelator logic
to bring the plan into an equivalent but more convenient form that can be
decorrelated into more efficient and correct plans.

2. Adapt HiveRelDecorrelator#decorrelateInputWithValueGenerator to avoid
creating value generator for already satisfied correlations present in
the input.

3. Based on the changes above many plans with subqueries become more
efficient since the value generator is no longer necessary and it is dropped.

4. Add subquery_complex_correlation_predicates.q which includes queries
generating wrong results without the new transformations.

5. Add CBO plans in few queries since they are easier to read and reason
about correctness and efficiency.