tree dca6c3ad7c9847e5b2141c78993683ff1698fc42
parent 01316ad1c9d3d20633ee429ef95f159d2778a747
author wzhou-code <wzhou@cloudera.com> 1598404473 -0700
committer Impala Public Jenkins <impala-public-jenkins@cloudera.com> 1600201910 +0000

IMPALA-9636: Don't run retried query on the blacklisted nodes

When a node is blacklisted, it is only placed on the blacklist for a
certain period of time. For the current implementation, it is possible
that the retried query could end up running on the node that it
blacklisted during its original attempt. To avoid same failure for
the retried query, we should not schedule query fragment instances on
the blacklisted nodes which caused the original query to fail.

This patch filters out the executors from executor group for those
nodes which are blacklisted during its original attempt when make
schedule for the retried query.
Adds new test cases test_retry_exec_rpc_failure_before_admin_delay()
and test_retry_query_failure_all_executors_blacklisted() for retried
queries which are triggered by RPC failure and blacklist timeout
are triggered by adding delay before admission.

Testing:
 - Passed test_query_retries.py, including the new test cases.
 - Passed core tests.

Change-Id: I00bc1b5026efbd0670ffbe57bcebc457d34cb105
Reviewed-on: http://gerrit.cloudera.org:8080/16369
Reviewed-by: Sahil Takiar <stakiar@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
