commit | faa0cd3fb38fb4e9ffb22969aed0f126f41964cb | [log] [tgz] |
---|---|---|
author | Jiajun Wang <jjwang@linkedin.com> | Wed May 26 11:29:21 2021 -0700 |
committer | GitHub <noreply@github.com> | Wed May 26 11:29:21 2021 -0700 |
tree | d2036ef849c182dc47650ed227c693222e9327db | |
parent | b12e0257f8d819ec3e7759c516f3173c729ab22f [diff] |
Remove a potential deadlock when shutting down a RoutingTableProvider. (#1751) This PR aims to avoid a potential deadlock that happens in a race condition when shutting down RoutingTableProvider. The race condition happens when the RoutingTableProvider is shut down and the corresponding HelixManager is processing an onLiveInstanceChange event. Both threads require 2 locks, HelixManager lock and RoutingTableProvider._lastSeenSessions, in different orders. To resolve this problem, this PR changes 2 logics. 1. Remove the _lastSeenSessions lock usage. So no more deadlock. The logic correctness is now ensured by the AtomicReference.getAndSet method. 2. Remove LiveInstanceChangeListener in the shutdown call before processing the other logics. This is a missing logic and should be done regardless of the deadlock problem.Moreover, this change also avoids a more complicated race condition introduced by the previous change that may lead to CurrentStates callback handler leakage. Test cases in TestZkCallbackHandlerLeak has been updated to validate the logic changes.
Helix is part of the Apache Software Foundation.
Project page: http://helix.apache.org/
Mailing list: http://helix.apache.org/mail-lists.html
mvn clean install -Dmaven.test.skip.exec=true
Helix is a generic cluster management framework used for automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features: