Remove a potential deadlock when shutting down a RoutingTableProvider. (#1751)

This PR aims to avoid a potential deadlock that happens in a race condition when shutting down RoutingTableProvider.

The race condition happens when the RoutingTableProvider is shut down and the corresponding HelixManager is processing an onLiveInstanceChange event. Both threads require 2 locks, HelixManager lock and RoutingTableProvider._lastSeenSessions, in different orders.
To resolve this problem, this PR changes 2 logics.
1. Remove the _lastSeenSessions lock usage. So no more deadlock. The logic correctness is now ensured by the AtomicReference.getAndSet method.
2. Remove LiveInstanceChangeListener in the shutdown call before processing the other logics. This is a missing logic and should be done regardless of the deadlock problem.Moreover, this change also avoids a more complicated race condition introduced by the previous change that may lead to CurrentStates callback handler leakage.

Test cases in TestZkCallbackHandlerLeak has been updated to validate the logic changes.
2 files changed
tree: d2036ef849c182dc47650ed227c693222e9327db
  1. .github/
  2. helix-admin-webapp/
  3. helix-agent/
  4. helix-common/
  5. helix-core/
  6. helix-front/
  7. helix-lock/
  8. helix-rest/
  9. metadata-store-directory-common/
  10. metrics-common/
  11. recipes/
  12. scripts/
  13. website/
  14. zookeeper-api/
  15. .gitignore
  16. build
  17. bump-up.command
  18. deploySite.sh
  19. helix-style-intellij.xml
  20. helix-style.xml
  21. hpost-review.sh
  22. LICENSE
  23. NOTICE
  24. pom.xml
  25. README.md
README.md

Apache Helix

Github Build Maven Central License codecov.io

Helix Logo

Helix is part of the Apache Software Foundation.

Project page: http://helix.apache.org/

Mailing list: http://helix.apache.org/mail-lists.html

Build

mvn clean install -Dmaven.test.skip.exec=true

WHAT IS HELIX

Helix is a generic cluster management framework used for automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features:

  1. Automatic assignment of resource/partition to nodes
  2. Node failure detection and recovery
  3. Dynamic addition of Resources
  4. Dynamic addition of nodes to the cluster
  5. Pluggable distributed state machine to manage the state of a resource via state transitions
  6. Automatic load balancing and throttling of transitions