Adjust the auto rebalancer state assignment logic to reduce top state transition. (#986)

The old state assignment logic assign the states to selected nodes according to the priority of the current replica state that is on the instance. Moreover, the sorting algorithm is designed to prioritize both current topstate and current secondary states equally. The result is that we will have premature mastership handoff to a current seconardy state host before the real desired master host is ready.
For example,
1. The current states are: [N1:M, N2:S, N3,S]
2. The desired states are: [N4:M, N2:S, N1:S]
3. Due to the sorting logic based on current states, we will have a transient preference list ordered like: [N2, N1, N4]. In which case, the controller will assign master to N2 before N4 has a slave state replica.
4. When N4 finishes the Offline to Slave transition, the same sorting logic will sort the preference list to be: [N4, N2, N1]. Then we have another mastership handoff.
To be clear, we don't want step 3. But only the state transition in step 4.

In this PR, we refactor the sorting logic so that it will only move the master whenever the candidate has a "ready" state replica, in which case, only one mastership handoff happens.
4 files changed
tree: ee85f4ba7476bcc7e7e6fcb518bbdd12a0f1783f
  1. helix-admin-webapp/
  2. helix-agent/
  3. helix-common/
  4. helix-core/
  5. helix-front/
  6. helix-lock/
  7. helix-rest/
  8. metadata-store-directory-common/
  9. metrics-common/
  10. recipes/
  11. scripts/
  12. website/
  13. zookeeper-api/
  14. .gitignore
  15. build
  16. bump-up.command
  17. deploySite.sh
  18. helix-style-intellij.xml
  19. helix-style.xml
  20. hpost-review.sh
  21. LICENSE
  22. NOTICE
  23. pom.xml
  24. README.md
README.md

Apache Helix

Helix is part of the Apache Software Foundation.

Project page: http://helix.apache.org/

Mailing list: http://helix.apache.org/mail-lists.html

Build

mvn clean install package -DskipTests

WHAT IS HELIX

Helix is a generic cluster management framework used for automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features:

  1. Automatic assignment of resource/partition to nodes
  2. Node failure detection and recovery
  3. Dynamic addition of Resources
  4. Dynamic addition of nodes to the cluster
  5. Pluggable distributed state machine to manage the state of a resource via state transitions
  6. Automatic load balancing and throttling of transitions