Directly use Best Possible State to calculate DifferenceWithIdealStateGauge metrics instead of relying on the persisted IdealState. (#1697)

This PR aims to resolve the undesired dependencies between DifferenceWithIdealStateGauge metric and the PERSIST_XXXXXX_ASSIGNMENT configuration. Before the change, if the assignment is not persisted, then the metric will report incorrect data.
To fix the issue, the calculation must be moved from the ExternalViewComputeStage to the BestPossibleStateCalcStage. Because the Best Possible State is only available in the BestPossibleStateCalcStage assuming the PERSIST_XXXXXX_ASSIGNMENT option is not turned on. In addition to the main logic migration, other changes listed following are required to ensure multi-thread safty.
1. Concurrent control the EV cache since it will now be read in the BestPossibleStateCalcStage in addition to the ExternalViewComputeStage.
2. Minor changes in the diff computing logic for the corner cases since the Best Possible State state mapping is not exactly the same as IdealState persist assignment.
3. Cleanup PersistAssignmentStage logic so it won't modify the IdealState cache anymore. It is supposed to be read-only.
4. Test cases are modified to cover the changes. Also the metric is now reporting correct result in some corner cases such as all the nodes are disabled. The test cases which are testing based on the wrong behavior have been fixed.
7 files changed
tree: 8be88f72cd20a66d70483e2d659565023b7956fe
  1. .github/
  2. helix-admin-webapp/
  3. helix-agent/
  4. helix-common/
  5. helix-core/
  6. helix-front/
  7. helix-lock/
  8. helix-rest/
  9. metadata-store-directory-common/
  10. metrics-common/
  11. recipes/
  12. scripts/
  13. website/
  14. zookeeper-api/
  15. .gitignore
  16. build
  17. bump-up.command
  18. deploySite.sh
  19. helix-style-intellij.xml
  20. helix-style.xml
  21. hpost-review.sh
  22. LICENSE
  23. NOTICE
  24. pom.xml
  25. README.md
README.md

Apache Helix

Github Build Maven Central License

Helix Logo

Helix is part of the Apache Software Foundation.

Project page: http://helix.apache.org/

Mailing list: http://helix.apache.org/mail-lists.html

Build

mvn clean install -Dmaven.test.skip.exec=true

WHAT IS HELIX

Helix is a generic cluster management framework used for automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features:

  1. Automatic assignment of resource/partition to nodes
  2. Node failure detection and recovery
  3. Dynamic addition of Resources
  4. Dynamic addition of nodes to the cluster
  5. Pluggable distributed state machine to manage the state of a resource via state transitions
  6. Automatic load balancing and throttling of transitions