ZkClient should not keep retrying getChildren() due to large number of children (#1109)

For ZkClient's getChildren() operation, if there are a large number of children and the response packet size exceeds jute.maxbuffer default value 4MB on zk client side, ZkClient will get a ConnectionLossException and keep retrying connecting to ZK. The consequence is, the infinite retry may cause heavy GC on ZK server and kill ZK server.

This commit implements a workaround to exit retry loop for getChildren() if a large number of children cause connection loss.
2 files changed
tree: ac277b85c0411e9522d36b1fce8d3a3b18f0ba10
  1. helix-admin-webapp/
  2. helix-agent/
  3. helix-core/
  4. helix-front/
  5. helix-rest/
  6. recipes/
  7. scripts/
  8. website/
  9. .gitignore
  10. build
  11. bump-up.command
  12. deploySite.sh
  13. helix-style-intellij.xml
  14. helix-style.xml
  15. hpost-review.sh
  16. LICENSE
  17. NOTICE
  18. pom.xml
  19. README.md
README.md

Apache Helix

Helix is part of the Apache Software Foundation.

Project page: http://helix.apache.org/

Mailing list: http://helix.apache.org/mail-lists.html

Build

mvn clean install package -DskipTests

WHAT IS HELIX

Helix is a generic cluster management framework used for automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features:

  1. Automatic assignment of resource/partition to nodes
  2. Node failure detection and recovery
  3. Dynamic addition of Resources
  4. Dynamic addition of nodes to the cluster
  5. Pluggable distributed state machine to manage the state of a resource via state transitions
  6. Automatic load balancing and throttling of transitions