fixed a bug at WriteLock caused by read-delete race on a znode.

Bug description:
T1 currently owns a zk lock as signified by znode n1, T2 creates a znode n2
and realizes n1 is saller. T2 is going to register a watcher on n1 but at the
same moment T1 released n1. T2 register fails, breaks from while loop, and wait().
Nobody will ever wake up T2 again. Consequently all subsequent callers for the
same lock are also blocked.

Test:
Repeated our loadtest and the bug doesn't reappear.

For detailed bug report see this post:
http://mail-archives.apache.org/mod_mbox/helix-dev/201605.mbox/%3CCAB-bdySG8Uf6c1fyVHpSu-5pD99VHE=mrL=j3QNkaTWaEtKQ+w@mail.gmail.com%3E
1 file changed
tree: 4fee069e4fe6ef7d5c8848ca96e3f1cbb597bbb4
  1. helix-admin-webapp/
  2. helix-agent/
  3. helix-core/
  4. helix-examples/
  5. helix-ipc/
  6. helix-provisioning/
  7. recipes/
  8. website/
  9. .gitignore
  10. build
  11. bump-up.command
  12. helix-style.xml
  13. hpost-review.sh
  14. LICENSE
  15. NOTICE
  16. pom.xml
  17. README.md
README.md

Apache Helix

Helix is part of the Apache Software Foundation.

Documentation: http://helix.apache.org/

Mailing list: http://helix.apache.org/mail-lists.html

Build

mvn clean install package -DskipTests

What is Helix?

Helix is a generic cluster management framework used for automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix provides the following features:

  1. Automatic assignment of resource/partition to nodes
  2. Node failure detection and recovery
  3. Dynamic addition of Resources
  4. Dynamic addition of nodes to the cluster
  5. Pluggable distributed state machine to manage the state of a resource via state transitions
  6. Automatic load balancing and throttling of transitions

Building the Website

To deploy the web site to Apache infrastructure: sh website/deploySite.sh -Dusername=uid -Dpassword=pwd (-DskipTests if you don't want to run units tests) uid is your asf id, pwd is the password