Bookie Client add quarantine ratio when error count exceed threshold

### Motivation
When bookie client read/write data from/to bookie servers, it will check the health of each connected server in sepecific interval. Once the amount of errors reached the threshold, the bookie server will be quarantined for server miniutes (configurated by `bookieQuarantineTimeSeconds`) by the bookie client.

In most circumstance, there are large amount of bookie clients connected to one bookie server, like pulsar broker. Once the bookie server runs in heavy load, most of bookie clients will receive errors and trigger quarantine in the same time, and then quarantine the server for several miniutes. After a few miniutes passed by, the quarantined server will be put back in the same time for most bookie clients, which will lead to periodic oscillation of in/out throughput of the server. It is the obstacle of tunning the throughput of the bookkeeper cluster.

### Changes
I introduce a quarantine probability to determine whether to quarantine the server for the client, avoiding quaraninte the heavy load server in the same time for most of bookie client.

I also expose the quarantine stats to prometheus.

Reviewers: Jia Zhai <zhaijia@apache.org>, Sijie Guo <None>

This closes #2327 from hangc0276/bookieClient_Quarantine_ratio
3 files changed
tree: 72b7681b38c9815eb60ee0a26b6d6048af36da27
  1. .github/
  2. .test-infra/
  3. bin/
  4. bookkeeper-benchmark/
  5. bookkeeper-common/
  6. bookkeeper-common-allocator/
  7. bookkeeper-dist/
  8. bookkeeper-http/
  9. bookkeeper-proto/
  10. bookkeeper-server/
  11. bookkeeper-stats/
  12. bookkeeper-stats-providers/
  13. buildtools/
  14. circe-checksum/
  15. conf/
  16. cpu-affinity/
  17. deploy/
  18. dev/
  19. docker/
  20. metadata-drivers/
  21. microbenchmarks/
  22. shaded/
  23. site/
  24. stats/
  25. stream/
  26. tests/
  27. tools/
  28. .gitignore
  29. LICENSE
  30. NOTICE
  31. pom.xml
  32. README.md
README.md

Build Status Build Status Coverage Status Maven Central

Apache BookKeeper

Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.

It is suitable for being used in following scenarios:

  • WAL (Write-Ahead-Logging), e.g. HDFS NameNode.
  • Message Store, e.g. Apache Pulsar.
  • Offset/Cursor Store, e.g. Apache Pulsar.
  • Object/Blob Store, e.g. storing state machine snapshots.

Get Started

  • Concepts: Start with the basic concepts of Apache BookKeeper. This will help you to fully understand the other parts of the documentation.
  • Getting Started to setup BookKeeper to write logs.

Documentation

Developers

You can also read Turning Ledgers into Logs to learn how to turn ledgers into continuous log streams. If you are looking for a high level log stream API, you can checkout DistributedLog.

Administrators

Contributors

Get In Touch

Report a Bug

For filing bugs, suggesting improvements, or requesting new features, help us out by opening a Github issue or opening an Apache jira.

Need Help?

Subscribe or mail the user@bookkeeper.apache.org list - Ask questions, find answers, and also help other users.

Subscribe or mail the dev@bookkeeper.apache.org list - Join development discussions, propose new ideas and connect with contributors.

Join us on Slack - This is the most immediate way to connect with Apache BookKeeper committers and contributors.

Contributing

We feel that a welcoming open community is important and welcome contributions.

Contributing Code

  1. See Developer Setup to get your local environment setup.

  2. Take a look at our open issues: JIRA Issues Github Issues.

  3. Review our coding style and follow our pull requests to learn about our conventions.

  4. Make your changes according to our contribution guide.

Improving Website and Documentation

  1. See Building the website and documentation on how to build the website and documentation.