commit	2f996dcf0159f945f7ec97ce7402e5d293009444	[log] [tgz]
author	Charan Reddy Guttapalem <reddycharan18@gmail.com>	Wed Oct 02 11:10:16 2019 -0700
committer	GitHub <noreply@github.com>	Wed Oct 02 11:10:16 2019 -0700
tree	8f1a0ec8aeebc4ae1736b786b2140fc4c66bbbff
parent	1e63d3c83e89d152cb603955e303b5377d8ab8e6 [diff]

Enhance deferLedgerLockReleaseOfFailedLedger in ReplicationWorker

Descriptions of the changes in this PR:

**Issue:** In the past, ReplicationWorker (RW) retrial logic is enhanced to backoff
replication after threshold number of replication failures of a ledger. This is
to help in a pathological situation where data (ledger/entry) is unavailable.
But this is sub-optimal solution, since there is possibility that each RW can
try recovering a ledger threshold number of times, before a RW defers
ledgerLockRelease. Also each time a RW tries to recover it would read entry/fragment
sequentially and writes to new bookies until it finds a missing entry (completely
unavailable) before failing on replication of that ledger. This is done for
each retrial and it bloats the storage and overreplication need to detect and
delete it, which runs once a day by default. So because of this cluster can
run out of storage space and may become RO cluster. Also this puts quite a bit of
load on cluster in vain.

**So the new proposal is to**
- On each RW. remember the state in addition to the counter. State must include the entries which RW failed to read.
- Counter and state must kept around in each RW node. And exponential backup should be used for deferLedgerLockReleaseOfFailedLedger
- During next attempt by RW, try to read the failed entries which is noted in the state. Read must be successful before proceeding replicating.
- With this model we avoid duplicate copies on each attempt. At the most each RW will create only one copy

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Venkateswararao Jujjuri (JV) <None>

This closes #2166 from reddycharan/enhancereplication

10 files changed

tree: 8f1a0ec8aeebc4ae1736b786b2140fc4c66bbbff

README.md

Apache BookKeeper

Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.

It is suitable for being used in following scenarios:

WAL (Write-Ahead-Logging), e.g. HDFS NameNode.
Message Store, e.g. Apache Pulsar.
Offset/Cursor Store, e.g. Apache Pulsar.
Object/Blob Store, e.g. storing state machine snapshots.

Get Started

Concepts: Start with the basic concepts of Apache BookKeeper. This will help you to fully understand the other parts of the documentation.
Getting Started to setup BookKeeper to write logs.

Documentation

Developers

You can also read Turning Ledgers into Logs to learn how to turn ledgers into continuous log streams. If you are looking for a high level log stream API, you can checkout DistributedLog.

Administrators

Contributors

BookKeeper Internals

Get In Touch

Report a Bug

For filing bugs, suggesting improvements, or requesting new features, help us out by opening a Github issue or opening an Apache jira.

Need Help?

Subscribe or mail the user@bookkeeper.apache.org list - Ask questions, find answers, and also help other users.

Subscribe or mail the dev@bookkeeper.apache.org list - Join development discussions, propose new ideas and connect with contributors.

Join us on Slack - This is the most immediate way to connect with Apache BookKeeper committers and contributors.

Contributing

We feel that a welcoming open community is important and welcome contributions.

Contributing Code

See Developer Setup to get your local environment setup.
Take a look at our open issues: JIRA Issues Github Issues.
Review our coding style and follow our pull requests to learn about our conventions.
Make your changes according to our contribution guide.

Improving Website and Documentation

See Building the website and documentation on how to build the website and documentation.