commit	c8d4086697f85c9093f4da2c907a13e17c198914	[log] [tgz]
author	Raúl Gracia <raul.gracia@emc.com>	Wed Aug 25 14:05:34 2021 +0200
committer	GitHub <noreply@github.com>	Wed Aug 25 14:05:34 2021 +0200
tree	01db192464edddad17c6c8a01c2c2cf40ef09ad8
parent	393834b6131423c0b7c5d7baa54b1bd867afe72a [diff]

commit

c8d4086697f85c9093f4da2c907a13e17c198914

[log] [tgz]

author

Raúl Gracia <raul.gracia@emc.com>

Wed Aug 25 14:05:34 2021 +0200

committer

GitHub <noreply@github.com>

Wed Aug 25 14:05:34 2021 +0200

tree

01db192464edddad17c6c8a01c2c2cf40ef09ad8

parent

393834b6131423c0b7c5d7baa54b1bd867afe72a [diff]

ISSUE #2482: Added TCP_USER_TIMEOUT to Epoll channel config ### Motivation Added `TCP_USER_TIMEOUT` in Epoll channel config to limit the time a connection is left sending keepalives to a non-responding Bookie. ### Changes The original issue reported that in scenarios where Bookies may go down unexpectedly and change their IP (e.g., Kubernetes), the Bookkeeper client may be left for some time attempting to connect with the old IP of the restarted Bookie (see #2482 for details). To prevent this problem from happening (in Epoll channels), we introduce the following changes: - Epoll channels are now configured with `TCP_USER_TIMEOUT`. This parameter rules over the underlying TCP keepalive configuration (see https://datatracker.ietf.org/doc/html/rfc5482), which may be defaulted to retry for too long depending on the environment (e.g., 10-15 minutes in our experience). - To prevent adding more configuration parameters, the existing `clientConnectTimeoutMillis` value in `ClientConfiguration` is the one used to set `TCP_USER_TIMEOUT` due to its similarity. ### Validation We have reproduced the original testing environment in which this problem appears consistently: - Cluster with 4 Bookies and 3 Kubernetes nodes, in addition to https://pravega.io which uses the Bookkeeper client. - Deployed an application to do IO to Pravega (and therefore, to Bookkeeper). - Periodically shut down a Kubernetes node, so Bookkeeper pods on it are restarted as well. Considering this test procedure, without the proposed PR we consistently observe Bookkeeper clients getting stuck trying to contact with old IPs from Bookies. With this change, we confirmed via logs that the configuration change takes place and we have not been able to reproduce the original problem so far after performing multiple node reboots. Master Issue: #2482 Reviewers: Flavio Junqueira <fpj@apache.org>, Enrico Olivelli <eolivelli@apache.org> This closes #2761 from RaulGracia/issue-2482-close-idle-bookie-connection, closes #2482

tree: 01db192464edddad17c6c8a01c2c2cf40ef09ad8

README.md

Apache BookKeeper

Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.

It is suitable for being used in following scenarios:

WAL (Write-Ahead-Logging), e.g. HDFS NameNode.
Message Store, e.g. Apache Pulsar.
Offset/Cursor Store, e.g. Apache Pulsar.
Object/Blob Store, e.g. storing state machine snapshots.

Get Started

Checkout the project website.
Concepts: Start with the basic concepts of Apache BookKeeper. This will help you to fully understand the other parts of the documentation.
Follow the Install guide to setup BookKeeper.

Documentation

Please visit the Documentation from the project website for more information.

Get In Touch

Report a Bug

For filing bugs, suggesting improvements, or requesting new features, help us out by opening a Github issue or opening an Apache jira.

Need Help?

Subscribe or mail the user@bookkeeper.apache.org list - Ask questions, find answers, and also help other users.

Subscribe or mail the dev@bookkeeper.apache.org list - Join development discussions, propose new ideas and connect with contributors.

Join us on Slack - This is the most immediate way to connect with Apache BookKeeper committers and contributors.

Contributing

We feel that a welcoming open community is important and welcome contributions.

Contributing Code

See Developer Setup to get your local environment setup.
Take a look at our open issues: JIRA Issues Github Issues.
Review our coding style and follow our pull requests to learn about our conventions.
Make your changes according to our contribution guide.

Improving Website and Documentation

See Building the website and documentation on how to build the website and documentation.