blob: 3d82506da3680dad513abed400ad67344511ae19 [file] [log] [blame]
Title: BookKeeper Metadata Management
Notice: Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may
obtain a copy of the License at "":
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS"
implied. See the License for the specific language governing permissions
and limitations under the License.
h1. Metadata Management
There are two kinds of metadata needs to be managed in BookKeeper: one is the __list of available bookies__, which is used to track server availability (ZooKeeper is designed naturally for this); while the other is __ledger metadata__, which could be handle by different kinds of key/value storages efficiently with __CAS (Compare And Set)__ semantics.
__Ledger metadata__ is handled by __LedgerManager__ and can be plugged with various storage mediums.
h2. Ledger Metadata Management
The operations on the metadata of a ledger are quite straightforward. They are:
* @createLedger@: create an new entry to store given ledger metadata. A unique id should be generated as the ledger id for the new ledger.
* @removeLedgerMetadata@: remove the entry of a ledger from metadata store. A __Version__ object is provided to do conditional remove. If given __Version__ object doesn't match current __Version__ in metadata store, __MetadataVersionException__ should be thrown to indicate version confliction. __NoSuchLedgerExistsException__ should be returned if the ledger metadata entry doesn't exists.
* @readLedgerMetadata@: read the metadata of a ledger from metadata store. The new __version__ should be set to the returned __LedgerMetadata__ object. __NoSuchLedgerExistsException__ should be returned if the entry of the ledger metadata doesn't exists.
* @writeLedgerMetadata@: update the metadata of a ledger matching the given __Version__. The update should be rejected and __MetadataVersionException__ should be returned whe then given __Version__ doesn't match the current __Version__ in metadata store. __NoSuchLedgerExistsException__ should be returned if the entry of the ledger metadata doesn't exists. The version of the __LedgerMetadata__ object should be set to the new __Version__ generated by applying this update.
* @asyncProcessLedgers@: loops through all existed ledgers in metadata store and applies a __Processor__. The __Processor__ provided is executed for each ledger. If a failure happens during iteration, the iteration should be teminated and __final callback__ triggered with failure. Otherwise, __final callback__ is triggered after all ledgers are processed. No ordering nor transactional guarantees need to be provided for in the implementation of this interface.
* @getLedgerRanges@: return a list of ranges for ledgers in the metadata store. The ledger metadata itself does not need to be fetched. Only the ledger ids are needed. No ordering is required, but there must be no overlap between ledger ranges and each ledger range must be contain all the ledgers in the metadata store between the defined endpoint (i.e. a ledger range [x, y], all ledger ids larger or equal to x and smaller or equal to y should exist only in this range). __getLedgerRanges__ is used in the __ScanAndCompare__ gc algorithm.
h1. How to choose a metadata storage medium for BookKeeper.
From the interface, several requirements need to met before choosing a metadata storage medium for BookKeeper:
* @Check and Set (CAS)@: The ability to do strict update according to specific conditional. Etc, a specific version (ZooKeeper) and same content (HBase).
* @Optimized for Writes@: The metadata access pattern for BookKeeper is read first and continuous updates.
* @Optimized for Scans@: Scans are required for a __ScanAndCompare__ gc algorithm.
__ZooKeeper__ is the default implemention for BookKeeper metadata management, __ZooKeeper__ holds data in memory and provides filesystem-like namespace and also meets all the above requirements. __ZooKeeper__ could meet most of usages for BookKeeper. However, if you application needs to manage millions of ledgers, a more scalable solution would be __HBase__, which also meet the above requirements, but it more complicated to set up.