blob: 68060c7a7928bf36b043147913c121bb9f96f4e7 [file] [log] [blame]
<!DOCTYPE html>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE- 2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Apache BookKeeper - BookKeeper Internals</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Bootstrap -->
<link href="/archives/css/bootstrap.min.css" rel="stylesheet">
<link href="/archives/css/bootstrap-responsive.min.css" rel="stylesheet">
<link href="/archives/css/styles.css" rel="stylesheet">
</head>
<body>
<header class="navbar navbar-inverse navbar-static-top" role="banner">
<div class="container">
<div class="navbar-header hidden-xs hidden-sm">
<a class="navbar-brand navbar-logo" href="/archives/"><img class="img-responsive" src="/archives/img/bookkeeper_blk40.png" alt="Bookkeeper Logo" /></a>
</div>
<div class="navbar-header">
<button class="navbar-toggle collapsed" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/archives/">Apache BookKeeper</a>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/archives/releases.html">Download</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Documentation<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="/archives/docs/master">Latest (master)</a></li>
<li><ul>
<li><a href="/archives/docs/master/apidocs">Java API docs</a></li>
<li><a href="/archives/docs/master/bookkeeperTutorial.html">Tutorial</a></li>
<li><a href="/archives/docs/master/bookkeeperConfig.html">Admin guide</a></li>
</ul><li>
<li><a href="/archives/docs/r4.4.0">Release 4.4.0</a></li>
<li class="divider"></li>
<li>Older releases</li>
<li><a href="/archives/docs/r4.3.2">Release 4.3.2</a></li>
<li><a href="/archives/docs/r4.3.1">Release 4.3.1</a></li>
<li><a href="/archives/docs/r4.3.0">Release 4.3.0</a></li>
<li><a href="/archives/docs/r4.2.4">Release 4.2.4</a></li>
<li><a href="/archives/docs/r4.2.3">Release 4.2.3</a></li>
<li><a href="/archives/docs/r4.2.2">Release 4.2.2</a></li>
<li><a href="/archives/docs/r4.2.1">Release 4.2.1</a></li>
<li><a href="/archives/docs/r4.2.0">Release 4.2.0</a></li>
<li><a href="/archives/docs/r4.1.0">Release 4.1.0</a></li>
<li><a href="/archives/docs/r4.0.0">Release 4.0.0</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Get Involved<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="/archives/lists.html">Mailing Lists</a></li>
<li><a href="/archives/irc.html">IRC</a></li>
<li><a href="/archives/svn.html">Version Control</a></li>
<li><a href="https://issues.apache.org/jira/browse/BOOKKEEPER">Issue Tracker</a></li>
</ul>
</li>
<li><a href="https://cwiki.apache.org/confluence/display/BOOKKEEPER/Index">Wiki</a></li>
<!--<li><a href="#">Hedwig</a></li>//-->
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Project Info<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="/archives/credits.html">Who are we?</a></li>
<li><a href="/archives/bylaws.html">Bylaws</a></li>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li class="divider"></li>
<li><a href="/archives/privacy.html">Privacy Policy</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsership</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
</ul>
</li>
</ul>
<script>
(function() {
var cx = '017580107654524087317:iqnsyimpydg';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
'//www.google.com/cse/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
<div class="navbar-form navbar-right visible-lg" id="googlebox">
<gcse:searchbox-only></gcse:searchbox-only>
</div>
</nav>
</div>
</header>
<div class="container">
<h2>Bookie Internals</h2>
<p>Bookie server stores its data in multiple ledger directories and its journal files in a journal directory. Ideally, storing journal files in a separate directory than data files would increase throughput and decrease latency</p>
<h3>The Bookie Journal</h3>
<p>Journal directory has one kind of file in it:</p>
<ul>
<li><code>{timestamp}.txn</code> - holds transactions executed in the bookie server.</li>
</ul>
<p>Before persisting ledger index and data to disk, a bookie ensures that the transaction that represents the update is written to a journal in non-volatile storage. A new journal file is created using current timestamp when a bookie starts or an old journal file reaches its maximum size.</p>
<p>A bookie supports journal rolling to remove old journal files. In order to remove old journal files safely, bookie server records LastLogMark in Ledger Device, which indicates all updates (including index and data) before LastLogMark has been persisted to the Ledger Device.</p>
<p>LastLogMark contains two parts:</p>
<ul>
<li><code>LastLogId</code> - indicates which journal file the transaction persisted.</li>
<li><code>LastLogPos</code> - indicates the position the transaction persisted in LastLogId journal file.</li>
</ul>
<p>You may use following settings to further fine tune the behavior of journalling on bookies:</p>
<table><tr><td><code>journalMaxSizeMB</code></td><td>journal file size limitation. when a journal reaches this limitation, it will be closed and new journal file be created.</td></tr><tr><td><code>journalMaxBackups</code></td><td>how many old journal files whose id is less than LastLogMark 's journal id.</td></tr></table>
<blockquote><p><span class="caps">NOTE</span>: keeping number of old journal files would be useful for manually recovery in special case.</p></blockquote>
<h1>ZooKeeper Metadata</h1>
<p>For BookKeeper, we require a ZooKeeper installation to store metadata, and to pass the list of ZooKeeper servers as parameter to the constructor of the BookKeeper class (<code>org.apache.bookkeeper.client.BookKeeper</code>). To setup ZooKeeper, please check the <a href="http://zookeeper.apache.org/doc/trunk/index.html">ZooKeeper documentation</a>. </p>
<p>BookKeeper provides two mechanisms to organize its metadata in ZooKeeper. By default, the <code>FlatLedgerManager</code> is used, and 99% of users should never need to look at anything else. However, in cases where there are a lot of active ledgers concurrently, (&gt; 50,000), <code>HierarchicalLedgerManager</code> should be used. For so many ledgers, a hierarchical approach is needed due to a limit ZooKeeper places on packet sizes <a href="https://issues.apache.org/jira/browse/BOOKKEEPER-39"><span class="caps">JIRA</span> Issue</a>.</p>
<table><tr><td><code>FlatLedgerManager</code></td><td>All ledger metadata are placed as children in a single zookeeper path.</td></tr><tr><td><code>HierarchicalLedgerManager</code></td><td>All ledger metadata are partitioned into 2-level znodes.</td></tr></table>
<h2>Flat Ledger Manager</h2>
<p>All ledgers' metadata are put in a single zookeeper path, created using zookeeper sequential node, which can ensure uniqueness of ledger id. Each ledger node is prefixed with 'L'.</p>
<p>Bookie server manages its owned active ledgers in a hash map. So it is easy for bookie server to find what ledgers are deleted from zookeeper and garbage collect them. And its garbage collection flow is described as below:</p>
<ul>
<li>Fetch all existing ledgers from zookeeper (<code>zkActiveLedgers</code>).</li>
<li>Fetch all ledgers currently active within the Bookie (<code>bkActiveLedgers</code>).</li>
<li>Loop over <code>bkActiveLedgers</code> to find those ledgers which do not exist in <code>zkActiveLedgers</code> and garbage collect them.</li>
</ul>
<h2>Hierarchical Ledger Manager</h2>
<p><code>HierarchicalLedgerManager</code> first obtains a global unique id from ZooKeeper using a <span class="caps">EPHEMERAL</span>_SEQUENTIAL znode.</p>
<p>Since ZooKeeper sequential counter has a format of %10d -- that is 10 digits with 0 (zero) padding, i.e. "&lt;path&gt;0000000001", <code>HierarchicalLedgerManager</code> splits the generated id into 3 parts :</p>
<p><code>{level1 (2 digits)}{level2 (4 digits)}{level3 (4 digits)}</code></p>
<p>These 3 parts are used to form the actual ledger node path used to store ledger metadata:</p>
<p><code>{ledgers_root_path}/{level1}/{level2}/L{level3}</code></p>
<p>E.g. Ledger 0000000001 is split into 3 parts 00, 0000, 00001, which is stored in znode /{ledgers_root_path}/00/0000/L0001. So each znode could have at most 10000 ledgers, which avoids the problem of the child list being larger than the maximum ZooKeeper packet size.</p>
<p>Bookie server manages its active ledgers in a sorted map, which simplifies access to active ledgers in a particular (level1, level2) partition.</p>
<p>Garbage collection in bookie server is processed node by node as follows:</p>
<ul>
<li>Fetching all level1 nodes, by calling zk#getChildren(ledgerRootPath).<ul>
<li>For each level1 nodes, fetching their level2 nodes :</li>
<li>For each partition (level1, level2) :<ul>
<li>Fetch all existed ledgers from zookeeper belonging to partition (level1, level2) (<code>zkActiveLedgers</code>).</li>
<li>Fetch all ledgers currently active in the bookie which belong to partition (level1, level2) (<code>bkActiveLedgers</code>).</li>
<li>Loop over <code>bkActiveLedgers</code> to find those ledgers which do not exist in <code>zkActiveLedgers</code>, and garbage collect them.</li>
</ul></li>
</ul></li>
</ul>
<blockquote><p><span class="caps">NOTE</span>: Hierarchical Ledger Manager is more suitable to manage large number of ledgers existed in BookKeeper.</p></blockquote>
</div>
<footer class="footer">
<div class="container">
<p class="text-muted">Copyright &copy; 2014 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.<br/>
Apache BookKeeper, BookKeeper, Apache, Apache ZooKeeper, ZooKeeper, the Apache feather logo, and the Apache BookKeeper project logo are trademarks of The Apache Software Foundation.</p>
</div>
</footer>
<script src="//code.jquery.com/jquery.js"></script>
<script src="/archives/js/bootstrap.min.js"></script>
</body>
</html>