blob: 223e8b7352f2fc19c43776604803cbbc8072650c [file] [log] [blame]
<!DOCTYPE html>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE- 2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Apache BookKeeper - Hedwig Metadata Management</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Bootstrap -->
<link href="/archives/css/bootstrap.min.css" rel="stylesheet">
<link href="/archives/css/bootstrap-responsive.min.css" rel="stylesheet">
<link href="/archives/css/styles.css" rel="stylesheet">
</head>
<body>
<header class="navbar navbar-inverse navbar-static-top" role="banner">
<div class="container">
<div class="navbar-header hidden-xs hidden-sm">
<a class="navbar-brand navbar-logo" href="/archives/"><img class="img-responsive" src="/archives/img/bookkeeper_blk40.png" alt="Bookkeeper Logo" /></a>
</div>
<div class="navbar-header">
<button class="navbar-toggle collapsed" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/archives/">Apache BookKeeper</a>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/archives/releases.html">Download</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Documentation<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="/archives/docs/master">Latest (master)</a></li>
<li><ul>
<li><a href="/archives/docs/master/apidocs">Java API docs</a></li>
<li><a href="/archives/docs/master/bookkeeperTutorial.html">Tutorial</a></li>
<li><a href="/archives/docs/master/bookkeeperConfig.html">Admin guide</a></li>
</ul><li>
<li><a href="/archives/docs/r4.4.0">Release 4.4.0</a></li>
<li class="divider"></li>
<li>Older releases</li>
<li><a href="/archives/docs/r4.3.2">Release 4.3.2</a></li>
<li><a href="/archives/docs/r4.3.1">Release 4.3.1</a></li>
<li><a href="/archives/docs/r4.3.0">Release 4.3.0</a></li>
<li><a href="/archives/docs/r4.2.4">Release 4.2.4</a></li>
<li><a href="/archives/docs/r4.2.3">Release 4.2.3</a></li>
<li><a href="/archives/docs/r4.2.2">Release 4.2.2</a></li>
<li><a href="/archives/docs/r4.2.1">Release 4.2.1</a></li>
<li><a href="/archives/docs/r4.2.0">Release 4.2.0</a></li>
<li><a href="/archives/docs/r4.1.0">Release 4.1.0</a></li>
<li><a href="/archives/docs/r4.0.0">Release 4.0.0</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Get Involved<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="/archives/lists.html">Mailing Lists</a></li>
<li><a href="/archives/irc.html">IRC</a></li>
<li><a href="/archives/svn.html">Version Control</a></li>
<li><a href="https://issues.apache.org/jira/browse/BOOKKEEPER">Issue Tracker</a></li>
</ul>
</li>
<li><a href="https://cwiki.apache.org/confluence/display/BOOKKEEPER/Index">Wiki</a></li>
<!--<li><a href="#">Hedwig</a></li>//-->
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Project Info<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="/archives/credits.html">Who are we?</a></li>
<li><a href="/archives/bylaws.html">Bylaws</a></li>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li class="divider"></li>
<li><a href="/archives/privacy.html">Privacy Policy</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsership</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
</ul>
</li>
</ul>
<script>
(function() {
var cx = '017580107654524087317:iqnsyimpydg';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
'//www.google.com/cse/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
<div class="navbar-form navbar-right visible-lg" id="googlebox">
<gcse:searchbox-only></gcse:searchbox-only>
</div>
</nav>
</div>
</header>
<div class="container">
<h1>Metadata Management</h1>
<p>There are two classes of metadata that need to be managed in Hedwig: one is the <i>list of available hubs</i>, which is used to track server availability (ZooKeeper is designed naturally for this); while the other is for data structures to track <i>topic states</i> and <i>subscription states</i>. This second class can be handled by any key/value store which provides ah <i><span class="caps">CAS </span>(Compare And Set)</i> operation. The metadata in this class are:</p>
<ul>
<li><code>Topic Ownership</code>: tracks which hub server is assigned to serve requests for a specific topic.</li>
<li><code>Topic Persistence Info</code>: records what <i>bookkeeper ledgers</i> are used to store messages for a specific topic and their message id ranges.</li>
<li><code>Subscription Data</code>: records the preferences and subscription state for a specific subscription (topic, subscriber).</li>
</ul>
<p>Each kind of metadata is handled by a specific metadata manager. They are <i>TopicOwnershipManager</i>, <i>TopicPersistenceManager</i> and <i>SubscriptionDataManager</i>.</p>
<h2>Topic Ownership Management</h2>
<p>There are two ways to management topic ownership. One is leveraging ZooKeeper's ephemeral znodes to record the topic's owner info as a child ephemeral znode under its topic znode. When a hub server, owning a specific topic, crashes, the ephemeral znode which signifies topic ownership will be deleted due to the loss of the zookeeper session. Other hubs can then be assigned the ownership of the topic. The other one is to leverage the <i><span class="caps">CAS</span></i> operation provided by key/value stores to do leader election. <i><span class="caps">CAS</span></i> doesn't require the underlying key/value store to provide functionality similar to ZooKeeper's ephemeral nodes. With <i><span class="caps">CAS</span></i> it is possible to guarantee that only one hub server gains the ownership for a specific topic, which is more scalable and generic solution.</p>
<p>The implementation of a <i>TopicOwnershipManager</i> is required to implement following methods:</p>
<pre><code>
public void readOwnerInfo(ByteString topic, Callback&lt;Versioned&lt;HubInfo&gt;&gt; callback, Object ctx);
public void writeOwnerInfo(ByteString topic, HubInfo owner, Version version,
Callback&lt;Version&gt; callback, Object ctx);
public void deleteOwnerInfo(ByteString topic, Version version,
Callback&lt;Void&gt; callback, Object ctx);
</code></pre>
<ul>
<li><code>readOwnerInfo</code>: Read the owner info from the underlying key/value store. The implementation should take the responsibility of deserializing the metadata into a <i>HubInfo</i> object identifying a hub server. Also, its current <i>version</i> needs to be returned for future updates. If there is no owner info found for a topic, null value is returned.</li>
</ul>
<ul>
<li><code>writeOwnerInfo</code>: Write the owner info into the underlying key/value store with the given <i>version</i>. If the current <i>version</i> in underlying key/value store doesn't equal to the provided <i>version</i>, the write should be rejected with <i>BadVersionException</i>. The new <i>version</i> should be returned for a successful write. <i>NoTopicOwnerInfoException</i> is returned if no owner info found for a topic.</li>
</ul>
<ul>
<li><code>deleteOwnerInfo</code>: Delete the owner info from key/value store with the given <i>version</i>. The owner info should be removed if the current <i>version</i> in key/value store is equal to the provided <i>version</i>. Otherwise, the deletion should be rejected with <i>BadVersionException</i>. <i>NoTopicOwnerInfoException</i> is returned if no owner info is found for the topic.</li>
</ul>
<h2>Topic Persistence Info Management</h2>
<p>Similar as <i>TopicOwnershipManager</i>, an implementation of <i>TopicPersistenceManager</i> is required to implement <span class="caps">READ</span>/WRITE/DELETE interfaces as below:</p>
<pre><code>
public void readTopicPersistenceInfo(ByteString topic,
Callback&lt;Versioned&lt;LedgerRanges&gt;&gt; callback, Object ctx);
public void writeTopicPersistenceInfo(ByteString topic, LedgerRanges ranges, Version version,
Callback&lt;Version&gt; callback, Object ctx);
public void deleteTopicPersistenceInfo(ByteString topic, Version version,
Callback&lt;Void&gt; callback, Object ctx);
</code></pre>
<ul>
<li><code>readTopicPersistenceInfo</code>: Read the persistence info from the underlying key/value store. The implementation should take the responsibility of deserializing the metadata into a <i>LedgerRanges</i> object includes the ledgers used to store messages. Also, its current <i>version</i> needs to be returned for future updates. If there is no persistence info found for a topic, a null value is returned.</li>
</ul>
<ul>
<li><code>writeTopicPersistenceInfo</code>: Write the persistence info into the underlying key/value store with the given <i>version</i>. If the current <i>version</i> in the underlying key/value store doesn't equal the provided <i>version</i>, the write should be rejected with <i>BadVersionException</i>. The new <i>version</i> should be returned on a successful write. <i>NoTopicPersistenceInfoException</i> is returned if no persistence info is found for a topic.</li>
</ul>
<ul>
<li><code>deleteTopicPersistenceInfo</code>: Delete the persistence info from the key/value store with the given <i>version</i>. The owner info should be removed if the current <i>version</i> in the key/value store equals the provided <i>version</i>. Otherwise, the deletion should be rejected with <i>BadVersionException</i>. <i>NoTopicPersistenceInfoException</i> is returned if no persistence info is found for a topic.</li>
</ul>
<h2>Subscription Data Management</h2>
<p><i>SubscriptionDataManager</i> has similar <span class="caps">READ</span>/CREATE/WRITE/DELETE interfaces as other managers. Besides that, the implementation needs to implement <i><span class="caps">READ SUBSCRIPTIONS</span></i> interface, which is to fetch all the subscriptions for a given topic.</p>
<pre><code>
public void createSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData data,
Callback&lt;Version&gt; callback, Object ctx);
public boolean isPartialUpdateSupported();
public void updateSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData dataToUpdate,
Version version, Callback&lt;Version&gt; callback, Object ctx);
public void replaceSubscriptionData(ByteString topic, ByteString subscriberId, SubscriptionData dataToReplace,
Version version, Callback&lt;Version&gt; callback, Object ctx);
public void deleteSubscriptionData(ByteString topic, ByteString subscriberId, Version version,
Callback&lt;Void&gt; callback, Object ctx);
public void readSubscriptionData(ByteString topic, ByteString subscriberId,
Callback&lt;Versioned&lt;SubscriptionData&gt;&gt; callback, Object ctx);
public void readSubscriptions(ByteString topic, Callback&lt;Map&lt;ByteString, Versioned&lt;SubscriptionData&gt;&gt;&gt; cb,
Object ctx);
</code></pre>
<h3>Create/Update Subscriptions</h3>
<p>The metadata for a subscription includes two parts, one is preferences and the other one is subscription state. <i>SubscriptionPreferences</i> tracks all the preferences for a subscriber (etc. Application could store its customized preferences for message filtering), while <i>SubscriptionState</i> is used internally to track the message consumption state for a given subscriber. These two kinds of metadata are quite different: <i>SubscriptionPreferences</i> is not updated<br />
frequently while <i>SubscriptionState</i> is be updated frequently when messages are consumed. If the underlying key/value store supports independent field update for a given key (subscription), <i>SubscriptionPreferences</i> and <i>SubscriptionState</i> could be stored as two different fields for a given subscription. In this case <i>isPartialUpdateSupported</i> should return true. Otherwise, <i>isPartialUpdateSupported</i> should return false and the implementation should serialize/deserialize <i>SubscriptionData</i> as an opaque blob.</p>
<ul>
<li><code>createSubscriptionData</code>: Create a subscription entry for a given topic. The initial <i>version</i> would be returned for a success creation. <i>SubscriptionStateExistsException</i> is returned if the subscription entry already exists.</li>
</ul>
<ul>
<li><code>updateSubscriptionData/replaceSubscriptionData</code>: Update/replace the subscription data in the underlying key/value store with the given <i>version</i>. If the current <i>version</i> in underlying key/value store doesn't equal to the provided <i>version</i>, the update should be rejected with <i>BadVersionException</i>. The new <i>version</i> should be returned for a successful write. <i>NoSubscriptionStateException</i> is returned if no subscription entry is found for a subscription (topic, subscriber).</li>
</ul>
<h3>Read Subscriptions</h3>
<ul>
<li><code>readSubscriptionData</code>: Read the subscription data from the underlying key/value store. The implementation should take the responsibility of deserializing the metadata into a <i>SubscriptionData</i> object including its preferences and subscription state. Also, its current <i>version</i> needs to be returned for future updates. If there is no subscription data found for a subscription, a null value is returned.</li>
</ul>
<ul>
<li><code>readSubscriptions</code>: Read all the subscription data from key/value store for a given topic. The implementation should take the responsibility of managing all subscription for a topic for efficient access. An empty map is returned if there are no subscriptions found for a given topic.</li>
</ul>
<h3>Delete Subscription</h3>
<ul>
<li><code>deleteSubscriptionData</code>: Delete the subscription data from the key/value store with given <i>version</i> for a specific subscription (topic, subscriber). The subscription info should be removed if current <i>version</i> in key/value store equals the provided <i>version</i>. Otherwise, the deletion should be rejected with <i>BadVersionException</i>. <i>NoSubscriptionStateException</i> is returned if no subscription data is found for a subscription (topic, subscriber).</li>
</ul>
<h1>How to choose a key/value store for Hedwig.</h1>
<p>From the interface, several requirements needs to meet before picking up a key/value store for Hedwig:</p>
<ul>
<li><code>CAS</code>: The ability to do strict updates according to specific condition, i.e. a specific version (ZooKeeper) and same content (HBase).</li>
<li><code>Optimized for Writes</code>: The metadata access pattern for Hedwig is read first and continuous updates.</li>
<li><code>Optimized for retrieving all subscriptions for a topic</code>: Either hierarchical structures to maintain such relationships (ZooKeeper), or ordered key/value storage to cluster the subscription for a topic together, would provide efficient subscription data management.</li>
</ul>
<p><i>ZooKeeper</i> is the default implementation for Hedwig metadata management, which holds data in memory and provides filesystem-like namespace, meeting the above requirements. <i>ZooKeeper</i> is suitable for most Hedwig usecases. However, if your application needs to manage millions of topics/subscriptions, a more scalable solution would be <i>HBase</i>, which also meet the above requirements.</p>
</div>
<footer class="footer">
<div class="container">
<p class="text-muted">Copyright &copy; 2014 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.<br/>
Apache BookKeeper, BookKeeper, Apache, Apache ZooKeeper, ZooKeeper, the Apache feather logo, and the Apache BookKeeper project logo are trademarks of The Apache Software Foundation.</p>
</div>
</footer>
<script src="//code.jquery.com/jquery.js"></script>
<script src="/archives/js/bootstrap.min.js"></script>
</body>
</html>