blob: 6875498584e5b171582129a7995d2276c3ef7b61 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Apache DistributedLog (incubating)</title>
<meta name="description" content="Apache DistributedLog is an high performance replicated log.
">
<link rel="stylesheet" href="/docs/0.4.0-incubating/styles/site.css">
<link rel="stylesheet" href="/docs/0.4.0-incubating/css/theme.css">
<!-- JQuery -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js"></script>
<script src="/docs/0.4.0-incubating/js/bootstrap.min.js"></script>
<link rel="canonical" href="http://bookkeeper.apache.org/distributedlog/docs/0.4.0-incubating/admin_guide/bookkeeper.html" data-proofer-ignore>
<link rel="alternate" type="application/rss+xml" title="Apache DistributedLog (incubating)" href="http://bookkeeper.apache.org/distributedlog/docs/0.4.0-incubating/feed.xml">
<!-- Font Awesome -->
<script src="//cdnjs.cloudflare.com/ajax/libs/anchor-js/3.2.0/anchor.min.js"></script>
<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-83870961-1', 'auto');
ga('send', 'pageview');
</script>
<!-- End Google Analytics -->
<link rel="shortcut icon" type="image/x-icon" href="/images/favicon.ico">
</head>
<body role="document">
<nav class="navbar navbar-default navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<a href="/" class="navbar-brand" >
<img alt="Brand" style="height: 28px" src="/docs/0.4.0-incubating/images/distributedlog_logo_navbar.png">
</a>
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<!-- Overview -->
<li><a href="/docs/0.4.0-incubating/">V0.4.0</a></li>
<!-- Concepts -->
<li><a href="/docs/0.4.0-incubating/basics/introduction">Concepts</a></li>
<!-- Quick Start -->
<li>
<a href="/docs/0.4.0-incubating/start" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Start<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li>
<a href="/docs/0.4.0-incubating/start/building.html">
Build DistributedLog from Source
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/start/download.html">
Download Releases
</a>
</li>
<li role="separator" class="divider"></li>
<li class="dropdown-header"><strong>Quickstart</strong></li>
<li>
<a href="/docs/0.4.0-incubating/start/quickstart.html">
Setup & Run Example
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/basic-1.html">
API - Write Records (via core library)
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/basic-2.html">
API - Write Records (via write proxy)
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/basic-5.html">
API - Read Records
</a>
</li>
<li role="separator" class="divider"></li>
<li class="dropdown-header"><strong>Deployment</strong></li>
<li>
<a href="/docs/0.4.0-incubating/deployment/cluster.html">
Cluster Setup
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/deployment/global-cluster.html">
Global Cluster Setup
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/deployment/docker.html">
Docker
</a>
</li>
</ul>
</li>
<!-- API -->
<li>
<a href="/docs/0.4.0-incubating/start" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">API<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="/docs/0.4.0-incubating/api/java">Java</a></li>
</ul>
</li>
<!-- User Guide -->
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">User Guide<span class="caret"></span></a>
<ul class="dropdown-menu">
<li>
<a href="/docs/0.4.0-incubating/basics/introduction.html">
Introduction
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/considerations/main.html">
Considerations
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/architecture/main.html">
Architecture
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/api/main.html">
API
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/configuration/main.html">
Configuration
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/design/main.html">
Detail Design
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/globalreplicatedlog/main.html">
Global Replicated Log
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/implementation/main.html">
Implementation
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/user_guide/references/main.html">
References
</a>
</li>
</ul>
</li>
<!-- Admin Guide -->
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Admin Guide<span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="/docs/0.4.0-incubating/deployment/cluster">Cluster Setup</a></li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/operations.html">
Operations
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/loadtest.html">
Load Test
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/performance.html">
Performance Tuning
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/hardware.html">
Hardware
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/monitoring.html">
Monitoring
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/zookeeper.html">
ZooKeeper
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/bookkeeper.html">
BookKeeper
</a>
</li>
</ul>
</li>
<!-- Tutorials -->
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Tutorials<span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header"><strong>Basic</strong></li>
<li><a href="/docs/0.4.0-incubating/tutorials/basic-1">Write Records (via Core Library)</a></li>
<li><a href="/docs/0.4.0-incubating/tutorials/basic-2">Write Records (via Write Proxy)</a></li>
<li><a href="/docs/0.4.0-incubating/tutorials/basic-3">Write Records to multiple streams</a></li>
<li><a href="/docs/0.4.0-incubating/tutorials/basic-4">Atomic Write Records</a></li>
<li><a href="/docs/0.4.0-incubating/tutorials/basic-5">Tailing Read Records</a></li>
<li><a href="/docs/0.4.0-incubating/tutorials/basic-6">Rewind Read Records</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header"><strong>Messaging</strong></li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/messaging-1.html">
Write records to partitioned streams
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/messaging-2.html">
Write records to multiple streams (load balancer)
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/messaging-3.html">
At-least-once Processing
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/messaging-4.html">
Exact-Once Processing
</a>
</li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/messaging-5.html">
Implement a kafka-like pub/sub system
</a>
</li>
<li role="separator" class="divider"></li>
<li class="dropdown-header"><strong>Replicated State Machines</strong></li>
<li>
<a href="/docs/0.4.0-incubating/tutorials/replicatedstatemachines.html">
Build replicated state machines
</a>
</li>
<li role="separator" class="divider"></li>
<li class="dropdown-header"><strong>Analytics</strong></li>
<li><a href="/docs/0.4.0-incubating/tutorials/analytics-mapreduce">Process log streams using MapReduce</a></li>
</ul>
</li>
</ul>
</div><!--/.nav-collapse -->
</div>
</nav>
<link rel="stylesheet" href="">
<div class="container" role="main">
<div class="row">
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<div class="row">
<!-- Sub Navigation -->
<div class="col-sm-3">
<ul id="sub-nav">
<li><a href="/docs/0.4.0-incubating/admin_guide/main.html" class="">Admin Guide</a>
<ul>
<li>
<a href="/docs/0.4.0-incubating/deployment/cluster.html" class="">
Cluster Setup
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/deployment/global-cluster.html" class="">
Global Cluster Setup
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/operations.html" class="">
Operations
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/loadtest.html" class="">
Load Test
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/performance.html" class="">
Performance Tuning
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/hardware.html" class="">
Hardware
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/monitoring.html" class="">
Monitoring
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/zookeeper.html" class="">
ZooKeeper
</a>
<ul>
</ul>
</li>
<li>
<a href="/docs/0.4.0-incubating/admin_guide/bookkeeper.html" class="active">
BookKeeper
</a>
<ul>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<!-- Main -->
<div class="col-sm-9">
<!-- Top anchor -->
<a href="#top"></a>
<!-- Breadcrumbs above the main heading -->
<ol class="breadcrumb">
<li><a href="/docs/0.4.0-incubating/admin_guide/main.html">Admin Guide</a></li>
<li class="active">BookKeeper</li>
</ol>
<div class="text">
<!-- Content -->
<div class="contents topic" id="bookkeeper">
<p class="topic-title first">BookKeeper</p>
<ul class="simple">
<li><a class="reference internal" href="#id1" id="id3">BookKeeper</a><ul>
<li><a class="reference internal" href="#run-from-bookkeeper-source" id="id4">Run from bookkeeper source</a></li>
<li><a class="reference internal" href="#run-from-distributedlog-source" id="id5">Run from distributedlog source</a><ul>
<li><a class="reference internal" href="#build" id="id6">Build</a></li>
<li><a class="reference internal" href="#configuration" id="id7">Configuration</a><ul>
<li><a class="reference internal" href="#port" id="id8">Port</a></li>
<li><a class="reference internal" href="#disks" id="id9">Disks</a></li>
<li><a class="reference internal" href="#zookeeper" id="id10">ZooKeeper</a></li>
<li><a class="reference internal" href="#stats-provider" id="id11">Stats Provider</a></li>
<li><a class="reference internal" href="#index-settings" id="id12">Index Settings</a></li>
<li><a class="reference internal" href="#journal-settings" id="id13">Journal Settings</a></li>
<li><a class="reference internal" href="#thread-settings" id="id14">Thread Settings</a></li>
</ul>
</li>
<li><a class="reference internal" href="#run" id="id15">Run</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<div class="section" id="id1">
<h2><a class="toc-backref" href="#id3">BookKeeper</a></h2>
<p>For reliable BookKeeper service, you should deploy BookKeeper in a cluster.</p>
<div class="section" id="run-from-bookkeeper-source">
<h3><a class="toc-backref" href="#id4">Run from bookkeeper source</a></h3>
<p>The version of BookKeeper that DistributedLog depends on is not the official opensource version.
It is twitter's production version <cite>4.3.4-TWTTR</cite>, which is available in <cite>https://github.com/twitter/bookkeeper</cite>.
We are working actively with BookKeeper community to merge all twitter's changes back to the community.</p>
<p>The major changes in Twitter's bookkeeper includes:</p>
<ul class="simple">
<li><a class="reference external" href="https://issues.apache.org/jira/browse/BOOKKEEPER-670">BOOKKEEPER-670</a>: Long poll reads and LastAddConfirmed piggyback. It is to reduce the tailing read latency.</li>
<li><a class="reference external" href="https://issues.apache.org/jira/browse/BOOKKEEPER-759">BOOKKEEPER-759</a>: Delay ensemble change if it doesn't break ack quorum constraint. It is to reduce the write latency on bookie failures.</li>
<li><a class="reference external" href="https://issues.apache.org/jira/browse/BOOKKEEPER-757">BOOKKEEPER-757</a>: Ledger recovery improvements, to reduce the latency on ledger recovery.</li>
<li>Misc improvements on bookie recovery and bookie storage.</li>
</ul>
<p>To build bookkeeper, run:</p>
<ol class="arabic simple">
<li>First checkout the bookkeeper source code from twitter's branch.</li>
</ol>
<figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class="bash"><span class="line"><span></span>$ git clone https://github.com/twitter/bookkeeper.git bookkeeper
</span></code></pre></td></tr></table></div></figure><ol class="arabic simple" start="2">
<li>Build the bookkeeper package:</li>
</ol>
<figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
</pre></td><td class="code"><pre><code class="bash"><span class="line"><span></span>$ <span class="nb">cd</span> bookkeeper
</span><span class="line">$ mvn clean package assembly:single -DskipTests
</span></code></pre></td></tr></table></div></figure><p>However, since <cite>bookkeeper-server</cite> is one of the dependency of <cite>distributedlog-service</cite>.
You could simply run bookkeeper using same set of scripts provided in <cite>distributedlog-service</cite>.
In the following sections, we will describe how to run bookkeeper using the scripts provided in
<cite>distributedlog-service</cite>.</p>
</div>
<div class="section" id="run-from-distributedlog-source">
<h3><a class="toc-backref" href="#id5">Run from distributedlog source</a></h3>
<div class="section" id="build">
<h4><a class="toc-backref" href="#id6">Build</a></h4>
<p>First of all, build DistributedLog:</p>
<figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class="bash"><span class="line"><span></span>$ mvn clean install -DskipTests
</span></code></pre></td></tr></table></div></figure></div>
<div class="section" id="configuration">
<h4><a class="toc-backref" href="#id7">Configuration</a></h4>
<p>The configuration file <cite>bookie.conf</cite> under <cite>distributedlog-service/conf</cite> is a template of production
configuration to run a bookie node. Most of the configuration settings are good for production usage.
You might need to configure following settings according to your environment and hardware platform.</p>
<div class="section" id="port">
<h5><a class="toc-backref" href="#id8">Port</a></h5>
<p>By default, the service port is <cite>3181</cite>, where the bookie server listens on. You can change the port
to whatever port you like by modifying the following setting.</p>
<pre class="literal-block">
bookiePort=3181
</pre>
</div>
<div class="section" id="disks">
<h5><a class="toc-backref" href="#id9">Disks</a></h5>
<p>You need to configure following settings according to the disk layout of your hardware. It is recommended
to put <cite>journalDirectory</cite> under a separated disk from others for performance. It is okay to set
<cite>indexDirectories</cite> to be same as <cite>ledgerDirectories</cite>. However, it is recommended to put <cite>indexDirectories</cite>
to a SSD driver for better performance.</p>
<pre class="literal-block">
# Directory Bookkeeper outputs its write ahead log
journalDirectory=/tmp/data/bk/journal
# Directory Bookkeeper outputs ledger snapshots
ledgerDirectories=/tmp/data/bk/ledgers
# Directory in which index files will be stored.
indexDirectories=/tmp/data/bk/ledgers
</pre>
<p>To better understand how bookie nodes work, please check <a class="reference external" href="http://bookkeeper.apache.org/">bookkeeper</a> website for more details.</p>
</div>
<div class="section" id="zookeeper">
<h5><a class="toc-backref" href="#id10">ZooKeeper</a></h5>
<p>You need to configure following settings to point the bookie to the zookeeper server that it is using.
You need to make sure <cite>zkLedgersRootPath</cite> exists before starting the bookies.</p>
<pre class="literal-block">
# Root zookeeper path to store ledger metadata
# This parameter is used by zookeeper-based ledger manager as a root znode to
# store all ledgers.
zkLedgersRootPath=/messaging/bookkeeper/ledgers
# A list of one of more servers on which zookeeper is running.
zkServers=localhost:2181
</pre>
</div>
<div class="section" id="stats-provider">
<h5><a class="toc-backref" href="#id11">Stats Provider</a></h5>
<p>Bookies use <cite>StatsProvider</cite> to expose its metrics. The <cite>StatsProvider</cite> is a pluggable library to
adopt to various stats collecting systems. Please check <a class="reference external" href="./monitoring">monitoring</a> for more details.</p>
<pre class="literal-block">
# stats provide - use `codahale` metrics library
statsProviderClass=org.apache.bookkeeper.stats.CodahaleMetricsServletProvider
### Following settings are stats provider related settings
# Exporting codahale stats in http port `9001`
codahaleStatsHttpPort=9001
</pre>
</div>
<div class="section" id="index-settings">
<h5><a class="toc-backref" href="#id12">Index Settings</a></h5>
<ul class="simple">
<li><cite>pageSize</cite>: size of a index page in ledger cache, in bytes. If there are large number
of ledgers and each ledger has fewer entries, smaller index page would improve memory usage.</li>
<li><cite>pageLimit</cite>: The maximum number of index pages in ledger cache. If nummber of index pages
reaches the limitation, bookie server starts to swap some ledgers from memory to disk.
Increase this value when swap becomes more frequent. But make sure <cite>pageLimit*pageSize</cite>
should not be more than JVM max memory limitation.</li>
</ul>
</div>
<div class="section" id="journal-settings">
<h5><a class="toc-backref" href="#id13">Journal Settings</a></h5>
<ul class="simple">
<li><cite>journalMaxGroupWaitMSec</cite>: The maximum wait time for group commit. It is valid only when
<cite>journalFlushWhenQueueEmpty</cite> is false.</li>
<li><cite>journalFlushWhenQueueEmpty</cite>: Flag indicates whether to flush/sync journal. If it is <cite>true</cite>,
bookie server will sync journal when there is no other writes in the journal queue.</li>
<li><cite>journalBufferedWritesThreshold</cite>: The maximum buffered writes for group commit, in bytes.
It is valid only when <cite>journalFlushWhenQueueEmpty</cite> is false.</li>
<li><cite>journalBufferedEntriesThreshold</cite>: The maximum buffered writes for group commit, in entries.
It is valid only when <cite>journalFlushWhenQueueEmpty</cite> is false.</li>
</ul>
<p>Setting <cite>journalFlushWhenQueueEmpty</cite> to <cite>true</cite> will produce low latency when the traffic is low.
However, the latency varies a lost when the traffic is increased. So it is recommended to set
<cite>journalMaxGroupWaitMSec</cite>, <cite>journalBufferedEntriesThreshold</cite> and <cite>journalBufferedWritesThreshold</cite>
to reduce the number of fsyncs made to journal disk, to achieve sustained low latency.</p>
</div>
<div class="section" id="thread-settings">
<h5><a class="toc-backref" href="#id14">Thread Settings</a></h5>
<p>It is recommended to configure following settings to align with the cpu cores of the hardware.</p>
<pre class="literal-block">
numAddWorkerThreads=4
numJournalCallbackThreads=4
numReadWorkerThreads=4
numLongPollWorkerThreads=4
</pre>
</div>
</div>
<div class="section" id="run">
<h4><a class="toc-backref" href="#id15">Run</a></h4>
<p>As <cite>bookkeeper-server</cite> is shipped as part of <cite>distributedlog-service</cite>, you could use the <cite>dlog-daemon.sh</cite>
script to start <cite>bookie</cite> as daemon thread.</p>
<p>Start the bookie:</p>
<figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class="bash"><span class="line"><span></span>$ ./distributedlog-service/bin/dlog-daemon.sh start bookie --conf /path/to/bookie/conf
</span></code></pre></td></tr></table></div></figure><p>Stop the bookie:</p>
<figure class="code"><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
</pre></td><td class="code"><pre><code class="bash"><span class="line"><span></span>$ ./distributedlog-service/bin/dlog-daemon.sh stop bookie
</span></code></pre></td></tr></table></div></figure><p>Please check <a class="reference external" href="http://bookkeeper.apache.org/">bookkeeper</a> website for more details.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<hr>
<div class="row">
<div class="col-xs-12">
<footer>
<p class="text-center">&copy; Copyright 2016
<a href="http://www.apache.org">The Apache Software Foundation.</a> All Rights Reserved.
</p>
<p class="text-center">
<a href="/docs/0.4.0-incubating/feed.xml">RSS Feed</a>
</p>
</footer>
</div>
</div>
<!-- container div end -->
</div>
<script>
(function () {
'use strict';
anchors.options.placement = 'right';
anchors.add();
})();
</script>
</body>
</html>