blob: 8b6677e686a1c1f5ef5bfb0d91afbfd3bea761fa [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Master fault tolerance in Kudu 1.0</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href=""
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="" />
<link rel="alternate" type="application/atom+xml"
title="RSS Feed for Apache Kudu blog"
href="/feed.xml" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src=""></script>
<script src=""></script>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<a class="logo" href="/"><img
srcset="// 1x, // 2x"
alt="Apache Kudu"/></a>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
<li >
<a href="/overview.html">Overview</a>
<li >
<a href="/docs/">Documentation</a>
<li >
<a href="/releases/">Releases</a>
<li class="active">
<a href="/blog/">Blog</a>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="">GitHub</a></li>
<li><a class="icon gerrit" href="">Gerrit Code Review</a></li>
<li><a class="icon jira" href="">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="">Twitter</a></li>
<li><a href="">Reddit</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
<li><a href="" target="_blank">Security</a></li>
<li><a href="" target="_blank">Sponsorship</a></li>
<li><a href="" target="_blank">Thanks</a></li>
<li><a href="" target="_blank">License</a></li>
<li >
<a href="/faq.html">FAQ</a>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
<div class="row header">
<div class="col-lg-12">
<h2><a href="/blog">Apache Kudu Blog</a></h2>
<div class="row-fluid">
<div class="col-lg-9">
<h1 class="entry-title">Master fault tolerance in Kudu 1.0</h1>
<p class="meta">Posted 24 Jun 2016 by Adar Dembo</p>
<div class="entry-content">
<p>This blog post describes how the 1.0 release of Apache Kudu (incubating) will
support fault tolerance for the Kudu master, finally eliminating Kudu’s last
single point of failure.</p>
<p>As those of you who follow this blog know by now, replication is a signature
feature in Kudu. Replication is used to provide fault tolerance for all loaded
data. By implementing the Raft consensus protocol, Kudu guarantees that a tablet
replicated <strong>2N+1</strong> times can tolerate up to <strong>N</strong> failures.</p>
<p>What you may not know is that Kudu replicates its metadata, too. That is, the
Kudu master stores all table and tablet metadata in a single “master” tablet.
As a regular Kudu tablet itself, this master tablet may be replicated with
Raft. As such, the Kudu master is a special kind of tablet server whose primary
job is to host a single replica of the master tablet.</p>
<p>When we launched Kudu’s first beta, support for replicated masters had been
implemented but was too fragile to be anything but experimental. One of our
goals for Kudu’s 1.0 release is to improve replicated master support so that it
can be safely enabled in production clusters.</p>
<h1 id="how-master-replication-works">How master replication works</h1>
<p>To use replicated masters, a Kudu operator must deploy some number of Kudu
masters, providing the hostname and port number of each master in the group via
the <code class="language-plaintext highlighter-rouge">--master_address</code> command line option. For example, each master in a
three-node deployment should be started with
<code class="language-plaintext highlighter-rouge">--master_address=&lt;host1:port1&gt;,&lt;host2:port2&gt;&lt;host3:port3&gt;</code>. In Raft parlance,
this group of masters is known as a <em>Raft configuration</em>.</p>
<p>At startup, a Raft configuration of masters will hold a leader election and
elect one master as the leader. The leader master is responsible for servicing
both tablet server heartbeats as well as client requests. The remaining masters
are followers: they participate in Raft consensus and replicate writes sent by
the leader, but are otherwise idle. Any client requests they receive are
rejected. Likewise, all tablet server heartbeats they receive are ignored. If
the leader master ever dies or steps down, the remaining replicas hold an
election to determine the new leader.</p>
<p>All persistent master metadata is stored in the single replicated “master”
tablet. Every row in this tablet represents either a table or a tablet. Table
records include unique table identifiers, the table’s schema, and other bits of
information. Tablet records include a unique identifier, the tablet’s Raft
configuration, and other information.</p>
<p>What master metadata is replicated?</p>
<li>Table and tablet existence, via <strong>CreateTable()</strong> and <strong>DeleteTable()</strong>.
Every new tablet record also includes an initial Raft configuration.</li>
<li>Schema changes, via <strong>AlterTable()</strong> and tablet server heartbeats.</li>
<li>Tablet server Raft configuration changes, via tablet server heartbeats.
These include both the list of Raft peers (may have changed due to
under-replication) as well as the current leader (may have changed due to
an election).</li>
<p>Scanning the master tablet to service every heartbeat or client request would be
slow, so the leader master caches all master metadata in memory. The caches are
only updated after a metadata change is successfully replicated; in this way
they are always consistent with the on-disk tablet. When a new leader master is
elected, it scans the entire master tablet and uses the metadata to rebuild its
in-memory caches.</p>
<h1 id="communication-with-replicated-masters">Communication with replicated masters</h1>
<p>All tablet servers start up with location information for the entire master Raft
configuration and will periodically heartbeat to every master. Similarly,
clients are also configured with the locations of all masters. Unlike tablet
servers, they always communicate with the leader master as follower masters will
reject client requests. To do this, clients must determine which master is the
leader before sending the first request as well as whenever any request fails
with a <code class="language-plaintext highlighter-rouge">NOT_THE_LEADER</code> error.</p>
<h1 id="remaining-work-for-kudu-10">Remaining work for Kudu 1.0</h1>
<p><a href="">KUDU-422</a> tracks the remaining
master replication work. The guts of this feature have been implemented as far
back as early 2015; the remaining work has been focused on fixing bugs that
manifest only under specific conditions. For example, we’ve observed failures in
DDL operations (e.g. <strong>CreateTable()</strong>) that only materialize upon the
completion of a master leader election. These failures highlight some of the
gaps in our testing regimen: we need a robust stress test that repeatedly
performs such operations while holding master leader elections.</p>
<p>That said, there is one remaining work item of larger scope: there’s no
mechanism with which to perform a Raft configuration change for replicated
masters. Such a mechanism would have multiple uses:</p>
<li>Migrating from a single-node master deployment to a fully replicated
three-node (or five-node) deployment.</li>
<li>Replacing a failed master with a new one.</li>
<p>This is being tracked by
<a href="">KUDU-1474</a>, and there’s been
<a href="">some discussion</a> around a design, but
nothing has been implemented yet. Stay tuned!</p>
<div class="col-lg-3 recent-posts">
<h3>Recent posts</h3>
<li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li>
<li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li>
<li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li>
<li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li>
<li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li>
<li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li>
<li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li>
<li> <a href="/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html">Testing Apache Kudu Applications on the JVM</a> </li>
<li> <a href="/2019/03/15/apache-kudu-1-9-0-release.html">Apache Kudu 1.9.0 Released</a> </li>
<li> <a href="/2019/03/05/transparent-hierarchical-storage-management-with-apache-kudu-and-impala.html">Transparent Hierarchical Storage Management with Apache Kudu and Impala</a> </li>
<li> <a href="/2018/12/11/call-for-posts.html">Call for Posts</a> </li>
<li> <a href="/2018/10/26/apache-kudu-1-8-0-released.html">Apache Kudu 1.8.0 Released</a> </li>
<li> <a href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan Optimization in Kudu</a> </li>
<li> <a href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data Pipelines with Kudu</a> </li>
<li> <a href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting Started with Kudu - an O'Reilly Title</a> </li>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p class="small">
Copyright &copy; 2019 The Apache Software Foundation.
<p class="small">
Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
project logo are either registered trademarks or trademarks of The
Apache Software Foundation in the United States and other countries.
<div class="col-md-3">
<a class="pull-right" href="">
<img src=""/>
<script src=""></script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
} else {
<script src=""
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
<script src=""></script>
anchors.options = {
placement: 'right',
visible: 'touch',