| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu (incubating) completes Hadoop's storage layer to enable fast analytics on fast data" /> |
| <meta name="author" content="Cloudera" /> |
| <title>Apache Kudu (incubating) - Master fault tolerance in Kudu 1.0</title> |
| <!-- Bootstrap core CSS --> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" |
| integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" |
| crossorigin="anonymous"> |
| |
| <!-- Custom styles for this template --> |
| <link href="/css/kudu.css" rel="stylesheet"/> |
| <link href="/css/asciidoc.css" rel="stylesheet"/> |
| <link rel="shortcut icon" href="/img/logo-favicon.ico" /> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" /> |
| |
| |
| <link rel="alternate" type="application/atom+xml" |
| title="RSS Feed for Apache Kudu blog" |
| href="/feed.xml" /> |
| |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| <body> |
| <div class="kudu-site container-fluid"> |
| <!-- Static navbar --> |
| <nav class="navbar navbar-default"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="logo" href="/"><img src="/img/logo_small.png" width="80" /></a> |
| |
| </div> |
| <div id="navbar" class="collapse navbar-collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li > |
| <a href="/">Home</a> |
| </li> |
| <li > |
| <a href="/overview.html">Overview</a> |
| </li> |
| <li > |
| <a href="/docs/">Documentation</a> |
| </li> |
| <li > |
| <a href="/releases/">Download</a> |
| </li> |
| <li class="active"> |
| <a href="/blog/">Blog</a> |
| </li> |
| <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here |
| that doesn't also appear elsewhere on the site. --> |
| <li class="dropdown"> |
| <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> |
| <ul class="dropdown-menu"> |
| <li class="dropdown-header">GET IN TOUCH</li> |
| <li><a class="icon email" href="/community.html">Mailing Lists</a></li> |
| <li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> |
| <!--<li><a href="/committers.html">Project Committers</a></li>--> |
| <!--<li><a href="/roadmap.html">Roadmap</a></li>--> |
| <li><a href="/community.html#contributions">How to Contribute</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">DEVELOPER RESOURCES</li> |
| <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> |
| <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> |
| <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">SOCIAL MEDIA</li> |
| <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> |
| </ul> |
| </li> |
| <li > |
| <a href="/faq.html">FAQ</a> |
| </li> |
| </ul><!-- /.nav --> |
| </div><!-- /#navbar --> |
| </div><!-- /.container-fluid --> |
| </nav> |
| |
| <div class="row header"> |
| <div class="col-lg-12"> |
| <h2><a href="/blog">Apache Kudu (incubating) Blog</a></h2> |
| </div> |
| </div> |
| |
| <div class="row-fluid"> |
| <div class="col-lg-9"> |
| <article> |
| <header> |
| <h1 class="entry-title">Master fault tolerance in Kudu 1.0</h1> |
| <p class="meta">Posted 24 Jun 2016 by Adar Dembo</p> |
| </header> |
| <div class="entry-content"> |
| <p>This blog post describes how the 1.0 release of Apache Kudu (incubating) will |
| support fault tolerance for the Kudu master, finally eliminating Kudu’s last |
| single point of failure.</p> |
| |
| <!--more--> |
| |
| <p>As those of you who follow this blog know by now, replication is a signature |
| feature in Kudu. Replication is used to provide fault tolerance for all loaded |
| data. By implementing the Raft consensus protocol, Kudu guarantees that a tablet |
| replicated <strong>2N+1</strong> times can tolerate up to <strong>N</strong> failures.</p> |
| |
| <p>What you may not know is that Kudu replicates its metadata, too. That is, the |
| Kudu master stores all table and tablet metadata in a single “master” tablet. |
| As a regular Kudu tablet itself, this master tablet may be replicated with |
| Raft. As such, the Kudu master is a special kind of tablet server whose primary |
| job is to host a single replica of the master tablet.</p> |
| |
| <p>When we launched Kudu’s first beta, support for replicated masters had been |
| implemented but was too fragile to be anything but experimental. One of our |
| goals for Kudu’s 1.0 release is to improve replicated master support so that it |
| can be safely enabled in production clusters.</p> |
| |
| <h1 id="how-master-replication-works">How master replication works</h1> |
| |
| <p>To use replicated masters, a Kudu operator must deploy some number of Kudu |
| masters, providing the hostname and port number of each master in the group via |
| the <code>--master_address</code> command line option. For example, each master in a |
| three-node deployment should be started with |
| <code>--master_address=<host1:port1>,<host2:port2><host3:port3></code>. In Raft parlance, |
| this group of masters is known as a <em>Raft configuration</em>.</p> |
| |
| <p>At startup, a Raft configuration of masters will hold a leader election and |
| elect one master as the leader. The leader master is responsible for servicing |
| both tablet server heartbeats as well as client requests. The remaining masters |
| are followers: they participate in Raft consensus and replicate writes sent by |
| the leader, but are otherwise idle. Any client requests they receive are |
| rejected. Likewise, all tablet server heartbeats they receive are ignored. If |
| the leader master ever dies or steps down, the remaining replicas hold an |
| election to determine the new leader.</p> |
| |
| <p>All persistent master metadata is stored in the single replicated “master” |
| tablet. Every row in this tablet represents either a table or a tablet. Table |
| records include unique table identifiers, the table’s schema, and other bits of |
| information. Tablet records include a unique identifier, the tablet’s Raft |
| configuration, and other information.</p> |
| |
| <p>What master metadata is replicated?</p> |
| |
| <ol> |
| <li>Table and tablet existence, via <strong>CreateTable()</strong> and <strong>DeleteTable()</strong>. |
| Every new tablet record also includes an initial Raft configuration.</li> |
| <li>Schema changes, via <strong>AlterTable()</strong> and tablet server heartbeats.</li> |
| <li>Tablet server Raft configuration changes, via tablet server heartbeats. |
| These include both the list of Raft peers (may have changed due to |
| under-replication) as well as the current leader (may have changed due to |
| an election).</li> |
| </ol> |
| |
| <p>Scanning the master tablet to service every heartbeat or client request would be |
| slow, so the leader master caches all master metadata in memory. The caches are |
| only updated after a metadata change is successfully replicated; in this way |
| they are always consistent with the on-disk tablet. When a new leader master is |
| elected, it scans the entire master tablet and uses the metadata to rebuild its |
| in-memory caches.</p> |
| |
| <h1 id="communication-with-replicated-masters">Communication with replicated masters</h1> |
| |
| <p>All tablet servers start up with location information for the entire master Raft |
| configuration and will periodically heartbeat to every master. Similarly, |
| clients are also configured with the locations of all masters. Unlike tablet |
| servers, they always communicate with the leader master as follower masters will |
| reject client requests. To do this, clients must determine which master is the |
| leader before sending the first request as well as whenever any request fails |
| with a <code>NOT_THE_LEADER</code> error.</p> |
| |
| <h1 id="remaining-work-for-kudu-10">Remaining work for Kudu 1.0</h1> |
| |
| <p><a href="https://issues.apache.org/jira/browse/KUDU-422">KUDU-422</a> tracks the remaining |
| master replication work. The guts of this feature have been implemented as far |
| back as early 2015; the remaining work has been focused on fixing bugs that |
| manifest only under specific conditions. For example, we’ve observed failures in |
| DDL operations (e.g. <strong>CreateTable()</strong>) that only materialize upon the |
| completion of a master leader election. These failures highlight some of the |
| gaps in our testing regimen: we need a robust stress test that repeatedly |
| performs such operations while holding master leader elections.</p> |
| |
| <p>That said, there is one remaining work item of larger scope: there’s no |
| mechanism with which to perform a Raft configuration change for replicated |
| masters. Such a mechanism would have multiple uses:</p> |
| |
| <ol> |
| <li>Migrating from a single-node master deployment to a fully replicated |
| three-node (or five-node) deployment.</li> |
| <li>Replacing a failed master with a new one.</li> |
| </ol> |
| |
| <p>This is being tracked by |
| <a href="https://issues.apache.org/jira/browse/KUDU-1474">KUDU-1474</a>, and there’s been |
| <a href="http://gerrit.cloudera.org:8080/3393">some discussion</a> around a design, but |
| nothing has been implemented yet. Stay tuned!</p> |
| |
| </div> |
| </article> |
| |
| |
| </div> |
| <div class="col-lg-3 recent-posts"> |
| <h3>Recent posts</h3> |
| <ul> |
| |
| <li> <a href="/2016/07/18/weekly-update.html">Apache Kudu (incubating) Weekly Update July 18, 2016</a> </li> |
| |
| <li> <a href="/2016/07/11/weekly-update.html">Apache Kudu (incubating) Weekly Update July 11, 2016</a> </li> |
| |
| <li> <a href="/2016/07/01/apache-kudu-0-9-1-released.html">Apache Kudu (incubating) 0.9.1 released</a> </li> |
| |
| <li> <a href="/2016/06/27/weekly-update.html">Apache Kudu (incubating) Weekly Update June 27, 2016</a> </li> |
| |
| <li> <a href="/2016/06/24/multi-master-1-0-0.html">Master fault tolerance in Kudu 1.0</a> </li> |
| |
| <li> <a href="/2016/06/21/weekly-update.html">Apache Kudu (incubating) Weekly Update June 21, 2016</a> </li> |
| |
| <li> <a href="/2016/06/17/raft-consensus-single-node.html">Using Raft Consensus on a Single Node</a> </li> |
| |
| <li> <a href="/2016/06/13/weekly-update.html">Apache Kudu (incubating) Weekly Update June 13, 2016</a> </li> |
| |
| <li> <a href="/2016/06/10/apache-kudu-0-9-0-released.html">Apache Kudu (incubating) 0.9.0 released</a> </li> |
| |
| <li> <a href="/2016/06/06/weekly-update.html">Apache Kudu (incubating) Weekly Update June 6, 2016</a> </li> |
| |
| <li> <a href="/2016/06/02/no-default-partitioning.html">Default Partitioning Changes Coming in Kudu 0.9</a> </li> |
| |
| <li> <a href="/2016/06/01/weekly-update.html">Apache Kudu (incubating) Weekly Update June 1, 2016</a> </li> |
| |
| <li> <a href="/2016/05/23/weekly-update.html">Apache Kudu (incubating) Weekly Update May 23, 2016</a> </li> |
| |
| <li> <a href="/2016/05/16/weekly-update.html">Apache Kudu (incubating) Weekly Update May 16, 2016</a> </li> |
| |
| <li> <a href="/2016/05/09/weekly-update.html">Apache Kudu (incubating) Weekly Update May 9, 2016</a> </li> |
| |
| </ul> |
| </div> |
| </div> |
| |
| <footer class="footer"> |
| <p class="pull-left"> |
| <a href="http://incubator.apache.org"><img src="/img/apache-incubator.png" width="225" height="53" align="right"/></a> |
| </p> |
| <p class="small"> |
| Apache Kudu (incubating) is an effort undergoing incubation at the Apache Software |
| Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is |
| required of all newly accepted projects until a further review |
| indicates that the infrastructure, communications, and decision making |
| process have stabilized in a manner consistent with other successful |
| ASF projects. While incubation status is not necessarily a reflection |
| of the completeness or stability of the code, it does indicate that the |
| project has yet to be fully endorsed by the ASF. |
| |
| Copyright © 2016 The Apache Software Foundation. |
| </p> |
| </footer> |
| </div> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> |
| <script> |
| // Try to detect touch-screen devices. Note: Many laptops have touch screens. |
| $(document).ready(function() { |
| if ("ontouchstart" in document.documentElement) { |
| $(document.documentElement).addClass("touch"); |
| } else { |
| $(document.documentElement).addClass("no-touch"); |
| } |
| }); |
| </script> |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js" |
| integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" |
| crossorigin="anonymous"></script> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-68448017-1', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script> |
| <script> |
| anchors.options = { |
| placement: 'right', |
| visible: 'touch', |
| }; |
| anchors.add(); |
| </script> |
| </body> |
| </html> |
| |