| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" /> |
| <meta name="author" content="Cloudera" /> |
| <title>Apache Kudu - Location Awareness in Kudu</title> |
| <!-- Bootstrap core CSS --> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" |
| integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" |
| crossorigin="anonymous"> |
| |
| <!-- Custom styles for this template --> |
| <link href="/css/kudu.css" rel="stylesheet"/> |
| <link href="/css/asciidoc.css" rel="stylesheet"/> |
| <link rel="shortcut icon" href="/img/logo-favicon.ico" /> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" /> |
| |
| |
| <link rel="alternate" type="application/atom+xml" |
| title="RSS Feed for Apache Kudu blog" |
| href="/feed.xml" /> |
| |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| <body> |
| <div class="kudu-site container-fluid"> |
| <!-- Static navbar --> |
| <nav class="navbar navbar-default"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="logo" href="/"><img |
| src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png" |
| srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x" |
| alt="Apache Kudu"/></a> |
| |
| </div> |
| <div id="navbar" class="collapse navbar-collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li > |
| <a href="/">Home</a> |
| </li> |
| <li > |
| <a href="/overview.html">Overview</a> |
| </li> |
| <li > |
| <a href="/docs/">Documentation</a> |
| </li> |
| <li > |
| <a href="/releases/">Releases</a> |
| </li> |
| <li class="active"> |
| <a href="/blog/">Blog</a> |
| </li> |
| <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here |
| that doesn't also appear elsewhere on the site. --> |
| <li class="dropdown"> |
| <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> |
| <ul class="dropdown-menu"> |
| <li class="dropdown-header">GET IN TOUCH</li> |
| <li><a class="icon email" href="/community.html">Mailing Lists</a></li> |
| <li><a class="icon slack" href="https://join.slack.com/t/getkudu/shared_invite/zt-244b4zvki-hB1q9IbAk6CqHNMZHvUALA">Slack Channel</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> |
| <li><a href="/committers.html">Project Committers</a></li> |
| <li><a href="/ecosystem.html">Ecosystem</a></li> |
| <!--<li><a href="/roadmap.html">Roadmap</a></li>--> |
| <li><a href="/community.html#contributions">How to Contribute</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">DEVELOPER RESOURCES</li> |
| <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> |
| <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> |
| <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">SOCIAL MEDIA</li> |
| <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> |
| <li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li> |
| <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> |
| <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> |
| <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> |
| <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> |
| </ul> |
| </li> |
| <li > |
| <a href="/faq.html">FAQ</a> |
| </li> |
| </ul><!-- /.nav --> |
| </div><!-- /#navbar --> |
| </div><!-- /.container-fluid --> |
| </nav> |
| |
| <div class="row header"> |
| <div class="col-lg-12"> |
| <h2><a href="/blog">Apache Kudu Blog</a></h2> |
| </div> |
| </div> |
| |
| <div class="row-fluid"> |
| <div class="col-lg-9"> |
| <article> |
| <header> |
| <h1 class="entry-title">Location Awareness in Kudu</h1> |
| <p class="meta">Posted 30 Apr 2019 by Alexey Serbin</p> |
| </header> |
| <div class="entry-content"> |
| <p>This post is about location awareness in Kudu. It gives an overview |
| of the following:</p> |
| <ul> |
| <li>principles of the design</li> |
| <li>restrictions of the current implementation</li> |
| <li>potential future enhancements and extensions</li> |
| </ul> |
| |
| <!--more--> |
| |
| <h1 id="introduction">Introduction</h1> |
| <p>Kudu supports location awareness starting with the 1.9.0 release. The |
| initial implementation of location awareness in Kudu is built to satisfy the |
| following requirement:</p> |
| |
| <ul> |
| <li>In a Kudu cluster consisting of multiple servers spread over several racks, |
| place the replicas of a tablet in such a way that the tablet stays available |
| even if all the servers in a single rack become unavailable.</li> |
| </ul> |
| |
| <p>A rack failure can occur when a hardware component shared among servers in the |
| rack, such as a network switch or power supply, fails. More generally, |
| replace ‘rack’ with any other aggregation of nodes (e.g., chassis, site, |
| cloud availability zone, etc.) where some or all nodes in an aggregate become |
| unavailable in case of a failure. This even applies to a datacenter if the |
| network latency between datacenters is low. This is why we call the feature |
| <em>location awareness</em> and not <em>rack awareness</em>.</p> |
| |
| <h1 id="locations-in-kudu">Locations in Kudu</h1> |
| <p>In Kudu, a location is defined by a string that begins with a slash (<code class="language-plaintext highlighter-rouge">/</code>) and |
| consists of slash-separated tokens each of which contains only characters from |
| the set <code class="language-plaintext highlighter-rouge">[a-zA-Z0-9_-.]</code>. The components of the location string hierarchy |
| should correspond to the physical or cloud-defined hierarchy of the deployed |
| cluster, e.g. <code class="language-plaintext highlighter-rouge">/data-center-0/rack-09</code> or <code class="language-plaintext highlighter-rouge">/region-0/availability-zone-01</code>.</p> |
| |
| <p>The design choice of using hierarchical paths for location strings is |
| partially influenced by HDFS. The intention was to make it possible using |
| the same locations as for existing HDFS nodes, because it’s common to deploy |
| Kudu alongside HDFS. In addition, the hierarchical structure of location |
| strings allows for interpretation of those in terms of common ancestry and |
| relative proximity. As of now, Kudu does not exploit the hierarchical |
| structure of the location except for the client’s logic to find the closest |
| tablet server. However, we plan to leverage the hierarchical structure |
| in future releases.</p> |
| |
| <h1 id="defining-and-assigning-locations">Defining and assigning locations</h1> |
| <p>Kudu masters assign locations to tablet servers and clients.</p> |
| |
| <p>Every Kudu master runs the location assignment procedure to assign a location |
| to a tablet server when it registers. To determine the location for a tablet |
| server, the master invokes an executable that takes the IP address or hostname |
| of the tablet server and outputs the corresponding location string for the |
| specified IP address or hostname. If the executable exits with non-zero exit |
| status, that’s interpreted as an error and masters add corresponding error |
| message about that into their logs. In case of tablet server registrations |
| such outcome is deemed as a registration failure and the corresponding tablet |
| server is not added into the master’s registry. The latter renders the tablet |
| server unusable to Kudu clients since non-registered tablet servers are not |
| discoverable to Kudu clients via <code class="language-plaintext highlighter-rouge">GetTableLocations</code> RPC.</p> |
| |
| <p>The master associates the produced location string with the registered tablet |
| server and keeps it until the tablet server re-registers, which only occurs |
| if the master or tablet server restarts. Masters use the assigned location |
| information internally to make replica placement decisions, trying to place |
| replicas evenly across locations and to keep tablets available in case all |
| tablet servers in a single location fail (see |
| <a href="https://s.apache.org/location-awareness-design">the design document</a> |
| for details). In addition, masters provide connected clients with |
| the information on the client’s assigned location, so the clients can make |
| informed decisions when they attempt to read from the closest tablet server. |
| Kudu tablet servers themselves are location agnostic, at least for now, |
| so the assigned location is not reported back to a registered tablet server.</p> |
| |
| <h1 id="the-location-aware-placement-policy-for-tablet-replicas-in-kudu">The location-aware placement policy for tablet replicas in Kudu</h1> |
| <p>While placing replicas of tablets in location-aware cluster, Kudu uses a best |
| effort approach to adhere to the following principle:</p> |
| <ul> |
| <li>Spread replicas across locations so that the failure of tablet servers |
| in one location does not make tablets unavailable.</li> |
| </ul> |
| |
| <p>That’s referred to as the <em>replica placement policy</em> or just <em>placement policy</em>. |
| In Kudu, both the initial placement of tablet replicas and the automatic |
| re-replication are governed by that policy. As of now, that’s the only |
| replica placement policy available in Kudu. The placement policy isn’t |
| customizable and doesn’t have any configurable parameters.</p> |
| |
| <h1 id="automatic-re-replication-and-placement-policy">Automatic re-replication and placement policy</h1> |
| <p>By design, keeping the target replication factor for tablets has higher |
| priority than conforming to the replica placement policy. In other words, |
| when bringing up tablet replicas to replace failed ones, Kudu uses a best-effort |
| approach with regard to conforming to the constraints of the placement policy. |
| Essentially, that means that if there isn’t a way to place a replica to conform |
| with the placement policy, the system places the replica anyway. The resulting |
| violation of the placement policy can be addressed later on when unreachable |
| tablet servers become available again or the misconfiguration is addressed. |
| As of now, to fix the resulting placement policy violations it’s necessary |
| to run the CLI rebalancer tool manually (see below for details), |
| but in future releases that might be done <a href="https://issues.apache.org/jira/browse/KUDU-2780">automatically in background</a>.</p> |
| |
| <h1 id="an-example-of-location-aware-rebalancing">An example of location-aware rebalancing</h1> |
| <p>This section illustrates what happens during each phase of the location-aware |
| rebalancing process.</p> |
| |
| <p>In the diagrams below, the larger outer boxes denote locations, and the |
| smaller inner ones denote tablet servers. As for the real-world objects behind |
| locations in this example, one might think of server racks with a shared power |
| supply or a shared network switch. It’s assumed that no more than one tablet |
| server is run at each node (i.e. machine) in a rack.</p> |
| |
| <p>The first phase of the rebalancing process is about detecting violations and |
| reinstating the placement policy in the cluster. In the diagram below, there |
| are three locations defined: <code class="language-plaintext highlighter-rouge">/L0</code>, <code class="language-plaintext highlighter-rouge">/L1</code>, <code class="language-plaintext highlighter-rouge">/L2</code>. Each location has two tablet |
| servers. Table <code class="language-plaintext highlighter-rouge">A</code> has the replication factor of three (RF=3) and consists of |
| four tablets: <code class="language-plaintext highlighter-rouge">A0</code>, <code class="language-plaintext highlighter-rouge">A1</code>, <code class="language-plaintext highlighter-rouge">A2</code>, <code class="language-plaintext highlighter-rouge">A3</code>. Table <code class="language-plaintext highlighter-rouge">B</code> has replication factor of five |
| (RF=5) and consists of three tablets: <code class="language-plaintext highlighter-rouge">B0</code>, <code class="language-plaintext highlighter-rouge">B1</code>, <code class="language-plaintext highlighter-rouge">B2</code>.</p> |
| |
| <p>The distribution of the replicas for tablet <code class="language-plaintext highlighter-rouge">A0</code> violates the placement policy. |
| Why? Because replicas <code class="language-plaintext highlighter-rouge">A0.0</code> and <code class="language-plaintext highlighter-rouge">A0.1</code> constitute the majority of replicas |
| (two out of three) and reside in the same location <code class="language-plaintext highlighter-rouge">/L0</code>.</p> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> /L0 /L1 /L2 |
| +-------------------+ +-------------------+ +-------------------+ |
| | TS0 TS1 | | TS2 TS3 | | TS4 TS5 | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| | | A0.0 | | A0.1 | | | | A0.2 | | | | | | | | | | |
| | | | | A1.0 | | | | A1.1 | | | | | | A1.2 | | | | |
| | | | | A2.0 | | | | A2.1 | | | | | | A2.2 | | | | |
| | | | | A3.0 | | | | A3.1 | | | | | | A3.2 | | | | |
| | | B0.0 | | B0.1 | | | | B0.2 | | B0.3 | | | | B0.4 | | | | |
| | | B1.0 | | B1.1 | | | | B1.2 | | B1.3 | | | | B1.4 | | | | |
| | | B2.0 | | B2.1 | | | | B2.2 | | B2.3 | | | | B2.4 | | | | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| +-------------------+ +-------------------+ +-------------------+ |
| </code></pre></div></div> |
| |
| <p>The location-aware rebalancer should initiate movement either of <code class="language-plaintext highlighter-rouge">T0.0</code> or |
| <code class="language-plaintext highlighter-rouge">T0.1</code> from <code class="language-plaintext highlighter-rouge">/L0</code> to other location, so the resulting replica distribution would |
| <em>not</em> contain the majority of replicas in any single location. In addition to |
| that, the rebalancer tool tries to evenly spread the load across all locations |
| and tablet servers within each location. The latter narrows down the list |
| of the candidate replicas to move: <code class="language-plaintext highlighter-rouge">A0.1</code> is the best candidate to move from |
| location <code class="language-plaintext highlighter-rouge">/L0</code>, so location <code class="language-plaintext highlighter-rouge">/L0</code> would not contain the majority of replicas |
| for tablet <code class="language-plaintext highlighter-rouge">A0</code>. The same principle dictates the target location and the target |
| tablet server to receive <code class="language-plaintext highlighter-rouge">A0.1</code>: that should be tablet server <code class="language-plaintext highlighter-rouge">TS5</code> in the |
| location <code class="language-plaintext highlighter-rouge">/L2</code>. The result distribution of the tablet replicas after the move |
| is represented in the diagram below.</p> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> /L0 /L1 /L2 |
| +-------------------+ +-------------------+ +-------------------+ |
| | TS0 TS1 | | TS2 TS3 | | TS4 TS5 | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| | | A0.0 | | | | | | A0.2 | | | | | | | | A0.1 | | |
| | | | | A1.0 | | | | A1.1 | | | | | | A1.2 | | | | |
| | | | | A2.0 | | | | A2.1 | | | | | | A2.2 | | | | |
| | | | | A3.0 | | | | A3.1 | | | | | | A3.2 | | | | |
| | | B0.0 | | B0.1 | | | | B0.2 | | B0.3 | | | | B0.4 | | | | |
| | | B1.0 | | B1.1 | | | | B1.2 | | B1.3 | | | | B1.4 | | | | |
| | | B2.0 | | B2.1 | | | | B2.2 | | B2.3 | | | | B2.4 | | | | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| +-------------------+ +-------------------+ +-------------------+ |
| </code></pre></div></div> |
| |
| <p>The second phase of the location-aware rebalancing is about moving tablet |
| replicas across locations to make the locations’ load more balanced. For the |
| number <code class="language-plaintext highlighter-rouge">S</code> of tablet servers in a location and the total number <code class="language-plaintext highlighter-rouge">R</code> of replicas |
| in the location, the <em>load of the location</em> is defined as <code class="language-plaintext highlighter-rouge">R/S</code>.</p> |
| |
| <p>At this stage all violations of the placement policy are already rectified. The |
| rebalancer tool doesn’t attempt to make any moves which would violate the |
| placement policy.</p> |
| |
| <p>The load of the locations in the diagram above:</p> |
| <ul> |
| <li><code class="language-plaintext highlighter-rouge">/L0</code>: 1/5</li> |
| <li><code class="language-plaintext highlighter-rouge">/L1</code>: 1/5</li> |
| <li><code class="language-plaintext highlighter-rouge">/L2</code>: 2/7</li> |
| </ul> |
| |
| <p>A possible distribution of the tablet replicas after the second phase is |
| represented below. The result load of the locations:</p> |
| <ul> |
| <li><code class="language-plaintext highlighter-rouge">/L0</code>: 2/9</li> |
| <li><code class="language-plaintext highlighter-rouge">/L1</code>: 2/9</li> |
| <li><code class="language-plaintext highlighter-rouge">/L2</code>: 2/9</li> |
| </ul> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> /L0 /L1 /L2 |
| +-------------------+ +-------------------+ +-------------------+ |
| | TS0 TS1 | | TS2 TS3 | | TS4 TS5 | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| | | A0.0 | | | | | | A0.2 | | | | | | | | A0.1 | | |
| | | | | A1.0 | | | | A1.1 | | | | | | A1.2 | | | | |
| | | | | A2.0 | | | | A2.1 | | | | | | A2.2 | | | | |
| | | | | A3.0 | | | | A3.1 | | | | | | A3.2 | | | | |
| | | B0.0 | | | | | | B0.2 | | B0.3 | | | | B0.4 | | B0.1 | | |
| | | B1.0 | | B1.1 | | | | | | B1.3 | | | | B1.4 | | B2.2 | | |
| | | B2.0 | | B2.1 | | | | B2.2 | | B2.3 | | | | B2.4 | | | | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| +-------------------+ +-------------------+ +-------------------+ |
| </code></pre></div></div> |
| |
| <p>The third phase of the location-aware rebalancing is about moving tablet |
| replicas within each location to make the distribution of replicas even, |
| both per-table and per-server.</p> |
| |
| <p>See below for a possible replicas’ distribution in the example scenario |
| after the third phase of the location-aware rebalancing successfully completes.</p> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> /L0 /L1 /L2 |
| +-------------------+ +-------------------+ +-------------------+ |
| | TS0 TS1 | | TS2 TS3 | | TS4 TS5 | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| | | A0.0 | | | | | | | | A0.2 | | | | | | A0.1 | | |
| | | | | A1.0 | | | | A1.1 | | | | | | A1.2 | | | | |
| | | | | A2.0 | | | | A2.1 | | | | | | A2.2 | | | | |
| | | | | A3.0 | | | | A3.1 | | | | | | A3.2 | | | | |
| | | B0.0 | | | | | | B0.2 | | B0.3 | | | | B0.4 | | B0.1 | | |
| | | B1.0 | | B1.1 | | | | | | B1.3 | | | | B1.4 | | B1.2 | | |
| | | B2.0 | | B2.1 | | | | B2.2 | | B2.3 | | | | | | B2.4 | | |
| | +------+ +------+ | | +------+ +------+ | | +------+ +------+ | |
| +-------------------+ +-------------------+ +-------------------+ |
| </code></pre></div></div> |
| |
| <h1 id="how-to-make-a-kudu-cluster-location-aware">How to make a Kudu cluster location-aware</h1> |
| <p>To make a Kudu cluster location-aware, it’s necessary to set the |
| <code class="language-plaintext highlighter-rouge">--location_mapping_cmd</code> flag for Kudu master(s) and make the corresponding |
| executable (binary or a script) available at the nodes where Kudu masters run. |
| In case of multiple masters, it’s important to make sure that the location |
| mappings stay the same regardless of the node where the location assignment |
| command is running.</p> |
| |
| <p>It’s recommended to have at least three locations defined in a Kudu |
| cluster so that no location contains a majority of tablet replicas. |
| With two locations or less it’s not possible to spread replicas |
| of tablets with replication factor of three and higher such that no location |
| contains a majority of replicas.</p> |
| |
| <p>For example, running a Kudu cluster in a single datacenter <code class="language-plaintext highlighter-rouge">dc0</code>, assign |
| location <code class="language-plaintext highlighter-rouge">/dc0/rack0</code> to tablet servers running at machines in the rack <code class="language-plaintext highlighter-rouge">rack0</code>, |
| <code class="language-plaintext highlighter-rouge">/dc0/rack1</code> to tablet servers running at machines in the rack <code class="language-plaintext highlighter-rouge">rack1</code>, |
| and <code class="language-plaintext highlighter-rouge">/dc0/rack2</code> to tablet servers running at machines in the rack <code class="language-plaintext highlighter-rouge">rack2</code>. |
| In a similar way, when running in cloud, assign location <code class="language-plaintext highlighter-rouge">/regionA/az0</code> |
| to tablet servers running in availability zone <code class="language-plaintext highlighter-rouge">az0</code> of region <code class="language-plaintext highlighter-rouge">regionA</code>, |
| and <code class="language-plaintext highlighter-rouge">/regionA/az1</code> to tablet servers running in zone <code class="language-plaintext highlighter-rouge">az1</code> of the same region.</p> |
| |
| <h1 id="an-example-of-location-assignment-script-for-kudu">An example of location assignment script for Kudu</h1> |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/sh |
| # |
| # It's assumed a Kudu cluster consists of nodes with IPv4 addresses in the |
| # private 192.168.100.0/32 subnet. The nodes are hosted in racks, where |
| # each rack can contain at most 32 nodes. This results in 8 locations, |
| # one location per rack. |
| # |
| # This example script maps IP addresses into locations assuming that RPC |
| # endpoints of tablet servers are specified via IPv4 addresses. If tablet |
| # servers' RPC endpoints are specified using DNS hostnames (and that's how |
| # it's done by default), the script should consume DNS hostname instead of |
| # an IP address as an input parameter. Check the `--rpc_bind_addresses` and |
| # `--rpc_advertised_addresses` command line flags of kudu-tserver for details. |
| # |
| # DISCLAIMER: |
| # This is an example Bourne shell script for Kudu location assignment. Please |
| # note it's just a toy script created with illustrative-only purpose. |
| # The error handling and the input validation are minimalistic. Also, the |
| # network topology choice, supportability and capacity planning aspects of |
| # this script might be sub-optimal if applied as-is for real-world use cases. |
| |
| set -e |
| |
| if [ $# -ne 1 ]; then |
| echo "usage: $0 <ip_address>" |
| exit 1 |
| fi |
| |
| ip_address=$1 |
| shift |
| |
| suffix=${ip_address##192.168.100.} |
| if [ -z "${suffix##*.*}" ]; then |
| # An IP address from a non-controlled subnet: maps into the 'other' location. |
| echo "/other" |
| exit 0 |
| fi |
| |
| # The mapping of the IP addresses |
| if [ -z "$suffix" -o $suffix -lt 0 -o $suffix -gt 255 ]; then |
| echo "ERROR: '$ip_address' is not a valid IPv4 address" |
| exit 2 |
| fi |
| |
| if [ $suffix -eq 0 -o $suffix -eq 255 ]; then |
| echo "ERROR: '$ip_address' is a 0xffffff00 IPv4 subnet address" |
| exit 3 |
| fi |
| |
| if [ $suffix -lt 32 ]; then |
| echo "/dc0/rack00" |
| elif [ $suffix -ge 32 -a $suffix -lt 64 ]; then |
| echo "/dc0/rack01" |
| elif [ $suffix -ge 64 -a $suffix -lt 96 ]; then |
| echo "/dc0/rack02" |
| elif [ $suffix -ge 96 -a $suffix -lt 128 ]; then |
| echo "/dc0/rack03" |
| elif [ $suffix -ge 128 -a $suffix -lt 160 ]; then |
| echo "/dc0/rack04" |
| elif [ $suffix -ge 160 -a $suffix -lt 192 ]; then |
| echo "/dc0/rack05" |
| elif [ $suffix -ge 192 -a $suffix -lt 224 ]; then |
| echo "/dc0/rack06" |
| else |
| echo "/dc0/rack07" |
| fi |
| </code></pre></div></div> |
| |
| <h1 id="reinstating-the-placement-policy-in-a-location-aware-kudu-cluster">Reinstating the placement policy in a location-aware Kudu cluster</h1> |
| <p>As explained earlier, even if the initial placement of tablet replicas conforms |
| to the placement policy, the cluster might get to a point where there are not |
| enough tablet servers to place a new or a replacement replica. Ideally, such |
| situations should be handled automatically: once there are enough tablet servers |
| in the cluster or the misconfiguration is fixed, the placement policy should |
| be reinstated. Currently, it’s possible to reinstate the placement policy using |
| the <code class="language-plaintext highlighter-rouge">kudu</code> CLI tool:</p> |
| |
| <p><code class="language-plaintext highlighter-rouge">sudo -u kudu kudu cluster rebalance <master_rpc_endpoints></code></p> |
| |
| <p>In the first phase, the location-aware rebalancing process tries to |
| reestablish the placement policy. If that’s not possible, the tool |
| terminates. Use the <code class="language-plaintext highlighter-rouge">--disable_policy_fixer</code> flag to skip this phase and |
| continue to the cross-location rebalancing phase.</p> |
| |
| <p>The second phase is cross-location rebalancing, i.e. moving tablet replicas |
| between different locations in attempt to spread tablet replicas among |
| locations evenly, equalizing the loads of locations throughout the cluster. |
| If the benefits of spreading the load among locations do not justify the cost |
| of the cross-location replica movement, the tool can be instructed to skip the |
| second phase of the location-aware rebalancing. Use the |
| <code class="language-plaintext highlighter-rouge">--disable_cross_location_rebalancing</code> command line flag for that.</p> |
| |
| <p>The third phase is intra-location rebalancing, i.e. balancing the distribution |
| of tablet replicas within each location as if each location is a cluster on its |
| own. Use the <code class="language-plaintext highlighter-rouge">--disable_intra_location_rebalancing</code> flag to skip this phase.</p> |
| |
| <h1 id="future-work">Future work</h1> |
| <p>Having a CLI tool to reinstate placement policy is nice, but it would be great |
| to run the location-aware rebalancing in background, automatically reinstating |
| the placement policy and making tablet replica distribution even |
| across a Kudu cluster.</p> |
| |
| <p>In addition to that, there is a idea to make it possible to have |
| multiple customizable placement policies in the system. As of now, there is |
| a request to implement so-called ‘table pinning’, i.e. make it possible |
| to specify placement policy where replicas of tablets of particular tables |
| are placed only at nodes within the specified locations. The table pinning |
| request is tracked by KUDU-2604 in Apache JIRA, see |
| <a href="https://issues.apache.org/jira/browse/KUDU-2604">KUDU-2604</a>.</p> |
| |
| <h1 id="references">References</h1> |
| <p>[1] Location awareness in Kudu: <a href="https://github.com/apache/kudu/blob/master/docs/design-docs/location-awareness.md">design document</a></p> |
| |
| <p>[2] A proposal for Kudu tablet server labeling: <a href="https://issues.apache.org/jira/browse/KUDU-2604">KUDU-2604</a></p> |
| |
| <p>[3] Further improvement: <a href="https://issues.apache.org/jira/browse/KUDU-2780">automatic cluster rebalancing</a>.</p> |
| |
| </div> |
| </article> |
| |
| |
| </div> |
| <div class="col-lg-3 recent-posts"> |
| <h3>Recent posts</h3> |
| <ul> |
| |
| <li> <a href="/2024/03/07/introducing-auto-incrementing-column.html">Introducing Auto-incrementing Column in Kudu</a> </li> |
| |
| <li> <a href="/2023/09/07/apache-kudu-1-17-0-released.html">Apache Kudu 1.17.0 Released</a> </li> |
| |
| <li> <a href="/2022/06/17/apache-kudu-1-16-0-released.html">Apache Kudu 1.16.0 Released</a> </li> |
| |
| <li> <a href="/2021/06/22/apache-kudu-1-15-0-released.html">Apache Kudu 1.15.0 Released</a> </li> |
| |
| <li> <a href="/2021/01/28/apache-kudu-1-14-0-release.html">Apache Kudu 1.14.0 Released</a> </li> |
| |
| <li> <a href="/2021/01/15/bloom-filter-predicate.html">Optimized joins & filtering with Bloom filter predicate in Kudu</a> </li> |
| |
| <li> <a href="/2020/09/21/apache-kudu-1-13-0-release.html">Apache Kudu 1.13.0 released</a> </li> |
| |
| <li> <a href="/2020/08/11/fine-grained-authz-ranger.html">Fine-Grained Authorization with Apache Kudu and Apache Ranger</a> </li> |
| |
| <li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li> |
| |
| <li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li> |
| |
| <li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li> |
| |
| <li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li> |
| |
| <li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li> |
| |
| <li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li> |
| |
| <li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li> |
| |
| </ul> |
| </div> |
| </div> |
| |
| <footer class="footer"> |
| <div class="row"> |
| <div class="col-md-9"> |
| <p class="small"> |
| Copyright © 2023 The Apache Software Foundation. |
| </p> |
| <p class="small"> |
| Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu |
| project logo are either registered trademarks or trademarks of The |
| Apache Software Foundation in the United States and other countries. |
| </p> |
| </div> |
| <div class="col-md-3"> |
| <a class="pull-right" href="https://www.apache.org/events/current-event.html"> |
| <img src="https://www.apache.org/events/current-event-234x60.png"/> |
| </a> |
| </div> |
| </div> |
| </footer> |
| </div> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> |
| <script> |
| // Try to detect touch-screen devices. Note: Many laptops have touch screens. |
| $(document).ready(function() { |
| if ("ontouchstart" in document.documentElement) { |
| $(document.documentElement).addClass("touch"); |
| } else { |
| $(document.documentElement).addClass("no-touch"); |
| } |
| }); |
| </script> |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js" |
| integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" |
| crossorigin="anonymous"></script> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-68448017-1', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script> |
| <script> |
| anchors.options = { |
| placement: 'right', |
| visible: 'touch', |
| }; |
| anchors.add(); |
| </script> |
| </body> |
| </html> |
| |