blob: c1c8c7b3e0af9eec24113246d2caed744f93c29c [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Default Partitioning Changes Coming in Kudu 0.9</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />
<link rel="alternate" type="application/atom+xml"
title="RSS Feed for Apache Kudu blog"
href="/feed.xml" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="logo" href="/"><img
src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
alt="Apache Kudu"/></a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
</li>
<li >
<a href="/overview.html">Overview</a>
</li>
<li >
<a href="/docs/">Documentation</a>
</li>
<li >
<a href="/releases/">Download</a>
</li>
<li class="active">
<a href="/blog/">Blog</a>
</li>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
</ul>
</li>
<li >
<a href="/faq.html">FAQ</a>
</li>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
</nav>
<div class="row header">
<div class="col-lg-12">
<h2><a href="/blog">Apache Kudu Blog</a></h2>
</div>
</div>
<div class="row-fluid">
<div class="col-lg-9">
<article>
<header>
<h1 class="entry-title">Default Partitioning Changes Coming in Kudu 0.9</h1>
<p class="meta">Posted 02 Jun 2016 by Dan Burkert</p>
</header>
<div class="entry-content">
<p>The upcoming Apache Kudu (incubating) 0.9 release is changing the default
partitioning configuration for new tables. This post will introduce the change,
explain the motivations, and show examples of how code can be updated to work
with the new release.</p>
<!--more-->
<p>The most common source of frustration with new Kudu users is the default
partitioning behavior when creating new tables. If partitioning is not
specified, the Kudu client prior to 0.9 creates tables with a <em>single tablet</em>.
Single tablet tables are a Kudu anti-pattern, since they are unable to get the
scalability benefit of distributing data across the cluster, and instead keep
all data on a single machine.</p>
<p>Unfortunately, automatically choosing a better default partitioning
configuration for new tables is not simple. In most cases, hash partitioning on
the primary key is a better default, but this approach can have its own
drawbacks. In particular, it is not clear how many buckets should be used for
the new table.</p>
<p>Since there is no bullet-proof default and changing the partitioning
configuration after table creation is impossible, <a href="https://lists.apache.org/thread.html/ca8972620839109334493424a1022fc08c77c315d9d623f5caaa815f@1463699013@%3Cuser.kudu.apache.org%3E">we
decided</a>
to remove the default altogether. Removing the default is a backwards
incompatible change, so it must be done before the 1.0 release. If we later find
a better way to create a default partitioning configuration, it should be
possible to adopt it in a backwards compatible way. The result of removing the
default is that new tables created with the 0.9 client must specify a
partitioning configuration, or table creation will fail. You can still create a
table with a single tablet, but it must be configured explicitly. These changes
only affect new table creation; existing tables, including tables created with
default partitioning before the 0.9 release, will continue to work.</p>
<p>In most cases updating existing code to explicitly set a partitioning
configuration should be simple. The examples below add hash partitioning, but
you can also specify range partitioning or a combination of range and hash
partitioning. See the <a href="http://kudu.apache.org/docs/schema_design.html#data-distribution">schema design
guide</a> for more
advanced configurations.</p>
<h1 id="c-client">C++ Client</h1>
<p>With the C++ client, creating a new table with hash partitions is as simple as
calling <code class="highlighter-rouge">KuduTableCreator:add_hash_partitions</code> with the columns to hash and the
number of buckets to use:</p>
<div class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">unique_ptr</span><span class="o">&lt;</span><span class="n">KuduTableCreator</span><span class="o">&gt;</span> <span class="n">table_creator</span><span class="p">(</span><span class="n">my_client</span><span class="o">-&gt;</span><span class="n">NewTableCreator</span><span class="p">());</span>
<span class="n">Status</span> <span class="n">create_status</span> <span class="o">=</span> <span class="n">table_creator</span><span class="o">-&gt;</span><span class="n">table_name</span><span class="p">(</span><span class="s">&quot;my-table&quot;</span><span class="p">)</span>
<span class="p">.</span><span class="n">schema</span><span class="p">(</span><span class="n">my_schema</span><span class="p">)</span>
<span class="p">.</span><span class="n">add_hash_partitions</span><span class="p">({</span> <span class="s">&quot;key_column_a&quot;</span><span class="p">,</span> <span class="s">&quot;key_column_b&quot;</span> <span class="p">},</span> <span class="mi">16</span><span class="p">)</span>
<span class="p">.</span><span class="n">Create</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">create_status</span><span class="p">.</span><span class="n">ok</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* handle error */</span> <span class="p">}</span></code></pre></div>
<h1 id="java-client">Java Client</h1>
<p>And similarly, in Java:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">hashColumns</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;&gt;();</span>
<span class="n">hashColumns</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;key_column_a&quot;</span><span class="o">);</span>
<span class="n">hashColumn</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;key_column_b&quot;</span><span class="o">);</span>
<span class="n">CreateTableOptions</span> <span class="n">options</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CreateTableOptions</span><span class="o">().</span><span class="na">addHashPartitions</span><span class="o">(</span><span class="n">hashColumns</span><span class="o">,</span> <span class="mi">16</span><span class="o">);</span>
<span class="n">myClient</span><span class="o">.</span><span class="na">createTable</span><span class="o">(</span><span class="s">&quot;my-table&quot;</span><span class="o">,</span> <span class="n">my_schema</span><span class="o">,</span> <span class="n">options</span><span class="o">);</span></code></pre></div>
<p>In the examples above, if the hash partition configuration is omitted the create
table operation will fail with the error <code class="highlighter-rouge">Table partitioning must be specified
using setRangePartitionColumns or addHashPartitions</code>. In the Java client this
manifests as a thrown <code class="highlighter-rouge">IllegalArgumentException</code>, while in the C++ client it is
returned as a <code class="highlighter-rouge">Status::InvalidArgument</code>.</p>
<h1 id="impala">Impala</h1>
<p>When creating Kudu tables with Impala, the formerly optional <code class="highlighter-rouge">DISTRIBUTE BY</code>
clause is now required:</p>
<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">my_table</span> <span class="p">(</span><span class="n">key_column_a</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">key_column_b</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">other_column</span> <span class="n">STRING</span><span class="p">)</span>
<span class="n">DISTRIBUTE</span> <span class="k">BY</span> <span class="n">HASH</span> <span class="p">(</span><span class="n">key_column_a</span><span class="p">,</span> <span class="n">key_column_b</span><span class="p">)</span> <span class="k">INTO</span> <span class="mi">16</span> <span class="n">BUCKETS</span>
<span class="n">TBLPROPERTIES</span><span class="p">(</span>
<span class="s1">&#39;storage_handler&#39;</span> <span class="o">=</span> <span class="s1">&#39;com.cloudera.kudu.hive.KuduStorageHandler&#39;</span><span class="p">,</span>
<span class="s1">&#39;kudu.table_name&#39;</span> <span class="o">=</span> <span class="s1">&#39;my_table&#39;</span><span class="p">,</span>
<span class="s1">&#39;kudu.master_addresses&#39;</span> <span class="o">=</span> <span class="s1">&#39;kudu-master.example.com:7051&#39;</span><span class="p">,</span>
<span class="s1">&#39;kudu.key_columns&#39;</span> <span class="o">=</span> <span class="s1">&#39;key_column_a,key_column_b&#39;</span>
<span class="p">);</span></code></pre></div>
</div>
</article>
</div>
<div class="col-lg-3 recent-posts">
<h3>Recent posts</h3>
<ul>
<li> <a href="/2016/08/31/intro-flume-kudu-sink.html">An Introduction to the Flume Kudu Sink</a> </li>
<li> <a href="/2016/08/23/new-range-partitioning-features.html">New Range Partitioning Features in Kudu 0.10</a> </li>
<li> <a href="/2016/08/23/apache-kudu-0-10-0-released.html">Apache Kudu 0.10.0 released</a> </li>
<li> <a href="/2016/08/16/weekly-update.html">Apache Kudu Weekly Update August 16th, 2016</a> </li>
<li> <a href="/2016/08/08/weekly-update.html">Apache Kudu Weekly Update August 8th, 2016</a> </li>
<li> <a href="/2016/07/26/weekly-update.html">Apache Kudu Weekly Update July 26, 2016</a> </li>
<li> <a href="/2016/07/25/asf-graduation.html">The Apache Software Foundation Announces Apache&reg; Kudu&trade; as a Top-Level Project</a> </li>
<li> <a href="/2016/07/18/weekly-update.html">Apache Kudu (incubating) Weekly Update July 18, 2016</a> </li>
<li> <a href="/2016/07/11/weekly-update.html">Apache Kudu (incubating) Weekly Update July 11, 2016</a> </li>
<li> <a href="/2016/07/01/apache-kudu-0-9-1-released.html">Apache Kudu (incubating) 0.9.1 released</a> </li>
<li> <a href="/2016/06/27/weekly-update.html">Apache Kudu (incubating) Weekly Update June 27, 2016</a> </li>
<li> <a href="/2016/06/24/multi-master-1-0-0.html">Master fault tolerance in Kudu 1.0</a> </li>
<li> <a href="/2016/06/21/weekly-update.html">Apache Kudu (incubating) Weekly Update June 21, 2016</a> </li>
<li> <a href="/2016/06/17/raft-consensus-single-node.html">Using Raft Consensus on a Single Node</a> </li>
<li> <a href="/2016/06/13/weekly-update.html">Apache Kudu (incubating) Weekly Update June 13, 2016</a> </li>
</ul>
</div>
</div>
<footer class="footer">
<p class="small">
Copyright &copy; 2016 The Apache Software Foundation.
</p>
</footer>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
$(document.documentElement).addClass("touch");
} else {
$(document.documentElement).addClass("no-touch");
}
});
</script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
crossorigin="anonymous"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
<script>
anchors.options = {
placement: 'right',
visible: 'touch',
};
anchors.add();
</script>
</body>
</html>