blob: f2cb8d20ab772efba164d2bcefe8865acf4d036b [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Default Partitioning Changes Coming in Kudu 0.9</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />
<link rel="alternate" type="application/atom+xml"
title="RSS Feed for Apache Kudu blog"
href="/feed.xml" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="logo" href="/"><img
src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
alt="Apache Kudu"/></a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
</li>
<li >
<a href="/overview.html">Overview</a>
</li>
<li >
<a href="/docs/">Documentation</a>
</li>
<li >
<a href="/releases/">Download</a>
</li>
<li class="active">
<a href="/blog/">Blog</a>
</li>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
<li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
</ul>
</li>
<li >
<a href="/faq.html">FAQ</a>
</li>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
</nav>
<div class="row header">
<div class="col-lg-12">
<h2><a href="/blog">Apache Kudu Blog</a></h2>
</div>
</div>
<div class="row-fluid">
<div class="col-lg-9">
<article>
<header>
<h1 class="entry-title">Default Partitioning Changes Coming in Kudu 0.9</h1>
<p class="meta">Posted 02 Jun 2016 by Dan Burkert</p>
</header>
<div class="entry-content">
<p>The upcoming Apache Kudu (incubating) 0.9 release is changing the default
partitioning configuration for new tables. This post will introduce the change,
explain the motivations, and show examples of how code can be updated to work
with the new release.</p>
<!--more-->
<p>The most common source of frustration with new Kudu users is the default
partitioning behavior when creating new tables. If partitioning is not
specified, the Kudu client prior to 0.9 creates tables with a <em>single tablet</em>.
Single tablet tables are a Kudu anti-pattern, since they are unable to get the
scalability benefit of distributing data across the cluster, and instead keep
all data on a single machine.</p>
<p>Unfortunately, automatically choosing a better default partitioning
configuration for new tables is not simple. In most cases, hash partitioning on
the primary key is a better default, but this approach can have its own
drawbacks. In particular, it is not clear how many buckets should be used for
the new table.</p>
<p>Since there is no bullet-proof default and changing the partitioning
configuration after table creation is impossible, <a href="https://lists.apache.org/thread.html/ca8972620839109334493424a1022fc08c77c315d9d623f5caaa815f@1463699013@%3Cuser.kudu.apache.org%3E">we
decided</a>
to remove the default altogether. Removing the default is a backwards
incompatible change, so it must be done before the 1.0 release. If we later find
a better way to create a default partitioning configuration, it should be
possible to adopt it in a backwards compatible way. The result of removing the
default is that new tables created with the 0.9 client must specify a
partitioning configuration, or table creation will fail. You can still create a
table with a single tablet, but it must be configured explicitly. These changes
only affect new table creation; existing tables, including tables created with
default partitioning before the 0.9 release, will continue to work.</p>
<p>In most cases updating existing code to explicitly set a partitioning
configuration should be simple. The examples below add hash partitioning, but
you can also specify range partitioning or a combination of range and hash
partitioning. See the <a href="http://kudu.apache.org/docs/schema_design.html#data-distribution">schema design
guide</a> for more
advanced configurations.</p>
<h1 id="c-client">C++ Client</h1>
<p>With the C++ client, creating a new table with hash partitions is as simple as
calling <code>KuduTableCreator:add_hash_partitions</code> with the columns to hash and the
number of buckets to use:</p>
<div class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">unique_ptr</span><span class="o">&lt;</span><span class="n">KuduTableCreator</span><span class="o">&gt;</span> <span class="n">table_creator</span><span class="p">(</span><span class="n">my_client</span><span class="o">-&gt;</span><span class="n">NewTableCreator</span><span class="p">());</span>
<span class="n">Status</span> <span class="n">create_status</span> <span class="o">=</span> <span class="n">table_creator</span><span class="o">-&gt;</span><span class="n">table_name</span><span class="p">(</span><span class="s">&quot;my-table&quot;</span><span class="p">)</span>
<span class="p">.</span><span class="n">schema</span><span class="p">(</span><span class="n">my_schema</span><span class="p">)</span>
<span class="p">.</span><span class="n">add_hash_partitions</span><span class="p">({</span> <span class="s">&quot;key_column_a&quot;</span><span class="p">,</span> <span class="s">&quot;key_column_b&quot;</span> <span class="p">},</span> <span class="mi">16</span><span class="p">)</span>
<span class="p">.</span><span class="n">Create</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">create_status</span><span class="p">.</span><span class="n">ok</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* handle error */</span> <span class="p">}</span></code></pre></div>
<h1 id="java-client">Java Client</h1>
<p>And similarly, in Java:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">hashColumns</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;&gt;();</span>
<span class="n">hashColumns</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;key_column_a&quot;</span><span class="o">);</span>
<span class="n">hashColumn</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;key_column_b&quot;</span><span class="o">);</span>
<span class="n">CreateTableOptions</span> <span class="n">options</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CreateTableOptions</span><span class="o">().</span><span class="na">addHashPartitions</span><span class="o">(</span><span class="n">hashColumns</span><span class="o">,</span> <span class="mi">16</span><span class="o">);</span>
<span class="n">myClient</span><span class="o">.</span><span class="na">createTable</span><span class="o">(</span><span class="s">&quot;my-table&quot;</span><span class="o">,</span> <span class="n">my_schema</span><span class="o">,</span> <span class="n">options</span><span class="o">);</span></code></pre></div>
<p>In the examples above, if the hash partition configuration is omitted the create
table operation will fail with the error <code>Table partitioning must be specified
using setRangePartitionColumns or addHashPartitions</code>. In the Java client this
manifests as a thrown <code>IllegalArgumentException</code>, while in the C++ client it is
returned as a <code>Status::InvalidArgument</code>.</p>
<h1 id="impala">Impala</h1>
<p>When creating Kudu tables with Impala, the formerly optional <code>DISTRIBUTE BY</code>
clause is now required:</p>
<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">my_table</span> <span class="p">(</span><span class="n">key_column_a</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">key_column_b</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">other_column</span> <span class="n">STRING</span><span class="p">)</span>
<span class="n">DISTRIBUTE</span> <span class="k">BY</span> <span class="n">HASH</span> <span class="p">(</span><span class="n">key_column_a</span><span class="p">,</span> <span class="n">key_column_b</span><span class="p">)</span> <span class="k">INTO</span> <span class="mi">16</span> <span class="n">BUCKETS</span>
<span class="n">TBLPROPERTIES</span><span class="p">(</span>
<span class="s1">&#39;storage_handler&#39;</span> <span class="o">=</span> <span class="s1">&#39;com.cloudera.kudu.hive.KuduStorageHandler&#39;</span><span class="p">,</span>
<span class="s1">&#39;kudu.table_name&#39;</span> <span class="o">=</span> <span class="s1">&#39;my_table&#39;</span><span class="p">,</span>
<span class="s1">&#39;kudu.master_addresses&#39;</span> <span class="o">=</span> <span class="s1">&#39;kudu-master.example.com:7051&#39;</span><span class="p">,</span>
<span class="s1">&#39;kudu.key_columns&#39;</span> <span class="o">=</span> <span class="s1">&#39;key_column_a,key_column_b&#39;</span>
<span class="p">);</span></code></pre></div>
</div>
</article>
</div>
<div class="col-lg-3 recent-posts">
<h3>Recent posts</h3>
<ul>
<li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li>
<li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li>
<li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li>
<li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li>
<li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li>
<li> <a href="/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html">Testing Apache Kudu Applications on the JVM</a> </li>
<li> <a href="/2019/03/15/apache-kudu-1-9-0-release.html">Apache Kudu 1.9.0 Released</a> </li>
<li> <a href="/2019/03/05/transparent-hierarchical-storage-management-with-apache-kudu-and-impala.html">Transparent Hierarchical Storage Management with Apache Kudu and Impala</a> </li>
<li> <a href="/2018/12/11/call-for-posts.html">Call for Posts</a> </li>
<li> <a href="/2018/10/26/apache-kudu-1-8-0-released.html">Apache Kudu 1.8.0 Released</a> </li>
<li> <a href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan Optimization in Kudu</a> </li>
<li> <a href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data Pipelines with Kudu</a> </li>
<li> <a href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting Started with Kudu - an O'Reilly Title</a> </li>
<li> <a href="/2018/07/10/instrumentation-in-kudu.html">Instrumentation in Apache Kudu</a> </li>
<li> <a href="/2018/03/23/apache-kudu-1-7-0-released.html">Apache Kudu 1.7.0 released</a> </li>
</ul>
</div>
</div>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p class="small">
Copyright &copy; 2019 The Apache Software Foundation.
</p>
<p class="small">
Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
project logo are either registered trademarks or trademarks of The
Apache Software Foundation in the United States and other countries.
</p>
</div>
<div class="col-md-3">
<a class="pull-right" href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png"/>
</a>
</div>
</div>
</footer>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
$(document.documentElement).addClass("touch");
} else {
$(document.documentElement).addClass("no-touch");
}
});
</script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
crossorigin="anonymous"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
<script>
anchors.options = {
placement: 'right',
visible: 'touch',
};
anchors.add();
</script>
</body>
</html>