| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" /> |
| <meta name="author" content="Cloudera" /> |
| <title>Apache Kudu - Default Partitioning Changes Coming in Kudu 0.9</title> |
| <!-- Bootstrap core CSS --> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" |
| integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" |
| crossorigin="anonymous"> |
| |
| <!-- Custom styles for this template --> |
| <link href="/css/kudu.css" rel="stylesheet"/> |
| <link href="/css/asciidoc.css" rel="stylesheet"/> |
| <link rel="shortcut icon" href="/img/logo-favicon.ico" /> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" /> |
| |
| |
| <link rel="alternate" type="application/atom+xml" |
| title="RSS Feed for Apache Kudu blog" |
| href="/feed.xml" /> |
| |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| <body> |
| <div class="kudu-site container-fluid"> |
| <!-- Static navbar --> |
| <nav class="navbar navbar-default"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="logo" href="/"><img |
| src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png" |
| srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x" |
| alt="Apache Kudu"/></a> |
| |
| </div> |
| <div id="navbar" class="collapse navbar-collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li > |
| <a href="/">Home</a> |
| </li> |
| <li > |
| <a href="/overview.html">Overview</a> |
| </li> |
| <li > |
| <a href="/docs/">Documentation</a> |
| </li> |
| <li > |
| <a href="/releases/">Download</a> |
| </li> |
| <li class="active"> |
| <a href="/blog/">Blog</a> |
| </li> |
| <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here |
| that doesn't also appear elsewhere on the site. --> |
| <li class="dropdown"> |
| <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> |
| <ul class="dropdown-menu"> |
| <li class="dropdown-header">GET IN TOUCH</li> |
| <li><a class="icon email" href="/community.html">Mailing Lists</a></li> |
| <li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> |
| <li><a href="/committers.html">Project Committers</a></li> |
| <!--<li><a href="/roadmap.html">Roadmap</a></li>--> |
| <li><a href="/community.html#contributions">How to Contribute</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">DEVELOPER RESOURCES</li> |
| <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> |
| <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> |
| <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">SOCIAL MEDIA</li> |
| <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> |
| </ul> |
| </li> |
| <li > |
| <a href="/faq.html">FAQ</a> |
| </li> |
| </ul><!-- /.nav --> |
| </div><!-- /#navbar --> |
| </div><!-- /.container-fluid --> |
| </nav> |
| |
| <div class="row header"> |
| <div class="col-lg-12"> |
| <h2><a href="/blog">Apache Kudu Blog</a></h2> |
| </div> |
| </div> |
| |
| <div class="row-fluid"> |
| <div class="col-lg-9"> |
| <article> |
| <header> |
| <h1 class="entry-title">Default Partitioning Changes Coming in Kudu 0.9</h1> |
| <p class="meta">Posted 02 Jun 2016 by Dan Burkert</p> |
| </header> |
| <div class="entry-content"> |
| <p>The upcoming Apache Kudu (incubating) 0.9 release is changing the default |
| partitioning configuration for new tables. This post will introduce the change, |
| explain the motivations, and show examples of how code can be updated to work |
| with the new release.</p> |
| |
| <!--more--> |
| |
| <p>The most common source of frustration with new Kudu users is the default |
| partitioning behavior when creating new tables. If partitioning is not |
| specified, the Kudu client prior to 0.9 creates tables with a <em>single tablet</em>. |
| Single tablet tables are a Kudu anti-pattern, since they are unable to get the |
| scalability benefit of distributing data across the cluster, and instead keep |
| all data on a single machine.</p> |
| |
| <p>Unfortunately, automatically choosing a better default partitioning |
| configuration for new tables is not simple. In most cases, hash partitioning on |
| the primary key is a better default, but this approach can have its own |
| drawbacks. In particular, it is not clear how many buckets should be used for |
| the new table.</p> |
| |
| <p>Since there is no bullet-proof default and changing the partitioning |
| configuration after table creation is impossible, <a href="https://lists.apache.org/thread.html/ca8972620839109334493424a1022fc08c77c315d9d623f5caaa815f@1463699013@%3Cuser.kudu.apache.org%3E">we |
| decided</a> |
| to remove the default altogether. Removing the default is a backwards |
| incompatible change, so it must be done before the 1.0 release. If we later find |
| a better way to create a default partitioning configuration, it should be |
| possible to adopt it in a backwards compatible way. The result of removing the |
| default is that new tables created with the 0.9 client must specify a |
| partitioning configuration, or table creation will fail. You can still create a |
| table with a single tablet, but it must be configured explicitly. These changes |
| only affect new table creation; existing tables, including tables created with |
| default partitioning before the 0.9 release, will continue to work.</p> |
| |
| <p>In most cases updating existing code to explicitly set a partitioning |
| configuration should be simple. The examples below add hash partitioning, but |
| you can also specify range partitioning or a combination of range and hash |
| partitioning. See the <a href="http://kudu.apache.org/docs/schema_design.html#data-distribution">schema design |
| guide</a> for more |
| advanced configurations.</p> |
| |
| <h1 id="c-client">C++ Client</h1> |
| |
| <p>With the C++ client, creating a new table with hash partitions is as simple as |
| calling <code>KuduTableCreator:add_hash_partitions</code> with the columns to hash and the |
| number of buckets to use:</p> |
| |
| <div class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">unique_ptr</span><span class="o"><</span><span class="n">KuduTableCreator</span><span class="o">></span> <span class="n">table_creator</span><span class="p">(</span><span class="n">my_client</span><span class="o">-></span><span class="n">NewTableCreator</span><span class="p">());</span> |
| <span class="n">Status</span> <span class="n">create_status</span> <span class="o">=</span> <span class="n">table_creator</span><span class="o">-></span><span class="n">table_name</span><span class="p">(</span><span class="s">"my-table"</span><span class="p">)</span> |
| <span class="p">.</span><span class="n">schema</span><span class="p">(</span><span class="n">my_schema</span><span class="p">)</span> |
| <span class="p">.</span><span class="n">add_hash_partitions</span><span class="p">({</span> <span class="s">"key_column_a"</span><span class="p">,</span> <span class="s">"key_column_b"</span> <span class="p">},</span> <span class="mi">16</span><span class="p">)</span> |
| <span class="p">.</span><span class="n">Create</span><span class="p">();</span> |
| <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">create_status</span><span class="p">.</span><span class="n">ok</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* handle error */</span> <span class="p">}</span></code></pre></div> |
| |
| <h1 id="java-client">Java Client</h1> |
| |
| <p>And similarly, in Java:</p> |
| |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">hashColumns</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><>();</span> |
| <span class="n">hashColumns</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">"key_column_a"</span><span class="o">);</span> |
| <span class="n">hashColumn</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="s">"key_column_b"</span><span class="o">);</span> |
| <span class="n">CreateTableOptions</span> <span class="n">options</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CreateTableOptions</span><span class="o">().</span><span class="na">addHashPartitions</span><span class="o">(</span><span class="n">hashColumns</span><span class="o">,</span> <span class="mi">16</span><span class="o">);</span> |
| <span class="n">myClient</span><span class="o">.</span><span class="na">createTable</span><span class="o">(</span><span class="s">"my-table"</span><span class="o">,</span> <span class="n">my_schema</span><span class="o">,</span> <span class="n">options</span><span class="o">);</span></code></pre></div> |
| |
| <p>In the examples above, if the hash partition configuration is omitted the create |
| table operation will fail with the error <code>Table partitioning must be specified |
| using setRangePartitionColumns or addHashPartitions</code>. In the Java client this |
| manifests as a thrown <code>IllegalArgumentException</code>, while in the C++ client it is |
| returned as a <code>Status::InvalidArgument</code>.</p> |
| |
| <h1 id="impala">Impala</h1> |
| |
| <p>When creating Kudu tables with Impala, the formerly optional <code>DISTRIBUTE BY</code> |
| clause is now required:</p> |
| |
| <div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">my_table</span> <span class="p">(</span><span class="n">key_column_a</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">key_column_b</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">other_column</span> <span class="n">STRING</span><span class="p">)</span> |
| <span class="n">DISTRIBUTE</span> <span class="k">BY</span> <span class="n">HASH</span> <span class="p">(</span><span class="n">key_column_a</span><span class="p">,</span> <span class="n">key_column_b</span><span class="p">)</span> <span class="k">INTO</span> <span class="mi">16</span> <span class="n">BUCKETS</span> |
| <span class="n">TBLPROPERTIES</span><span class="p">(</span> |
| <span class="s1">'storage_handler'</span> <span class="o">=</span> <span class="s1">'com.cloudera.kudu.hive.KuduStorageHandler'</span><span class="p">,</span> |
| <span class="s1">'kudu.table_name'</span> <span class="o">=</span> <span class="s1">'my_table'</span><span class="p">,</span> |
| <span class="s1">'kudu.master_addresses'</span> <span class="o">=</span> <span class="s1">'kudu-master.example.com:7051'</span><span class="p">,</span> |
| <span class="s1">'kudu.key_columns'</span> <span class="o">=</span> <span class="s1">'key_column_a,key_column_b'</span> |
| <span class="p">);</span></code></pre></div> |
| |
| |
| </div> |
| </article> |
| |
| |
| </div> |
| <div class="col-lg-3 recent-posts"> |
| <h3>Recent posts</h3> |
| <ul> |
| |
| <li> <a href="/2016/07/25/asf-graduation.html">The Apache Software Foundation Announces Apache® Kudu™ as a Top-Level Project</a> </li> |
| |
| <li> <a href="/2016/07/18/weekly-update.html">Apache Kudu (incubating) Weekly Update July 18, 2016</a> </li> |
| |
| <li> <a href="/2016/07/11/weekly-update.html">Apache Kudu (incubating) Weekly Update July 11, 2016</a> </li> |
| |
| <li> <a href="/2016/07/01/apache-kudu-0-9-1-released.html">Apache Kudu (incubating) 0.9.1 released</a> </li> |
| |
| <li> <a href="/2016/06/27/weekly-update.html">Apache Kudu (incubating) Weekly Update June 27, 2016</a> </li> |
| |
| <li> <a href="/2016/06/24/multi-master-1-0-0.html">Master fault tolerance in Kudu 1.0</a> </li> |
| |
| <li> <a href="/2016/06/21/weekly-update.html">Apache Kudu (incubating) Weekly Update June 21, 2016</a> </li> |
| |
| <li> <a href="/2016/06/17/raft-consensus-single-node.html">Using Raft Consensus on a Single Node</a> </li> |
| |
| <li> <a href="/2016/06/13/weekly-update.html">Apache Kudu (incubating) Weekly Update June 13, 2016</a> </li> |
| |
| <li> <a href="/2016/06/10/apache-kudu-0-9-0-released.html">Apache Kudu (incubating) 0.9.0 released</a> </li> |
| |
| <li> <a href="/2016/06/06/weekly-update.html">Apache Kudu (incubating) Weekly Update June 6, 2016</a> </li> |
| |
| <li> <a href="/2016/06/02/no-default-partitioning.html">Default Partitioning Changes Coming in Kudu 0.9</a> </li> |
| |
| <li> <a href="/2016/06/01/weekly-update.html">Apache Kudu (incubating) Weekly Update June 1, 2016</a> </li> |
| |
| <li> <a href="/2016/05/23/weekly-update.html">Apache Kudu (incubating) Weekly Update May 23, 2016</a> </li> |
| |
| <li> <a href="/2016/05/16/weekly-update.html">Apache Kudu (incubating) Weekly Update May 16, 2016</a> </li> |
| |
| </ul> |
| </div> |
| </div> |
| |
| <footer class="footer"> |
| <p class="small"> |
| Copyright © 2016 The Apache Software Foundation. |
| </p> |
| </footer> |
| </div> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> |
| <script> |
| // Try to detect touch-screen devices. Note: Many laptops have touch screens. |
| $(document).ready(function() { |
| if ("ontouchstart" in document.documentElement) { |
| $(document.documentElement).addClass("touch"); |
| } else { |
| $(document.documentElement).addClass("no-touch"); |
| } |
| }); |
| </script> |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js" |
| integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" |
| crossorigin="anonymous"></script> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-68448017-1', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script> |
| <script> |
| anchors.options = { |
| placement: 'right', |
| visible: 'touch', |
| }; |
| anchors.add(); |
| </script> |
| </body> |
| </html> |
| |