blob: 241b6819ecb416d331b625310d486722c375db87 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Predicate Improvements in Kudu 0.8</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />
<link rel="alternate" type="application/atom+xml"
title="RSS Feed for Apache Kudu blog"
href="/feed.xml" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="logo" href="/"><img
src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
alt="Apache Kudu"/></a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
</li>
<li >
<a href="/overview.html">Overview</a>
</li>
<li >
<a href="/docs/">Documentation</a>
</li>
<li >
<a href="/releases/">Download</a>
</li>
<li class="active">
<a href="/blog/">Blog</a>
</li>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
<li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
</ul>
</li>
<li >
<a href="/faq.html">FAQ</a>
</li>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
</nav>
<div class="row header">
<div class="col-lg-12">
<h2><a href="/blog">Apache Kudu Blog</a></h2>
</div>
</div>
<div class="row-fluid">
<div class="col-lg-9">
<article>
<header>
<h1 class="entry-title">Predicate Improvements in Kudu 0.8</h1>
<p class="meta">Posted 19 Apr 2016 by Dan Burkert</p>
</header>
<div class="entry-content">
<p>The recently released Kudu version 0.8 ships with a host of new improvements to
scan predicates. Performance and usability have been improved, especially for
tables taking advantage of <a href="http://kudu.apache.org/docs/schema_design.html#data-distribution">advanced partitioning
options</a>.</p>
<!--more-->
<h2 id="scan-optimizations-in-the-server-and-c-client">Scan Optimizations in the Server and C++ Client</h2>
<p>The server and C++ client have gotten more sophisticated in how they handle and
optimize scan constraints. Constraints specified in the predicates and lower
and upper bound primary keys are better unified, resulting in more predicates
being pushed into primary key bounds, which can turn full table scans with
predicates into much more efficient bounded scans.</p>
<p>Additionally, the server and C++ client now recognize more opportunities to
prune entire tablets during scans. For example, for the following schema and
query Kudu will now be able to skip scanning 15 out of the 16 tablets in the
table:</p>
<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="c1">-- create a table with 16 tablets</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">users</span> <span class="p">(</span><span class="n">id</span> <span class="n">INT64</span><span class="p">,</span> <span class="n">name</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">address</span> <span class="n">STRING</span><span class="p">)</span>
<span class="n">DISTRIBUTE</span> <span class="k">BY</span> <span class="n">HASH</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">INTO</span> <span class="mi">16</span> <span class="n">BUCKETS</span><span class="p">;</span>
<span class="c1">-- scan over a single tablet</span>
<span class="k">SELECT</span> <span class="n">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span> <span class="k">FROM</span> <span class="n">users</span>
<span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">876932</span><span class="p">;</span></code></pre></div>
<p>For a deeper look at the newly implemented scan and partition pruning
optimizations, see the associated <a href="https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/scan-optimization-partition-pruning.md">design
document</a>.
These optimizations will eventually be incorporated into the Java client as
well, but until that time they are still used on the server side for scans
initiated by Java clients. If you would like to help with this effort, let us
know on the <a href="https://issues.apache.org/jira/browse/KUDU-1065">JIRA issue</a>.</p>
<h2 id="redesigned-predicate-api-in-the-java-client">Redesigned Predicate API in the Java Client</h2>
<p>The Java client has a new way to express scan predicates: the
<a href="http://kudu.apache.org/apidocs/org/kududb/client/KuduPredicate.html"><code>KuduPredicate</code></a>.
The API matches the corresponding C++ API more closely, and adds support for
specifying exclusive, as well as inclusive, range predicates. The existing
<a href="http://kudu.apache.org/apidocs/org/kududb/client/ColumnRangePredicate.html"><code>ColumnRangePredicate</code></a>
API has been deprecated, and will be removed soon. Example of transitioning from
the old to new API:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">ColumnSchema</span> <span class="n">myIntColumnSchema</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KuduScanner</span><span class="o">.</span><span class="na">KuduScannerBuilder</span> <span class="n">scannerBuilder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Old predicate API</span>
<span class="n">ColumnRangePredicate</span> <span class="n">predicate</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ColumnRangePredicate</span><span class="o">(</span><span class="n">myIntColumnSchema</span><span class="o">);</span>
<span class="n">predicate</span><span class="o">.</span><span class="na">setLowerBound</span><span class="o">(</span><span class="mi">20</span><span class="o">);</span>
<span class="n">scannerBuilder</span><span class="o">.</span><span class="na">addColumnRangePredicate</span><span class="o">(</span><span class="n">predicate</span><span class="o">);</span>
<span class="c1">// New predicate API</span>
<span class="n">scannerBuilder</span><span class="o">.</span><span class="na">newPredicate</span><span class="o">(</span>
<span class="n">KuduPredicate</span><span class="o">.</span><span class="na">newComparisonPredicate</span><span class="o">(</span><span class="n">myIntColumnSchema</span><span class="o">,</span> <span class="n">ComparisonOp</span><span class="o">.</span><span class="na">GREATER_EQUAL</span><span class="o">,</span> <span class="mi">20</span><span class="o">));</span></code></pre></div>
<h2 id="under-the-covers-changes">Under the Covers Changes</h2>
<p>The scan optimizations in the server and C++ client, and the new <code>KuduPredicate</code>
API in the Java client are made possible by an overhaul of how predicates are
handled internally. A new protobuf message type,
<a href="https://github.com/apache/incubator-kudu/blob/master/src/kudu/common/common.proto#L273"><code>ColumnPredicatePB</code></a>
has been introduced, and will allow more column predicate types to be introduced
in the future. If you are interested in contributing to Kudu but don’t know
where to start, consider adding a new predicate type; for example the <code>IS NULL</code>,
<code>IS NOT NULL</code>, <code>IN</code>, and <code>LIKE</code> predicates types are currently not implemented.</p>
</div>
</article>
</div>
<div class="col-lg-3 recent-posts">
<h3>Recent posts</h3>
<ul>
<li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li>
<li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li>
<li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li>
<li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li>
<li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li>
<li> <a href="/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html">Testing Apache Kudu Applications on the JVM</a> </li>
<li> <a href="/2019/03/15/apache-kudu-1-9-0-release.html">Apache Kudu 1.9.0 Released</a> </li>
<li> <a href="/2019/03/05/transparent-hierarchical-storage-management-with-apache-kudu-and-impala.html">Transparent Hierarchical Storage Management with Apache Kudu and Impala</a> </li>
<li> <a href="/2018/12/11/call-for-posts.html">Call for Posts</a> </li>
<li> <a href="/2018/10/26/apache-kudu-1-8-0-released.html">Apache Kudu 1.8.0 Released</a> </li>
<li> <a href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan Optimization in Kudu</a> </li>
<li> <a href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data Pipelines with Kudu</a> </li>
<li> <a href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting Started with Kudu - an O'Reilly Title</a> </li>
<li> <a href="/2018/07/10/instrumentation-in-kudu.html">Instrumentation in Apache Kudu</a> </li>
<li> <a href="/2018/03/23/apache-kudu-1-7-0-released.html">Apache Kudu 1.7.0 released</a> </li>
</ul>
</div>
</div>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p class="small">
Copyright &copy; 2019 The Apache Software Foundation.
</p>
<p class="small">
Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
project logo are either registered trademarks or trademarks of The
Apache Software Foundation in the United States and other countries.
</p>
</div>
<div class="col-md-3">
<a class="pull-right" href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png"/>
</a>
</div>
</div>
</footer>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
$(document.documentElement).addClass("touch");
} else {
$(document.documentElement).addClass("no-touch");
}
});
</script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
crossorigin="anonymous"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
<script>
anchors.options = {
placement: 'right',
visible: 'touch',
};
anchors.add();
</script>
</body>
</html>