<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
    <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu (incubating) completes Hadoop's storage layer to enable fast analytics on fast data" />
    <meta name="author" content="Cloudera" />
    <title>Apache Kudu (incubating) - Predicate Improvements in Kudu 0.8</title>
    <!-- Bootstrap core CSS -->
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
          integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
          crossorigin="anonymous">

    <!-- Custom styles for this template -->
    <link href="/css/justified-nav.css" rel="stylesheet" />

    <link href="/css/kudu.css" rel="stylesheet"/>
    <link href="/css/asciidoc.css" rel="stylesheet"/>
    <link rel="shortcut icon" href="/img/logo-favicon.ico" />
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />

    
    <link rel="alternate" type="application/atom+xml"
      title="RSS Feed for Apache Kudu blog"
      href="/feed.xml" />
    

    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
    <!--[if lt IE 9]>
        <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
        <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
        <![endif]-->
  </head>
  <body>
    <!-- Fork me on GitHub -->
    <a class="fork-me-on-github" href="https://github.com/apache/incubator-kudu"><img src="//aral.github.io/fork-me-on-github-retina-ribbons/right-cerulean@2x.png" alt="Fork me on GitHub" /></a>

    <div class="kudu-site container-fluid">
      <!-- Static navbar -->
        <nav class="container-fluid navbar-default">
          <div class="navbar-header">
            <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
              <span class="sr-only">Toggle navigation</span>
              <span class="icon-bar"></span>
              <span class="icon-bar"></span>
              <span class="icon-bar"></span>
            </button>
            
            <a class="logo" href="/"><img src="/img/logo_small.png" width="80" /></a>
            
          </div>
          <div id="navbar" class="navbar-collapse collapse navbar-right">
            <ul class="nav navbar-nav">
              <li >
                <a href="/">Home</a>
              </li>
              <li >
                <a href="/overview.html">Overview</a>
              </li>
              <li >
                <a href="/docs/">Documentation</a>
              </li>
              <li >
                <a href="/releases/">Download</a>
              </li>
              <li class="active">
                <a href="/blog/">Blog</a>
              </li>
              <li >
                <a href="/community.html">Community</a>
              </li>
              <li >
                <a href="/faq.html">FAQ</a>
              </li>
            </ul>
          </div><!--/.nav-collapse -->
        </nav>

<div class="row header">
  <div class="col-lg-12">
    <h2><a href="/blog">Apache Kudu (incubating) Blog</a></h2>
  </div>
</div>

<div class="row-fluid">
  <div class="col-lg-9">
    <article>
  <header>
    <h1 class="entry-title">Predicate Improvements in Kudu 0.8</h1>
    <p class="meta">Posted 19 Apr 2016 by Dan Burkert</p>
  </header>
  <div class="entry-content">
    <p>The recently released Kudu version 0.8 ships with a host of new improvements to
scan predicates. Performance and usability have been improved, especially for
tables taking advantage of <a href="http://kudu.apache.org/docs/schema_design.html#data-distribution">advanced partitioning
options</a>.</p>

<!--more-->

<h2 id="scan-optimizations-in-the-server-and-c-client">Scan Optimizations in the Server and C++ Client</h2>

<p>The server and C++ client have gotten more sophisticated in how they handle and
optimize scan constraints. Constraints specified in the predicates and lower
and upper bound primary keys are better unified, resulting in more predicates
being pushed into primary key bounds, which can turn full table scans with
predicates into much more efficient bounded scans.</p>

<p>Additionally, the server and C++ client now recognize more opportunities to
prune entire tablets during scans. For example, for the following schema and
query Kudu will now be able to skip scanning 15 out of the 16 tablets in the
table:</p>

<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="c1">-- create a table with 16 tablets</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">users</span> <span class="p">(</span><span class="n">id</span> <span class="n">INT64</span><span class="p">,</span> <span class="n">name</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">address</span> <span class="n">STRING</span><span class="p">)</span>
<span class="n">DISTRIBUTE</span> <span class="k">BY</span> <span class="n">HASH</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">INTO</span> <span class="mi">16</span> <span class="n">BUCKETS</span><span class="p">;</span>

<span class="c1">-- scan over a single tablet</span>
<span class="k">SELECT</span> <span class="n">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">address</span> <span class="k">FROM</span> <span class="n">users</span>
<span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">876932</span><span class="p">;</span></code></pre></div>

<p>For a deeper look at the newly implemented scan and partition pruning
optimizations, see the associated <a href="https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/scan-optimization-partition-pruning.md">design
document</a>.
These optimizations will eventually be incorporated into the Java client as
well, but until that time they are still used on the server side for scans
initiated by Java clients. If you would like to help with this effort, let us
know on the <a href="https://issues.apache.org/jira/browse/KUDU-1065">JIRA issue</a>.</p>

<h2 id="redesigned-predicate-api-in-the-java-client">Redesigned Predicate API in the Java Client</h2>

<p>The Java client has a new way to express scan predicates: the
<a href="http://kudu.apache.org/apidocs/org/kududb/client/KuduPredicate.html"><code>KuduPredicate</code></a>.
The API matches the corresponding C++ API more closely, and adds support for
specifying exclusive, as well as inclusive, range predicates. The existing
<a href="http://kudu.apache.org/apidocs/org/kududb/client/ColumnRangePredicate.html"><code>ColumnRangePredicate</code></a>
API has been deprecated, and will be removed soon. Example of transitioning from
the old to new API:</p>

<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">ColumnSchema</span> <span class="n">myIntColumnSchema</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KuduScanner</span><span class="o">.</span><span class="na">KuduScannerBuilder</span> <span class="n">scannerBuilder</span> <span class="o">=</span> <span class="o">...;</span>

<span class="c1">// Old predicate API</span>
<span class="n">ColumnRangePredicate</span> <span class="n">predicate</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ColumnRangePredicate</span><span class="o">(</span><span class="n">myIntColumnSchema</span><span class="o">);</span>
<span class="n">predicate</span><span class="o">.</span><span class="na">setLowerBound</span><span class="o">(</span><span class="mi">20</span><span class="o">);</span>
<span class="n">scannerBuilder</span><span class="o">.</span><span class="na">addColumnRangePredicate</span><span class="o">(</span><span class="n">predicate</span><span class="o">);</span>

<span class="c1">// New predicate API</span>
<span class="n">scannerBuilder</span><span class="o">.</span><span class="na">newPredicate</span><span class="o">(</span>
    <span class="n">KuduPredicate</span><span class="o">.</span><span class="na">newComparisonPredicate</span><span class="o">(</span><span class="n">myIntColumnSchema</span><span class="o">,</span> <span class="n">ComparisonOp</span><span class="o">.</span><span class="na">GREATER_EQUAL</span><span class="o">,</span> <span class="mi">20</span><span class="o">));</span></code></pre></div>

<h2 id="under-the-covers-changes">Under the Covers Changes</h2>

<p>The scan optimizations in the server and C++ client, and the new <code>KuduPredicate</code>
API in the Java client are made possible by an overhaul of how predicates are
handled internally. A new protobuf message type,
<a href="https://github.com/apache/incubator-kudu/blob/master/src/kudu/common/common.proto#L273"><code>ColumnPredicatePB</code></a>
has been introduced, and will allow more column predicate types to be introduced
in the future. If you are interested in contributing to Kudu but don’t know
where to start, consider adding a new predicate type; for example the <code>IS NULL</code>,
<code>IS NOT NULL</code>, <code>IN</code>, and <code>LIKE</code> predicates types are currently not implemented.</p>

  </div>
</article>


  </div>
  <div class="col-lg-3 recent-posts">
    <h3>Recent posts</h3>
    <ul>
    
      <li> <a href="/2016/07/18/weekly-update.html">Apache Kudu (incubating) Weekly Update July 18, 2016</a> </li>
    
      <li> <a href="/2016/07/11/weekly-update.html">Apache Kudu (incubating) Weekly Update July 11, 2016</a> </li>
    
      <li> <a href="/2016/07/01/apache-kudu-0-9-1-released.html">Apache Kudu (incubating) 0.9.1 released</a> </li>
    
      <li> <a href="/2016/06/27/weekly-update.html">Apache Kudu (incubating) Weekly Update June 27, 2016</a> </li>
    
      <li> <a href="/2016/06/24/multi-master-1-0-0.html">Master fault tolerance in Kudu 1.0</a> </li>
    
      <li> <a href="/2016/06/21/weekly-update.html">Apache Kudu (incubating) Weekly Update June 21, 2016</a> </li>
    
      <li> <a href="/2016/06/17/raft-consensus-single-node.html">Using Raft Consensus on a Single Node</a> </li>
    
      <li> <a href="/2016/06/13/weekly-update.html">Apache Kudu (incubating) Weekly Update June 13, 2016</a> </li>
    
      <li> <a href="/2016/06/10/apache-kudu-0-9-0-released.html">Apache Kudu (incubating) 0.9.0 released</a> </li>
    
      <li> <a href="/2016/06/06/weekly-update.html">Apache Kudu (incubating) Weekly Update June 6, 2016</a> </li>
    
      <li> <a href="/2016/06/02/no-default-partitioning.html">Default Partitioning Changes Coming in Kudu 0.9</a> </li>
    
      <li> <a href="/2016/06/01/weekly-update.html">Apache Kudu (incubating) Weekly Update June 1, 2016</a> </li>
    
      <li> <a href="/2016/05/23/weekly-update.html">Apache Kudu (incubating) Weekly Update May 23, 2016</a> </li>
    
      <li> <a href="/2016/05/16/weekly-update.html">Apache Kudu (incubating) Weekly Update May 16, 2016</a> </li>
    
      <li> <a href="/2016/05/09/weekly-update.html">Apache Kudu (incubating) Weekly Update May 9, 2016</a> </li>
    
    </ul>
  </div>
</div>

      <footer class="footer">
        <p class="pull-left">
        <a href="http://incubator.apache.org"><img src="/img/apache-incubator.png" width="225" height="53" align="right"/></a>
        </p>
        <p class="small">
        Apache Kudu (incubating) is an effort undergoing incubation at the Apache Software
        Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is
        required of all newly accepted projects until a further review
        indicates that the infrastructure, communications, and decision making
        process have stabilized in a manner consistent with other successful
        ASF projects. While incubation status is not necessarily a reflection
        of the completeness or stability of the code, it does indicate that the
        project has yet to be fully endorsed by the ASF.

        Copyright &copy; 2016 The Apache Software Foundation. 
        </p>
      </footer>
    </div>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
            integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
            crossorigin="anonymous"></script>
    <script>
      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

      ga('create', 'UA-68448017-1', 'auto');
      ga('send', 'pageview');
    </script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
    <script>
      anchors.options = {
        placement: 'right',
        visible: 'touch',
      };
      anchors.add();
    </script>
  </body>
</html>

