2016/09/16/predicate-pushdown.html - kudu-site - Git at Google

 <!DOCTYPE html>
 <html lang="en">
   <head>
     <meta charset="utf-8" />
     <meta http-equiv="X-UA-Compatible" content="IE=edge" />
     <meta name="viewport" content="width=device-width, initial-scale=1" />
     <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
     <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
     <meta name="author" content="Cloudera" />
     <title>Apache Kudu - Pushing Down Predicate Evaluation in Apache Kudu</title>
     <!-- Bootstrap core CSS -->
     <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
           integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
           crossorigin="anonymous">

     <!-- Custom styles for this template -->
     <link href="/css/kudu.css" rel="stylesheet"/>
     <link href="/css/asciidoc.css" rel="stylesheet"/>
     <link rel="shortcut icon" href="/img/logo-favicon.ico" />
     <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />


     <link rel="alternate" type="application/atom+xml"
       title="RSS Feed for Apache Kudu blog"
       href="/feed.xml" />


     <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
     <!--[if lt IE 9]>
         <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
         <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
         <![endif]-->
   </head>
   <body>
     <div class="kudu-site container-fluid">
       <!-- Static navbar -->
         <nav class="navbar navbar-default">
           <div class="container-fluid">
             <div class="navbar-header">
               <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
                 <span class="sr-only">Toggle navigation</span>
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
               </button>

               <a class="logo" href="/"><img
                 src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
                 srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
                 alt="Apache Kudu"/></a>

             </div>
             <div id="navbar" class="collapse navbar-collapse">
               <ul class="nav navbar-nav navbar-right">
                 <li >
                   <a href="/">Home</a>
                 </li>
                 <li >
                   <a href="/overview.html">Overview</a>
                 </li>
                 <li >
                   <a href="/docs/">Documentation</a>
                 </li>
                 <li >
                   <a href="/releases/">Releases</a>
                 </li>
                 <li class="active">
                   <a href="/blog/">Blog</a>
                 </li>
                 <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
                      that doesn't also appear elsewhere on the site. -->
                 <li class="dropdown">
                   <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
                   <ul class="dropdown-menu">
                     <li class="dropdown-header">GET IN TOUCH</li>
                     <li><a class="icon email" href="/community.html">Mailing Lists</a></li>
                     <li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
                     <li role="separator" class="divider"></li>
                     <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
                     <li><a href="/committers.html">Project Committers</a></li>
                     <!--<li><a href="/roadmap.html">Roadmap</a></li>-->
                     <li><a href="/community.html#contributions">How to Contribute</a></li>
                     <li role="separator" class="divider"></li>
                     <li class="dropdown-header">DEVELOPER RESOURCES</li>
                     <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
                     <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
                     <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
                     <li role="separator" class="divider"></li>
                     <li class="dropdown-header">SOCIAL MEDIA</li>
                     <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
                     <li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
                     <li role="separator" class="divider"></li>
                     <li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
                     <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
                     <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
                     <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
                     <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
                   </ul>
                 </li>
                 <li >
                   <a href="/faq.html">FAQ</a>
                 </li>
               </ul><!-- /.nav -->
             </div><!-- /#navbar -->
           </div><!-- /.container-fluid -->
         </nav>

 <div class="row header">
   <div class="col-lg-12">
     <h2><a href="/blog">Apache Kudu Blog</a></h2>
   </div>
 </div>

 <div class="row-fluid">
   <div class="col-lg-9">
     <article>
   <header>
     <h1 class="entry-title">Pushing Down Predicate Evaluation in Apache Kudu</h1>
     <p class="meta">Posted 16 Sep 2016 by Andrew Wong</p>
   </header>
   <div class="entry-content">
     <p>I had the pleasure of interning with the Apache Kudu team at Cloudera this
 summer. This project was my summer contribution to Kudu: a restructuring of the
 scan path to speed up queries.</p>

 <!--more-->

 <h2 id="introduction">Introduction</h2>

 <p>In Kudu, <em>predicate pushdown</em> refers to the way in which predicates are
 handled. When a scan is requested, its predicates are passed through the
 different layers of Kudu’s storage hierarchy, allowing for pruning and other
 optimizations to happen at each level before reaching the underlying data.</p>

 <p>While predicates are pushed down, predicate evaluation itself occurs at a fairly
 high level, precluding the evaluation process from certain data-specific
 optimizations. These optimizations can make tablet scans an order of magnitude
 faster, if not more.</p>

 <h2 id="a-day-in-the-life-of-a-query">A Day in the Life of a Query</h2>

 <p>Because Kudu is a columnar storage engine, its scan path has a number of
 optimizations to avoid extraneous reads, copies, and computation. When a query
 is sent to a tablet server, the server prunes tablets based on the
 primary key, directing the request to only the tablets that contain the key
 range of interest. Once at a tablet, only the columns relevant to the query are
 scanned. Further pruning is done over the primary key, and if the query is
 predicated on non-key columns, the entire column is scanned. The columns in a
 tablet are stored as <em>cfiles</em>, which are split into encoded <em>blocks</em>. Once the
 relevant cfiles are determined, the data are materialized by the block
 decoders, i.e. their underlying data are decoded and copied into a buffer,
 which is passed back to the tablet layer. The tablet can then evaluate the
 predicate on the batch of data and mark which rows should be returned to the
 client.</p>

 <p>One of the encoding types I worked very closely with is <em>dictionary encoding</em>,
 an encoding type for strings that performs particularly well for cfiles that
 have repeating values. Rather than storing every row’s string, each unique
 string is assigned a numeric codeword, and the rows are stored numerically on
 disk. When materializing a dictionary block, all of the numeric data are scanned
 and all of the corresponding strings are copied and buffered for evaluation.
 When the vocabulary of a dictionary-encoded cfile gets too large, the blocks
 begin switching to <em>plain encoding mode</em> to act like <em>plain-encoded</em> blocks.</p>

 <p>In a plain-encoded block, strings are stored contiguously and the character
 offsets to the start of each string are stored as a list of integers. When
 materializing, all of the strings are copied to a buffer for evaluation.</p>

 <p>Therein lies room for improvement: this predicate evaluation path is the same
 for all data types and encoding types. Within the tablet, the correct cfiles
 are determined, the cfiles’ decoders are opened, all of the data are copied to
 a buffer, and the predicates are evaluated on this buffered data via
 type-specific comparators. This path is extremely flexible, but because it was
 designed to be encoding-independent, there is room for improvement.</p>

 <h2 id="trimming-the-fat">Trimming the Fat</h2>

 <p>The first step is to allow the decoders access to the predicate. In doing so,
 each encoding type can specialize its evaluation. Additionally, this puts the
 decoder in a position where it can determine whether a given row satisfies the
 query, which in turn, allows the decoders to determine what data gets copied
 instead of eagerly copying all of its data to get evaluated.</p>

 <p>Take the case of dictionary-encoded strings as an example. With the existing
 scan path, not only are all of the strings in a column copied into a buffer, but
 string comparisons are done on every row. By taking advantage of the fact that
 the data can be represented as integers, the cost of determining the query
 results can be greatly reduced. The string comparisons can be swapped out with
 evaluation based on the codewords, in which case the room for improvement boils
 down to how to most quickly determine whether or not a given codeword
 corresponds to a string that satisfies the predicate. Dictionary columns will
 now use a bitset to store the codewords that match the predicates.  It will then
 scan through the integer-valued data and checks the bitset to determine whether
 it should copy the corresponding string over.</p>

 <p>This is great in the best case scenario where a cfile’s vocabulary is small,
 but when the vocabulary gets too large and the dictionary blocks switch to plain
 encoding mode, performance is hampered. In this mode, the blocks don’t utilize
 any dictionary metadata and end up wasting the codeword bitset. That isn’t to
 say all is lost: the decoders can still evaluate a predicate via string
 comparison, and the fact that evaluation can still occur at the decoder-level
 means the eager buffering can still be avoided.</p>

 <p>Dictionary encoding is a perfect storm in that the decoders can completely
 evaluate the predicates. This is not the case for most other encoding types,
 but having decoders support evaluation leaves the door open for other encoding
 types to extend this idea.</p>

 <h2 id="performance">Performance</h2>
 <p>Depending on the dataset and query, predicate pushdown can lead to significant
 improvements. Tablet scans were timed with datasets consisting of repeated
 string patterns of tunable length and tunable cardinality.</p>

 <p><img src="/img/predicate-pushdown/pushdown-10.png" alt="png" class="img-responsive" />
 <img src="/img/predicate-pushdown/pushdown-10M.png" alt="png" class="img-responsive" /></p>

 <p>The above plots show the time taken to completely scan a single tablet, recorded
 using a dataset of ten million rows of strings with length ten. Predicates were
 designed to select values out of bounds (Empty), select a single value (Equal,
 i.e. for cardinality <em>k</em>, this would select 1/<em>k</em> of the dataset), select half
 of the full range (Half), and select the full range of values (All).</p>

 <p>With the original evaluation implementation, the tablet must copy and scan
 through the tablet to determine whether any values match. This means that even
 when the result set is small, the full column is still copied. This is avoided
 by pushing down predicates, which only copies as needed, and can be seen in the
 above queries: those with near-empty result sets (Empty and Equal) have shorter
 scan times than those with larger result sets (Half and All).</p>

 <p>Note that for dictionary encoding, given a low cardinality, Kudu can completely
 rely on the dictionary codewords to evaluate, making the query significantly
 faster. At higher cardinalities, the dictionaries completely fill up and the
 blocks fall back on plain encoding. The slower, albeit still improved,
 performance on the dataset containing 10M unique values reflects this.</p>

 <p><img src="/img/predicate-pushdown/pushdown-tpch.png" alt="png" class="img-responsive" /></p>

 <p>Similar predicates were run with the TPC-H dataset, querying on the shipdate
 column. The full path of a query includes not only the tablet scanning itself,
 but also RPCs and batched data transfer to the caller as the scan progresses.
 As such, the times plotted above refer to the average end-to-end time required
 to scan and return a batch of rows. Regardless of this additional overhead,
 significant improvements on the scan path still yield substantial improvements
 to the query performance as a whole.</p>

 <h2 id="conclusion">Conclusion</h2>

 <p>Pushing down predicate evaluation in Kudu yielded substantial improvements to
 the scan path. For dictionary encoding, pushdown can be particularly powerful,
 and other encoding types are either unaffected or also improved. This change has
 been pushed to the main branch of Kudu, and relevant commits can be found
 <a href="https://github.com/cloudera/kudu/commit/c0f37278cb09a7781d9073279ea54b08db6e2010">here</a>
 and
 <a href="https://github.com/cloudera/kudu/commit/ec80fdb37be44d380046a823b5e6d8e2241ec3da">here</a>.</p>

 <p>This summer has been a phenomenal learning experience for me, in terms of the
 tools, the workflow, the datasets, the thought-processes that go into building
 something at Kudu’s scale. I am extremely thankful for all of the mentoring and
 support I received, and that I got to be a part of Kudu’s journey from
 incubating to a Top Level Apache project. I can’t express enough how grateful I
 am for the amount of support I got from the Kudu team, from the intern
 coordinators, and from the Cloudera community as a whole.</p>

   </div>
 </article>


   </div>
   <div class="col-lg-3 recent-posts">
     <h3>Recent posts</h3>
     <ul>

       <li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li>

       <li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li>

       <li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li>

       <li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li>

       <li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li>

       <li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li>

       <li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li>

       <li> <a href="/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html">Testing Apache Kudu Applications on the JVM</a> </li>

       <li> <a href="/2019/03/15/apache-kudu-1-9-0-release.html">Apache Kudu 1.9.0 Released</a> </li>

       <li> <a href="/2019/03/05/transparent-hierarchical-storage-management-with-apache-kudu-and-impala.html">Transparent Hierarchical Storage Management with Apache Kudu and Impala</a> </li>

       <li> <a href="/2018/12/11/call-for-posts.html">Call for Posts</a> </li>

       <li> <a href="/2018/10/26/apache-kudu-1-8-0-released.html">Apache Kudu 1.8.0 Released</a> </li>

       <li> <a href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan Optimization in Kudu</a> </li>

       <li> <a href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data Pipelines with Kudu</a> </li>

       <li> <a href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting Started with Kudu - an O'Reilly Title</a> </li>

     </ul>
   </div>
 </div>

       <footer class="footer">
         <div class="row">
           <div class="col-md-9">
             <p class="small">
             Copyright &copy; 2019 The Apache Software Foundation.
             </p>
             <p class="small">
             Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
             project logo are either registered trademarks or trademarks of The
             Apache Software Foundation in the United States and other countries.
             </p>
           </div>
           <div class="col-md-3">
             <a class="pull-right" href="https://www.apache.org/events/current-event.html">
                 <img src="https://www.apache.org/events/current-event-234x60.png"/>
             </a>
           </div>
         </div>
       </footer>
     </div>
     <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
     <script>
       // Try to detect touch-screen devices. Note: Many laptops have touch screens.
       $(document).ready(function() {
         if ("ontouchstart" in document.documentElement) {
           $(document.documentElement).addClass("touch");
         } else {
           $(document.documentElement).addClass("no-touch");
         }
       });
     </script>
     <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
             integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
             crossorigin="anonymous"></script>
     <script>
       (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
       (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
       m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
       })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

       ga('create', 'UA-68448017-1', 'auto');
       ga('send', 'pageview');
     </script>
     <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
     <script>
       anchors.options = {
         placement: 'right',
         visible: 'touch',
       };
       anchors.add();
     </script>
   </body>
 </html>
	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="utf-8" />
	<meta http-equiv="X-UA-Compatible" content="IE=edge" />
	<meta name="viewport" content="width=device-width, initial-scale=1" />
	<!-- The above 3 meta tags must come first in the head; any other head content must come after these tags -->
	<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
	<meta name="author" content="Cloudera" />
	<title>Apache Kudu - Pushing Down Predicate Evaluation in Apache Kudu</title>
	<!-- Bootstrap core CSS -->
	<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
	integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
	crossorigin="anonymous">

	<!-- Custom styles for this template -->
	<link href="/css/kudu.css" rel="stylesheet"/>
	<link href="/css/asciidoc.css" rel="stylesheet"/>
	<link rel="shortcut icon" href="/img/logo-favicon.ico" />
	<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />


	<link rel="alternate" type="application/atom+xml"
	title="RSS Feed for Apache Kudu blog"
	href="/feed.xml" />


	<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
	<!--[if lt IE 9]>
	<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
	<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
	<![endif]-->
	</head>
	<body>
	<div class="kudu-site container-fluid">
	<!-- Static navbar -->
	<nav class="navbar navbar-default">
	<div class="container-fluid">
	<div class="navbar-header">
	<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
	<span class="sr-only">Toggle navigation</span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	</button>

	<a class="logo" href="/"><img
	src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
	srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
	alt="Apache Kudu"/></a>

	</div>
	<div id="navbar" class="collapse navbar-collapse">
	<ul class="nav navbar-nav navbar-right">
	<li >
	<a href="/">Home</a>
	</li>
	<li >
	<a href="/overview.html">Overview</a>
	</li>
	<li >
	<a href="/docs/">Documentation</a>
	</li>
	<li >
	<a href="/releases/">Releases</a>
	</li>
	<li class="active">
	<a href="/blog/">Blog</a>
	</li>
	<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
	that doesn't also appear elsewhere on the site. -->
	<li class="dropdown">
	<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
	<ul class="dropdown-menu">
	<li class="dropdown-header">GET IN TOUCH</li>
	<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
	<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
	<li role="separator" class="divider"></li>
	<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
	<li><a href="/committers.html">Project Committers</a></li>
	<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
	<li><a href="/community.html#contributions">How to Contribute</a></li>
	<li role="separator" class="divider"></li>
	<li class="dropdown-header">DEVELOPER RESOURCES</li>
	<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
	<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
	<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
	<li role="separator" class="divider"></li>
	<li class="dropdown-header">SOCIAL MEDIA</li>
	<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
	<li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
	<li role="separator" class="divider"></li>
	<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
	<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
	<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
	<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
	<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
	</ul>
	</li>
	<li >
	<a href="/faq.html">FAQ</a>
	</li>
	</ul><!-- /.nav -->
	</div><!-- /#navbar -->
	</div><!-- /.container-fluid -->
	</nav>

	<div class="row header">
	<div class="col-lg-12">
	<h2><a href="/blog">Apache Kudu Blog</a></h2>
	</div>
	</div>

	<div class="row-fluid">
	<div class="col-lg-9">
	<article>
	<header>
	<h1 class="entry-title">Pushing Down Predicate Evaluation in Apache Kudu</h1>
	<p class="meta">Posted 16 Sep 2016 by Andrew Wong</p>
	</header>
	<div class="entry-content">
	<p>I had the pleasure of interning with the Apache Kudu team at Cloudera this
	summer. This project was my summer contribution to Kudu: a restructuring of the
	scan path to speed up queries.</p>

	<!--more-->

	<h2 id="introduction">Introduction</h2>

	<p>In Kudu, <em>predicate pushdown</em> refers to the way in which predicates are
	handled. When a scan is requested, its predicates are passed through the
	different layers of Kudu’s storage hierarchy, allowing for pruning and other
	optimizations to happen at each level before reaching the underlying data.</p>

	<p>While predicates are pushed down, predicate evaluation itself occurs at a fairly
	high level, precluding the evaluation process from certain data-specific
	optimizations. These optimizations can make tablet scans an order of magnitude
	faster, if not more.</p>

	<h2 id="a-day-in-the-life-of-a-query">A Day in the Life of a Query</h2>

	<p>Because Kudu is a columnar storage engine, its scan path has a number of
	optimizations to avoid extraneous reads, copies, and computation. When a query
	is sent to a tablet server, the server prunes tablets based on the
	primary key, directing the request to only the tablets that contain the key
	range of interest. Once at a tablet, only the columns relevant to the query are
	scanned. Further pruning is done over the primary key, and if the query is
	predicated on non-key columns, the entire column is scanned. The columns in a
	tablet are stored as <em>cfiles</em>, which are split into encoded <em>blocks</em>. Once the
	relevant cfiles are determined, the data are materialized by the block
	decoders, i.e. their underlying data are decoded and copied into a buffer,
	which is passed back to the tablet layer. The tablet can then evaluate the
	predicate on the batch of data and mark which rows should be returned to the
	client.</p>

	<p>One of the encoding types I worked very closely with is <em>dictionary encoding</em>,
	an encoding type for strings that performs particularly well for cfiles that
	have repeating values. Rather than storing every row’s string, each unique
	string is assigned a numeric codeword, and the rows are stored numerically on
	disk. When materializing a dictionary block, all of the numeric data are scanned
	and all of the corresponding strings are copied and buffered for evaluation.
	When the vocabulary of a dictionary-encoded cfile gets too large, the blocks
	begin switching to <em>plain encoding mode</em> to act like <em>plain-encoded</em> blocks.</p>

	<p>In a plain-encoded block, strings are stored contiguously and the character
	offsets to the start of each string are stored as a list of integers. When
	materializing, all of the strings are copied to a buffer for evaluation.</p>

	<p>Therein lies room for improvement: this predicate evaluation path is the same
	for all data types and encoding types. Within the tablet, the correct cfiles
	are determined, the cfiles’ decoders are opened, all of the data are copied to
	a buffer, and the predicates are evaluated on this buffered data via
	type-specific comparators. This path is extremely flexible, but because it was
	designed to be encoding-independent, there is room for improvement.</p>

	<h2 id="trimming-the-fat">Trimming the Fat</h2>

	<p>The first step is to allow the decoders access to the predicate. In doing so,
	each encoding type can specialize its evaluation. Additionally, this puts the
	decoder in a position where it can determine whether a given row satisfies the
	query, which in turn, allows the decoders to determine what data gets copied
	instead of eagerly copying all of its data to get evaluated.</p>

	<p>Take the case of dictionary-encoded strings as an example. With the existing
	scan path, not only are all of the strings in a column copied into a buffer, but
	string comparisons are done on every row. By taking advantage of the fact that
	the data can be represented as integers, the cost of determining the query
	results can be greatly reduced. The string comparisons can be swapped out with
	evaluation based on the codewords, in which case the room for improvement boils
	down to how to most quickly determine whether or not a given codeword
	corresponds to a string that satisfies the predicate. Dictionary columns will
	now use a bitset to store the codewords that match the predicates. It will then
	scan through the integer-valued data and checks the bitset to determine whether
	it should copy the corresponding string over.</p>

	<p>This is great in the best case scenario where a cfile’s vocabulary is small,
	but when the vocabulary gets too large and the dictionary blocks switch to plain
	encoding mode, performance is hampered. In this mode, the blocks don’t utilize
	any dictionary metadata and end up wasting the codeword bitset. That isn’t to
	say all is lost: the decoders can still evaluate a predicate via string
	comparison, and the fact that evaluation can still occur at the decoder-level
	means the eager buffering can still be avoided.</p>

	<p>Dictionary encoding is a perfect storm in that the decoders can completely
	evaluate the predicates. This is not the case for most other encoding types,
	but having decoders support evaluation leaves the door open for other encoding
	types to extend this idea.</p>

	<h2 id="performance">Performance</h2>
	<p>Depending on the dataset and query, predicate pushdown can lead to significant
	improvements. Tablet scans were timed with datasets consisting of repeated
	string patterns of tunable length and tunable cardinality.</p>

	<p><img src="/img/predicate-pushdown/pushdown-10.png" alt="png" class="img-responsive" />
	<img src="/img/predicate-pushdown/pushdown-10M.png" alt="png" class="img-responsive" /></p>

	<p>The above plots show the time taken to completely scan a single tablet, recorded
	using a dataset of ten million rows of strings with length ten. Predicates were
	designed to select values out of bounds (Empty), select a single value (Equal,
	i.e. for cardinality <em>k</em>, this would select 1/<em>k</em> of the dataset), select half
	of the full range (Half), and select the full range of values (All).</p>

	<p>With the original evaluation implementation, the tablet must copy and scan
	through the tablet to determine whether any values match. This means that even
	when the result set is small, the full column is still copied. This is avoided
	by pushing down predicates, which only copies as needed, and can be seen in the
	above queries: those with near-empty result sets (Empty and Equal) have shorter
	scan times than those with larger result sets (Half and All).</p>

	<p>Note that for dictionary encoding, given a low cardinality, Kudu can completely
	rely on the dictionary codewords to evaluate, making the query significantly
	faster. At higher cardinalities, the dictionaries completely fill up and the
	blocks fall back on plain encoding. The slower, albeit still improved,
	performance on the dataset containing 10M unique values reflects this.</p>

	<p><img src="/img/predicate-pushdown/pushdown-tpch.png" alt="png" class="img-responsive" /></p>

	<p>Similar predicates were run with the TPC-H dataset, querying on the shipdate
	column. The full path of a query includes not only the tablet scanning itself,
	but also RPCs and batched data transfer to the caller as the scan progresses.
	As such, the times plotted above refer to the average end-to-end time required
	to scan and return a batch of rows. Regardless of this additional overhead,
	significant improvements on the scan path still yield substantial improvements
	to the query performance as a whole.</p>

	<h2 id="conclusion">Conclusion</h2>

	<p>Pushing down predicate evaluation in Kudu yielded substantial improvements to
	the scan path. For dictionary encoding, pushdown can be particularly powerful,
	and other encoding types are either unaffected or also improved. This change has
	been pushed to the main branch of Kudu, and relevant commits can be found
	<a href="https://github.com/cloudera/kudu/commit/c0f37278cb09a7781d9073279ea54b08db6e2010">here</a>
	and
	<a href="https://github.com/cloudera/kudu/commit/ec80fdb37be44d380046a823b5e6d8e2241ec3da">here</a>.</p>

	<p>This summer has been a phenomenal learning experience for me, in terms of the
	tools, the workflow, the datasets, the thought-processes that go into building
	something at Kudu’s scale. I am extremely thankful for all of the mentoring and
	support I received, and that I got to be a part of Kudu’s journey from
	incubating to a Top Level Apache project. I can’t express enough how grateful I
	am for the amount of support I got from the Kudu team, from the intern
	coordinators, and from the Cloudera community as a whole.</p>

	</div>
	</article>


	</div>
	<div class="col-lg-3 recent-posts">
	<h3>Recent posts</h3>
	<ul>

	<li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li>

	<li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li>

	<li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li>

	<li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li>

	<li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li>

	<li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li>

	<li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li>

	<li> <a href="/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html">Testing Apache Kudu Applications on the JVM</a> </li>

	<li> <a href="/2019/03/15/apache-kudu-1-9-0-release.html">Apache Kudu 1.9.0 Released</a> </li>

	<li> <a href="/2019/03/05/transparent-hierarchical-storage-management-with-apache-kudu-and-impala.html">Transparent Hierarchical Storage Management with Apache Kudu and Impala</a> </li>

	<li> <a href="/2018/12/11/call-for-posts.html">Call for Posts</a> </li>

	<li> <a href="/2018/10/26/apache-kudu-1-8-0-released.html">Apache Kudu 1.8.0 Released</a> </li>

	<li> <a href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan Optimization in Kudu</a> </li>

	<li> <a href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data Pipelines with Kudu</a> </li>

	<li> <a href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting Started with Kudu - an O'Reilly Title</a> </li>

	</ul>
	</div>
	</div>

	<footer class="footer">
	<div class="row">
	<div class="col-md-9">
	<p class="small">
	Copyright © 2019 The Apache Software Foundation.
	</p>
	<p class="small">
	Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
	project logo are either registered trademarks or trademarks of The
	Apache Software Foundation in the United States and other countries.
	</p>
	</div>
	<div class="col-md-3">
	<a class="pull-right" href="https://www.apache.org/events/current-event.html">
	<img src="https://www.apache.org/events/current-event-234x60.png"/>
	</a>
	</div>
	</div>
	</footer>
	</div>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
	<script>
	// Try to detect touch-screen devices. Note: Many laptops have touch screens.
	$(document).ready(function() {
	if ("ontouchstart" in document.documentElement) {
	$(document.documentElement).addClass("touch");
	} else {
	$(document.documentElement).addClass("no-touch");
	}
	});
	</script>
	<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
	integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
	crossorigin="anonymous"></script>
	<script>
	(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]\|\|function(){
	(i[r].q=i[r].q\|\|[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
	m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
	})(window,document,'script','//www.google-analytics.com/analytics.js','ga');

	ga('create', 'UA-68448017-1', 'auto');
	ga('send', 'pageview');
	</script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
	<script>
	anchors.options = {
	placement: 'right',
	visible: 'touch',
	};
	anchors.add();
	</script>
	</body>
	</html>