| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| <meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" /> |
| <meta name="author" content="Cloudera" /> |
| <title>Apache Kudu - Pushing Down Predicate Evaluation in Apache Kudu</title> |
| <!-- Bootstrap core CSS --> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" |
| integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" |
| crossorigin="anonymous"> |
| |
| <!-- Custom styles for this template --> |
| <link href="/css/kudu.css" rel="stylesheet"/> |
| <link href="/css/asciidoc.css" rel="stylesheet"/> |
| <link rel="shortcut icon" href="/img/logo-favicon.ico" /> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" /> |
| |
| |
| <link rel="alternate" type="application/atom+xml" |
| title="RSS Feed for Apache Kudu blog" |
| href="/feed.xml" /> |
| |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| <body> |
| <div class="kudu-site container-fluid"> |
| <!-- Static navbar --> |
| <nav class="navbar navbar-default"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="logo" href="/"><img |
| src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png" |
| srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x" |
| alt="Apache Kudu"/></a> |
| |
| </div> |
| <div id="navbar" class="collapse navbar-collapse"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li > |
| <a href="/">Home</a> |
| </li> |
| <li > |
| <a href="/overview.html">Overview</a> |
| </li> |
| <li > |
| <a href="/docs/">Documentation</a> |
| </li> |
| <li > |
| <a href="/releases/">Releases</a> |
| </li> |
| <li class="active"> |
| <a href="/blog/">Blog</a> |
| </li> |
| <!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here |
| that doesn't also appear elsewhere on the site. --> |
| <li class="dropdown"> |
| <a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a> |
| <ul class="dropdown-menu"> |
| <li class="dropdown-header">GET IN TOUCH</li> |
| <li><a class="icon email" href="/community.html">Mailing Lists</a></li> |
| <li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li> |
| <li><a href="/committers.html">Project Committers</a></li> |
| <li><a href="/ecosystem.html">Ecosystem</a></li> |
| <!--<li><a href="/roadmap.html">Roadmap</a></li>--> |
| <li><a href="/community.html#contributions">How to Contribute</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">DEVELOPER RESOURCES</li> |
| <li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li> |
| <li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li> |
| <li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">SOCIAL MEDIA</li> |
| <li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li> |
| <li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li> |
| <li><a href="https://www.apache.org/security/" target="_blank">Security</a></li> |
| <li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li> |
| <li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li> |
| <li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li> |
| </ul> |
| </li> |
| <li > |
| <a href="/faq.html">FAQ</a> |
| </li> |
| </ul><!-- /.nav --> |
| </div><!-- /#navbar --> |
| </div><!-- /.container-fluid --> |
| </nav> |
| |
| <div class="row header"> |
| <div class="col-lg-12"> |
| <h2><a href="/blog">Apache Kudu Blog</a></h2> |
| </div> |
| </div> |
| |
| <div class="row-fluid"> |
| <div class="col-lg-9"> |
| <article> |
| <header> |
| <h1 class="entry-title">Pushing Down Predicate Evaluation in Apache Kudu</h1> |
| <p class="meta">Posted 16 Sep 2016 by Andrew Wong</p> |
| </header> |
| <div class="entry-content"> |
| <p>I had the pleasure of interning with the Apache Kudu team at Cloudera this |
| summer. This project was my summer contribution to Kudu: a restructuring of the |
| scan path to speed up queries.</p> |
| |
| <!--more--> |
| |
| <h2 id="introduction">Introduction</h2> |
| |
| <p>In Kudu, <em>predicate pushdown</em> refers to the way in which predicates are |
| handled. When a scan is requested, its predicates are passed through the |
| different layers of Kudu’s storage hierarchy, allowing for pruning and other |
| optimizations to happen at each level before reaching the underlying data.</p> |
| |
| <p>While predicates are pushed down, predicate evaluation itself occurs at a fairly |
| high level, precluding the evaluation process from certain data-specific |
| optimizations. These optimizations can make tablet scans an order of magnitude |
| faster, if not more.</p> |
| |
| <h2 id="a-day-in-the-life-of-a-query">A Day in the Life of a Query</h2> |
| |
| <p>Because Kudu is a columnar storage engine, its scan path has a number of |
| optimizations to avoid extraneous reads, copies, and computation. When a query |
| is sent to a tablet server, the server prunes tablets based on the |
| primary key, directing the request to only the tablets that contain the key |
| range of interest. Once at a tablet, only the columns relevant to the query are |
| scanned. Further pruning is done over the primary key, and if the query is |
| predicated on non-key columns, the entire column is scanned. The columns in a |
| tablet are stored as <em>cfiles</em>, which are split into encoded <em>blocks</em>. Once the |
| relevant cfiles are determined, the data are materialized by the block |
| decoders, i.e. their underlying data are decoded and copied into a buffer, |
| which is passed back to the tablet layer. The tablet can then evaluate the |
| predicate on the batch of data and mark which rows should be returned to the |
| client.</p> |
| |
| <p>One of the encoding types I worked very closely with is <em>dictionary encoding</em>, |
| an encoding type for strings that performs particularly well for cfiles that |
| have repeating values. Rather than storing every row’s string, each unique |
| string is assigned a numeric codeword, and the rows are stored numerically on |
| disk. When materializing a dictionary block, all of the numeric data are scanned |
| and all of the corresponding strings are copied and buffered for evaluation. |
| When the vocabulary of a dictionary-encoded cfile gets too large, the blocks |
| begin switching to <em>plain encoding mode</em> to act like <em>plain-encoded</em> blocks.</p> |
| |
| <p>In a plain-encoded block, strings are stored contiguously and the character |
| offsets to the start of each string are stored as a list of integers. When |
| materializing, all of the strings are copied to a buffer for evaluation.</p> |
| |
| <p>Therein lies room for improvement: this predicate evaluation path is the same |
| for all data types and encoding types. Within the tablet, the correct cfiles |
| are determined, the cfiles’ decoders are opened, all of the data are copied to |
| a buffer, and the predicates are evaluated on this buffered data via |
| type-specific comparators. This path is extremely flexible, but because it was |
| designed to be encoding-independent, there is room for improvement.</p> |
| |
| <h2 id="trimming-the-fat">Trimming the Fat</h2> |
| |
| <p>The first step is to allow the decoders access to the predicate. In doing so, |
| each encoding type can specialize its evaluation. Additionally, this puts the |
| decoder in a position where it can determine whether a given row satisfies the |
| query, which in turn, allows the decoders to determine what data gets copied |
| instead of eagerly copying all of its data to get evaluated.</p> |
| |
| <p>Take the case of dictionary-encoded strings as an example. With the existing |
| scan path, not only are all of the strings in a column copied into a buffer, but |
| string comparisons are done on every row. By taking advantage of the fact that |
| the data can be represented as integers, the cost of determining the query |
| results can be greatly reduced. The string comparisons can be swapped out with |
| evaluation based on the codewords, in which case the room for improvement boils |
| down to how to most quickly determine whether or not a given codeword |
| corresponds to a string that satisfies the predicate. Dictionary columns will |
| now use a bitset to store the codewords that match the predicates. It will then |
| scan through the integer-valued data and checks the bitset to determine whether |
| it should copy the corresponding string over.</p> |
| |
| <p>This is great in the best case scenario where a cfile’s vocabulary is small, |
| but when the vocabulary gets too large and the dictionary blocks switch to plain |
| encoding mode, performance is hampered. In this mode, the blocks don’t utilize |
| any dictionary metadata and end up wasting the codeword bitset. That isn’t to |
| say all is lost: the decoders can still evaluate a predicate via string |
| comparison, and the fact that evaluation can still occur at the decoder-level |
| means the eager buffering can still be avoided.</p> |
| |
| <p>Dictionary encoding is a perfect storm in that the decoders can completely |
| evaluate the predicates. This is not the case for most other encoding types, |
| but having decoders support evaluation leaves the door open for other encoding |
| types to extend this idea.</p> |
| |
| <h2 id="performance">Performance</h2> |
| <p>Depending on the dataset and query, predicate pushdown can lead to significant |
| improvements. Tablet scans were timed with datasets consisting of repeated |
| string patterns of tunable length and tunable cardinality.</p> |
| |
| <p><img src="/img/predicate-pushdown/pushdown-10.png" alt="png" class="img-responsive" /> |
| <img src="/img/predicate-pushdown/pushdown-10M.png" alt="png" class="img-responsive" /></p> |
| |
| <p>The above plots show the time taken to completely scan a single tablet, recorded |
| using a dataset of ten million rows of strings with length ten. Predicates were |
| designed to select values out of bounds (Empty), select a single value (Equal, |
| i.e. for cardinality <em>k</em>, this would select 1/<em>k</em> of the dataset), select half |
| of the full range (Half), and select the full range of values (All).</p> |
| |
| <p>With the original evaluation implementation, the tablet must copy and scan |
| through the tablet to determine whether any values match. This means that even |
| when the result set is small, the full column is still copied. This is avoided |
| by pushing down predicates, which only copies as needed, and can be seen in the |
| above queries: those with near-empty result sets (Empty and Equal) have shorter |
| scan times than those with larger result sets (Half and All).</p> |
| |
| <p>Note that for dictionary encoding, given a low cardinality, Kudu can completely |
| rely on the dictionary codewords to evaluate, making the query significantly |
| faster. At higher cardinalities, the dictionaries completely fill up and the |
| blocks fall back on plain encoding. The slower, albeit still improved, |
| performance on the dataset containing 10M unique values reflects this.</p> |
| |
| <p><img src="/img/predicate-pushdown/pushdown-tpch.png" alt="png" class="img-responsive" /></p> |
| |
| <p>Similar predicates were run with the TPC-H dataset, querying on the shipdate |
| column. The full path of a query includes not only the tablet scanning itself, |
| but also RPCs and batched data transfer to the caller as the scan progresses. |
| As such, the times plotted above refer to the average end-to-end time required |
| to scan and return a batch of rows. Regardless of this additional overhead, |
| significant improvements on the scan path still yield substantial improvements |
| to the query performance as a whole.</p> |
| |
| <h2 id="conclusion">Conclusion</h2> |
| |
| <p>Pushing down predicate evaluation in Kudu yielded substantial improvements to |
| the scan path. For dictionary encoding, pushdown can be particularly powerful, |
| and other encoding types are either unaffected or also improved. This change has |
| been pushed to the main branch of Kudu, and relevant commits can be found |
| <a href="https://github.com/cloudera/kudu/commit/c0f37278cb09a7781d9073279ea54b08db6e2010">here</a> |
| and |
| <a href="https://github.com/cloudera/kudu/commit/ec80fdb37be44d380046a823b5e6d8e2241ec3da">here</a>.</p> |
| |
| <p>This summer has been a phenomenal learning experience for me, in terms of the |
| tools, the workflow, the datasets, the thought-processes that go into building |
| something at Kudu’s scale. I am extremely thankful for all of the mentoring and |
| support I received, and that I got to be a part of Kudu’s journey from |
| incubating to a Top Level Apache project. I can’t express enough how grateful I |
| am for the amount of support I got from the Kudu team, from the intern |
| coordinators, and from the Cloudera community as a whole.</p> |
| |
| </div> |
| </article> |
| |
| |
| </div> |
| <div class="col-lg-3 recent-posts"> |
| <h3>Recent posts</h3> |
| <ul> |
| |
| <li> <a href="/2021/01/15/bloom-filter-predicate.html">Optimized joins & filtering with Bloom filter predicate in Kudu</a> </li> |
| |
| <li> <a href="/2020/09/21/apache-kudu-1-13-0-release.html">Apache Kudu 1.13.0 released</a> </li> |
| |
| <li> <a href="/2020/08/11/fine-grained-authz-ranger.html">Fine-Grained Authorization with Apache Kudu and Apache Ranger</a> </li> |
| |
| <li> <a href="/2020/07/30/building-near-real-time-big-data-lake.html">Building Near Real-time Big Data Lake</a> </li> |
| |
| <li> <a href="/2020/05/18/apache-kudu-1-12-0-release.html">Apache Kudu 1.12.0 released</a> </li> |
| |
| <li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li> |
| |
| <li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li> |
| |
| <li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li> |
| |
| <li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li> |
| |
| <li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li> |
| |
| <li> <a href="/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html">Testing Apache Kudu Applications on the JVM</a> </li> |
| |
| <li> <a href="/2019/03/15/apache-kudu-1-9-0-release.html">Apache Kudu 1.9.0 Released</a> </li> |
| |
| <li> <a href="/2019/03/05/transparent-hierarchical-storage-management-with-apache-kudu-and-impala.html">Transparent Hierarchical Storage Management with Apache Kudu and Impala</a> </li> |
| |
| <li> <a href="/2018/12/11/call-for-posts.html">Call for Posts</a> </li> |
| |
| <li> <a href="/2018/10/26/apache-kudu-1-8-0-released.html">Apache Kudu 1.8.0 Released</a> </li> |
| |
| </ul> |
| </div> |
| </div> |
| |
| <footer class="footer"> |
| <div class="row"> |
| <div class="col-md-9"> |
| <p class="small"> |
| Copyright © 2020 The Apache Software Foundation. |
| </p> |
| <p class="small"> |
| Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu |
| project logo are either registered trademarks or trademarks of The |
| Apache Software Foundation in the United States and other countries. |
| </p> |
| </div> |
| <div class="col-md-3"> |
| <a class="pull-right" href="https://www.apache.org/events/current-event.html"> |
| <img src="https://www.apache.org/events/current-event-234x60.png"/> |
| </a> |
| </div> |
| </div> |
| </footer> |
| </div> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> |
| <script> |
| // Try to detect touch-screen devices. Note: Many laptops have touch screens. |
| $(document).ready(function() { |
| if ("ontouchstart" in document.documentElement) { |
| $(document.documentElement).addClass("touch"); |
| } else { |
| $(document.documentElement).addClass("no-touch"); |
| } |
| }); |
| </script> |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js" |
| integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" |
| crossorigin="anonymous"></script> |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-68448017-1', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script> |
| <script> |
| anchors.options = { |
| placement: 'right', |
| visible: 'touch', |
| }; |
| anchors.add(); |
| </script> |
| </body> |
| </html> |
| |