| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="description" content="Apache Druid"> |
| <meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source"> |
| <meta name="author" content="Apache Software Foundation"> |
| |
| <title>Druid | How We Scaled HyperLogLog: Three Real-World Optimizations</title> |
| |
| <link rel="alternate" type="application/atom+xml" href="/feed"> |
| <link rel="shortcut icon" href="/img/favicon.png"> |
| |
| <link rel="stylesheet" href="/assets/css/font-awesome-5.css"> |
| |
| <link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'> |
| |
| <link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1"> |
| <link rel="stylesheet" href="/css/base.css?v=1.1"> |
| <link rel="stylesheet" href="/css/header.css?v=1.1"> |
| <link rel="stylesheet" href="/css/footer.css?v=1.1"> |
| <link rel="stylesheet" href="/css/syntax.css?v=1.1"> |
| <link rel="stylesheet" href="/css/docs.css?v=1.1"> |
| |
| <script> |
| (function() { |
| var cx = '000162378814775985090:molvbm0vggm'; |
| var gcse = document.createElement('script'); |
| gcse.type = 'text/javascript'; |
| gcse.async = true; |
| gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + |
| '//cse.google.com/cse.js?cx=' + cx; |
| var s = document.getElementsByTagName('script')[0]; |
| s.parentNode.insertBefore(gcse, s); |
| })(); |
| </script> |
| |
| |
| </head> |
| <body> |
| <!-- Start page_header include --> |
| <script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> |
| |
| <div class="top-navigator"> |
| <div class="container"> |
| <div class="left-cont"> |
| <a class="logo" href="/"><span class="druid-logo"></span></a> |
| </div> |
| <div class="right-cont"> |
| <ul class="links"> |
| <li class=""><a href="/technology">Technology</a></li> |
| <li class=""><a href="/use-cases">Use Cases</a></li> |
| <li class=""><a href="/druid-powered">Powered By</a></li> |
| <li class=""><a href="/docs/latest/design/">Docs</a></li> |
| <li class=""><a href="/community/">Community</a></li> |
| <li class="header-dropdown"> |
| <a>Apache</a> |
| <div class="header-dropdown-menu"> |
| <a href="https://www.apache.org/" target="_blank">Foundation</a> |
| <a href="https://www.apache.org/events/current-event" target="_blank">Events</a> |
| <a href="https://www.apache.org/licenses/" target="_blank">License</a> |
| <a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> |
| <a href="https://www.apache.org/security/" target="_blank">Security</a> |
| <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a> |
| </div> |
| </li> |
| <li class=" button-link"><a href="/downloads.html">Download</a></li> |
| </ul> |
| </div> |
| </div> |
| <div class="action-button menu-icon"> |
| <span class="fa fa-bars"></span> MENU |
| </div> |
| <div class="action-button menu-icon-close"> |
| <span class="fa fa-times"></span> MENU |
| </div> |
| </div> |
| |
| <script type="text/javascript"> |
| var $menu = $('.right-cont'); |
| var $menuIcon = $('.menu-icon'); |
| var $menuIconClose = $('.menu-icon-close'); |
| |
| function showMenu() { |
| $menu.fadeIn(100); |
| $menuIcon.fadeOut(100); |
| $menuIconClose.fadeIn(100); |
| } |
| |
| $menuIcon.click(showMenu); |
| |
| function hideMenu() { |
| $menu.fadeOut(100); |
| $menuIconClose.fadeOut(100); |
| $menuIcon.fadeIn(100); |
| } |
| |
| $menuIconClose.click(hideMenu); |
| |
| $(window).resize(function() { |
| if ($(window).width() >= 840) { |
| $menu.fadeIn(100); |
| $menuIcon.fadeOut(100); |
| $menuIconClose.fadeOut(100); |
| } |
| else { |
| $menu.fadeOut(100); |
| $menuIcon.fadeIn(100); |
| $menuIconClose.fadeOut(100); |
| } |
| }); |
| </script> |
| |
| <!-- Stop page_header include --> |
| |
| |
| <link rel="stylesheet" href="/css/blogs.css"> |
| |
| <div class="blog druid-header"> |
| <div class="row"> |
| <div class="col-md-8 col-md-offset-2"> |
| <div class="title-image-wrap"> |
| |
| <div class="title-spacer"></div> |
| <img class="title-image" src="http://metamarkets.com/wp-content/uploads/2014/02/sequoia-600x400.jpg" alt="How We Scaled HyperLogLog: Three Real-World Optimizations"/> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <div class="container blog"> |
| <div class="row"> |
| <div class="col-md-8 col-md-offset-2"> |
| <div class="blog-entry"> |
| <h1>How We Scaled HyperLogLog: Three Real-World Optimizations</h1> |
| <p class="text-muted">by <span class="author text-uppercase">NELSON RAY AND FANGJIN YANG</span> · February 18, 2014</p> |
| |
| <p>At Metamarkets, we specialize in converting mountains of programmatic ad data |
| into real-time, explorable views. Because these datasets are so large and |
| complex, we’re always looking for ways to maximize the speed and efficiency of |
| how we deliver them to our clients. In this post, we’re going to continue our |
| discussion of some of the techniques we use to calculate critical metrics such |
| as unique users and device IDs with maximum performance and accuracy.</p> |
| |
| <p>Approximation algorithms are rapidly gaining traction as the preferred way to |
| determine the unique number of elements in high cardinality sets. In the space |
| of cardinality estimation algorithms, HyperLogLog has quickly emerged as the |
| de-facto standard. Widely discussed by <a href="http://research.google.com/pubs/pub40671.html">technology companies</a> and |
| <a href="http://highscalability.com/blog/2012/4/5/big-data-counting-how-to-count-a-billion-distinct-objects-us.html">popular blogs</a>, HyperLogLog trades |
| accuracy in data and query results for massive reductions in data storage and |
| vastly improved <a href="http://strataconf.com/stratany2013/public/schedule/detail/30045">system performance</a>.</p> |
| |
| <p>In our <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data">previous</a> investigation of HyperLogLog, we briefly |
| discussed our motivations for using approximate algorithms and how we leveraged |
| HyperLogLog in <a href="http://druid.io/">Druid</a>, Metamarkets’ open source, distributed data |
| store. Since implementing and deploying HyperLogLog last year, we’ve made |
| several optimizations to further improve performance and reduce storage cost. |
| This blog post will share some of those optimizations. This blog post assumes |
| that you are already familiar with how HyperLogLog works. If you are not |
| familiar with the algorithm, there are plenty of resources <a href="http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf">online</a>.</p> |
| |
| <h2 id="compacting-registers">Compacting Registers</h2> |
| |
| <p>In our initial implementation of HLL, we allocated 8 bits of memory for each |
| register. Recall that each value stored in a register indicates the position of |
| the first ‘1’ bit of a hashed input. Given that 2^255 ~== 10^76, a single 8 bit |
| register could approximate (not well, though) a cardinality close to the number |
| of atoms in the entire <a href="http://www.universetoday.com/36302/atoms-in-the-universe/">observable universe</a>. Martin |
| Traverso, et. al. of <a href="https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920">Facebook’s Presto</a> , realized that this was a bit |
| wasteful and proposed an optimization, exploiting the fact that the registers |
| increment in near lockstep.</p> |
| |
| <p>Given that each register is initially initialized with value 0, with 0 uniques, |
| there is no change in any of the registers. Let’s say we have 8 registers. Then |
| with 8 * 2^10 uniques, each register will have values ~ 10. Of course, there |
| will be some variance, which can be calculated exactly if one were so inclined, |
| given that the distribution in each register is an independent maximum of |
| <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">Negative Binomial</a> (1, .5) draws.</p> |
| |
| <p>With 4 bit registers, each register can only approximate up to 2^15 = 32,768 |
| uniques. In fact, the reality is worse because the higher numbers cannot be |
| represented and are lost, impacting accuracy. Even with 2,048 registers, we |
| can’t do much better than ~60M, which is one or two orders of magnitude lower |
| than what we need.</p> |
| |
| <p>Since the register values tend to increase together, the FB folks decided to |
| introduce an offset counter and only store positive differences from it in the |
| registers. That is, if we have register values of 8, 7, and 9, this corresponds |
| to having an offset of 7 and using register difference values of 1, 0, and 2. |
| Given the smallish spread that we expect to see, we typically won’t observe a |
| difference of more than 15 among register values. So we feel comfortable using |
| 2,048 4 bit registers with an 8 bit offset, for 1025 bytes of storage < 2048 |
| bytes (no offset and 8 bit registers).</p> |
| |
| <p>In fact, others have commented on the concentrated distribution of the register |
| values as well. In her <a href="http://algo.inria.fr/durand/Articles/these.ps">thesis</a>, Marianne Durand suggested using |
| a variable bit prefix encoding. Researchers at <a href="http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf">Google</a> have had |
| success with difference encodings and variable length encodings.</p> |
| |
| <h3 id="problem">Problem</h3> |
| |
| <p>This optimization has served us well, with no appreciable loss in accuracy when |
| streaming many uniques into a single HLL object, because the offset increments |
| when all the registers get hit. Similarly, we can combine many HLL objects of |
| moderate size together and watch the offsets increase. However, a curious |
| phenomenon occurs when we try to combine many “small” HLL objects together.</p> |
| |
| <p>Suppose each HLL object stores a single unique value. Then its offset will be |
| 0, one register will have a value between 1 and 15, and the remaining registers |
| will be 0. No matter how many of these we combine together, our aggregate HLL |
| object will never be able to exceed a value of 15 in each register with a 0 |
| offset, which is equivalent to an offset of 15 with 0’s in each register. Using |
| 2,048 registers, this means we won’t be able to produce estimates greater than |
| ~ .7 * 2048^2 * 1 / (2048 / 2^15) ~ 47M. (<a href="http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf"><em>Flajolet, et al. 2007</em></a>)</p> |
| |
| <p>Not good, because this means our estimates are capped at 10^7 instead of 10^80, |
| irrespective of the number of true uniques. And this isn’t just some |
| pathological edge case. Its untimely appearance in production a while ago was |
| no fun trying to fix.</p> |
| |
| <h3 id="floating-max">Floating Max</h3> |
| |
| <p>The root problem in the above scenario is that the high values (> 15) are |
| being clipped, with no hope of making it into a “small” HLL object, since the |
| offset is 0. Although they are rare, many cumulative misses can have a |
| noticeably large effect. Our solution involves storing one additional pair, a |
| “floating max” bucket with higher resolution. Previously, a value of 20 in |
| bucket 94 would be clipped to 15. Now, we store (20, 94) as the floating max, |
| requiring at most an additional 2 bytes, bringing our total up to 1027 bytes. |
| With enough small HLL objects so that each position is covered by a floating |
| max, the combined HLL object can exceed the previous limit of 15 in each |
| position. It also turns out that just one floating max is sufficient to largely |
| fix the problem.</p> |
| |
| <p>Let’s take a look at one measure of the accuracy of our approximations. We |
| simulate 1,000 runs of streaming 1B uniques into an HLL object and look at the |
| proportion of cases in which we observed clipping with the offset approximation |
| (black) and the addition of the floating max (red). So for 1e9 uniques, the max |
| reduced clipping from 95%+ to ~15%. That is, in 85% of cases, the much smaller |
| HLL objects with the floating max agreed with HLL versus less than 5% without |
| the floating max.</p> |
| |
| <p><img src="http://metamarkets.com/wp-content/uploads/2014/02/FJblogpost-600x560.png" alt="Clipping on Cardinality" title="Clipping on Cardinality"></p> |
| |
| <p>For the cost of only 2 bytes, the floating max register allowed us to union |
| millions of HLL objects with minimal measurable loss in accuracy.</p> |
| |
| <h2 id="sparse-and-dense-storage">Sparse and Dense Storage</h2> |
| |
| <p>We first discussed the concept of representing HLL buckets in either a sparse |
| or dense format in our <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data">first blog post</a>. Since that time, |
| Google has also written a <a href="http://research.google.com/pubs/pub40671.html">great paper</a> on the matter. Data undergoes |
| a <a href="/blog/2011/05/20/druid-part-deux.html">summarization process</a> when it is ingested in Druid. It is |
| unnecessarily expensive to store raw event data and instead, Druid rolls |
| ingested data up to some time granularity.</p> |
| |
| <p><img src="https://lh6.googleusercontent.com/O2YefUQdRdmCTXzh6xdxthD0VJY0Vq96DTXkhhPVAL_JXaJ1JuAWfFaxZDSmf9NDZgrmHS61RMFLqivacqsOw7evy1Ff73KNb1MdjoLchpCwc-YE8d9eCLiAAA" alt=""></p> |
| |
| <p>In practice, we see tremendous reductions in data volume by summarizing our |
| <a href="http://strataconf.com/stratany2013/public/schedule/detail/30045">data</a>. For a given summarized row, we can maintain HLL objects |
| where each object represents the estimated number of unique elements for a |
| column of that row.</p> |
| |
| <p>When the summarization granularity is sufficiently small, only a limited number |
| of unique elements may be seen for a dimension. In this case, a given HLL |
| object may have registers that contain no values. The HLL registers are thus |
| ‘sparsely’ populated.</p> |
| |
| <p>Our normal storage representation of HLL stores 2 register values per byte. In |
| the sparse representation, we instead store the explicit indexes of buckets |
| that have valid values in them as (index, value) pairs. When the sparse |
| representation exceeds the size of the normal or ‘dense’ representation (1027 |
| bytes), we can switch to using only the dense representation. Our actual |
| implementation uses a heuristic to determine when this switch occurs, but the |
| idea is the same. In practice, many dimensions in real world data sets are of |
| low cardinality, and this optimization can greatly reduce storage versus only |
| storing the dense representation.</p> |
| |
| <h2 id="faster-lookups">Faster Lookups</h2> |
| |
| <p>One of the simpler optimizations that we implemented for faster cardinality |
| calculations was to use lookups for register values. Instead of computing the |
| actual register value by summing the register offset with the stored register |
| value, we instead perform a lookup into a precalculated map. Similarly, to |
| determine the number of zeros in a register value, we created a secondary |
| lookup table. Given the number of registers we have, the cost of storing these |
| lookup tables is near trivial. This problem is often known as the <a href="http://en.wikipedia.org/wiki/Hamming_weight">Hamming |
| Weight problem</a>.</p> |
| |
| <h2 id="lessons">Lessons</h2> |
| |
| <p>Many of our optimizations came out of necessity, both to provide the |
| interactive query latencies that Druid users have come to expect, and to keep |
| our storage costs reasonable. If you have any further improvements to our |
| optimizations, please share them with us! We strongly believe that as data sets |
| get increasingly larger, estimation algorithms are key to keeping query times |
| acceptable. The approximate algorithm space remains relatively new, but it is |
| something we can build together.</p> |
| |
| <p>For more information on Druid, please visit <a href="http://druid.io/">druid.io</a> and follow |
| <a href="https://twitter.com/druidio">@druidio</a>. We’d also like to thank Eric Tschetter and Xavier Léauté |
| for their contributions to this work. Featured image courtesy of <a href="http://donasdays.blogspot.com/2012/10/are-you-sprinter-or-long-distance-runner.html">Donna L |
| Martin</a>.</p> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| |
| <!-- Start page_footer include --> |
| <footer class="druid-footer"> |
| <div class="container"> |
| <div class="text-center"> |
| <p> |
| <a href="/technology">Technology</a> ·  |
| <a href="/use-cases">Use Cases</a> ·  |
| <a href="/druid-powered">Powered by Druid</a> ·  |
| <a href="/docs/latest/">Docs</a> ·  |
| <a href="/community/">Community</a> ·  |
| <a href="/downloads.html">Download</a> ·  |
| <a href="/faq">FAQ</a> |
| </p> |
| </div> |
| <div class="text-center"> |
| <a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a> ·  |
| <a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a> ·  |
| <a title="GitHub" href="https://github.com/apache/druid" target="_blank"><span class="fab fa-github"></span></a> |
| </div> |
| <div class="text-center license"> |
| Copyright © 2020 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br> |
| Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br> |
| Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. |
| </div> |
| </div> |
| </footer> |
| |
| <script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script> |
| <script> |
| window.dataLayer = window.dataLayer || []; |
| function gtag(){dataLayer.push(arguments);} |
| gtag('js', new Date()); |
| gtag('config', 'UA-131010415-1'); |
| </script> |
| <script> |
| function trackDownload(type, url) { |
| ga('send', 'event', 'download', type, url); |
| } |
| </script> |
| <script src="//code.jquery.com/jquery.min.js"></script> |
| <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script> |
| <script src="/assets/js/druid.js"></script> |
| <!-- stop page_footer include --> |
| |
| |
| </body> |
| </html> |