blog/2014/02/18/hyperloglog-optimizations-for-real-world-systems.html - druid-website - Git at Google

 <!DOCTYPE html>
 <html lang="en">
   <head>
     <meta charset="UTF-8" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <meta name="description" content="Apache Druid">
 <meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source">
 <meta name="author" content="Apache Software Foundation">

 <title>Druid | How We Scaled HyperLogLog: Three Real-World Optimizations</title>

 <link rel="alternate" type="application/atom+xml" href="/feed">
 <link rel="shortcut icon" href="/img/favicon.png">

 <link rel="stylesheet" href="/assets/css/font-awesome-5.css">

 <link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'>

 <link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1">
 <link rel="stylesheet" href="/css/base.css?v=1.1">
 <link rel="stylesheet" href="/css/header.css?v=1.1">
 <link rel="stylesheet" href="/css/footer.css?v=1.1">
 <link rel="stylesheet" href="/css/syntax.css?v=1.1">
 <link rel="stylesheet" href="/css/docs.css?v=1.1">

 <script>
   (function() {
     var cx = '000162378814775985090:molvbm0vggm';
     var gcse = document.createElement('script');
     gcse.type = 'text/javascript';
     gcse.async = true;
     gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
         '//cse.google.com/cse.js?cx=' + cx;
     var s = document.getElementsByTagName('script')[0];
     s.parentNode.insertBefore(gcse, s);
   })();
 </script>


   </head>
   <body>
     <!-- Start page_header include -->
 <script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>

 <div class="top-navigator">
   <div class="container">
     <div class="left-cont">
       <a class="logo" href="/"><span class="druid-logo"></span></a>
     </div>
     <div class="right-cont">
       <ul class="links">
         <li class=""><a href="/technology">Technology</a></li>
         <li class=""><a href="/use-cases">Use Cases</a></li>
         <li class=""><a href="/druid-powered">Powered By</a></li>
         <li class=""><a href="/docs/latest/design/">Docs</a></li>
         <li class=""><a href="/community/">Community</a></li>
         <li class="header-dropdown">
           <a>Apache</a>
           <div class="header-dropdown-menu">
             <a href="https://www.apache.org/" target="_blank">Foundation</a>
             <a href="https://www.apache.org/events/current-event" target="_blank">Events</a>
             <a href="https://www.apache.org/licenses/" target="_blank">License</a>
             <a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a>
             <a href="https://www.apache.org/security/" target="_blank">Security</a>
             <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a>
           </div>
         </li>
         <li class=" button-link"><a href="/downloads.html">Download</a></li>
       </ul>
     </div>
   </div>
   <div class="action-button menu-icon">
     <span class="fa fa-bars"></span> MENU
   </div>
   <div class="action-button menu-icon-close">
     <span class="fa fa-times"></span> MENU
   </div>
 </div>

 <script type="text/javascript">
   var $menu = $('.right-cont');
   var $menuIcon = $('.menu-icon');
   var $menuIconClose = $('.menu-icon-close');

   function showMenu() {
     $menu.fadeIn(100);
     $menuIcon.fadeOut(100);
     $menuIconClose.fadeIn(100);
   }

   $menuIcon.click(showMenu);

   function hideMenu() {
     $menu.fadeOut(100);
     $menuIconClose.fadeOut(100);
     $menuIcon.fadeIn(100);
   }

   $menuIconClose.click(hideMenu);

   $(window).resize(function() {
     if ($(window).width() >= 840) {
       $menu.fadeIn(100);
       $menuIcon.fadeOut(100);
       $menuIconClose.fadeOut(100);
     }
     else {
       $menu.fadeOut(100);
       $menuIcon.fadeIn(100);
       $menuIconClose.fadeOut(100);
     }
   });
 </script>

 <!-- Stop page_header include -->


     <link rel="stylesheet" href="/css/blogs.css">

 <div class="blog druid-header">
   <div class="row">
     <div class="col-md-8 col-md-offset-2">
       <div class="title-image-wrap">

         <div class="title-spacer"></div>
         <img class="title-image" src="http://metamarkets.com/wp-content/uploads/2014/02/sequoia-600x400.jpg" alt="How We Scaled HyperLogLog: Three Real-World Optimizations"/>

       </div>
     </div>
   </div>
 </div>

 <div class="container blog">
   <div class="row">
     <div class="col-md-8 col-md-offset-2">
       <div class="blog-entry">
         <h1>How We Scaled HyperLogLog: Three Real-World Optimizations</h1>
         <p class="text-muted">by <span class="author text-uppercase">NELSON RAY AND FANGJIN YANG</span> · February 18, 2014</p>

         <p>At Metamarkets, we specialize in converting mountains of programmatic ad data
 into real-time, explorable views. Because these datasets are so large and
 complex, we’re always looking for ways to maximize the speed and efficiency of
 how we deliver them to our clients.  In this post, we’re going to continue our
 discussion of some of the techniques we use to calculate critical metrics such
 as unique users and device IDs with maximum performance and accuracy.</p>

 <p>Approximation algorithms are rapidly gaining traction as the preferred way to
 determine the unique number of elements in high cardinality sets. In the space
 of cardinality estimation algorithms, HyperLogLog has quickly emerged as the
 de-facto standard. Widely discussed by <a href="http://research.google.com/pubs/pub40671.html">technology companies</a> and
 <a href="http://highscalability.com/blog/2012/4/5/big-data-counting-how-to-count-a-billion-distinct-objects-us.html">popular blogs</a>, HyperLogLog trades
 accuracy in data and query results for massive reductions in data storage and
 vastly improved <a href="http://strataconf.com/stratany2013/public/schedule/detail/30045">system performance</a>.</p>

 <p>In our <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data">previous</a> investigation of HyperLogLog, we briefly
 discussed our motivations for using approximate algorithms and how we leveraged
 HyperLogLog in <a href="http://druid.io/">Druid</a>, Metamarkets’ open source, distributed data
 store.  Since implementing and deploying HyperLogLog last year, we’ve made
 several optimizations to further improve performance and reduce storage cost.
 This blog post will share some of those optimizations. This blog post assumes
 that you are already familiar with how HyperLogLog works. If you are not
 familiar with the algorithm, there are plenty of resources <a href="http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf">online</a>.</p>

 <h2 id="compacting-registers">Compacting Registers</h2>

 <p>In our initial implementation of HLL, we allocated 8 bits of memory for each
 register. Recall that each value stored in a register indicates the position of
 the first ‘1’ bit of a hashed input. Given that 2^255 ~== 10^76, a single 8 bit
 register could approximate (not well, though) a cardinality close to the number
 of atoms in the entire <a href="http://www.universetoday.com/36302/atoms-in-the-universe/">observable universe</a>. Martin
 Traverso, et. al. of <a href="https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920">Facebook’s Presto</a> , realized that this was a bit
 wasteful and proposed an optimization, exploiting the fact that the registers
 increment in near lockstep.</p>

 <p>Given that each register is initially initialized with value 0, with 0 uniques,
 there is no change in any of the registers. Let’s say we have 8 registers. Then
 with 8 * 2^10 uniques, each register will have values ~ 10. Of course, there
 will be some variance, which can be calculated exactly if one were so inclined,
 given that the distribution in each register is an independent maximum of
 <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">Negative Binomial</a> (1, .5) draws.</p>

 <p>With 4 bit registers, each register can only approximate up to 2^15 = 32,768
 uniques. In fact, the reality is worse because the higher numbers cannot be
 represented and are lost, impacting accuracy. Even with 2,048 registers, we
 can’t do much better than ~60M, which is one or two orders of magnitude lower
 than what we need.</p>

 <p>Since the register values tend to increase together, the FB folks decided to
 introduce an offset counter and only store positive differences from it in the
 registers. That is, if we have register values of 8, 7, and 9, this corresponds
 to having an offset of 7 and using register difference values of 1, 0, and 2.
 Given the smallish spread that we expect to see, we typically won’t observe a
 difference of more than 15 among register values. So we feel comfortable using
 2,048 4 bit registers with an 8 bit offset, for 1025 bytes of storage &lt; 2048
 bytes (no offset and 8 bit registers).</p>

 <p>In fact, others have commented on the concentrated distribution of the register
 values as well. In her <a href="http://algo.inria.fr/durand/Articles/these.ps">thesis</a>, Marianne Durand suggested using
 a variable bit prefix encoding. Researchers at <a href="http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf">Google</a> have had
 success with difference encodings and variable length encodings.</p>

 <h3 id="problem">Problem</h3>

 <p>This optimization has served us well, with no appreciable loss in accuracy when
 streaming many uniques into a single HLL object, because the offset increments
 when all the registers get hit. Similarly, we can combine many HLL objects of
 moderate size together and watch the offsets increase. However, a curious
 phenomenon occurs when we try to combine many “small” HLL objects together.</p>

 <p>Suppose each HLL object stores a single unique value. Then its offset will be
 0, one register will have a value between 1 and 15, and the remaining registers
 will be 0. No matter how many of these we combine together, our aggregate HLL
 object will never be able to exceed a value of 15 in each register with a 0
 offset, which is equivalent to an offset of 15 with 0’s in each register. Using
 2,048 registers, this means we won’t be able to produce estimates greater than
 ~ .7 * 2048^2 * 1 / (2048 / 2^15) ~ 47M. (<a href="http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf"><em>Flajolet, et al. 2007</em></a>)</p>

 <p>Not good, because this means our estimates are capped at 10^7 instead of 10^80,
 irrespective of the number of true uniques. And this isn’t just some
 pathological edge case. Its untimely appearance in production a while ago was
 no fun trying to fix.</p>

 <h3 id="floating-max">Floating Max</h3>

 <p>The root problem in the above scenario is that the high values (&gt; 15) are
 being clipped, with no hope of making it into a “small” HLL object, since the
 offset is 0. Although they are rare, many cumulative misses can have a
 noticeably large effect. Our solution involves storing one additional pair, a
 “floating max” bucket with higher resolution. Previously, a value of 20 in
 bucket 94 would be clipped to 15. Now, we store (20, 94) as the floating max,
 requiring at most an additional 2 bytes, bringing our total up to 1027 bytes.
 With enough small HLL objects so that each position is covered by a floating
 max, the combined HLL object can exceed the previous limit of 15 in each
 position. It also turns out that just one floating max is sufficient to largely
 fix the problem.</p>

 <p>Let’s take a look at one measure of the accuracy of our approximations. We
 simulate 1,000 runs of streaming 1B uniques into an HLL object and look at the
 proportion of cases in which we observed clipping with the offset approximation
 (black) and the addition of the floating max (red). So for 1e9 uniques, the max
 reduced clipping from 95%+ to ~15%. That is, in 85% of cases, the much smaller
 HLL objects with the floating max agreed with HLL versus less than 5% without
 the floating max.</p>

 <p><img src="http://metamarkets.com/wp-content/uploads/2014/02/FJblogpost-600x560.png" alt="Clipping on Cardinality" title="Clipping on Cardinality"></p>

 <p>For the cost of only 2 bytes, the floating max register allowed us to union
 millions of HLL objects with minimal measurable loss in accuracy.</p>

 <h2 id="sparse-and-dense-storage">Sparse and Dense Storage</h2>

 <p>We first discussed the concept of representing HLL buckets in either a sparse
 or dense format in our <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data">first blog post</a>. Since that time,
 Google has also written a <a href="http://research.google.com/pubs/pub40671.html">great paper</a> on the matter. Data undergoes
 a <a href="/blog/2011/05/20/druid-part-deux.html">summarization process</a> when it is ingested in Druid. It is
 unnecessarily expensive to store raw event data and instead, Druid rolls
 ingested data up to some time granularity.</p>

 <p><img src="https://lh6.googleusercontent.com/O2YefUQdRdmCTXzh6xdxthD0VJY0Vq96DTXkhhPVAL_JXaJ1JuAWfFaxZDSmf9NDZgrmHS61RMFLqivacqsOw7evy1Ff73KNb1MdjoLchpCwc-YE8d9eCLiAAA" alt=""></p>

 <p>In practice, we see tremendous reductions in data volume by summarizing our
 <a href="http://strataconf.com/stratany2013/public/schedule/detail/30045">data</a>. For a given summarized row, we can maintain HLL objects
 where each object represents the estimated number of unique elements for a
 column of that row.</p>

 <p>When the summarization granularity is sufficiently small, only a limited number
 of unique elements may be seen for a dimension. In this case, a given HLL
 object may have registers that contain no values. The HLL registers are thus
 ‘sparsely’ populated.</p>

 <p>Our normal storage representation of HLL stores 2 register values per byte. In
 the sparse representation, we instead store the explicit indexes of buckets
 that have valid values in them as (index, value) pairs. When the sparse
 representation exceeds the size of the normal or ‘dense’ representation (1027
 bytes), we can switch to using only the dense representation. Our actual
 implementation uses a heuristic to determine when this switch occurs, but the
 idea is the same. In practice, many dimensions in real world data sets are of
 low cardinality, and this optimization can greatly reduce storage versus only
 storing the dense representation.</p>

 <h2 id="faster-lookups">Faster Lookups</h2>

 <p>One of the simpler optimizations that we implemented for faster cardinality
 calculations was to use lookups for register values. Instead of computing the
 actual register value by summing the register offset with the stored register
 value, we instead perform a lookup into a precalculated map. Similarly, to
 determine the number of zeros in a register value, we created a secondary
 lookup table. Given the number of registers we have, the cost of storing these
 lookup tables is near trivial. This problem is often known as the <a href="http://en.wikipedia.org/wiki/Hamming_weight">Hamming
 Weight problem</a>.</p>

 <h2 id="lessons">Lessons</h2>

 <p>Many of our optimizations came out of necessity, both to provide the
 interactive query latencies that Druid users have come to expect, and to keep
 our storage costs reasonable. If you have any further improvements to our
 optimizations, please share them with us! We strongly believe that as data sets
 get increasingly larger, estimation algorithms are key to keeping query times
 acceptable. The approximate algorithm space remains relatively new, but it is
 something we can build together.</p>

 <p>For more information on Druid, please visit <a href="http://druid.io/">druid.io</a> and follow
 <a href="https://twitter.com/druidio">@druidio</a>. We’d also like to thank Eric Tschetter and Xavier Léauté
 for their contributions to this work.  Featured image courtesy of <a href="http://donasdays.blogspot.com/2012/10/are-you-sprinter-or-long-distance-runner.html">Donna L
 Martin</a>.</p>

       </div>
     </div>
   </div>
 </div>


     <!-- Start page_footer include -->
 <footer class="druid-footer">
 <div class="container">
   <div class="text-center">
     <p>
     <a href="/technology">Technology</a>&ensp;·&ensp;
     <a href="/use-cases">Use Cases</a>&ensp;·&ensp;
     <a href="/druid-powered">Powered by Druid</a>&ensp;·&ensp;
     <a href="/docs/latest/">Docs</a>&ensp;·&ensp;
     <a href="/community/">Community</a>&ensp;·&ensp;
     <a href="/downloads.html">Download</a>&ensp;·&ensp;
     <a href="/faq">FAQ</a>
     </p>
   </div>
   <div class="text-center">
     <a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a>&ensp;·&ensp;
     <a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a>&ensp;·&ensp;
     <a title="GitHub" href="https://github.com/apache/druid" target="_blank"><span class="fab fa-github"></span></a>
   </div>
   <div class="text-center license">
     Copyright © 2020 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br>
     Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br>
     Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
   </div>
 </div>
 </footer>

 <script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script>
 <script>
   window.dataLayer = window.dataLayer || [];
   function gtag(){dataLayer.push(arguments);}
   gtag('js', new Date());
   gtag('config', 'UA-131010415-1');
 </script>
 <script>
   function trackDownload(type, url) {
     ga('send', 'event', 'download', type, url);
   }
 </script>
 <script src="//code.jquery.com/jquery.min.js"></script>
 <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
 <script src="/assets/js/druid.js"></script>
 <!-- stop page_footer include -->


   </body>
 </html>
	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<meta name="description" content="Apache Druid">
	<meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source">
	<meta name="author" content="Apache Software Foundation">

	<title>Druid \| How We Scaled HyperLogLog: Three Real-World Optimizations</title>

	<link rel="alternate" type="application/atom+xml" href="/feed">
	<link rel="shortcut icon" href="/img/favicon.png">

	<link rel="stylesheet" href="/assets/css/font-awesome-5.css">

	<link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic\|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'>

	<link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1">
	<link rel="stylesheet" href="/css/base.css?v=1.1">
	<link rel="stylesheet" href="/css/header.css?v=1.1">
	<link rel="stylesheet" href="/css/footer.css?v=1.1">
	<link rel="stylesheet" href="/css/syntax.css?v=1.1">
	<link rel="stylesheet" href="/css/docs.css?v=1.1">

	<script>
	(function() {
	var cx = '000162378814775985090:molvbm0vggm';
	var gcse = document.createElement('script');
	gcse.type = 'text/javascript';
	gcse.async = true;
	gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
	'//cse.google.com/cse.js?cx=' + cx;
	var s = document.getElementsByTagName('script')[0];
	s.parentNode.insertBefore(gcse, s);
	})();
	</script>


	</head>
	<body>
	<!-- Start page_header include -->
	<script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>

	<div class="top-navigator">
	<div class="container">
	<div class="left-cont">
	<a class="logo" href="/"><span class="druid-logo"></span></a>
	</div>
	<div class="right-cont">
	<ul class="links">
	<li class=""><a href="/technology">Technology</a></li>
	<li class=""><a href="/use-cases">Use Cases</a></li>
	<li class=""><a href="/druid-powered">Powered By</a></li>
	<li class=""><a href="/docs/latest/design/">Docs</a></li>
	<li class=""><a href="/community/">Community</a></li>
	<li class="header-dropdown">
	<a>Apache</a>
	<div class="header-dropdown-menu">
	<a href="https://www.apache.org/" target="_blank">Foundation</a>
	<a href="https://www.apache.org/events/current-event" target="_blank">Events</a>
	<a href="https://www.apache.org/licenses/" target="_blank">License</a>
	<a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a>
	<a href="https://www.apache.org/security/" target="_blank">Security</a>
	<a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a>
	</div>
	</li>
	<li class=" button-link"><a href="/downloads.html">Download</a></li>
	</ul>
	</div>
	</div>
	<div class="action-button menu-icon">
	<span class="fa fa-bars"></span> MENU
	</div>
	<div class="action-button menu-icon-close">
	<span class="fa fa-times"></span> MENU
	</div>
	</div>

	<script type="text/javascript">
	var $menu = $('.right-cont');
	var $menuIcon = $('.menu-icon');
	var $menuIconClose = $('.menu-icon-close');

	function showMenu() {
	$menu.fadeIn(100);
	$menuIcon.fadeOut(100);
	$menuIconClose.fadeIn(100);
	}

	$menuIcon.click(showMenu);

	function hideMenu() {
	$menu.fadeOut(100);
	$menuIconClose.fadeOut(100);
	$menuIcon.fadeIn(100);
	}

	$menuIconClose.click(hideMenu);

	$(window).resize(function() {
	if ($(window).width() >= 840) {
	$menu.fadeIn(100);
	$menuIcon.fadeOut(100);
	$menuIconClose.fadeOut(100);
	}
	else {
	$menu.fadeOut(100);
	$menuIcon.fadeIn(100);
	$menuIconClose.fadeOut(100);
	}
	});
	</script>

	<!-- Stop page_header include -->


	<link rel="stylesheet" href="/css/blogs.css">

	<div class="blog druid-header">
	<div class="row">
	<div class="col-md-8 col-md-offset-2">
	<div class="title-image-wrap">

	<div class="title-spacer"></div>
	<img class="title-image" src="http://metamarkets.com/wp-content/uploads/2014/02/sequoia-600x400.jpg" alt="How We Scaled HyperLogLog: Three Real-World Optimizations"/>

	</div>
	</div>
	</div>
	</div>

	<div class="container blog">
	<div class="row">
	<div class="col-md-8 col-md-offset-2">
	<div class="blog-entry">
	<h1>How We Scaled HyperLogLog: Three Real-World Optimizations</h1>
	<p class="text-muted">by <span class="author text-uppercase">NELSON RAY AND FANGJIN YANG</span> · February 18, 2014</p>

	<p>At Metamarkets, we specialize in converting mountains of programmatic ad data
	into real-time, explorable views. Because these datasets are so large and
	complex, we’re always looking for ways to maximize the speed and efficiency of
	how we deliver them to our clients. In this post, we’re going to continue our
	discussion of some of the techniques we use to calculate critical metrics such
	as unique users and device IDs with maximum performance and accuracy.</p>

	<p>Approximation algorithms are rapidly gaining traction as the preferred way to
	determine the unique number of elements in high cardinality sets. In the space
	of cardinality estimation algorithms, HyperLogLog has quickly emerged as the
	de-facto standard. Widely discussed by <a href="http://research.google.com/pubs/pub40671.html">technology companies</a> and
	<a href="http://highscalability.com/blog/2012/4/5/big-data-counting-how-to-count-a-billion-distinct-objects-us.html">popular blogs</a>, HyperLogLog trades
	accuracy in data and query results for massive reductions in data storage and
	vastly improved <a href="http://strataconf.com/stratany2013/public/schedule/detail/30045">system performance</a>.</p>

	<p>In our <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data">previous</a> investigation of HyperLogLog, we briefly
	discussed our motivations for using approximate algorithms and how we leveraged
	HyperLogLog in <a href="http://druid.io/">Druid</a>, Metamarkets’ open source, distributed data
	store. Since implementing and deploying HyperLogLog last year, we’ve made
	several optimizations to further improve performance and reduce storage cost.
	This blog post will share some of those optimizations. This blog post assumes
	that you are already familiar with how HyperLogLog works. If you are not
	familiar with the algorithm, there are plenty of resources <a href="http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf">online</a>.</p>

	<h2 id="compacting-registers">Compacting Registers</h2>

	<p>In our initial implementation of HLL, we allocated 8 bits of memory for each
	register. Recall that each value stored in a register indicates the position of
	the first ‘1’ bit of a hashed input. Given that 2^255 ~== 10^76, a single 8 bit
	register could approximate (not well, though) a cardinality close to the number
	of atoms in the entire <a href="http://www.universetoday.com/36302/atoms-in-the-universe/">observable universe</a>. Martin
	Traverso, et. al. of <a href="https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920">Facebook’s Presto</a> , realized that this was a bit
	wasteful and proposed an optimization, exploiting the fact that the registers
	increment in near lockstep.</p>

	<p>Given that each register is initially initialized with value 0, with 0 uniques,
	there is no change in any of the registers. Let’s say we have 8 registers. Then
	with 8 * 2^10 uniques, each register will have values ~ 10. Of course, there
	will be some variance, which can be calculated exactly if one were so inclined,
	given that the distribution in each register is an independent maximum of
	<a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">Negative Binomial</a> (1, .5) draws.</p>

	<p>With 4 bit registers, each register can only approximate up to 2^15 = 32,768
	uniques. In fact, the reality is worse because the higher numbers cannot be
	represented and are lost, impacting accuracy. Even with 2,048 registers, we
	can’t do much better than ~60M, which is one or two orders of magnitude lower
	than what we need.</p>

	<p>Since the register values tend to increase together, the FB folks decided to
	introduce an offset counter and only store positive differences from it in the
	registers. That is, if we have register values of 8, 7, and 9, this corresponds
	to having an offset of 7 and using register difference values of 1, 0, and 2.
	Given the smallish spread that we expect to see, we typically won’t observe a
	difference of more than 15 among register values. So we feel comfortable using
	2,048 4 bit registers with an 8 bit offset, for 1025 bytes of storage < 2048
	bytes (no offset and 8 bit registers).</p>

	<p>In fact, others have commented on the concentrated distribution of the register
	values as well. In her <a href="http://algo.inria.fr/durand/Articles/these.ps">thesis</a>, Marianne Durand suggested using
	a variable bit prefix encoding. Researchers at <a href="http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf">Google</a> have had
	success with difference encodings and variable length encodings.</p>

	<h3 id="problem">Problem</h3>

	<p>This optimization has served us well, with no appreciable loss in accuracy when
	streaming many uniques into a single HLL object, because the offset increments
	when all the registers get hit. Similarly, we can combine many HLL objects of
	moderate size together and watch the offsets increase. However, a curious
	phenomenon occurs when we try to combine many “small” HLL objects together.</p>

	<p>Suppose each HLL object stores a single unique value. Then its offset will be
	0, one register will have a value between 1 and 15, and the remaining registers
	will be 0. No matter how many of these we combine together, our aggregate HLL
	object will never be able to exceed a value of 15 in each register with a 0
	offset, which is equivalent to an offset of 15 with 0’s in each register. Using
	2,048 registers, this means we won’t be able to produce estimates greater than
	~ .7 * 2048^2 * 1 / (2048 / 2^15) ~ 47M. (<a href="http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf"><em>Flajolet, et al. 2007</em></a>)</p>

	<p>Not good, because this means our estimates are capped at 10^7 instead of 10^80,
	irrespective of the number of true uniques. And this isn’t just some
	pathological edge case. Its untimely appearance in production a while ago was
	no fun trying to fix.</p>

	<h3 id="floating-max">Floating Max</h3>

	<p>The root problem in the above scenario is that the high values (> 15) are
	being clipped, with no hope of making it into a “small” HLL object, since the
	offset is 0. Although they are rare, many cumulative misses can have a
	noticeably large effect. Our solution involves storing one additional pair, a
	“floating max” bucket with higher resolution. Previously, a value of 20 in
	bucket 94 would be clipped to 15. Now, we store (20, 94) as the floating max,
	requiring at most an additional 2 bytes, bringing our total up to 1027 bytes.
	With enough small HLL objects so that each position is covered by a floating
	max, the combined HLL object can exceed the previous limit of 15 in each
	position. It also turns out that just one floating max is sufficient to largely
	fix the problem.</p>

	<p>Let’s take a look at one measure of the accuracy of our approximations. We
	simulate 1,000 runs of streaming 1B uniques into an HLL object and look at the
	proportion of cases in which we observed clipping with the offset approximation
	(black) and the addition of the floating max (red). So for 1e9 uniques, the max
	reduced clipping from 95%+ to ~15%. That is, in 85% of cases, the much smaller
	HLL objects with the floating max agreed with HLL versus less than 5% without
	the floating max.</p>

	<p><img src="http://metamarkets.com/wp-content/uploads/2014/02/FJblogpost-600x560.png" alt="Clipping on Cardinality" title="Clipping on Cardinality"></p>

	<p>For the cost of only 2 bytes, the floating max register allowed us to union
	millions of HLL objects with minimal measurable loss in accuracy.</p>

	<h2 id="sparse-and-dense-storage">Sparse and Dense Storage</h2>

	<p>We first discussed the concept of representing HLL buckets in either a sparse
	or dense format in our <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data">first blog post</a>. Since that time,
	Google has also written a <a href="http://research.google.com/pubs/pub40671.html">great paper</a> on the matter. Data undergoes
	a <a href="/blog/2011/05/20/druid-part-deux.html">summarization process</a> when it is ingested in Druid. It is
	unnecessarily expensive to store raw event data and instead, Druid rolls
	ingested data up to some time granularity.</p>

	<p><img src="https://lh6.googleusercontent.com/O2YefUQdRdmCTXzh6xdxthD0VJY0Vq96DTXkhhPVAL_JXaJ1JuAWfFaxZDSmf9NDZgrmHS61RMFLqivacqsOw7evy1Ff73KNb1MdjoLchpCwc-YE8d9eCLiAAA" alt=""></p>

	<p>In practice, we see tremendous reductions in data volume by summarizing our
	<a href="http://strataconf.com/stratany2013/public/schedule/detail/30045">data</a>. For a given summarized row, we can maintain HLL objects
	where each object represents the estimated number of unique elements for a
	column of that row.</p>

	<p>When the summarization granularity is sufficiently small, only a limited number
	of unique elements may be seen for a dimension. In this case, a given HLL
	object may have registers that contain no values. The HLL registers are thus
	‘sparsely’ populated.</p>

	<p>Our normal storage representation of HLL stores 2 register values per byte. In
	the sparse representation, we instead store the explicit indexes of buckets
	that have valid values in them as (index, value) pairs. When the sparse
	representation exceeds the size of the normal or ‘dense’ representation (1027
	bytes), we can switch to using only the dense representation. Our actual
	implementation uses a heuristic to determine when this switch occurs, but the
	idea is the same. In practice, many dimensions in real world data sets are of
	low cardinality, and this optimization can greatly reduce storage versus only
	storing the dense representation.</p>

	<h2 id="faster-lookups">Faster Lookups</h2>

	<p>One of the simpler optimizations that we implemented for faster cardinality
	calculations was to use lookups for register values. Instead of computing the
	actual register value by summing the register offset with the stored register
	value, we instead perform a lookup into a precalculated map. Similarly, to
	determine the number of zeros in a register value, we created a secondary
	lookup table. Given the number of registers we have, the cost of storing these
	lookup tables is near trivial. This problem is often known as the <a href="http://en.wikipedia.org/wiki/Hamming_weight">Hamming
	Weight problem</a>.</p>

	<h2 id="lessons">Lessons</h2>

	<p>Many of our optimizations came out of necessity, both to provide the
	interactive query latencies that Druid users have come to expect, and to keep
	our storage costs reasonable. If you have any further improvements to our
	optimizations, please share them with us! We strongly believe that as data sets
	get increasingly larger, estimation algorithms are key to keeping query times
	acceptable. The approximate algorithm space remains relatively new, but it is
	something we can build together.</p>

	<p>For more information on Druid, please visit <a href="http://druid.io/">druid.io</a> and follow
	<a href="https://twitter.com/druidio">@druidio</a>. We’d also like to thank Eric Tschetter and Xavier Léauté
	for their contributions to this work. Featured image courtesy of <a href="http://donasdays.blogspot.com/2012/10/are-you-sprinter-or-long-distance-runner.html">Donna L
	Martin</a>.</p>

	</div>
	</div>
	</div>
	</div>


	<!-- Start page_footer include -->
	<footer class="druid-footer">
	<div class="container">
	<div class="text-center">
	<p>
	<a href="/technology">Technology</a>&ensp;·&ensp;
	<a href="/use-cases">Use Cases</a>&ensp;·&ensp;
	<a href="/druid-powered">Powered by Druid</a>&ensp;·&ensp;
	<a href="/docs/latest/">Docs</a>&ensp;·&ensp;
	<a href="/community/">Community</a>&ensp;·&ensp;
	<a href="/downloads.html">Download</a>&ensp;·&ensp;
	<a href="/faq">FAQ</a>
	</p>
	</div>
	<div class="text-center">
	<a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a>&ensp;·&ensp;
	<a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a>&ensp;·&ensp;
	<a title="GitHub" href="https://github.com/apache/druid" target="_blank"><span class="fab fa-github"></span></a>
	</div>
	<div class="text-center license">
	Copyright © 2020 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br>
	Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br>
	Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
	</div>
	</div>
	</footer>

	<script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script>
	<script>
	window.dataLayer = window.dataLayer \|\| [];
	function gtag(){dataLayer.push(arguments);}
	gtag('js', new Date());
	gtag('config', 'UA-131010415-1');
	</script>
	<script>
	function trackDownload(type, url) {
	ga('send', 'event', 'download', type, url);
	}
	</script>
	<script src="//code.jquery.com/jquery.min.js"></script>
	<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
	<script src="/assets/js/druid.js"></script>
	<!-- stop page_footer include -->


	</body>
	</html>