blog/2011/04/30/introducing-druid.html - druid-website - Git at Google

 <!DOCTYPE html>
 <html lang="en">
   <head>
     <meta charset="UTF-8" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <meta name="description" content="Apache Druid">
 <meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source">
 <meta name="author" content="Apache Software Foundation">

 <title>Druid | Introducing Druid: Real-Time Analytics at a Billion Rows Per Second</title>

 <link rel="alternate" type="application/atom+xml" href="/feed">
 <link rel="shortcut icon" href="/img/favicon.png">

 <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous">

 <link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'>

 <link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1">
 <link rel="stylesheet" href="/css/base.css?v=1.1">
 <link rel="stylesheet" href="/css/header.css?v=1.1">
 <link rel="stylesheet" href="/css/footer.css?v=1.1">
 <link rel="stylesheet" href="/css/syntax.css?v=1.1">
 <link rel="stylesheet" href="/css/docs.css?v=1.1">

 <script>
   (function() {
     var cx = '000162378814775985090:molvbm0vggm';
     var gcse = document.createElement('script');
     gcse.type = 'text/javascript';
     gcse.async = true;
     gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
         '//cse.google.com/cse.js?cx=' + cx;
     var s = document.getElementsByTagName('script')[0];
     s.parentNode.insertBefore(gcse, s);
   })();
 </script>


   </head>
   <body>
     <!-- Start page_header include -->
 <script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>

 <div class="top-navigator">
   <div class="container">
     <div class="left-cont">
       <a class="logo" href="/"><span class="druid-logo"></span></a>
     </div>
     <div class="right-cont">
       <ul class="links">
         <li class=""><a href="/technology">Technology</a></li>
         <li class=""><a href="/use-cases">Use Cases</a></li>
         <li class=""><a href="/druid-powered">Powered By</a></li>
         <li class=""><a href="/docs/latest/design/">Docs</a></li>
         <li class=""><a href="/community/">Community</a></li>
         <li class="header-dropdown">
           <a>Apache</a>
           <div class="header-dropdown-menu">
             <a href="https://www.apache.org/" target="_blank">Foundation</a>
             <a href="https://www.apache.org/events/current-event" target="_blank">Events</a>
             <a href="https://www.apache.org/licenses/" target="_blank">License</a>
             <a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a>
             <a href="https://www.apache.org/security/" target="_blank">Security</a>
             <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a>
           </div>
         </li>
         <li class=" button-link"><a href="/downloads.html">Download</a></li>
       </ul>
     </div>
   </div>
   <div class="action-button menu-icon">
     <span class="fa fa-bars"></span> MENU
   </div>
   <div class="action-button menu-icon-close">
     <span class="fa fa-times"></span> MENU
   </div>
 </div>

 <script type="text/javascript">
   var $menu = $('.right-cont');
   var $menuIcon = $('.menu-icon');
   var $menuIconClose = $('.menu-icon-close');

   function showMenu() {
     $menu.fadeIn(100);
     $menuIcon.fadeOut(100);
     $menuIconClose.fadeIn(100);
   }

   $menuIcon.click(showMenu);

   function hideMenu() {
     $menu.fadeOut(100);
     $menuIconClose.fadeOut(100);
     $menuIcon.fadeIn(100);
   }

   $menuIconClose.click(hideMenu);

   $(window).resize(function() {
     if ($(window).width() >= 840) {
       $menu.fadeIn(100);
       $menuIcon.fadeOut(100);
       $menuIconClose.fadeOut(100);
     }
     else {
       $menu.fadeOut(100);
       $menuIcon.fadeIn(100);
       $menuIconClose.fadeOut(100);
     }
   });
 </script>

 <!-- Stop page_header include -->


     <link rel="stylesheet" href="/css/blogs.css">

 <div class="blog druid-header">
   <div class="row">
     <div class="col-md-8 col-md-offset-2">
       <div class="title-image-wrap">

         <div class="title-spacer"></div>
         <img class="title-image" src="http://metamarkets.com/wp-content/uploads/2011/04/fastcar-sized-470x288.jpg" alt="Introducing Druid: Real-Time Analytics at a Billion Rows Per Second"/>

       </div>
     </div>
   </div>
 </div>

 <div class="container blog">
   <div class="row">
     <div class="col-md-8 col-md-offset-2">
       <div class="blog-entry">
         <h1>Introducing Druid: Real-Time Analytics at a Billion Rows Per Second</h1>
         <p class="text-muted">by <span class="author text-uppercase">Eric Tschetter</span> · April 30, 2011</p>

         <p>Here at Metamarkets we have developed a web-based analytics console that
 supports drill-downs and roll-ups of high dimensional data sets – comprising
 billions of events – in real-time.  This is the first of two blog posts
 introducing Druid, the data store that powers our console.  Over the last twelve
 months, we tried and failed to achieve scale and speed with relational databases
 (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). So instead we did
 something crazy: we rolled our own database. Druid is the distributed, in-memory
 OLAP data store that resulted.</p>

 <p><strong>The Challenge: Fast Roll-Ups Over Big Data</strong></p>

 <p>To frame our discussion, let’s begin with an illustration of what our raw impression event logs look
 like, containing many dimensions and two metrics (click and price).</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>timestamp             publisher          advertiser  gender  country  dimensions  click  price
 2011-01-01T01:01:35Z  bieberfever.com    google.com  Male    USA                  0      0.65
 2011-01-01T01:03:63Z  bieberfever.com    google.com  Male    USA                  0      0.62
 2011-01-01T01:04:51Z  bieberfever.com    google.com  Male    USA                  1      0.45
 ...
 2011-01-01T01:00:00Z  ultratrimfast.com  google.com  Female  UK                   0      0.87
 2011-01-01T02:00:00Z  ultratrimfast.com  google.com  Female  UK                   0      0.99
 2011-01-01T02:00:00Z  ultratrimfast.com  google.com  Female  UK                   1      1.53
 ...
 </code></pre></div>
 <p>We call this our <em>alpha</em> data set. We perform a first-level aggregation operation over a selected set of
 dimensions, equivalent to (in pseudocode):</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>GROUP BY timestamp, publisher, advertiser, gender, country
   :: impressions = COUNT(1),  clicks = SUM(click),  revenue = SUM(price)
 </code></pre></div>
 <p>to yield a compacted version:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span> timestamp             publisher          advertiser  gender country impressions clicks revenue
  2011-01-01T01:00:00Z  ultratrimfast.com  google.com  Male   USA     1800        25     15.70
  2011-01-01T01:00:00Z  bieberfever.com    google.com  Male   USA     2912        42     29.18
  2011-01-01T02:00:00Z  ultratrimfast.com  google.com  Male   UK      1953        17     17.31
  2011-01-01T02:00:00Z  bieberfever.com    google.com  Male   UK      3194        170    34.01
 </code></pre></div>
 <p>This is our <em>beta</em> data set, filtered for five selected dimensions and compacted. In the limit, as we group
 by all available dimensions, the size of this aggregated beta converges to the original <em>alpha</em>. In practice,
 it is dramatically smaller (often by a factor of 100). Our <em>beta</em> data comprises three distinct parts:</p>

 <blockquote>
 <p><strong>Timestamp column</strong>: We treat timestamp separately because all of our queries
 center around the time axis. Timestamps are faceted by varying granularities
 (hourly, in the example above).</p>

 <p><strong>Dimension columns</strong>: Here we have four dimensions of publisher, advertiser,
 gender, and country. They each represent an axis of the data that we’ve chosen
 to slice across.</p>

 <p><strong>Metric columns</strong>: These are impressions, clicks and revenue. These represent
 values, usually numeric, which are derived from an aggregation operation – such
 as count, sum, and mean (we also run variance and higher moment calculations).
 For example, in the first row, the revenue metric of 15.70 is the sum of 1800
 event-level prices.</p>
 </blockquote>

 <p>Our goal is to rapidly compute drill-downs and roll-ups over this data set. We
 want to answer questions like “How many impressions from males were on
 bieberfever.com?” and “What is the average cost to advertise to women at
 ultratrimfast.com?”  But we have a hard requirement to meet: we want queries
 over any arbitrary combination of dimensions at sub-second latencies.</p>

 <p>Performance of such a system is dependent on the size of our beta set, and
 there are two ways that this becomes large: (i) when we include additional
 dimensions, and (ii) when we include a dimension whose cardinality is large.
 Using our example, for every hour’s worth of data we calculate the maximum
 number of rows as:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>number_of_publishers * number_of_advertisers * number_of_genders * number of countries
 </code></pre></div>
 <p>If we have 10 publishers, 50 advertisers, 2 genders, and 120 countries, that
 would yield a maximum of 120,000 rows.  If there had been 1,000,000 possible
 publishers, it would become a maximum of 12 billion rows. If we add 10 more
 dimensions of cardinality 10, then it becomes a maximum of 1.2 quadrillion (1.2
 x 10^15) rows.</p>

 <p>Luckily for us, these data sets are generally sparse, as dimension values are
 not conditionally independent (few Kazakhstanis visit beiberfever.com, for
 example). Thus the combinatorial explosion is far less than the theoretical
 worst-case. Nonetheless, as a rule, more dimensions and more cardinality
 dramatically inflate the size of the data set.</p>

 <p><strong>Failed Solution I: Dynamic Roll-Ups with a RDBMS</strong></p>

 <p>Our stated goals of aggregation and drill-down are well suited to a classical
 relational architecture. So about a year ago, we fired up a RDBMS instance
 (actually, the Greenplum Community Edition, running on an m1.large EC2 box),
 and began loading our data into it. It worked and we were able to build the
 initial version of our console on this system. However, we had two problems:</p>

 <ol>
 <li><p>We stored the data in a star schema, which meant that there was operational
 overhead maintaining dimension and fact tables.</p></li>
 <li><p>Whenever we needed to do a full table scan, for things like global counts,
 the queries ran slow. For example, naive benchmarks showed scanning 33
 million rows took 3 seconds.</p></li>
 </ol>

 <p>We initially just decided to eat the operational overhead of (1) because that’s
 how these systems work and we benefited from having the database to do our
 storage and computation. But, (2) was painful. We started materializing all
 dimensional roll-ups of a certain depth, and began routing queries to these
 pre-aggregated tables. We also implemented a caching layer in front of our
 queries.</p>

 <p>This approach generally worked and is, I believe, a fairly common strategy in
 the space. Except, when things weren’t in the cache and a query couldn’t be
 mapped to a pre-aggregated table, we were back to full scans and slow
 performance.  We tried indexing our way out of it, but given that we are
 allowing arbitrary combinations of dimensions, we couldn’t really take
 advantage of composite indexes. Additionally, index merge strategies are not
 always implemented, or only implemented for bitmap indexes, depending on the
 flavor of RDBMS.</p>

 <p>We also benchmarked plain Postgres, MySQL, and InfoBright, but did not observe
 dramatically better performance. Seeing no path ahead for our relational
 database, we turned to one of those new-fangled, massively scalable NOSQL
 solutions.</p>

 <p><strong>Failed Solution II: Pre-compute the World in NoSQL</strong></p>

 <p>We used a data storage schema very similar to Twitter’s
 <a href="http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011">Rainbird</a>.</p>

 <p>In short, we took all of our data and pre-computed aggregates for every
 combination of dimensions. At query time we need only locate the specific
 pre-computed aggregate and and return it: an O(1) key-value lookup. This made
 things fast and worked wonderfully when we had a six dimension beta data set.
 But when we added five more dimensions – giving us 11 dimensions total – the
 time to pre-compute all aggregates became unmanageably large (such that we
 never waited more than 24 hours required to see it finish).</p>

 <p>So we decided to limit the depth that we aggregated to. By only pre-computing
 aggregates of five dimensions or less, we were able to limit some of the
 exponential expansion of the data. The data became manageable again, meaning it
 only took about 4 hours on 15 machines to compute the expansion of a 500k beta
 rows into the full multi-billion entry output data set.</p>

 <p>Then we added three more dimensions, bringing us up to 14. This turned into 9
 hours on 25 machines. We realized that we were <a href="http://knowyourmeme.com/memes/youre-doing-it-wrong">doing it
 wrong</a>.</p>

 <p>Lesson learned: massively scalable counter systems like rainbird are intended
 for high cardinality data sets with pre-defined hierarchical drill-downs. But
 they break down when supporting arbitrary drill downs across all dimensions.</p>

 <p><strong>Introducing Druid: A Distributed, In-Memory OLAP Store</strong></p>

 <p>Stepping back from our two failures, let’s examine why these systems failed to
 scale for our needs:</p>

 <ol>
 <li>Relational Database Architectures

 <ul>
 <li>Full table scans were slow, regardless of the storage engine used</li>
 <li>Maintaining proper dimension tables, indexes and aggregate tables was painful</li>
 <li>Parallelization of queries was not always supported or non-trivial</li>
 </ul></li>
 <li>Massive NOSQL With Pre-Computation

 <ul>
 <li>Supporting high dimensional OLAP requires pre-computing an exponentially large amount of data</li>
 </ul></li>
 </ol>

 <p>Looking at the problems with these solutions, it looks like the first,
 RDBMS-style architecture has a simpler issue to tackle: namely, how to scan
 tables fast?  When we were looking at our 500k row data set, someone remarked,
 “Dude, I can store that in memory”. That was the answer.</p>

 <p>Keeping everything in memory provides fast scans, but it does introduce a new
 problem: machine memory is limited. The corollary thus was: distribute the data
 over multiple machines. Thus, our requirements were:</p>

 <ul>
 <li>Ability to load up, store, and query data sets in memory</li>
 <li>Parallelized architecture that allows us to add more machines in order to relieve memory pressure</li>
 </ul>

 <p>And then we threw in a couple more that seemed like good ideas:</p>

 <ul>
 <li>Parallelized queries to speed up full scan processing</li>
 <li>No dimensional tables to manage</li>
 </ul>

 <p>These are the requirements we used to implement Druid. The system makes a
 number of simplifying assumptions that fit our use case (namely that all
 analytics are time-based) and integrates access to real-time and historical
 data for a configurable amount of time into the past.</p>

 <p>The <a href="http://metamarketsgroup.com/blog/druid-part-deux-three-principles-for-fast-distributed-olap/">next
 installment</a>
 will go into the architecture of Druid, how queries work and how the system can
 scale out to handle query hotspots and high cardinality data sets. For now, we
 leave you with a benchmark:</p>

 <ul>
 <li>Our 40-instance (m2.2xlarge) cluster can scan, filter, and aggregate 1 billion rows in 950 milliseconds.</li>
 </ul>

 <p><a href="http://metamarkets.com/2011/druid-part-deux-three-principles-for-fast-distributed-olap/">CONTINUE TO PART II…</a></p>

       </div>
     </div>
   </div>
 </div>


     <!-- Start page_footer include -->
 <footer class="druid-footer">
 <div class="container">
   <div class="text-center">
     <p>
     <a href="/technology">Technology</a>&ensp;·&ensp;
     <a href="/use-cases">Use Cases</a>&ensp;·&ensp;
     <a href="/druid-powered">Powered by Druid</a>&ensp;·&ensp;
     <a href="/docs/latest">Docs</a>&ensp;·&ensp;
     <a href="/community/">Community</a>&ensp;·&ensp;
     <a href="/downloads.html">Download</a>&ensp;·&ensp;
     <a href="/faq">FAQ</a>
     </p>
   </div>
   <div class="text-center">
     <a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a>&ensp;·&ensp;
     <a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a>&ensp;·&ensp;
     <a title="Download via Apache" href="https://www.apache.org/dyn/closer.cgi?path=/incubator/druid/0.16.1-incubating/apache-druid-0.16.1-incubating-bin.tar.gz" target="_blank"><span class="fas fa-feather"></span></a>&ensp;·&ensp;
     <a title="GitHub" href="https://github.com/apache/incubator-druid" target="_blank"><span class="fab fa-github"></span></a>
   </div>
   <div class="text-center license">
     Copyright © 2019 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br>
     Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br>
     Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
   </div>
 </div>
 </footer>

 <script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script>
 <script>
   window.dataLayer = window.dataLayer || [];
   function gtag(){dataLayer.push(arguments);}
   gtag('js', new Date());
   gtag('config', 'UA-131010415-1');
 </script>
 <script>
   function trackDownload(type, url) {
     ga('send', 'event', 'download', type, url);
   }
 </script>
 <script src="//code.jquery.com/jquery.min.js"></script>
 <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
 <script src="/assets/js/druid.js"></script>
 <!-- stop page_footer include -->


   </body>
 </html>
	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<meta name="description" content="Apache Druid">
	<meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source">
	<meta name="author" content="Apache Software Foundation">

	<title>Druid \| Introducing Druid: Real-Time Analytics at a Billion Rows Per Second</title>

	<link rel="alternate" type="application/atom+xml" href="/feed">
	<link rel="shortcut icon" href="/img/favicon.png">

	<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous">

	<link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic\|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'>

	<link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1">
	<link rel="stylesheet" href="/css/base.css?v=1.1">
	<link rel="stylesheet" href="/css/header.css?v=1.1">
	<link rel="stylesheet" href="/css/footer.css?v=1.1">
	<link rel="stylesheet" href="/css/syntax.css?v=1.1">
	<link rel="stylesheet" href="/css/docs.css?v=1.1">

	<script>
	(function() {
	var cx = '000162378814775985090:molvbm0vggm';
	var gcse = document.createElement('script');
	gcse.type = 'text/javascript';
	gcse.async = true;
	gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
	'//cse.google.com/cse.js?cx=' + cx;
	var s = document.getElementsByTagName('script')[0];
	s.parentNode.insertBefore(gcse, s);
	})();
	</script>


	</head>
	<body>
	<!-- Start page_header include -->
	<script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>

	<div class="top-navigator">
	<div class="container">
	<div class="left-cont">
	<a class="logo" href="/"><span class="druid-logo"></span></a>
	</div>
	<div class="right-cont">
	<ul class="links">
	<li class=""><a href="/technology">Technology</a></li>
	<li class=""><a href="/use-cases">Use Cases</a></li>
	<li class=""><a href="/druid-powered">Powered By</a></li>
	<li class=""><a href="/docs/latest/design/">Docs</a></li>
	<li class=""><a href="/community/">Community</a></li>
	<li class="header-dropdown">
	<a>Apache</a>
	<div class="header-dropdown-menu">
	<a href="https://www.apache.org/" target="_blank">Foundation</a>
	<a href="https://www.apache.org/events/current-event" target="_blank">Events</a>
	<a href="https://www.apache.org/licenses/" target="_blank">License</a>
	<a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a>
	<a href="https://www.apache.org/security/" target="_blank">Security</a>
	<a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a>
	</div>
	</li>
	<li class=" button-link"><a href="/downloads.html">Download</a></li>
	</ul>
	</div>
	</div>
	<div class="action-button menu-icon">
	<span class="fa fa-bars"></span> MENU
	</div>
	<div class="action-button menu-icon-close">
	<span class="fa fa-times"></span> MENU
	</div>
	</div>

	<script type="text/javascript">
	var $menu = $('.right-cont');
	var $menuIcon = $('.menu-icon');
	var $menuIconClose = $('.menu-icon-close');

	function showMenu() {
	$menu.fadeIn(100);
	$menuIcon.fadeOut(100);
	$menuIconClose.fadeIn(100);
	}

	$menuIcon.click(showMenu);

	function hideMenu() {
	$menu.fadeOut(100);
	$menuIconClose.fadeOut(100);
	$menuIcon.fadeIn(100);
	}

	$menuIconClose.click(hideMenu);

	$(window).resize(function() {
	if ($(window).width() >= 840) {
	$menu.fadeIn(100);
	$menuIcon.fadeOut(100);
	$menuIconClose.fadeOut(100);
	}
	else {
	$menu.fadeOut(100);
	$menuIcon.fadeIn(100);
	$menuIconClose.fadeOut(100);
	}
	});
	</script>

	<!-- Stop page_header include -->


	<link rel="stylesheet" href="/css/blogs.css">

	<div class="blog druid-header">
	<div class="row">
	<div class="col-md-8 col-md-offset-2">
	<div class="title-image-wrap">

	<div class="title-spacer"></div>
	<img class="title-image" src="http://metamarkets.com/wp-content/uploads/2011/04/fastcar-sized-470x288.jpg" alt="Introducing Druid: Real-Time Analytics at a Billion Rows Per Second"/>

	</div>
	</div>
	</div>
	</div>

	<div class="container blog">
	<div class="row">
	<div class="col-md-8 col-md-offset-2">
	<div class="blog-entry">
	<h1>Introducing Druid: Real-Time Analytics at a Billion Rows Per Second</h1>
	<p class="text-muted">by <span class="author text-uppercase">Eric Tschetter</span> · April 30, 2011</p>

	<p>Here at Metamarkets we have developed a web-based analytics console that
	supports drill-downs and roll-ups of high dimensional data sets – comprising
	billions of events – in real-time. This is the first of two blog posts
	introducing Druid, the data store that powers our console. Over the last twelve
	months, we tried and failed to achieve scale and speed with relational databases
	(Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). So instead we did
	something crazy: we rolled our own database. Druid is the distributed, in-memory
	OLAP data store that resulted.</p>

	<p><strong>The Challenge: Fast Roll-Ups Over Big Data</strong></p>

	<p>To frame our discussion, let’s begin with an illustration of what our raw impression event logs look
	like, containing many dimensions and two metrics (click and price).</p>
	<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>timestamp publisher advertiser gender country dimensions click price
	2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65
	2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62
	2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45
	...
	2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87
	2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99
	2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53
	...
	</code></pre></div>
	<p>We call this our <em>alpha</em> data set. We perform a first-level aggregation operation over a selected set of
	dimensions, equivalent to (in pseudocode):</p>
	<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>GROUP BY timestamp, publisher, advertiser, gender, country
	:: impressions = COUNT(1), clicks = SUM(click), revenue = SUM(price)
	</code></pre></div>
	<p>to yield a compacted version:</p>
	<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span> timestamp publisher advertiser gender country impressions clicks revenue
	2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70
	2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18
	2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31
	2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01
	</code></pre></div>
	<p>This is our <em>beta</em> data set, filtered for five selected dimensions and compacted. In the limit, as we group
	by all available dimensions, the size of this aggregated beta converges to the original <em>alpha</em>. In practice,
	it is dramatically smaller (often by a factor of 100). Our <em>beta</em> data comprises three distinct parts:</p>

	<blockquote>
	<p><strong>Timestamp column</strong>: We treat timestamp separately because all of our queries
	center around the time axis. Timestamps are faceted by varying granularities
	(hourly, in the example above).</p>

	<p><strong>Dimension columns</strong>: Here we have four dimensions of publisher, advertiser,
	gender, and country. They each represent an axis of the data that we’ve chosen
	to slice across.</p>

	<p><strong>Metric columns</strong>: These are impressions, clicks and revenue. These represent
	values, usually numeric, which are derived from an aggregation operation – such
	as count, sum, and mean (we also run variance and higher moment calculations).
	For example, in the first row, the revenue metric of 15.70 is the sum of 1800
	event-level prices.</p>
	</blockquote>

	<p>Our goal is to rapidly compute drill-downs and roll-ups over this data set. We
	want to answer questions like “How many impressions from males were on
	bieberfever.com?” and “What is the average cost to advertise to women at
	ultratrimfast.com?” But we have a hard requirement to meet: we want queries
	over any arbitrary combination of dimensions at sub-second latencies.</p>

	<p>Performance of such a system is dependent on the size of our beta set, and
	there are two ways that this becomes large: (i) when we include additional
	dimensions, and (ii) when we include a dimension whose cardinality is large.
	Using our example, for every hour’s worth of data we calculate the maximum
	number of rows as:</p>
	<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>number_of_publishers * number_of_advertisers * number_of_genders * number of countries
	</code></pre></div>
	<p>If we have 10 publishers, 50 advertisers, 2 genders, and 120 countries, that
	would yield a maximum of 120,000 rows. If there had been 1,000,000 possible
	publishers, it would become a maximum of 12 billion rows. If we add 10 more
	dimensions of cardinality 10, then it becomes a maximum of 1.2 quadrillion (1.2
	x 10^15) rows.</p>

	<p>Luckily for us, these data sets are generally sparse, as dimension values are
	not conditionally independent (few Kazakhstanis visit beiberfever.com, for
	example). Thus the combinatorial explosion is far less than the theoretical
	worst-case. Nonetheless, as a rule, more dimensions and more cardinality
	dramatically inflate the size of the data set.</p>

	<p><strong>Failed Solution I: Dynamic Roll-Ups with a RDBMS</strong></p>

	<p>Our stated goals of aggregation and drill-down are well suited to a classical
	relational architecture. So about a year ago, we fired up a RDBMS instance
	(actually, the Greenplum Community Edition, running on an m1.large EC2 box),
	and began loading our data into it. It worked and we were able to build the
	initial version of our console on this system. However, we had two problems:</p>

	<ol>
	<li><p>We stored the data in a star schema, which meant that there was operational
	overhead maintaining dimension and fact tables.</p></li>
	<li><p>Whenever we needed to do a full table scan, for things like global counts,
	the queries ran slow. For example, naive benchmarks showed scanning 33
	million rows took 3 seconds.</p></li>
	</ol>

	<p>We initially just decided to eat the operational overhead of (1) because that’s
	how these systems work and we benefited from having the database to do our
	storage and computation. But, (2) was painful. We started materializing all
	dimensional roll-ups of a certain depth, and began routing queries to these
	pre-aggregated tables. We also implemented a caching layer in front of our
	queries.</p>

	<p>This approach generally worked and is, I believe, a fairly common strategy in
	the space. Except, when things weren’t in the cache and a query couldn’t be
	mapped to a pre-aggregated table, we were back to full scans and slow
	performance. We tried indexing our way out of it, but given that we are
	allowing arbitrary combinations of dimensions, we couldn’t really take
	advantage of composite indexes. Additionally, index merge strategies are not
	always implemented, or only implemented for bitmap indexes, depending on the
	flavor of RDBMS.</p>

	<p>We also benchmarked plain Postgres, MySQL, and InfoBright, but did not observe
	dramatically better performance. Seeing no path ahead for our relational
	database, we turned to one of those new-fangled, massively scalable NOSQL
	solutions.</p>

	<p><strong>Failed Solution II: Pre-compute the World in NoSQL</strong></p>

	<p>We used a data storage schema very similar to Twitter’s
	<a href="http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011">Rainbird</a>.</p>

	<p>In short, we took all of our data and pre-computed aggregates for every
	combination of dimensions. At query time we need only locate the specific
	pre-computed aggregate and and return it: an O(1) key-value lookup. This made
	things fast and worked wonderfully when we had a six dimension beta data set.
	But when we added five more dimensions – giving us 11 dimensions total – the
	time to pre-compute all aggregates became unmanageably large (such that we
	never waited more than 24 hours required to see it finish).</p>

	<p>So we decided to limit the depth that we aggregated to. By only pre-computing
	aggregates of five dimensions or less, we were able to limit some of the
	exponential expansion of the data. The data became manageable again, meaning it
	only took about 4 hours on 15 machines to compute the expansion of a 500k beta
	rows into the full multi-billion entry output data set.</p>

	<p>Then we added three more dimensions, bringing us up to 14. This turned into 9
	hours on 25 machines. We realized that we were <a href="http://knowyourmeme.com/memes/youre-doing-it-wrong">doing it
	wrong</a>.</p>

	<p>Lesson learned: massively scalable counter systems like rainbird are intended
	for high cardinality data sets with pre-defined hierarchical drill-downs. But
	they break down when supporting arbitrary drill downs across all dimensions.</p>

	<p><strong>Introducing Druid: A Distributed, In-Memory OLAP Store</strong></p>

	<p>Stepping back from our two failures, let’s examine why these systems failed to
	scale for our needs:</p>

	<ol>
	<li>Relational Database Architectures

	<ul>
	<li>Full table scans were slow, regardless of the storage engine used</li>
	<li>Maintaining proper dimension tables, indexes and aggregate tables was painful</li>
	<li>Parallelization of queries was not always supported or non-trivial</li>
	</ul></li>
	<li>Massive NOSQL With Pre-Computation

	<ul>
	<li>Supporting high dimensional OLAP requires pre-computing an exponentially large amount of data</li>
	</ul></li>
	</ol>

	<p>Looking at the problems with these solutions, it looks like the first,
	RDBMS-style architecture has a simpler issue to tackle: namely, how to scan
	tables fast? When we were looking at our 500k row data set, someone remarked,
	“Dude, I can store that in memory”. That was the answer.</p>

	<p>Keeping everything in memory provides fast scans, but it does introduce a new
	problem: machine memory is limited. The corollary thus was: distribute the data
	over multiple machines. Thus, our requirements were:</p>

	<ul>
	<li>Ability to load up, store, and query data sets in memory</li>
	<li>Parallelized architecture that allows us to add more machines in order to relieve memory pressure</li>
	</ul>

	<p>And then we threw in a couple more that seemed like good ideas:</p>

	<ul>
	<li>Parallelized queries to speed up full scan processing</li>
	<li>No dimensional tables to manage</li>
	</ul>

	<p>These are the requirements we used to implement Druid. The system makes a
	number of simplifying assumptions that fit our use case (namely that all
	analytics are time-based) and integrates access to real-time and historical
	data for a configurable amount of time into the past.</p>

	<p>The <a href="http://metamarketsgroup.com/blog/druid-part-deux-three-principles-for-fast-distributed-olap/">next
	installment</a>
	will go into the architecture of Druid, how queries work and how the system can
	scale out to handle query hotspots and high cardinality data sets. For now, we
	leave you with a benchmark:</p>

	<ul>
	<li>Our 40-instance (m2.2xlarge) cluster can scan, filter, and aggregate 1 billion rows in 950 milliseconds.</li>
	</ul>

	<p><a href="http://metamarkets.com/2011/druid-part-deux-three-principles-for-fast-distributed-olap/">CONTINUE TO PART II…</a></p>

	</div>
	</div>
	</div>
	</div>


	<!-- Start page_footer include -->
	<footer class="druid-footer">
	<div class="container">
	<div class="text-center">
	<p>
	<a href="/technology">Technology</a>&ensp;·&ensp;
	<a href="/use-cases">Use Cases</a>&ensp;·&ensp;
	<a href="/druid-powered">Powered by Druid</a>&ensp;·&ensp;
	<a href="/docs/latest">Docs</a>&ensp;·&ensp;
	<a href="/community/">Community</a>&ensp;·&ensp;
	<a href="/downloads.html">Download</a>&ensp;·&ensp;
	<a href="/faq">FAQ</a>
	</p>
	</div>
	<div class="text-center">
	<a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a>&ensp;·&ensp;
	<a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a>&ensp;·&ensp;
	<a title="Download via Apache" href="https://www.apache.org/dyn/closer.cgi?path=/incubator/druid/0.16.1-incubating/apache-druid-0.16.1-incubating-bin.tar.gz" target="_blank"><span class="fas fa-feather"></span></a>&ensp;·&ensp;
	<a title="GitHub" href="https://github.com/apache/incubator-druid" target="_blank"><span class="fab fa-github"></span></a>
	</div>
	<div class="text-center license">
	Copyright © 2019 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br>
	Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br>
	Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
	</div>
	</div>
	</footer>

	<script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script>
	<script>
	window.dataLayer = window.dataLayer \|\| [];
	function gtag(){dataLayer.push(arguments);}
	gtag('js', new Date());
	gtag('config', 'UA-131010415-1');
	</script>
	<script>
	function trackDownload(type, url) {
	ga('send', 'event', 'download', type, url);
	}
	</script>
	<script src="//code.jquery.com/jquery.min.js"></script>
	<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
	<script src="/assets/js/druid.js"></script>
	<!-- stop page_footer include -->


	</body>
	</html>