| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="description" content="Apache Druid"> |
| <meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source"> |
| <meta name="author" content="Apache Software Foundation"> |
| |
| <title>Druid | Introducing Druid: Real-Time Analytics at a Billion Rows Per Second</title> |
| |
| <link rel="alternate" type="application/atom+xml" href="/feed"> |
| <link rel="shortcut icon" href="/img/favicon.png"> |
| |
| <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous"> |
| |
| <link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'> |
| |
| <link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1"> |
| <link rel="stylesheet" href="/css/base.css?v=1.1"> |
| <link rel="stylesheet" href="/css/header.css?v=1.1"> |
| <link rel="stylesheet" href="/css/footer.css?v=1.1"> |
| <link rel="stylesheet" href="/css/syntax.css?v=1.1"> |
| <link rel="stylesheet" href="/css/docs.css?v=1.1"> |
| |
| <script> |
| (function() { |
| var cx = '000162378814775985090:molvbm0vggm'; |
| var gcse = document.createElement('script'); |
| gcse.type = 'text/javascript'; |
| gcse.async = true; |
| gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + |
| '//cse.google.com/cse.js?cx=' + cx; |
| var s = document.getElementsByTagName('script')[0]; |
| s.parentNode.insertBefore(gcse, s); |
| })(); |
| </script> |
| |
| |
| </head> |
| <body> |
| <!-- Start page_header include --> |
| <script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> |
| |
| <div class="top-navigator"> |
| <div class="container"> |
| <div class="left-cont"> |
| <a class="logo" href="/"><span class="druid-logo"></span></a> |
| </div> |
| <div class="right-cont"> |
| <ul class="links"> |
| <li class=""><a href="/technology">Technology</a></li> |
| <li class=""><a href="/use-cases">Use Cases</a></li> |
| <li class=""><a href="/druid-powered">Powered By</a></li> |
| <li class=""><a href="/docs/latest/design/">Docs</a></li> |
| <li class=""><a href="/community/">Community</a></li> |
| <li class="header-dropdown"> |
| <a>Apache</a> |
| <div class="header-dropdown-menu"> |
| <a href="https://www.apache.org/" target="_blank">Foundation</a> |
| <a href="https://www.apache.org/events/current-event" target="_blank">Events</a> |
| <a href="https://www.apache.org/licenses/" target="_blank">License</a> |
| <a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> |
| <a href="https://www.apache.org/security/" target="_blank">Security</a> |
| <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a> |
| </div> |
| </li> |
| <li class=" button-link"><a href="/downloads.html">Download</a></li> |
| </ul> |
| </div> |
| </div> |
| <div class="action-button menu-icon"> |
| <span class="fa fa-bars"></span> MENU |
| </div> |
| <div class="action-button menu-icon-close"> |
| <span class="fa fa-times"></span> MENU |
| </div> |
| </div> |
| |
| <script type="text/javascript"> |
| var $menu = $('.right-cont'); |
| var $menuIcon = $('.menu-icon'); |
| var $menuIconClose = $('.menu-icon-close'); |
| |
| function showMenu() { |
| $menu.fadeIn(100); |
| $menuIcon.fadeOut(100); |
| $menuIconClose.fadeIn(100); |
| } |
| |
| $menuIcon.click(showMenu); |
| |
| function hideMenu() { |
| $menu.fadeOut(100); |
| $menuIconClose.fadeOut(100); |
| $menuIcon.fadeIn(100); |
| } |
| |
| $menuIconClose.click(hideMenu); |
| |
| $(window).resize(function() { |
| if ($(window).width() >= 840) { |
| $menu.fadeIn(100); |
| $menuIcon.fadeOut(100); |
| $menuIconClose.fadeOut(100); |
| } |
| else { |
| $menu.fadeOut(100); |
| $menuIcon.fadeIn(100); |
| $menuIconClose.fadeOut(100); |
| } |
| }); |
| </script> |
| |
| <!-- Stop page_header include --> |
| |
| |
| <link rel="stylesheet" href="/css/blogs.css"> |
| |
| <div class="blog druid-header"> |
| <div class="row"> |
| <div class="col-md-8 col-md-offset-2"> |
| <div class="title-image-wrap"> |
| |
| <div class="title-spacer"></div> |
| <img class="title-image" src="http://metamarkets.com/wp-content/uploads/2011/04/fastcar-sized-470x288.jpg" alt="Introducing Druid: Real-Time Analytics at a Billion Rows Per Second"/> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <div class="container blog"> |
| <div class="row"> |
| <div class="col-md-8 col-md-offset-2"> |
| <div class="blog-entry"> |
| <h1>Introducing Druid: Real-Time Analytics at a Billion Rows Per Second</h1> |
| <p class="text-muted">by <span class="author text-uppercase">Eric Tschetter</span> · April 30, 2011</p> |
| |
| <p>Here at Metamarkets we have developed a web-based analytics console that |
| supports drill-downs and roll-ups of high dimensional data sets – comprising |
| billions of events – in real-time. This is the first of two blog posts |
| introducing Druid, the data store that powers our console. Over the last twelve |
| months, we tried and failed to achieve scale and speed with relational databases |
| (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). So instead we did |
| something crazy: we rolled our own database. Druid is the distributed, in-memory |
| OLAP data store that resulted.</p> |
| |
| <p><strong>The Challenge: Fast Roll-Ups Over Big Data</strong></p> |
| |
| <p>To frame our discussion, let’s begin with an illustration of what our raw impression event logs look |
| like, containing many dimensions and two metrics (click and price).</p> |
| <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>timestamp publisher advertiser gender country dimensions click price |
| 2011-01-01T01:01:35Z bieberfever.com google.com Male USA 0 0.65 |
| 2011-01-01T01:03:63Z bieberfever.com google.com Male USA 0 0.62 |
| 2011-01-01T01:04:51Z bieberfever.com google.com Male USA 1 0.45 |
| ... |
| 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87 |
| 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99 |
| 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53 |
| ... |
| </code></pre></div> |
| <p>We call this our <em>alpha</em> data set. We perform a first-level aggregation operation over a selected set of |
| dimensions, equivalent to (in pseudocode):</p> |
| <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>GROUP BY timestamp, publisher, advertiser, gender, country |
| :: impressions = COUNT(1), clicks = SUM(click), revenue = SUM(price) |
| </code></pre></div> |
| <p>to yield a compacted version:</p> |
| <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span> timestamp publisher advertiser gender country impressions clicks revenue |
| 2011-01-01T01:00:00Z ultratrimfast.com google.com Male USA 1800 25 15.70 |
| 2011-01-01T01:00:00Z bieberfever.com google.com Male USA 2912 42 29.18 |
| 2011-01-01T02:00:00Z ultratrimfast.com google.com Male UK 1953 17 17.31 |
| 2011-01-01T02:00:00Z bieberfever.com google.com Male UK 3194 170 34.01 |
| </code></pre></div> |
| <p>This is our <em>beta</em> data set, filtered for five selected dimensions and compacted. In the limit, as we group |
| by all available dimensions, the size of this aggregated beta converges to the original <em>alpha</em>. In practice, |
| it is dramatically smaller (often by a factor of 100). Our <em>beta</em> data comprises three distinct parts:</p> |
| |
| <blockquote> |
| <p><strong>Timestamp column</strong>: We treat timestamp separately because all of our queries |
| center around the time axis. Timestamps are faceted by varying granularities |
| (hourly, in the example above).</p> |
| |
| <p><strong>Dimension columns</strong>: Here we have four dimensions of publisher, advertiser, |
| gender, and country. They each represent an axis of the data that we’ve chosen |
| to slice across.</p> |
| |
| <p><strong>Metric columns</strong>: These are impressions, clicks and revenue. These represent |
| values, usually numeric, which are derived from an aggregation operation – such |
| as count, sum, and mean (we also run variance and higher moment calculations). |
| For example, in the first row, the revenue metric of 15.70 is the sum of 1800 |
| event-level prices.</p> |
| </blockquote> |
| |
| <p>Our goal is to rapidly compute drill-downs and roll-ups over this data set. We |
| want to answer questions like “How many impressions from males were on |
| bieberfever.com?” and “What is the average cost to advertise to women at |
| ultratrimfast.com?” But we have a hard requirement to meet: we want queries |
| over any arbitrary combination of dimensions at sub-second latencies.</p> |
| |
| <p>Performance of such a system is dependent on the size of our beta set, and |
| there are two ways that this becomes large: (i) when we include additional |
| dimensions, and (ii) when we include a dimension whose cardinality is large. |
| Using our example, for every hour’s worth of data we calculate the maximum |
| number of rows as:</p> |
| <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>number_of_publishers * number_of_advertisers * number_of_genders * number of countries |
| </code></pre></div> |
| <p>If we have 10 publishers, 50 advertisers, 2 genders, and 120 countries, that |
| would yield a maximum of 120,000 rows. If there had been 1,000,000 possible |
| publishers, it would become a maximum of 12 billion rows. If we add 10 more |
| dimensions of cardinality 10, then it becomes a maximum of 1.2 quadrillion (1.2 |
| x 10^15) rows.</p> |
| |
| <p>Luckily for us, these data sets are generally sparse, as dimension values are |
| not conditionally independent (few Kazakhstanis visit beiberfever.com, for |
| example). Thus the combinatorial explosion is far less than the theoretical |
| worst-case. Nonetheless, as a rule, more dimensions and more cardinality |
| dramatically inflate the size of the data set.</p> |
| |
| <p><strong>Failed Solution I: Dynamic Roll-Ups with a RDBMS</strong></p> |
| |
| <p>Our stated goals of aggregation and drill-down are well suited to a classical |
| relational architecture. So about a year ago, we fired up a RDBMS instance |
| (actually, the Greenplum Community Edition, running on an m1.large EC2 box), |
| and began loading our data into it. It worked and we were able to build the |
| initial version of our console on this system. However, we had two problems:</p> |
| |
| <ol> |
| <li><p>We stored the data in a star schema, which meant that there was operational |
| overhead maintaining dimension and fact tables.</p></li> |
| <li><p>Whenever we needed to do a full table scan, for things like global counts, |
| the queries ran slow. For example, naive benchmarks showed scanning 33 |
| million rows took 3 seconds.</p></li> |
| </ol> |
| |
| <p>We initially just decided to eat the operational overhead of (1) because that’s |
| how these systems work and we benefited from having the database to do our |
| storage and computation. But, (2) was painful. We started materializing all |
| dimensional roll-ups of a certain depth, and began routing queries to these |
| pre-aggregated tables. We also implemented a caching layer in front of our |
| queries.</p> |
| |
| <p>This approach generally worked and is, I believe, a fairly common strategy in |
| the space. Except, when things weren’t in the cache and a query couldn’t be |
| mapped to a pre-aggregated table, we were back to full scans and slow |
| performance. We tried indexing our way out of it, but given that we are |
| allowing arbitrary combinations of dimensions, we couldn’t really take |
| advantage of composite indexes. Additionally, index merge strategies are not |
| always implemented, or only implemented for bitmap indexes, depending on the |
| flavor of RDBMS.</p> |
| |
| <p>We also benchmarked plain Postgres, MySQL, and InfoBright, but did not observe |
| dramatically better performance. Seeing no path ahead for our relational |
| database, we turned to one of those new-fangled, massively scalable NOSQL |
| solutions.</p> |
| |
| <p><strong>Failed Solution II: Pre-compute the World in NoSQL</strong></p> |
| |
| <p>We used a data storage schema very similar to Twitter’s |
| <a href="http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011">Rainbird</a>.</p> |
| |
| <p>In short, we took all of our data and pre-computed aggregates for every |
| combination of dimensions. At query time we need only locate the specific |
| pre-computed aggregate and and return it: an O(1) key-value lookup. This made |
| things fast and worked wonderfully when we had a six dimension beta data set. |
| But when we added five more dimensions – giving us 11 dimensions total – the |
| time to pre-compute all aggregates became unmanageably large (such that we |
| never waited more than 24 hours required to see it finish).</p> |
| |
| <p>So we decided to limit the depth that we aggregated to. By only pre-computing |
| aggregates of five dimensions or less, we were able to limit some of the |
| exponential expansion of the data. The data became manageable again, meaning it |
| only took about 4 hours on 15 machines to compute the expansion of a 500k beta |
| rows into the full multi-billion entry output data set.</p> |
| |
| <p>Then we added three more dimensions, bringing us up to 14. This turned into 9 |
| hours on 25 machines. We realized that we were <a href="http://knowyourmeme.com/memes/youre-doing-it-wrong">doing it |
| wrong</a>.</p> |
| |
| <p>Lesson learned: massively scalable counter systems like rainbird are intended |
| for high cardinality data sets with pre-defined hierarchical drill-downs. But |
| they break down when supporting arbitrary drill downs across all dimensions.</p> |
| |
| <p><strong>Introducing Druid: A Distributed, In-Memory OLAP Store</strong></p> |
| |
| <p>Stepping back from our two failures, let’s examine why these systems failed to |
| scale for our needs:</p> |
| |
| <ol> |
| <li>Relational Database Architectures |
| |
| <ul> |
| <li>Full table scans were slow, regardless of the storage engine used</li> |
| <li>Maintaining proper dimension tables, indexes and aggregate tables was painful</li> |
| <li>Parallelization of queries was not always supported or non-trivial</li> |
| </ul></li> |
| <li>Massive NOSQL With Pre-Computation |
| |
| <ul> |
| <li>Supporting high dimensional OLAP requires pre-computing an exponentially large amount of data</li> |
| </ul></li> |
| </ol> |
| |
| <p>Looking at the problems with these solutions, it looks like the first, |
| RDBMS-style architecture has a simpler issue to tackle: namely, how to scan |
| tables fast? When we were looking at our 500k row data set, someone remarked, |
| “Dude, I can store that in memory”. That was the answer.</p> |
| |
| <p>Keeping everything in memory provides fast scans, but it does introduce a new |
| problem: machine memory is limited. The corollary thus was: distribute the data |
| over multiple machines. Thus, our requirements were:</p> |
| |
| <ul> |
| <li>Ability to load up, store, and query data sets in memory</li> |
| <li>Parallelized architecture that allows us to add more machines in order to relieve memory pressure</li> |
| </ul> |
| |
| <p>And then we threw in a couple more that seemed like good ideas:</p> |
| |
| <ul> |
| <li>Parallelized queries to speed up full scan processing</li> |
| <li>No dimensional tables to manage</li> |
| </ul> |
| |
| <p>These are the requirements we used to implement Druid. The system makes a |
| number of simplifying assumptions that fit our use case (namely that all |
| analytics are time-based) and integrates access to real-time and historical |
| data for a configurable amount of time into the past.</p> |
| |
| <p>The <a href="http://metamarketsgroup.com/blog/druid-part-deux-three-principles-for-fast-distributed-olap/">next |
| installment</a> |
| will go into the architecture of Druid, how queries work and how the system can |
| scale out to handle query hotspots and high cardinality data sets. For now, we |
| leave you with a benchmark:</p> |
| |
| <ul> |
| <li>Our 40-instance (m2.2xlarge) cluster can scan, filter, and aggregate 1 billion rows in 950 milliseconds.</li> |
| </ul> |
| |
| <p><a href="http://metamarkets.com/2011/druid-part-deux-three-principles-for-fast-distributed-olap/">CONTINUE TO PART II…</a></p> |
| |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| |
| <!-- Start page_footer include --> |
| <footer class="druid-footer"> |
| <div class="container"> |
| <div class="text-center"> |
| <p> |
| <a href="/technology">Technology</a> ·  |
| <a href="/use-cases">Use Cases</a> ·  |
| <a href="/druid-powered">Powered by Druid</a> ·  |
| <a href="/docs/latest">Docs</a> ·  |
| <a href="/community/">Community</a> ·  |
| <a href="/downloads.html">Download</a> ·  |
| <a href="/faq">FAQ</a> |
| </p> |
| </div> |
| <div class="text-center"> |
| <a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a> ·  |
| <a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a> ·  |
| <a title="Download via Apache" href="https://www.apache.org/dyn/closer.cgi?path=/incubator/druid/0.16.1-incubating/apache-druid-0.16.1-incubating-bin.tar.gz" target="_blank"><span class="fas fa-feather"></span></a> ·  |
| <a title="GitHub" href="https://github.com/apache/incubator-druid" target="_blank"><span class="fab fa-github"></span></a> |
| </div> |
| <div class="text-center license"> |
| Copyright © 2019 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br> |
| Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br> |
| Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. |
| </div> |
| </div> |
| </footer> |
| |
| <script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script> |
| <script> |
| window.dataLayer = window.dataLayer || []; |
| function gtag(){dataLayer.push(arguments);} |
| gtag('js', new Date()); |
| gtag('config', 'UA-131010415-1'); |
| </script> |
| <script> |
| function trackDownload(type, url) { |
| ga('send', 'event', 'download', type, url); |
| } |
| </script> |
| <script src="//code.jquery.com/jquery.min.js"></script> |
| <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script> |
| <script src="/assets/js/druid.js"></script> |
| <!-- stop page_footer include --> |
| |
| |
| </body> |
| </html> |