| <!doctype html> |
| <html lang="en" dir="ltr" class="docs-wrapper docs-doc-page docs-version-current plugin-docs plugin-id-default docs-doc-id-comparisons/druid-vs-sql-on-hadoop" data-has-hydrated="false"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta name="generator" content="Docusaurus v2.4.3"> |
| <title data-rh="true">Apache Druid vs SQL-on-Hadoop | Apache® Druid</title><meta data-rh="true" name="viewport" content="width=device-width,initial-scale=1"><meta data-rh="true" name="twitter:card" content="summary_large_image"><meta data-rh="true" property="og:image" content="https://druid.apache.org/img/druid_nav.png"><meta data-rh="true" name="twitter:image" content="https://druid.apache.org/img/druid_nav.png"><meta data-rh="true" property="og:url" content="https://druid.apache.org/docs/latest/comparisons/druid-vs-sql-on-hadoop"><meta data-rh="true" name="docusaurus_locale" content="en"><meta data-rh="true" name="docsearch:language" content="en"><meta data-rh="true" name="docusaurus_version" content="current"><meta data-rh="true" name="docusaurus_tag" content="docs-default-current"><meta data-rh="true" name="docsearch:version" content="current"><meta data-rh="true" name="docsearch:docusaurus_tag" content="docs-default-current"><meta data-rh="true" property="og:title" content="Apache Druid vs SQL-on-Hadoop | Apache® Druid"><meta data-rh="true" name="description" content="<!--"><meta data-rh="true" property="og:description" content="<!--"><link data-rh="true" rel="icon" href="/img/favicon.png"><link data-rh="true" rel="canonical" href="https://druid.apache.org/docs/latest/comparisons/druid-vs-sql-on-hadoop"><link data-rh="true" rel="alternate" href="https://druid.apache.org/docs/latest/comparisons/druid-vs-sql-on-hadoop" hreflang="en"><link data-rh="true" rel="alternate" href="https://druid.apache.org/docs/latest/comparisons/druid-vs-sql-on-hadoop" hreflang="x-default"><link rel="preconnect" href="https://www.google-analytics.com"> |
| <link rel="preconnect" href="https://www.googletagmanager.com"> |
| <script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script> |
| <script>function gtag(){dataLayer.push(arguments)}window.dataLayer=window.dataLayer||[],gtag("js",new Date),gtag("config","UA-131010415-1",{})</script> |
| |
| |
| |
| |
| |
| <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css"> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js"></script><link rel="stylesheet" href="/assets/css/styles.60a7f877.css"> |
| <link rel="preload" href="/assets/js/runtime~main.3dd217d1.js" as="script"> |
| <link rel="preload" href="/assets/js/main.8b320f33.js" as="script"> |
| </head> |
| <body class="navigation-with-keyboard"> |
| <script>!function(){function t(t){document.documentElement.setAttribute("data-theme",t)}var e=function(){var t=null;try{t=new URLSearchParams(window.location.search).get("docusaurus-theme")}catch(t){}return t}()||function(){var t=null;try{t=localStorage.getItem("theme")}catch(t){}return t}();t(null!==e?e:"light")}()</script><div id="__docusaurus"> |
| <div role="region" aria-label="Skip to main content"><a class="skipToContent_fXgn" href="#__docusaurus_skipToContent_fallback">Skip to main content</a></div><nav aria-label="Main" class="navbar navbar--fixed-top navbar--dark"><div class="navbar__inner"><div class="navbar__items"><button aria-label="Toggle navigation bar" aria-expanded="false" class="navbar__toggle clean-btn" type="button"><svg width="30" height="30" viewBox="0 0 30 30" aria-hidden="true"><path stroke="currentColor" stroke-linecap="round" stroke-miterlimit="10" stroke-width="2" d="M4 7h22M4 15h22M4 23h22"></path></svg></button><a class="navbar__brand" href="/"><div class="navbar__logo"><img src="/img/druid_nav.png" alt="Apache® Druid" class="themedImage_ToTc themedImage--light_HNdA"><img src="/img/druid_nav.png" alt="Apache® Druid" class="themedImage_ToTc themedImage--dark_i4oU"></div></a></div><div class="navbar__items navbar__items--right"><a class="navbar__item navbar__link" href="/technology">Technology</a><a class="navbar__item navbar__link" href="/use-cases">Use Cases</a><a class="navbar__item navbar__link" href="/druid-powered">Powered By</a><a class="navbar__item navbar__link" href="/docs/latest/design/">Docs</a><a class="navbar__item navbar__link" href="/community/">Community</a><div class="navbar__item dropdown dropdown--hoverable dropdown--right"><a href="#" aria-haspopup="true" aria-expanded="false" role="button" class="navbar__link">Apache®</a><ul class="dropdown__menu"><li><a href="https://www.apache.org/" target="_blank" rel="noopener noreferrer" class="dropdown__link">Foundation<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://apachecon.com/?ref=druid.apache.org" target="_blank" rel="noopener noreferrer" class="dropdown__link">Events<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/licenses/" target="_blank" rel="noopener noreferrer" class="dropdown__link">License<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Thanks<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/security/" target="_blank" rel="noopener noreferrer" class="dropdown__link">Security<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Sponsorship<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li></ul></div><a class="navbar__item navbar__link" href="/downloads/">Download</a><div class="searchBox_ZlJk"><div class="navbar__search"><span aria-label="expand searchbar" role="button" class="search-icon" tabindex="0"></span><input type="search" id="search_input_react" placeholder="Loading..." aria-label="Search" class="navbar__search-input search-bar" disabled=""></div></div></div></div><div role="presentation" class="navbar-sidebar__backdrop"></div></nav><div id="__docusaurus_skipToContent_fallback" class="main-wrapper mainWrapper_z2l0 docsWrapper_BCFX"><button aria-label="Scroll back to top" class="clean-btn theme-back-to-top-button backToTopButton_sjWU" type="button"></button><div class="docPage__5DB"><main class="docMainContainer_gTbr docMainContainerEnhanced_Uz_u"><div class="container padding-top--md padding-bottom--lg"><div class="row"><div class="col docItemCol_VOVn"><div class="docItemContainer_Djhp"><article><div class="tocCollapsible_ETCw theme-doc-toc-mobile tocMobile_ITEo"><button type="button" class="clean-btn tocCollapsibleButton_TO0P">On this page</button></div><div class="theme-doc-markdown markdown"><header><h1>Apache Druid vs SQL-on-Hadoop</h1></header><p>SQL-on-Hadoop engines provide an |
| execution engine for various data formats and data stores, and |
| many can be made to push down computations down to Druid, while providing a SQL interface to Druid.</p><p>For a direct comparison between the technologies and when to only use one or the other, things basically comes down to your |
| product requirements and what the systems were designed to do.</p><p>Druid was designed to</p><ol><li>be an always on service</li><li>ingest data in real-time</li><li>handle slice-n-dice style ad-hoc queries</li></ol><p>SQL-on-Hadoop engines generally sidestep Map/Reduce, instead querying data directly from HDFS or, in some cases, other storage systems. |
| Some of these engines (including Impala and Presto) can be co-located with HDFS data nodes and coordinate with them to achieve data locality for queries. |
| What does this mean? We can talk about it in terms of three general areas</p><ol><li>Queries</li><li>Data Ingestion</li><li>Query Flexibility</li></ol><h3 class="anchor anchorWithStickyNavbar_LWe7" id="queries">Queries<a href="#queries" class="hash-link" aria-label="Direct link to Queries" title="Direct link to Queries"></a></h3><p>Druid segments stores data in a custom column format. Segments are scanned directly as part of queries and each Druid server |
| calculates a set of results that are eventually merged at the Broker level. This means the data that is transferred between servers |
| are queries and results, and all computation is done internally as part of the Druid servers.</p><p>Most SQL-on-Hadoop engines are responsible for query planning and execution for underlying storage layers and storage formats. |
| They are processes that stay on even if there is no query running (eliminating the JVM startup costs from Hadoop MapReduce). |
| Some (Impala/Presto) SQL-on-Hadoop engines have daemon processes that can be run where the data is stored, virtually eliminating network transfer costs. There is still |
| some latency overhead (e.g. serialization/deserialization time) associated with pulling data from the underlying storage layer into the computation layer. We are unaware of exactly |
| how much of a performance impact this makes.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="data-ingestion">Data Ingestion<a href="#data-ingestion" class="hash-link" aria-label="Direct link to Data Ingestion" title="Direct link to Data Ingestion"></a></h3><p>Druid is built to allow for real-time ingestion of data. You can ingest data and query it immediately upon ingestion, |
| the latency between how quickly the event is reflected in the data is dominated by how long it takes to deliver the event to Druid.</p><p>SQL-on-Hadoop, being based on data in HDFS or some other backing store, are limited in their data ingestion rates by the |
| rate at which that backing store can make data available. Generally, the backing store is the biggest bottleneck for |
| how quickly data can become available.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="query-flexibility">Query Flexibility<a href="#query-flexibility" class="hash-link" aria-label="Direct link to Query Flexibility" title="Direct link to Query Flexibility"></a></h3><p>Druid's query language is fairly low level and maps to how Druid operates internally. Although Druid can be combined with a high level query |
| planner to support most SQL queries and analytic SQL queries (minus joins among large tables), |
| base Druid is less flexible than SQL-on-Hadoop solutions for generic processing.</p><p>SQL-on-Hadoop support SQL style queries with full joins.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="druid-vs-parquet">Druid vs Parquet<a href="#druid-vs-parquet" class="hash-link" aria-label="Direct link to Druid vs Parquet" title="Direct link to Druid vs Parquet"></a></h2><p>Parquet is a column storage format that is designed to work with SQL-on-Hadoop engines. Parquet doesn't have a query execution engine, and instead |
| relies on external sources to pull data out of it.</p><p>Druid's storage format is highly optimized for linear scans. Although Druid has support for nested data, Parquet's storage format is much |
| more hierarchical, and is more designed for binary chunking. In theory, this should lead to faster scans in Druid.</p></div></article><nav class="pagination-nav docusaurus-mt-lg" aria-label="Docs pages"></nav></div></div><div class="col col--3"><div class="tableOfContents_bqdL thin-scrollbar theme-doc-toc-desktop"><ul class="table-of-contents table-of-contents__left-border"><li><a href="#queries" class="table-of-contents__link toc-highlight">Queries</a></li><li><a href="#data-ingestion" class="table-of-contents__link toc-highlight">Data Ingestion</a></li><li><a href="#query-flexibility" class="table-of-contents__link toc-highlight">Query Flexibility</a></li><li><a href="#druid-vs-parquet" class="table-of-contents__link toc-highlight">Druid vs Parquet</a></li></ul></div></div></div></div></main></div></div><footer class="footer"><div class="container container-fluid"><div class="footer__bottom text--center"><div class="margin-bottom--sm"><img src="/img/favicon.png" class="themedImage_ToTc themedImage--light_HNdA footer__logo"><img src="/img/favicon.png" class="themedImage_ToTc themedImage--dark_i4oU footer__logo"></div><div class="footer__copyright">Copyright © 2023 Apache Software Foundation. Except where otherwise noted, licensed under CC BY-SA 4.0. Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</div></div></div></footer></div> |
| <script src="/assets/js/runtime~main.3dd217d1.js"></script> |
| <script src="/assets/js/main.8b320f33.js"></script> |
| </body> |
| </html> |