blob: 89441285824c554d3989a07d1b44e3430f5011b1 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name=viewport content="width=device-width, initial-scale=1">
<title>Frequently Asked Questions - Apache Drill</title>
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
<link href="/css/site.css" rel="stylesheet" type="text/css"/>
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"/>
<link rel="icon" href="/favicon.ico" type="image/x-icon"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.1/jquery.min.js" language="javascript" type="text/javascript"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js" language="javascript" type="text/javascript"></script>
<script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script>
<script language="javascript" type="text/javascript" src="/js/script.js"></script>
<script language="javascript" type="text/javascript" src="/js/drill.js"></script>
</head>
<body onResize="resized();">
<div class="page-wrap">
<div class="bui"></div>
<div id="menu" class="mw">
<ul>
<li class='toc-categories'>
<a class="expand-toc-icon" href="javascript:void(0);"><i class="fa fa-bars"></i></a>
</li>
<li class="logo"><a href="/"></a></li>
<li class='expand-menu'>
<a href="javascript:void(0);"><span class='menu-text'>Menu</span><span class='expand-icon'><i class="fa fa-bars"></i></span></a>
</li>
<li class="clear-float"></li>
<li class="nav">
<a>Language</a>
<ul>
<li>
<a style="font-weight: bold;" href="/faq/" >en</a>
</li>
<li>
<a href="/zh/faq/" >zh</a>
</li>
</ul>
</li>
<li class="apache-link">
<a href="/apacheASF/">Apache</a>
</li>
<li class="poweredby">
<a href="/poweredBy">Powered By</a>
</li>
<li class="documentation-menu">
<a href="/docs/">Documentation</a>
<ul>
<li><a href="/docs/getting-started/">Getting Started</a></li>
<li><a href="/docs/architecture/">Architecture</a></li>
<li><a href="/docs/tutorials/">Tutorials</a></li>
<li><a href="/docs/drill-on-yarn/">Drill-on-YARN</a></li>
<li><a href="/docs/install-drill/">Install Drill</a></li>
<li><a href="/docs/configure-drill/">Configure Drill</a></li>
<li><a href="/docs/connect-a-data-source/">Connect a Data Source</a></li>
<li><a href="/docs/odbc-jdbc-interfaces/">ODBC/JDBC Interfaces</a></li>
<li><a href="/docs/query-data/">Query Data</a></li>
<li><a href="/docs/performance-tuning/">Performance Tuning</a></li>
<li><a href="/docs/log-and-debug/">Log and Debug</a></li>
<li><a href="/docs/sql-reference/">SQL Reference</a></li>
<li><a href="/docs/data-sources-and-file-formats/">Data Sources and File Formats</a></li>
<li><a href="/docs/develop-custom-functions/">Develop Custom Functions</a></li>
<li><a href="/docs/troubleshooting/">Troubleshooting</a></li>
<li><a href="/docs/developer-information/">Developer Information</a></li>
<li><a href="/docs/release-notes/">Release Notes</a></li>
<li><a href="/docs/sample-datasets/">Sample Datasets</a></li>
<li><a href="/docs/project-bylaws/">Project Bylaws</a></li>
<li><a href="/docs/ecosystem/">Ecosystem</a></li>
</ul>
</li>
<li class='nav'>
<a href="/community-resources/">Community</a>
<ul>
<li><a href="/team/">Team</a></li>
<li><a href="/mailinglists/">Mailing Lists</a></li>
<li><a href="/community-resources/">Community Resources</a></li>
</ul>
</li>
<li class='nav'><a href="/faq/">FAQ</a></li>
<li class='nav'><a href="/blog/">Blog</a></li>
<li class="social-menu-item"><a href="https://twitter.com/apachedrill" title="apachedrill on twitter" target="_blank"><img src="/images/twitter_32_26_white.png" alt="twitter logo" align="center"></a> </li>
<li class="social-menu-item"><a href="https://join.slack.com/t/apache-drill/shared_invite/enQtNTQ4MjM1MDA3MzQ2LTJlYmUxMTRkMmUwYmQ2NTllYmFmMjU4MDk0NjYwZjBmYjg0MDZmOTE2ZDg0ZjBlYmI3Yjc4Y2I2NTQyNGVlZTc" title="Apache Drill Slack channels"
target="_blank"><img src="/images/slack-logo.svg" alt="Slack logo" align="center"></a> </li>
<li class='search-bar'>
<form id="drill-search-form">
<input type="text" placeholder="Search Apache Drill" id="drill-search-term" />
<button type="submit">
<i class="fa fa-search"></i>
</button>
</form>
</li>
<li class="d">
<a href="/download/">
<i class="fa fa-cloud-download"></i> Download
</a>
</li>
</ul>
</div>
<div class="int_title">
<h1>Frequently Asked Questions</h1>
</div>
<div class="int_text" align="left"><h2 id="overview">Overview</h2>
<h3 id="why-drill">Why Drill?</h3>
<p>The 40-year monopoly of the RDBMS is over. With the exponential growth of data in recent years, and the shift towards rapid application development, new data is increasingly being stored in non-relational datastores including Hadoop, NoSQL and cloud storage. Apache Drill enables analysts, business users, data scientists and developers to explore and analyze this data without sacrificing the flexibility and agility offered by these datastores. Drill processes the data in-situ without requiring users to define schemas or transform data.</p>
<h3 id="what-are-some-of-drills-key-features">What are some of Drill’s key features?</h3>
<p>Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores. Users can query the data using standard SQL and BI tools without having to create and manage schemas. Some of the key features are:</p>
<ul>
<li>Schema-free JSON document model similar to MongoDB and Elasticsearch</li>
<li>Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs</li>
<li>Extremely user and developer friendly</li>
<li>Pluggable architecture enables connectivity to multiple datastores</li>
</ul>
<h3 id="how-does-drill-achieve-performance">How does Drill achieve performance?</h3>
<p>Drill is built from the ground up to achieve high throughput and low latency. The following capabilities help accomplish that:</p>
<ul>
<li><strong>Distributed query optimization and execution</strong>: Drill is designed to scale from a single node (your laptop) to large clusters with thousands of servers.</li>
<li><strong>Columnar execution</strong>: Drill is the world’s only columnar execution engine that supports complex data and schema-free data. It uses a shredded, in-memory, columnar data representation.</li>
<li><strong>Runtime compilation and code generation</strong>: Drill is the world’s only query engine that compiles and re-compiles queries at runtime. This allows Drill to achieve high performance without knowing the structure of the data in advance. Drill leverages multiple compilers as well as ASM-based bytecode rewriting to optimize the code.</li>
<li><strong>Vectorization</strong>: Drill takes advantage of the latest SIMD instructions available in modern processors.</li>
<li><strong>Optimistic/pipelined execution</strong>: Drill is able to stream data in memory between operators. Drill minimizes the use of disks unless needed to complete the query.</li>
</ul>
<h3 id="what-datastores-does-drill-support">What datastores does Drill support?</h3>
<p>Drill is primarily focused on non-relational datastores, including Hadoop, NoSQL and cloud storage. The following datastores are currently supported:</p>
<ul>
<li><strong>Hadoop</strong>: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR</li>
<li><strong>NoSQL</strong>: MongoDB, HBase</li>
<li><strong>Cloud storage</strong>: Amazon S3, Google Cloud Storage, Azure Blog Storage, Swift</li>
</ul>
<p>A new datastore can be added by developing a storage plugin. Drill’s unique schema-free JSON data model enables it to query non-relational datastores in-situ (many of these systems store complex or schema-free data).</p>
<h3 id="what-clients-are-supported">What clients are supported?</h3>
<ul>
<li><strong>BI tools</strong> via the ODBC and JDBC drivers (eg, Tableau, Excel, MicroStrategy, Spotfire, QlikView, Business Objects)</li>
<li><strong>Custom applications</strong> via the REST API</li>
<li><strong>Java and C applications</strong> via the dedicated Java and C libraries</li>
</ul>
<h2 id="comparisons">Comparisons</h2>
<h3 id="is--drill-a-sql-on-hadoop-engine">Is Drill a ‘SQL-on-Hadoop’ engine?</h3>
<p>Drill supports a variety of non-relational datastores in addition to Hadoop. Drill takes a different approach compared to traditional SQL-on-Hadoop technologies like Hive and Impala. For example, users can directly query self-describing data (eg, JSON, Parquet) without having to create and manage schemas.</p>
<p>The following table provides a more detailed comparison between Drill and traditional SQL-on-Hadoop technologies:</p>
<table>
<thead>
<tr>
<th> </th>
<th>Drill</th>
<th>SQL-on-Hadoop (Hive, Impala, etc.)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Use case</td>
<td>Self-service, in-situ, SQL-based analytics</td>
<td>Data warehouse offload</td>
</tr>
<tr>
<td>Data sources</td>
<td>Hadoop, NoSQL, cloud storage (including multiple instances)</td>
<td>A single Hadoop cluster</td>
</tr>
<tr>
<td>Data model</td>
<td>Schema-free JSON (like MongoDB)</td>
<td>Relational</td>
</tr>
<tr>
<td>User experience</td>
<td>Point-and-query</td>
<td>Ingest data → define schemas → query</td>
</tr>
<tr>
<td>Deployment model</td>
<td>Standalone service or co-located with Hadoop or NoSQL</td>
<td>Co-located with Hadoop</td>
</tr>
<tr>
<td>Data management</td>
<td>Self-service</td>
<td>IT-driven</td>
</tr>
<tr>
<td>SQL</td>
<td>ANSI SQL</td>
<td>SQL-like</td>
</tr>
<tr>
<td>1.0 availability</td>
<td>Q2 2015</td>
<td>Q2 2013 or earlier</td>
</tr>
</tbody>
</table>
<h3 id="is-spark-sql-similar-to-drill">Is Spark SQL similar to Drill?</h3>
<p>No. Spark SQL is primarily designed to enable developers to incorporate SQL statements in Spark programs. Drill does not depend on Spark, and is targeted at business users, analysts, data scientists and developers.</p>
<h3 id="does-drill-replace-hive">Does Drill replace Hive?</h3>
<p>Hive is a batch processing framework most suitable for long-running jobs. For data exploration and BI, Drill provides a much better experience than Hive.</p>
<p>In addition, Drill is not limited to Hadoop. For example, it can query NoSQL databases (eg, MongoDB, HBase) and cloud storage (eg, Amazon S3, Google Cloud Storage, Azure Blob Storage, Swift).</p>
<h2 id="metadata">Metadata</h2>
<h3 id="how-does-drill-support-queries-on-self-describing-data">How does Drill support queries on self-describing data?</h3>
<p>Drill’s flexible JSON data model and on-the-fly schema discovery enable it to query self-describing data.</p>
<ul>
<li><strong>JSON data model</strong>: Traditional query engines have a relational data model, which is limited to flat records with a fixed structure. Drill is built from the ground up to support modern complex/semi-structured data commonly seen in non-relational datastores such as Hadoop, NoSQL and cloud storage. Drill’s internal in-memory data representation is hierarchical and columnar, allowing it to perform efficient SQL processing on complex data without flattening into rows.</li>
<li><strong>On-the-fly schema discovery (or late binding)</strong>: Traditional query engines (eg, relational databases, Hive, Impala, Spark SQL) need to know the structure of the data before query execution. Drill, on the other hand, features a fundamentally different architecture, which enables execution to begin without knowing the structure of the data. The query is automatically compiled and re-compiled during the execution phase, based on the actual data flowing through the system. As a result, Drill can handle data with evolving schema or even no schema at all (eg, JSON files, MongoDB collections, HBase tables).</li>
</ul>
<h3 id="but-i-already-have-schemas-defined-in-hive-metastore-can-i-use-that-with-drill">But I already have schemas defined in Hive Metastore? Can I use that with Drill?</h3>
<p>Absolutely. Drill has a storage plugin for Hive tables, so you can simply point Drill to the Hive Metastore and start performing low-latency queries on Hive tables. In fact, a single Drill cluster can query data from multiple Hive Metastores, and even perform joins across these datasets.</p>
<h3 id="is-drill-anti-schema-or-anti-dba">Is Drill “anti-schema” or “anti-DBA”?</h3>
<p>Not at all. Drill actually takes advantage of schemas when available. For example, Drill leverages the schema information in Hive when querying Hive tables. However, when querying schema-free datastores like MongoDB, or raw files on S3 or Hadoop, schemas are not available, and Drill is still able to query that data.</p>
<p>Centralized schemas work well if the data structure is static, and the value of data is well understood and ready to be operationalized for regular reporting purposes. However, during data exploration, discovery and interactive analysis, requiring rigid modeling poses significant challenges. For example:</p>
<ul>
<li>Complex data (eg, JSON) is hard to map to relational tables</li>
<li>Centralized schemas are hard to keep in sync when the data structure is changing rapidly</li>
<li>Non-repetitive/ad-hoc queries and data exploration needs may not justify modeling costs</li>
</ul>
<p>Drill is all about flexibility. The flexible schema management capabilities in Drill allow users to explore raw data and then create models/structure with <code class="language-plaintext highlighter-rouge">CREATE TABLE</code> or <code class="language-plaintext highlighter-rouge">CREATE VIEW</code> statements, or with Hive Metastore.</p>
<h3 id="what-does-a-drill-query-look-like">What does a Drill query look like?</h3>
<p>Drill uses a decentralized metadata model and relies on its storage plugins to provide metadata. There is a storage plugin associated with each data source that is supported by Drill.</p>
<p>The name of the table in a query tells Drill where to get the data:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">dfs1</span><span class="p">.</span><span class="n">root</span><span class="p">.</span><span class="nv">`/my/log/files/`</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">dfs2</span><span class="p">.</span><span class="n">root</span><span class="p">.</span><span class="nv">`/home/john/log.json`</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">mongodb1</span><span class="p">.</span><span class="n">website</span><span class="p">.</span><span class="n">users</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">hive1</span><span class="p">.</span><span class="n">logs</span><span class="p">.</span><span class="n">frontend</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">hbase1</span><span class="p">.</span><span class="n">events</span><span class="p">.</span><span class="n">clicks</span><span class="p">;</span>
</code></pre></div></div>
<h3 id="what-sql-functionality-does-drill-support">What SQL functionality does Drill support?</h3>
<p>Drill supports standard SQL (aka ANSI SQL). In addition, it features several extensions that help with complex data, such as the <code class="language-plaintext highlighter-rouge">KVGEN</code> and <code class="language-plaintext highlighter-rouge">FLATTEN</code> functions. For more details, refer to the <a href="/docs/sql-reference/">SQL Reference</a>.</p>
<h3 id="do-i-need-to-load-data-into-drill-to-start-querying-it">Do I need to load data into Drill to start querying it?</h3>
<p>No. Drill can query data ‘in-situ’.</p>
<h2 id="getting-started">Getting Started</h2>
<h3 id="what-is-the-best-way-to-get-started-with-drill">What is the best way to get started with Drill?</h3>
<p>The best way to get started is to try it out. It only takes a few minutes and all you need is a laptop (Mac, Windows or Linux). We’ve compiled <a href="/docs/tutorials-introduction/">several tutorials</a> to help you get started.</p>
<h3 id="how-can-i-ask-questions-and-provide-feedback">How can I ask questions and provide feedback?</h3>
<p>Please post your questions and feedback to <a href="mailto:user@drill.apache.org">user@drill.apache.org</a>. We are happy to help!</p>
<h3 id="how-can-i-contribute-to-drill">How can I contribute to Drill?</h3>
<p>The documentation has information on <a href="/docs/contribute-to-drill/">how to contribute</a>.</p>
</div>
</div>
<p class="push"></p>
<div id="footer" class="mw">
<div class="wrapper">
Copyright © 2012-2025 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br>
Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/>
</div>
</div>
<script type="text/javascript" src="https://s7.addthis.com/js/300/addthis_widget.js#pubid=ra-548b2caa33765e8d" async="async"></script>
</body>
</html>