blob: ef7ab1497d15d4b440e48fa2ac42e603e6bb28e6 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Apache Druid">
<meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source">
<meta name="author" content="Apache Software Foundation">
<title>Druid | Technology</title>
<link rel="canonical" href="https://druid.apache.org/technology" />
<link rel="alternate" type="application/atom+xml" href="/feed">
<link rel="shortcut icon" href="/img/favicon.png">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous">
<link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1">
<link rel="stylesheet" href="/css/base.css?v=1.1">
<link rel="stylesheet" href="/css/header.css?v=1.1">
<link rel="stylesheet" href="/css/footer.css?v=1.1">
<link rel="stylesheet" href="/css/syntax.css?v=1.1">
<link rel="stylesheet" href="/css/docs.css?v=1.1">
<script>
(function() {
var cx = '000162378814775985090:molvbm0vggm';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
'//cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
</head>
<body>
<!-- Start page_header include -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
<div class="top-navigator">
<div class="container">
<div class="left-cont">
<a class="logo" href="/"><span class="druid-logo"></span></a>
</div>
<div class="right-cont">
<ul class="links">
<li class=" active"><a href="/technology">Technology</a></li>
<li class=""><a href="/use-cases">Use Cases</a></li>
<li class=""><a href="/druid-powered">Powered By</a></li>
<li class=""><a href="/docs/latest/design/">Docs</a></li>
<li class=""><a href="/community/">Community</a></li>
<li class="header-dropdown">
<a>Apache</a>
<div class="header-dropdown-menu">
<a href="https://www.apache.org/" target="_blank">Foundation</a>
<a href="https://www.apache.org/events/current-event" target="_blank">Events</a>
<a href="https://www.apache.org/licenses/" target="_blank">License</a>
<a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a>
<a href="https://www.apache.org/security/" target="_blank">Security</a>
<a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a>
</div>
</li>
<li class=" button-link"><a href="/downloads.html">Download</a></li>
</ul>
</div>
</div>
<div class="action-button menu-icon">
<span class="fa fa-bars"></span> MENU
</div>
<div class="action-button menu-icon-close">
<span class="fa fa-times"></span> MENU
</div>
</div>
<script type="text/javascript">
var $menu = $('.right-cont');
var $menuIcon = $('.menu-icon');
var $menuIconClose = $('.menu-icon-close');
function showMenu() {
$menu.fadeIn(100);
$menuIcon.fadeOut(100);
$menuIconClose.fadeIn(100);
}
$menuIcon.click(showMenu);
function hideMenu() {
$menu.fadeOut(100);
$menuIconClose.fadeOut(100);
$menuIcon.fadeIn(100);
}
$menuIconClose.click(hideMenu);
$(window).resize(function() {
if ($(window).width() >= 840) {
$menu.fadeIn(100);
$menuIcon.fadeOut(100);
$menuIconClose.fadeOut(100);
}
else {
$menu.fadeOut(100);
$menuIcon.fadeIn(100);
$menuIconClose.fadeOut(100);
}
});
</script>
<!-- Stop page_header include -->
<div class="druid-header">
<div class="container">
<h1>Technology</h1>
<h4></h4>
</div>
</div>
<div class="container">
<div class="row">
<div class="col-md-10 col-md-offset-1">
<p>Apache Druid is an open source distributed data store.
Druid’s core design combines ideas from <a href="https://en.wikipedia.org/wiki/Data_warehouse">data warehouses</a>, <a href="https://en.wikipedia.org/wiki/Time_series_database">timeseries databases</a>, and <a href="https://en.wikipedia.org/wiki/Full-text_search">search systems</a> to create a high performance real-time analytics database for a broad range of <a href="/use-cases">use cases</a>. Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture.</p>
<div class="image-large">
<img src="img/diagram-2.png" style="max-width: 360px">
</div>
<p>Key features of Druid include:</p>
<div class="features">
<div class="feature">
<span class="fa fa-columns fa"></span>
<h5>Column-oriented storage</h5>
<p>
Druid stores and compresses each column individually, and only needs to read the ones needed for a particular query, which supports fast scans, rankings, and groupBys.
</p>
</div>
<div class="feature">
<span class="fa fa-search fa"></span>
<h5>Native search indexes</h5>
<p>
Druid creates inverted indexes for string values for fast search and filter.
</p>
</div>
<div class="feature">
<span class="fa fa-tint fa"></span>
<h5>Streaming and batch ingest</h5>
<p>
Out-of-the-box connectors for Apache Kafka, HDFS, AWS S3, stream processors, and more.
</p>
</div>
<div class="feature">
<span class="fa fa-stream fa"></span>
<h5>Flexible schemas</h5>
<p>
Druid gracefully handles evolving schemas and <a href="/docs/latest/ingestion/data-formats.html#flattenspec">nested data</a>.
</p>
</div>
<div class="feature">
<span class="fa fa-clock fa"></span>
<h5>Time-optimized partitioning</h5>
<p>
Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases.
</p>
</div>
<div class="feature">
<span class="fa fa-align-left fa"></span>
<h5>SQL support</h5>
<p>
In addition to its native <a href="/docs/latest/querying/querying">JSON based language</a>, Druid speaks <a href="/docs/latest/querying/sql">SQL</a> over either HTTP or JDBC.
</p>
</div>
<div class="feature">
<span class="fa fa-expand fa"></span>
<h5>Horizontal scalability</h5>
<p>
Druid has been <a href="druid-powered">used in production</a> to ingest millions of events/sec, retain years of data, and provide sub-second queries.
</p>
</div>
<div class="feature">
<span class="fa fa-balance-scale fa"></span>
<h5>Easy operation</h5>
<p>
Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures.
</p>
</div>
</div>
<h2 id="integration">Integration</h2>
<p>Druid is complementary to many open source data technologies in the <a href="https://www.apache.org/">Apache Software Foundation</a> including <a href="https://kafka.apache.org/">Apache Kafka</a>, <a href="https://hadoop.apache.org/">Apache Hadoop</a>, <a href="https://flink.apache.org/">Apache Flink</a>, and more.</p>
<p>Druid typically sits between a storage or processing layer and the end user, and acts as a query layer to serve analytic workloads.</p>
<div class="image-large">
<img src="img/diagram-3.png" style="max-width: 580px;">
</div>
<h2 id="ingestion">Ingestion</h2>
<p>Druid supports both streaming and batch ingestion.
Druid connects to a source of raw data, typically a message bus such as Apache Kafka (for streaming data loads), or a distributed filesystem such as HDFS (for batch data loads).</p>
<p>Druid converts raw data stored in a source to a more read-optimized format (called a Druid “segment”) in a process calling “indexing”.</p>
<div class="image-large">
<img src="img/diagram-4.png" style="max-width: 580px;">
</div>
<p>For more information, please visit <a href="/docs/latest/ingestion/index.html">our docs page</a>.</p>
<h2 id="storage">Storage</h2>
<p>Like many analytic data stores, Druid stores data in columns.
Depending on the type of column (string, number, etc), different compression and encoding methods are applied.
Druid also builds different types of indexes based on the column type.</p>
<p>Similar to search systems, Druid builds inverted indexes for string columns for fast search and filter.
Similar to timeseries databases, Druid intelligently partitions data by time to enable fast time-oriented queries.</p>
<p>Unlike many traditional systems, Druid can optionally pre-aggregate data as it is ingested.
This pre-aggregation step is known as <a href="/docs/latest/tutorials/tutorial-rollup.html">rollup</a>, and can lead to dramatic storage savings.</p>
<div class="image-large">
<img src="img/diagram-5.png" style="max-width: 800px;">
</div>
<p>For more information, please visit <a href="/docs/latest/design/segments.html">our docs page</a>.</p>
<h2 id="querying">Querying</h2>
<p>Druid supports querying data through <a href="/docs/latest/querying/querying">JSON-over-HTTP</a> and <a href="/docs/latest/querying/sql">SQL</a>.
In addition to standard SQL operators, Druid supports unique operators that leverage its suite of approximate algorithms to provide rapid counting, ranking, and quantiles.</p>
<div class="image-large">
<img src="img/diagram-6.png" style="max-width: 580px;">
</div>
<p>For more information, please visit <a href="/docs/latest/querying/querying.html">our docs page</a>.</p>
<h2 id="architecture">Architecture</h2>
<p>Druid has a microservice-based architecture can be thought of as a disassembled database.
Each core service in Druid (ingestion, querying, and coordination) can be separately or jointly deployed on commodity hardware.</p>
<p>Druid explicitly names every main service to allow the operator to fine tune each service based on the use case and workload.
For example, an operator can dedicate more resources to Druid’s ingestion service while giving less resources to Druid’s query service if the workload requires it.</p>
<p>Druid services can independently fail without impacting the operations of other services.</p>
<div class="image-large">
<img src="img/diagram-7.png" style="max-width: 800px;">
</div>
<p>For more information, please visit <a href="/docs/latest/design/index.html">our docs page</a>.</p>
<h2 id="operations">Operations</h2>
<p>Druid is designed to power applications that need to be up 24 hours a day, 7 days a week.
As such, Druid possesses several features to ensure uptime and no data loss.</p>
<div class="features">
<div class="feature">
<span class="fa fa-clone fa"></span>
<h5>Data replication</h5>
<p>
All data in Druid is replicated a configurable number of times so single server failures have no impact on queries.
</p>
</div>
<div class="feature">
<span class="fa fa-th-large fa"></span>
<h5>Independent services</h5>
<p>
Druid explicitly names all of its main services and each service can be fine tuned based on use case.
Services can independently fail without impacting other services.
For example, if the ingestion services fails, no new data is loaded in the system, but existing data remains queryable.
</p>
</div>
<div class="feature">
<span class="fa fa-cloud-download-alt fa"></span>
<h5>Automatic data backup</h5>
<p>
Druid automatically backs up all indexed data to a filesystem such as HDFS.
You can lose your entire Druid cluster and quickly restore it from this backed up data.
</p>
</div>
<div class="feature">
<span class="fa fa-sync-alt fa"></span>
<h5>Rolling updates</h5>
<p>
You can update a Druid cluster with no downtime and no impact to end users through rolling updates.
All Druid releases are backwards compatible with the previous version.
</p>
</div>
</div>
<p>For more information, please visit <a href="/docs/latest/operations/basic-cluster-tuning.html">our docs page</a>.</p>
</div>
</div>
</div>
<!-- Start page_footer include -->
<footer class="druid-footer">
<div class="container">
<div class="text-center">
<p>
<a href="/technology">Technology</a>&ensp;·&ensp;
<a href="/use-cases">Use Cases</a>&ensp;·&ensp;
<a href="/druid-powered">Powered by Druid</a>&ensp;·&ensp;
<a href="/docs/latest/">Docs</a>&ensp;·&ensp;
<a href="/community/">Community</a>&ensp;·&ensp;
<a href="/downloads.html">Download</a>&ensp;·&ensp;
<a href="/faq">FAQ</a>
</p>
</div>
<div class="text-center">
<a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a>&ensp;·&ensp;
<a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a>&ensp;·&ensp;
<a title="GitHub" href="https://github.com/apache/druid" target="_blank"><span class="fab fa-github"></span></a>
</div>
<div class="text-center license">
Copyright © 2020 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br>
Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br>
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
</div>
</div>
</footer>
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-131010415-1');
</script>
<script>
function trackDownload(type, url) {
ga('send', 'event', 'download', type, url);
}
</script>
<script src="//code.jquery.com/jquery.min.js"></script>
<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
<script src="/assets/js/druid.js"></script>
<!-- stop page_footer include -->
</body>
</html>