blob: 2bc68bb0ce0894a13a27b8f7614a758f172982f6 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Apache Druid">
<meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source">
<meta name="author" content="Apache Software Foundation">
<title>Druid | Performance FAQ</title>
<link rel="alternate" type="application/atom+xml" href="/feed">
<link rel="shortcut icon" href="/img/favicon.png">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous">
<link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1">
<link rel="stylesheet" href="/css/base.css?v=1.1">
<link rel="stylesheet" href="/css/header.css?v=1.1">
<link rel="stylesheet" href="/css/footer.css?v=1.1">
<link rel="stylesheet" href="/css/syntax.css?v=1.1">
<link rel="stylesheet" href="/css/docs.css?v=1.1">
<script>
(function() {
var cx = '000162378814775985090:molvbm0vggm';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
'//cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
</head>
<body>
<!-- Start page_header include -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
<div class="top-navigator">
<div class="container">
<div class="left-cont">
<a class="logo" href="/"><span class="druid-logo"></span></a>
</div>
<div class="right-cont">
<ul class="links">
<li class=""><a href="/technology">Technology</a></li>
<li class=""><a href="/use-cases">Use Cases</a></li>
<li class=""><a href="/druid-powered">Powered By</a></li>
<li class=""><a href="/docs/latest/design/">Docs</a></li>
<li class=""><a href="/community/">Community</a></li>
<li class="header-dropdown">
<a>Apache</a>
<div class="header-dropdown-menu">
<a href="https://www.apache.org/" target="_blank">Foundation</a>
<a href="https://www.apache.org/events/current-event" target="_blank">Events</a>
<a href="https://www.apache.org/licenses/" target="_blank">License</a>
<a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a>
<a href="https://www.apache.org/security/" target="_blank">Security</a>
<a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a>
</div>
</li>
<li class=" button-link"><a href="/downloads.html">Download</a></li>
</ul>
</div>
</div>
<div class="action-button menu-icon">
<span class="fa fa-bars"></span> MENU
</div>
<div class="action-button menu-icon-close">
<span class="fa fa-times"></span> MENU
</div>
</div>
<script type="text/javascript">
var $menu = $('.right-cont');
var $menuIcon = $('.menu-icon');
var $menuIconClose = $('.menu-icon-close');
function showMenu() {
$menu.fadeIn(100);
$menuIcon.fadeOut(100);
$menuIconClose.fadeIn(100);
}
$menuIcon.click(showMenu);
function hideMenu() {
$menu.fadeOut(100);
$menuIconClose.fadeOut(100);
$menuIcon.fadeIn(100);
}
$menuIconClose.click(hideMenu);
$(window).resize(function() {
if ($(window).width() >= 840) {
$menu.fadeIn(100);
$menuIcon.fadeOut(100);
$menuIconClose.fadeOut(100);
}
else {
$menu.fadeOut(100);
$menuIcon.fadeIn(100);
$menuIconClose.fadeOut(100);
}
});
</script>
<!-- Stop page_header include -->
<div class="container doc-container">
<p> Looking for the <a href="/docs/0.19.0/">latest stable documentation</a>?</p>
<div class="row">
<div class="col-md-9 doc-content">
<p>
<a class="btn btn-default btn-xs visible-xs-inline-block visible-sm-inline-block" href="#toc">Table of Contents</a>
</p>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
<h1 id="performance-faq">Performance FAQ</h1>
<h2 id="i-cant-match-your-benchmarked-results">I can&#39;t match your benchmarked results</h2>
<p>Improper configuration is by far the largest problem we see people trying to deploy Apache Druid (incubating). The example configurations listed in the tutorials are designed for a small volume of data where all processes are on a single machine. The configs are extremely poor for actual production use.</p>
<h2 id="what-should-i-set-my-jvm-heap">What should I set my JVM heap?</h2>
<p>The size of the JVM heap really depends on the type of Druid process you are running. Below are a few considerations.</p>
<p><a href="../design/broker.html">Broker processes</a> uses the JVM heap mainly to merge results from Historicals and real-times. Brokers also use off-heap memory and processing threads for groupBy queries. We recommend 20G-30G of heap here.</p>
<p><a href="../design/historical.html">Historical processes</a> use off-heap memory to store intermediate results, and by default, all segments are memory mapped before they can be queried. Typically, the more memory is available on a Historical process, the more segments can be served without the possibility of data being paged on to disk. On Historicals, the JVM heap is used for <a href="../querying/groupbyquery.html">GroupBy queries</a>, some data structures used for intermediate computation, and general processing. One way to calculate how much space there is for segments is: memory_for_segments = total_memory - heap - direct_memory - jvm_overhead. Note that total_memory here refers to the memory available to the cgroup (if running on Linux), which for default cases is going to be all the system memory.</p>
<p>We recommend 250mb * (processing.numThreads) for the heap.</p>
<p><a href="../design/coordinator.html">Coordinator processes</a> do not require off-heap memory and the heap is used for loading information about all segments to determine what segments need to be loaded, dropped, moved, or replicated.</p>
<h2 id="how-much-direct-memory-does-druid-use">How much direct memory does Druid use?</h2>
<p>Any Druid process that handles queries (Brokers, Peons, and Historicals) uses two kinds of direct memory buffers with configurable size: processing buffers and merge buffers.</p>
<p>Each processing thread is allocated one processing buffer. Additionally, there is a shared pool of merge buffers (only used for GroupBy V2 queries currently).</p>
<p>Other sources of direct memory usage include:
- When a column is loaded for reading, a 64KB direct buffer is allocated for decompression.
- When a set of segments are merged during ingestion, a direct buffer is allocated for every String typed column, for every segment in the set to be merged. The size of these buffers are equal to the cardinality of the String column within its segment, times 4 bytes (the buffers store integers). For example, if two segments are being merged, the first segment having a single String column with cardinality 1000, and the second segment having a String column with cardinality 500, the merge step would allocate (1000 + 500) * 4 = 6000 bytes of direct memory. These buffers are used for merging the value dictionaries of the String column across segments. These &quot;dictionary merging buffers&quot; are independent of the &quot;merge buffers&quot; configured by <code>druid.processing.numMergeBuffers</code>.</p>
<p>A useful formula for estimating direct memory usage follows:</p>
<p><code>druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1)</code></p>
<p>The <code>+1</code> is a fuzzy parameter meant to account for the decompression and dictionary merging buffers and may need to be adjusted based on the characteristics of the data being ingested/queried.
Operators can ensure at least this amount of direct memory is available by providing <code>-XX:MaxDirectMemorySize=&lt;VALUE&gt;</code> at the command line.</p>
<h2 id="what-is-the-intermediate-computation-buffer">What is the intermediate computation buffer?</h2>
<p>The intermediate computation buffer specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. The default size is 1073741824 bytes (1GB).</p>
<h2 id="what-is-server-maxsize">What is server maxSize?</h2>
<p>Server maxSize sets the maximum cumulative segment size (in bytes) that a process can hold. Changing this parameter will affect performance by controlling the memory/disk ratio on a process. Setting this parameter to a value greater than the total memory capacity on a process and may cause disk paging to occur. This paging time introduces a query latency delay.</p>
<h2 id="my-logs-are-really-chatty-can-i-set-them-to-asynchronously-write">My logs are really chatty, can I set them to asynchronously write?</h2>
<p>Yes, using a <code>log4j2.xml</code> similar to the following causes some of the more chatty classes to write asynchronously:</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt;
&lt;Configuration status=&quot;WARN&quot;&gt;
&lt;Appenders&gt;
&lt;Console name=&quot;Console&quot; target=&quot;SYSTEM_OUT&quot;&gt;
&lt;PatternLayout pattern=&quot;%d{ISO8601} %p [%t] %c - %m%n&quot;/&gt;
&lt;/Console&gt;
&lt;/Appenders&gt;
&lt;Loggers&gt;
&lt;AsyncLogger name=&quot;org.apache.druid.curator.inventory.CuratorInventoryManager&quot; level=&quot;debug&quot; additivity=&quot;false&quot;&gt;
&lt;AppenderRef ref=&quot;Console&quot;/&gt;
&lt;/AsyncLogger&gt;
&lt;AsyncLogger name=&quot;org.apache.druid.client.BatchServerInventoryView&quot; level=&quot;debug&quot; additivity=&quot;false&quot;&gt;
&lt;AppenderRef ref=&quot;Console&quot;/&gt;
&lt;/AsyncLogger&gt;
&lt;!-- Make extra sure nobody adds logs in a bad way that can hurt performance --&gt;
&lt;AsyncLogger name=&quot;org.apache.druid.client.ServerInventoryView&quot; level=&quot;debug&quot; additivity=&quot;false&quot;&gt;
&lt;AppenderRef ref=&quot;Console&quot;/&gt;
&lt;/AsyncLogger&gt;
&lt;AsyncLogger name =&quot;org.apache.druid.java.util.http.client.pool.ChannelResourceFactory&quot; level=&quot;info&quot; additivity=&quot;false&quot;&gt;
&lt;AppenderRef ref=&quot;Console&quot;/&gt;
&lt;/AsyncLogger&gt;
&lt;Root level=&quot;info&quot;&gt;
&lt;AppenderRef ref=&quot;Console&quot;/&gt;
&lt;/Root&gt;
&lt;/Loggers&gt;
&lt;/Configuration&gt;
</code></pre></div>
</div>
<div class="col-md-3">
<div class="searchbox">
<gcse:searchbox-only></gcse:searchbox-only>
</div>
<div id="toc" class="nav toc hidden-print">
</div>
</div>
</div>
</div>
<!-- Start page_footer include -->
<footer class="druid-footer">
<div class="container">
<div class="text-center">
<p>
<a href="/technology">Technology</a>&ensp;·&ensp;
<a href="/use-cases">Use Cases</a>&ensp;·&ensp;
<a href="/druid-powered">Powered by Druid</a>&ensp;·&ensp;
<a href="/docs/latest/">Docs</a>&ensp;·&ensp;
<a href="/community/">Community</a>&ensp;·&ensp;
<a href="/downloads.html">Download</a>&ensp;·&ensp;
<a href="/faq">FAQ</a>
</p>
</div>
<div class="text-center">
<a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a>&ensp;·&ensp;
<a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a>&ensp;·&ensp;
<a title="GitHub" href="https://github.com/apache/druid" target="_blank"><span class="fab fa-github"></span></a>
</div>
<div class="text-center license">
Copyright © 2020 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br>
Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br>
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
</div>
</div>
</footer>
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-131010415-1');
</script>
<script>
function trackDownload(type, url) {
ga('send', 'event', 'download', type, url);
}
</script>
<script src="//code.jquery.com/jquery.min.js"></script>
<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
<script src="/assets/js/druid.js"></script>
<!-- stop page_footer include -->
<script>
$(function() {
$(".toc").load("/docs/0.14.2-incubating/toc.html");
// There is no way to tell when .gsc-input will be async loaded into the page so just try to set a placeholder until it works
var tries = 0;
var timer = setInterval(function() {
tries++;
if (tries > 300) clearInterval(timer);
var searchInput = $('input.gsc-input');
if (searchInput.length) {
searchInput.attr('placeholder', 'Search');
clearInterval(timer);
}
}, 100);
});
</script>
</body>
</html>