| --- |
| title: Apache Kudu Scaling Guide |
| layout: default |
| active_nav: docs |
| last_updated: 'Last updated 2024-10-28 17:38:51 -0700' |
| --- |
| <!-- |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| |
| <div class="container"> |
| <div class="row"> |
| <div class="col-md-9"> |
| |
| <h1>Apache Kudu Scaling Guide</h1> |
| <div id="preamble"> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>This document describes in detail how Kudu scales with respect to various system resources, |
| including memory, file descriptors, and threads. See the |
| <a href="known_issues.html#_scale">scaling limits</a> for the maximum recommended parameters of a Kudu |
| cluster. They can be used to estimate roughly the number of servers required for a given quantity |
| of data.</p> |
| </div> |
| <div class="admonitionblock warning"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-warning" title="Warning"></i> |
| </td> |
| <td class="content"> |
| The recommendations and conclusions here are only approximations. Appropriate numbers |
| depend on use case. There is no substitute for measurement and monitoring of resources used during a |
| representative workload. |
| </td> |
| </tr> |
| </table> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_terms"><a class="link" href="#_terms">Terms</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>We will use the following terms:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><strong>hot replica</strong>: A tablet replica that is continuously receiving writes. For example, in a time |
| series use case, tablet replicas for the most recent range partition on a time column would be |
| continuously receiving the latest data, and would be hot replicas.</p> |
| </li> |
| <li> |
| <p><strong>cold replica</strong>: A tablet replica that is not hot, i.e. a replica that is not frequently receiving |
| writes, for example, once every few minutes. A cold replica may be read from. For example, in a time |
| series use case, tablet replicas for previous range partitions on a time column would not receive |
| writes at all, or only occasionally receive late updates or additions, but may be constantly read.</p> |
| </li> |
| <li> |
| <p><strong>data on disk</strong>: The total amount of data stored on a tablet server across all disks, |
| post-replication, post-compression, and post-encoding.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_example_workload"><a class="link" href="#_example_workload">Example Workload</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The sections below perform sample calculations using the following parameters:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>200 hot replicas per tablet server</p> |
| </li> |
| <li> |
| <p>1600 cold replicas per tablet server</p> |
| </li> |
| <li> |
| <p>8TB of data on disk per tablet server (about 4.5GB/replica)</p> |
| </li> |
| <li> |
| <p>512MB block cache</p> |
| </li> |
| <li> |
| <p>40 cores per server</p> |
| </li> |
| <li> |
| <p>limit of 32000 file descriptors per server</p> |
| </li> |
| <li> |
| <p>a read workload with 1 frequently-scanned table with 40 columns</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>This workload resembles a time series use case, where the hot replicas correspond to the most recent |
| range partition on time.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="memory"><a class="link" href="#memory">Memory</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The flag <code>--memory_limit_hard_bytes</code> determines the maximum amount of memory that a Kudu tablet |
| server may use. The amount of memory used by a tablet server scales with data size, write workload, |
| and read concurrency. The following table provides numbers that can be used to compute a rough |
| estimate of memory usage.</p> |
| </div> |
| <table class="tableblock frame-all grid-all stretch"> |
| <caption class="title">Table 1. Tablet Server Memory Usage</caption> |
| <colgroup> |
| <col style="width: 33.3333%;"> |
| <col style="width: 33.3333%;"> |
| <col style="width: 33.3334%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Type</th> |
| <th class="tableblock halign-left valign-top">Multiplier</th> |
| <th class="tableblock halign-left valign-top">Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Memory required per TB of data on disk</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1.5GB per 1TB data on disk</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory per unit of data on disk required for |
| basic operation of the tablet server.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Hot Replicas' MemRowSets and DeltaMemStores</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">minimum 128MB per hot replica</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Minimum amount of data |
| to flush per MemRowSet flush. For most use cases, updates should be rare compared to inserts, so the |
| DeltaMemStores should be very small.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Scans</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">256KB per column per core for read-heavy tables</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory used by scanners, and which |
| will be constantly needed for tables which are constantly read.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_cache_capacity_mb</code> (default 512MB)</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory reserved for use by the |
| block cache.</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>Using this information for the example load gives the following breakdown of memory usage:</p> |
| </div> |
| <table class="tableblock frame-all grid-all stretch"> |
| <caption class="title">Table 2. Example Tablet Server Memory Usage</caption> |
| <colgroup> |
| <col style="width: 50%;"> |
| <col style="width: 50%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Type</th> |
| <th class="tableblock halign-left valign-top">Amount</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">8TB data on disk</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">8TB * 1.5GB / 1TB = 12GB</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">200 * 128MB = 25.6GB</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1 40-column, frequently-scanned table</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">40 * 40 * 256KB = 409.6MB</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>--block_cache_capacity_mb=512</code> = 512MB</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Expected memory usage</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">38.5GB</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Recommended hard limit</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">52GB</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>Using this as a rough estimate of Kudu’s memory usage, select a memory limit so that the expected |
| memory usage of Kudu is around 50-75% of the hard limit.</p> |
| </div> |
| <div class="sect2"> |
| <h3 id="_verifying_if_a_memory_limit_is_sufficient"><a class="link" href="#_verifying_if_a_memory_limit_is_sufficient">Verifying if a Memory Limit is sufficient</a></h3> |
| <div class="paragraph"> |
| <p>After configuring an appropriate memory limit with <code>--memory_limit_hard_bytes</code>, run a workload and |
| monitor the Kudu tablet server process’s RAM usage. The memory usage should stay around 50-75% of |
| the hard limit, with occasional spikes above 75% but below 100%. If the tablet server runs above 75% |
| consistently, the memory limit should be increased.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Additionally, it’s also useful to monitor the logs for memory rejections, which look like:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre>Service unavailable: Soft memory limit exceeded (at 96.35% of capacity)</pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>and watch the memory rejections metrics:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><code>leader_memory_pressure_rejections</code></p> |
| </li> |
| <li> |
| <p><code>follower_memory_pressure_rejections</code></p> |
| </li> |
| <li> |
| <p><code>transaction_memory_pressure_rejections</code></p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>Occasional rejections due to memory pressure are fine and act as backpressure to clients. Clients |
| will transparently retry operations. However, no operations should time out.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="file_descriptors"><a class="link" href="#file_descriptors">File Descriptors</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Processes are allotted a maximum number of open file descriptors (also referred to as fds). If a |
| tablet server attempts to open too many fds, it will typically crash with a message saying something |
| like "too many open files". The following table summarizes the sources of file descriptor usage in a |
| Kudu tablet server process:</p> |
| </div> |
| <table class="tableblock frame-all grid-all stretch"> |
| <caption class="title">Table 3. Tablet Server File Descriptor Usage</caption> |
| <colgroup> |
| <col style="width: 33.3333%;"> |
| <col style="width: 33.3333%;"> |
| <col style="width: 33.3334%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Type</th> |
| <th class="tableblock halign-left valign-top">Multiplier</th> |
| <th class="tableblock halign-left valign-top">Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">File cache</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_manager_max_open_files</code> (default 40% of process maximum)</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Maximum allowed open fds reserved for use by |
| the file cache.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Hot replicas</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">2 per WAL segment, 1 per WAL index</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used by hot replicas. See below |
| for more explanation.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Cold replicas</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">3 per cold replica</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used per cold replica: 2 for the single WAL |
| segment and 1 for the single WAL index.</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>Every replica has at least one WAL segment and at least one WAL index, and should have the same |
| number of segments and indices; however, the number of segments and indices can be greater for a |
| replica if one of its peer replicas is falling behind. WAL segment and index fds are closed as WALs |
| are garbage collected.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Using this information for the example load gives the following breakdown of file descriptor usage, |
| under the assumption that some replicas are lagging and using 10 WAL segments:</p> |
| </div> |
| <table class="tableblock frame-all grid-all stretch"> |
| <caption class="title">Table 4. Example Tablet Server File Descriptor Usage</caption> |
| <colgroup> |
| <col style="width: 50%;"> |
| <col style="width: 50%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Type</th> |
| <th class="tableblock halign-left valign-top">Amount</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">file cache</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">40% * 32000 fds = 12800 fds</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas * 3 fds / cold replica = 4800 fds</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">(2 / segment * 10 segments/hot replica * 200 hot replicas) + (1 / index * 10 indices / hot replica * 200 hot replicas) = 6000 fds</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Total</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">23600 fds</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>So for this example, the tablet server process has about 32000 - 23600 = 8400 fds to spare.</p> |
| </div> |
| <div class="paragraph"> |
| <p>There is typically no downside to configuring a higher file descriptor limit if approaching the |
| currently configured limit.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="threads"><a class="link" href="#threads">Threads</a></h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Processes are allotted a maximum number of threads by the operating system, and this limit is |
| typically difficult or impossible to change. Therefore, this section is more informational than |
| advisory.</p> |
| </div> |
| <div class="paragraph"> |
| <p>If a Kudu tablet server’s thread count exceeds the OS limit, it will crash, usually with a message |
| in the logs like "pthread_create failed: Resource temporarily unavailable". If the system thread |
| count limit is exceeded, other processes on the same node may also crash.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Threads and threadpools are used all over Kudu for various purposes, but the number of threads found |
| in nearly all of these does not scale with load or data/tablet size; instead, the number of threads |
| is either a hardcoded constant, a constant defined by a configuration parameter, or based on a |
| static dimension (such as the number of CPU cores).</p> |
| </div> |
| <div class="paragraph"> |
| <p>The only exception to this is the WAL append thread, one of which exists for every "hot" replica. |
| Note that all replicas may be considered hot at startup, so tablet servers' thread usage will |
| generally peak when started and settle down thereafter.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| |
| <div id="toc" data-spy="affix" data-offset-top="70"> |
| <ul> |
| |
| <li> |
| |
| <a href="index.html">Introducing Kudu</a> |
| </li> |
| <li> |
| |
| <a href="release_notes.html">Kudu Release Notes</a> |
| </li> |
| <li> |
| |
| <a href="quickstart.html">Quickstart Guide</a> |
| </li> |
| <li> |
| |
| <a href="installation.html">Installation Guide</a> |
| </li> |
| <li> |
| |
| <a href="configuration.html">Configuring Kudu</a> |
| </li> |
| <li> |
| |
| <a href="hive_metastore.html">Using the Hive Metastore with Kudu</a> |
| </li> |
| <li> |
| |
| <a href="kudu_impala_integration.html">Using Impala with Kudu</a> |
| </li> |
| <li> |
| |
| <a href="administration.html">Administering Kudu</a> |
| </li> |
| <li> |
| |
| <a href="troubleshooting.html">Troubleshooting Kudu</a> |
| </li> |
| <li> |
| |
| <a href="developing.html">Developing Applications with Kudu</a> |
| </li> |
| <li> |
| |
| <a href="schema_design.html">Kudu Schema Design</a> |
| </li> |
| <li> |
| <span class="active-toc">Kudu Scaling Guide</span> |
| <ul class="sectlevel1"> |
| <li><a href="#_terms">Terms</a></li> |
| <li><a href="#_example_workload">Example Workload</a></li> |
| <li><a href="#memory">Memory</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_verifying_if_a_memory_limit_is_sufficient">Verifying if a Memory Limit is sufficient</a></li> |
| </ul> |
| </li> |
| <li><a href="#file_descriptors">File Descriptors</a></li> |
| <li><a href="#threads">Threads</a></li> |
| </ul> |
| </li> |
| <li> |
| |
| <a href="security.html">Kudu Security</a> |
| </li> |
| <li> |
| |
| <a href="transaction_semantics.html">Kudu Transaction Semantics</a> |
| </li> |
| <li> |
| |
| <a href="background_tasks.html">Background Maintenance Tasks</a> |
| </li> |
| <li> |
| |
| <a href="configuration_reference.html">Kudu Configuration Reference</a> |
| </li> |
| <li> |
| |
| <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> |
| </li> |
| <li> |
| |
| <a href="metrics_reference.html">Kudu Metrics Reference</a> |
| </li> |
| <li> |
| |
| <a href="known_issues.html">Known Issues and Limitations</a> |
| </li> |
| <li> |
| |
| <a href="contributing.html">Contributing to Kudu</a> |
| </li> |
| <li> |
| |
| <a href="export_control.html">Export Control Notice</a> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </div> |