blob: 6d568aca208c2f9eb1c62cbe266725457cce24cd [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Apache Kudu Background Maintenance Tasks</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="logo" href="/"><img
src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
alt="Apache Kudu"/></a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
</li>
<li >
<a href="/overview.html">Overview</a>
</li>
<li class="active">
<a href="/docs/">Documentation</a>
</li>
<li >
<a href="/releases/">Releases</a>
</li>
<li >
<a href="/blog/">Blog</a>
</li>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<li><a href="/ecosystem.html">Ecosystem</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
<li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
</ul>
</li>
<li >
<a href="/faq.html">FAQ</a>
</li>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
</nav>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<div class="container">
<div class="row">
<div class="col-md-9">
<h1>Apache Kudu Background Maintenance Tasks</h1>
<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Kudu relies on running background tasks for many important automatic
maintenance activities. These tasks include flushing data from memory to disk,
compacting data to improve performance, freeing up disk space, and more.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_maintenance_manager"><a class="link" href="#_maintenance_manager">Maintenance manager</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>The maintenance manager schedules and runs background tasks. At any given point
in time, the maintenance manager is prioritizing the next task based on the
improvement needed at that moment, such as relieving memory pressure, improving
read performance, or freeing up disk space. The number of worker threads
dedicated to running background tasks can be controlled by setting
<code>--maintenance_manager_num_threads</code>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_flushing_data_to_disk"><a class="link" href="#_flushing_data_to_disk">Flushing data to disk</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Flushing data from memory to disk relieves memory pressure and can improve read
performance by switching from a write-optimized, row-oriented in-memory format
in the <code>MemRowSet</code> to a read-optimized, column-oriented format on disk.
Background tasks that flush data include <code>FlushMRSOp</code> and
<code>FlushDeltaMemStoresOp</code>.</p>
</div>
<div class="paragraph">
<p>The metrics associated with these ops have the prefix <code>flush_mrs</code> and
<code>flush_dms</code>, respectively.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_compacting_on_disk_data"><a class="link" href="#_compacting_on_disk_data">Compacting on-disk data</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Kudu constantly performs several types of compaction tasks in order to maintain
consistent read and write performance over time. A merging compaction, which combines
multiple <code>DiskRowSets</code> together into a single <code>DiskRowSet</code>, is run by
<code>CompactRowSetsOp</code>. There are two types of delta store compaction operations
that may be run as well: <code>MinorDeltaCompactionOp</code> and <code>MajorDeltaCompactionOp</code>.</p>
</div>
<div class="paragraph">
<p>For more information on what these different types of compaction operations do,
please see the
<a href="https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md">Kudu Tablet
design document</a>.</p>
</div>
<div class="paragraph">
<p>The metrics associated with these tasks have the prefix <code>compact_rs</code>,
<code>delta_minor_compact_rs</code>, and <code>delta_major_compact_rs</code>, respectively.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_write_ahead_log_gc"><a class="link" href="#_write_ahead_log_gc">Write-ahead log GC</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Kudu maintains a write-ahead log (WAL) per tablet that is split into discrete
fixed-size segments. A tablet periodically rolls the WAL to a new log segment
when the active segment reaches a configured size (controlled by
<code>--log_segment_size_mb</code>). In order to save disk space and decrease startup
time, a background task called <code>LogGCOp</code> attempts to garbage-collect (GC) old
WAL segments by deleting them from disk once it is determined that they are no
longer needed by the local node for durability.</p>
</div>
<div class="paragraph">
<p>The metrics associated with this background task have the prefix <code>log_gc</code>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_tablet_history_gc_and_the_ancient_history_mark"><a class="link" href="#_tablet_history_gc_and_the_ancient_history_mark">Tablet history GC and the ancient history mark</a></h2>
<div class="sectionbody">
<div class="paragraph">
<p>Because Kudu uses a multiversion concurrency control (MVCC) mechanism to
ensure that snapshot scans can proceeed isolated from new changes to a table,
periodically old historical data should be garbage-collected (removed) to free
up disk space. While Kudu never removes rows or data that are visible in the
latest version of the data, Kudu does remove records of old changes that are no
longer visible.</p>
</div>
<div class="paragraph">
<p>The point in time in the past beyond which historical MVCC data becomes
inaccessible and is free to be deleted is called the <em>ancient history mark</em>
(AHM). The AHM can be configured by setting <code>--tablet_history_max_age_sec</code>.</p>
</div>
<div class="paragraph">
<p>There are two background tasks that GC historical MVCC data older than the AHM:
the one that runs the merging compaction, called <code>CompactRowSetsOp</code> (see
above), and a separate background task that deletes old undo delta blocks,
called <code>UndoDeltaBlockGCOp</code>. Running <code>UndoDeltaBlockGCOp</code> reduces disk space
usage in all workloads, but particularly in those with a higher volume of
updates or upserts.</p>
</div>
<div class="paragraph">
<p>The metrics associated with this background task have the prefix
<code>undo_delta_block</code>.</p>
</div>
</div>
</div>
</div>
<div class="col-md-3">
<div id="toc" data-spy="affix" data-offset-top="70">
<ul>
<li>
<a href="index.html">Introducing Kudu</a>
</li>
<li>
<a href="release_notes.html">Kudu Release Notes</a>
</li>
<li>
<a href="quickstart.html">Getting Started with Kudu</a>
</li>
<li>
<a href="installation.html">Installation Guide</a>
</li>
<li>
<a href="configuration.html">Configuring Kudu</a>
</li>
<li>
<a href="kudu_impala_integration.html">Using Impala with Kudu</a>
</li>
<li>
<a href="administration.html">Administering Kudu</a>
</li>
<li>
<a href="troubleshooting.html">Troubleshooting Kudu</a>
</li>
<li>
<a href="developing.html">Developing Applications with Kudu</a>
</li>
<li>
<a href="schema_design.html">Kudu Schema Design</a>
</li>
<li>
<a href="transaction_semantics.html">Kudu Transaction Semantics</a>
</li>
<li>
<span class="active-toc">Background Maintenance Tasks</span>
<ul class="sectlevel1">
<li><a href="#_maintenance_manager">Maintenance manager</a></li>
<li><a href="#_flushing_data_to_disk">Flushing data to disk</a></li>
<li><a href="#_compacting_on_disk_data">Compacting on-disk data</a></li>
<li><a href="#_write_ahead_log_gc">Write-ahead log GC</a></li>
<li><a href="#_tablet_history_gc_and_the_ancient_history_mark">Tablet history GC and the ancient history mark</a></li>
</ul>
</li>
<li>
<a href="configuration_reference.html">Kudu Configuration Reference</a>
</li>
<li>
<a href="known_issues.html">Known Issues and Limitations</a>
</li>
<li>
<a href="contributing.html">Contributing to Kudu</a>
</li>
<li>
<a href="export_control.html">Export Control Notice</a>
</li>
</ul>
</div>
</div>
</div>
</div>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p class="small">
Copyright &copy; 2020 The Apache Software Foundation. Last updated 2017-04-12 15:28:02 PDT
</p>
<p class="small">
Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
project logo are either registered trademarks or trademarks of The
Apache Software Foundation in the United States and other countries.
</p>
</div>
<div class="col-md-3">
<a class="pull-right" href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png"/>
</a>
</div>
</div>
</footer>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
$(document.documentElement).addClass("touch");
} else {
$(document.documentElement).addClass("no-touch");
}
});
</script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
crossorigin="anonymous"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
<script>
anchors.options = {
placement: 'right',
visible: 'touch',
};
anchors.add();
</script>
</body>
</html>