blob: 85ea3c77e83749c4d3d29ce94f44bd7bc010a556 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
= Apache Kudu Background Maintenance Tasks
:author: Kudu Team
:imagesdir: ./images
:icons: font
:toc: left
:toclevels: 3
:doctype: book
:backend: html5
Kudu relies on running background tasks for many important automatic
maintenance activities. These tasks include flushing data from memory to disk,
compacting data to improve performance, freeing up disk space, and more.
== Maintenance manager
The maintenance manager schedules and runs background tasks. At any given point
in time, the maintenance manager is prioritizing the next task based on the
improvement needed at that moment, such as relieving memory pressure, improving
read performance, or freeing up disk space. The number of worker threads
dedicated to running background tasks can be controlled by setting
== Flushing data to disk
Flushing data from memory to disk relieves memory pressure and can improve read
performance by switching from a write-optimized, row-oriented in-memory format
in the `MemRowSet` to a read-optimized, column-oriented format on disk.
Background tasks that flush data include `FlushMRSOp` and
The metrics associated with these ops have the prefix `flush_mrs` and
`flush_dms`, respectively.
== Compacting on-disk data
Kudu constantly performs several types of compaction tasks in order to maintain
consistent read and write performance over time. A merging compaction, which combines
multiple `DiskRowSets` together into a single `DiskRowSet`, is run by
`CompactRowSetsOp`. There are two types of delta store compaction operations
that may be run as well: `MinorDeltaCompactionOp` and `MajorDeltaCompactionOp`.
For more information on what these different types of compaction operations do,
please see the
link:[Kudu Tablet
design document].
The metrics associated with these tasks have the prefix `compact_rs`,
`delta_minor_compact_rs`, and `delta_major_compact_rs`, respectively.
== Write-ahead log GC
Kudu maintains a write-ahead log (WAL) per tablet that is split into discrete
fixed-size segments. A tablet periodically rolls the WAL to a new log segment
when the active segment reaches a configured size (controlled by
`--log_segment_size_mb`). In order to save disk space and decrease startup
time, a background task called `LogGCOp` attempts to garbage-collect (GC) old
WAL segments by deleting them from disk once it is determined that they are no
longer needed by the local node for durability.
The metrics associated with this background task have the prefix `log_gc`.
== Tablet history GC and the ancient history mark
Because Kudu uses a multiversion concurrency control (MVCC) mechanism to
ensure that snapshot scans can proceeed isolated from new changes to a table,
periodically old historical data should be garbage-collected (removed) to free
up disk space. While Kudu never removes rows or data that are visible in the
latest version of the data, Kudu does remove records of old changes that are no
longer visible.
The point in time in the past beyond which historical MVCC data becomes
inaccessible and is free to be deleted is called the _ancient history mark_
(AHM). The AHM can be configured by setting `--tablet_history_max_age_sec`.
There are two background tasks that GC historical MVCC data older than the AHM:
the one that runs the merging compaction, called `CompactRowSetsOp` (see
above), and a separate background task that deletes old undo delta blocks,
called `UndoDeltaBlockGCOp`. Running `UndoDeltaBlockGCOp` reduces disk space
usage in all workloads, but particularly in those with a higher volume of
updates or upserts.
The metrics associated with this background task have the prefix