| --- |
| title: "Compaction" |
| weight: 7 |
| type: docs |
| aliases: |
| - /primary-key-table/compaction.html |
| --- |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Compaction |
| |
| When more and more records are written into the LSM tree, the number of sorted runs will increase. Because querying an |
| LSM tree requires all sorted runs to be combined, too many sorted runs will result in a poor query performance, or even |
| out of memory. |
| |
| To limit the number of sorted runs, we have to merge several sorted runs into one big sorted run once in a while. This |
| procedure is called compaction. |
| |
| However, compaction is a resource intensive procedure which consumes a certain amount of CPU time and disk IO, so too |
| frequent compaction may in turn result in slower writes. It is a trade-off between query and write performance. Paimon |
| currently adopts a compaction strategy similar to Rocksdb's [universal compaction](https://github.com/facebook/rocksdb/wiki/Universal-Compaction). |
| |
| Compaction solves: |
| |
| 1. Reduce Level 0 files to avoid poor query performance. |
| 2. Produce changelog via [changelog-producer]({{< ref "primary-key-table/changelog-producer" >}}). |
| 3. Produce deletion vectors for [MOW mode]({{< ref "primary-key-table/table-mode#merge-on-write" >}}). |
| 4. Snapshot Expiration, Tag Expiration, Partitions Expiration. |
| |
| Limitation: |
| |
| - There can only be one job working on the same partition's compaction, otherwise it will cause conflicts and one side will throw an exception failure. |
| |
| Writing performance is almost always affected by compaction, so its tuning is crucial. |
| |
| ## Asynchronous Compaction |
| |
| Compaction is inherently asynchronous, but if you want it to be completely asynchronous without blocking writes, |
| expecting a mode for maximum writing throughput, the compaction can be done slowly and not in a hurry. |
| You can use the following strategies for your table: |
| |
| ```shell |
| num-sorted-run.stop-trigger = 2147483647 |
| sort-spill-threshold = 10 |
| lookup-wait = false |
| ``` |
| |
| This configuration will generate more files during peak write periods and gradually merge them for optimal read |
| performance during low write periods. |
| |
| ## Dedicated compaction job |
| |
| In general, if you expect multiple jobs to be written to the same table, you need to separate the compaction. You can |
| use [dedicated compaction job]({{< ref "maintenance/dedicated-compaction#dedicated-compaction-job" >}}). |
| |
| ## Record-Level expire |
| |
| In compaction, you can configure record-Level expire time to expire records, you should configure: |
| |
| 1. `'record-level.expire-time'`: time retain for records. |
| 2. `'record-level.time-field'`: time field for record level expire. |
| |
| Expiration happens in compaction, and there is no strong guarantee to expire records in time. |
| You can trigger a full compaction manually to expire records which were not expired in time. |
| |
| ## Full Compaction |
| |
| Paimon Compaction uses [Universal-Compaction](https://github.com/facebook/rocksdb/wiki/Universal-Compaction). |
| By default, when there is too much incremental data, Full Compaction will be automatically performed. You don't usually |
| have to worry about it. |
| |
| Paimon also provides a configuration that allows for regular execution of Full Compaction. |
| |
| 1. 'compaction.optimization-interval': Implying how often to perform an optimization full compaction, this |
| configuration is used to ensure the query timeliness of the read-optimized system table. |
| 2. 'full-compaction.delta-commits': Full compaction will be constantly triggered after delta commits. Its disadvantage |
| is that it can only perform compaction synchronously, which will affect writing efficiency. |
| |
| ## Lookup Compaction |
| |
| When primary key table is configured with `lookup` [changelog producer]({{< ref "primary-key-table/changelog-producer" >}}) |
| or `first-row` [merge-engine]({{< ref "primary-key-table/merge-engine" >}}) |
| or has enabled `deletion vectors` for [MOW mode]({{< ref "primary-key-table/table-mode#merge-on-write" >}}), Paimon will |
| use a radical compaction strategy to force compacting level 0 files to higher levels for every compaction trigger. |
| |
| Paimon also provides configurations to optimize the frequency of this compaction. |
| |
| 1. 'lookup-compact': compact mode used for lookup compaction. Possible values: `radical`, will use |
| `ForceUpLevel0Compaction` strategy to radically compact new files; `gentle`, will use `UniversalCompaction` strategy |
| to gently compact new files; |
| 2. 'lookup-compact.max-interval': The max interval for a forced L0 lookup compaction to be triggered in `gentle` mode. |
| This option is only valid when `lookup-compact` mode is `gentle`. |
| |
| By configuring 'lookup-compact' as `gentle`, new files in L0 will not be compacted immediately, this may greatly |
| reduce the overall resource usage at the expense of worse data freshness in certain cases. |
| |
| ## Compaction Options |
| |
| ### Number of Sorted Runs to Pause Writing |
| |
| When the number of sorted runs is small, Paimon writers will perform compaction asynchronously in separated threads, so |
| records can be continuously written into the table. However, to avoid unbounded growth of sorted runs, writers will |
| pause writing when the number of sorted runs hits the threshold. The following table property determines |
| the threshold. |
| |
| <table class="table table-bordered"> |
| <thead> |
| <tr> |
| <th class="text-left" style="width: 20%">Option</th> |
| <th class="text-left" style="width: 5%">Required</th> |
| <th class="text-left" style="width: 5%">Default</th> |
| <th class="text-left" style="width: 10%">Type</th> |
| <th class="text-left" style="width: 60%">Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><h5>num-sorted-run.stop-trigger</h5></td> |
| <td>No</td> |
| <td style="word-wrap: break-word;">(none)</td> |
| <td>Integer</td> |
| <td>The number of sorted runs that trigger the stopping of writes, the default value is 'num-sorted-run.compaction-trigger' + 3.</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| Write stalls will become less frequent when `num-sorted-run.stop-trigger` becomes larger, thus improving writing |
| performance. However, if this value becomes too large, more memory and CPU time will be needed when querying the |
| table. If you are concerned about the OOM problem, please configure the following option. |
| Its value depends on your memory size. |
| |
| <table class="table table-bordered"> |
| <thead> |
| <tr> |
| <th class="text-left" style="width: 20%">Option</th> |
| <th class="text-left" style="width: 5%">Required</th> |
| <th class="text-left" style="width: 5%">Default</th> |
| <th class="text-left" style="width: 10%">Type</th> |
| <th class="text-left" style="width: 60%">Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><h5>sort-spill-threshold</h5></td> |
| <td>No</td> |
| <td style="word-wrap: break-word;">(none)</td> |
| <td>Integer</td> |
| <td>If the maximum number of sort readers exceeds this value, a spill will be attempted. This prevents too many readers from consuming too much memory and causing OOM.</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| ### Number of Sorted Runs to Trigger Compaction |
| |
| Paimon uses [LSM tree]({{< ref "primary-key-table/overview#lsm-trees" >}}) which supports a large number of updates. LSM organizes files in several [sorted runs]({{< ref "primary-key-table/overview#sorted-runs" >}}). When querying records from an LSM tree, all sorted runs must be combined to produce a complete view of all records. |
| |
| One can easily see that too many sorted runs will result in poor query performance. To keep the number of sorted runs in a reasonable range, Paimon writers will automatically perform [compactions]({{< ref "primary-key-table/compaction" >}}). The following table property determines the minimum number of sorted runs to trigger a compaction. |
| |
| <table class="table table-bordered"> |
| <thead> |
| <tr> |
| <th class="text-left" style="width: 20%">Option</th> |
| <th class="text-left" style="width: 5%">Required</th> |
| <th class="text-left" style="width: 5%">Default</th> |
| <th class="text-left" style="width: 10%">Type</th> |
| <th class="text-left" style="width: 60%">Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><h5>num-sorted-run.compaction-trigger</h5></td> |
| <td>No</td> |
| <td style="word-wrap: break-word;">5</td> |
| <td>Integer</td> |
| <td>The sorted run number to trigger compaction. Includes level0 files (one file one sorted run) and high-level runs (one level one sorted run).</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| Compaction will become less frequent when `num-sorted-run.compaction-trigger` becomes larger, thus improving writing performance. However, if this value becomes too large, more memory and CPU time will be needed when querying the table. This is a trade-off between writing and query performance. |