| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <title>Pegasus | Pegasus 的 last_flushed_decree</title> |
| <link rel="stylesheet" href="/assets/css/app.css"> |
| <link rel="shortcut icon" href="/assets/images/favicon.ico"> |
| <link rel="stylesheet" href="/assets/css/utilities.min.css"> |
| <link rel="stylesheet" href="/assets/css/docsearch.v3.css"> |
| <script src="/assets/js/jquery.min.js"></script> |
| <script src="/assets/js/all.min.js"></script> |
| <script src="/assets/js/docsearch.v3.js"></script> |
| <!-- Begin Jekyll SEO tag v2.8.0 --> |
| <title>Pegasus 的 last_flushed_decree | Pegasus</title> |
| <meta name="generator" content="Jekyll v4.3.3" /> |
| <meta property="og:title" content="Pegasus 的 last_flushed_decree" /> |
| <meta name="author" content="吴涛" /> |
| <meta property="og:locale" content="en_US" /> |
| <meta name="description" content="本文主要为大家梳理 last_flushed_decree 的原理。" /> |
| <meta property="og:description" content="本文主要为大家梳理 last_flushed_decree 的原理。" /> |
| <meta property="og:site_name" content="Pegasus" /> |
| <meta property="og:type" content="article" /> |
| <meta property="article:published_time" content="2018-03-07T00:00:00+00:00" /> |
| <meta name="twitter:card" content="summary" /> |
| <meta property="twitter:title" content="Pegasus 的 last_flushed_decree" /> |
| <script type="application/ld+json"> |
| {"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"吴涛"},"dateModified":"2018-03-07T00:00:00+00:00","datePublished":"2018-03-07T00:00:00+00:00","description":"本文主要为大家梳理 last_flushed_decree 的原理。","headline":"Pegasus 的 last_flushed_decree","mainEntityOfPage":{"@type":"WebPage","@id":"/2018/03/07/last_flushed_decree.html"},"url":"/2018/03/07/last_flushed_decree.html"}</script> |
| <!-- End Jekyll SEO tag --> |
| </head> |
| |
| <body> |
| |
| |
| |
| |
| <nav class="navbar is-info"> |
| <div class="container"> |
| <!--container will be unwrapped when it's in docs--> |
| <div class="navbar-brand"> |
| <a href="/" class="navbar-item "> |
| <!-- Pegasus Icon --> |
| <img src="/assets/images/pegasus.svg"> |
| </a> |
| <div class="navbar-item"> |
| <a href="/docs" class="button is-primary is-outlined is-inverted"> |
| <span class="icon"><i class="fas fa-book"></i></span> |
| <span>Docs</span> |
| </a> |
| </div> |
| <div class="navbar-item is-hidden-desktop"> |
| |
| |
| <!--A simple language switch button that only supports zh and en.--> |
| <!--IF its language is zh, then switches to en.--> |
| |
| <a class="button is-primary is-outlined is-inverted" href="/zh/2018/03/07/last_flushed_decree.html"><strong>中</strong></a> |
| |
| </div> |
| <a role="button" class="navbar-burger burger" aria-label="menu" aria-expanded="false" data-target="navMenu"> |
| <!-- Appears in mobile mode only --> |
| <span aria-hidden="true"></span> |
| <span aria-hidden="true"></span> |
| <span aria-hidden="true"></span> |
| </a> |
| </div> |
| <div class="navbar-menu" id="navMenu"> |
| <div class="navbar-end"> |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable "> |
| <a href="" |
| class="navbar-link "> |
| |
| <span class="icon" style="margin-right: .25em"> |
| <i class="fas fa-users"></i> |
| </span> |
| |
| <span> |
| ASF |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="https://www.apache.org/" |
| class="navbar-item "> |
| Foundation |
| </a> |
| |
| <a href="https://www.apache.org/licenses/" |
| class="navbar-item "> |
| License |
| </a> |
| |
| <a href="https://www.apache.org/events/current-event.html" |
| class="navbar-item "> |
| Events |
| </a> |
| |
| <a href="https://www.apache.org/foundation/sponsorship.html" |
| class="navbar-item "> |
| Sponsorship |
| </a> |
| |
| <a href="https://www.apache.org/security/" |
| class="navbar-item "> |
| Security |
| </a> |
| |
| <a href="https://privacy.apache.org/policies/privacy-policy-public.html" |
| class="navbar-item "> |
| Privacy |
| </a> |
| |
| <a href="https://www.apache.org/foundation/thanks.html" |
| class="navbar-item "> |
| Thanks |
| </a> |
| |
| </div> |
| </div> |
| |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable "> |
| <a href="/community" |
| class="navbar-link "> |
| |
| <span class="icon" style="margin-right: .25em"> |
| <i class="fas fa-user-plus"></i> |
| </span> |
| |
| <span> |
| Community |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="/community/#contact-us" |
| class="navbar-item "> |
| Contact Us |
| </a> |
| |
| <a href="/community/#contribution" |
| class="navbar-item "> |
| Contribution |
| </a> |
| |
| <a href="https://cwiki.apache.org/confluence/display/PEGASUS/Coding+guides" |
| class="navbar-item "> |
| Coding Guides |
| </a> |
| |
| <a href="https://github.com/apache/incubator-pegasus/issues?q=is%3Aissue+is%3Aopen+label%3Atype%2Fbug" |
| class="navbar-item "> |
| Bug Tracking |
| </a> |
| |
| <a href="https://cwiki.apache.org/confluence/display/INCUBATOR/PegasusProposal" |
| class="navbar-item "> |
| Apache Proposal |
| </a> |
| |
| </div> |
| </div> |
| |
| |
| |
| <a href="/blogs" |
| class="navbar-item "> |
| |
| <span class="icon" style="margin-right: .25em"> |
| <i class="fas fa-rss"></i> |
| </span> |
| |
| <span>Blog</span> |
| </a> |
| |
| |
| |
| <a href="/docs/downloads" |
| class="navbar-item "> |
| |
| <span class="icon" style="margin-right: .25em"> |
| <i class="fas fa-fire"></i> |
| </span> |
| |
| <span>Releases</span> |
| </a> |
| |
| |
| </div> |
| <div class="navbar-item is-hidden-mobile"> |
| |
| |
| <!--A simple language switch button that only supports zh and en.--> |
| <!--IF its language is zh, then switches to en.--> |
| |
| <a class="button is-primary is-outlined is-inverted" href="/zh/2018/03/07/last_flushed_decree.html"><strong>中</strong></a> |
| |
| </div> |
| </div> |
| </div> |
| </nav> |
| |
| <section class="section"> |
| <div class="container"> |
| <div class="columns is-multiline"> |
| <div class="column is-one-fourth"> |
| |
| </div> |
| <div class="column is-half"> |
| |
| <div class="content"> |
| |
| <ul class="blog-post-meta"> |
| <li class="blog-post-meta-item"> |
| <span class="icon" style="margin-right: .25em"> |
| <i class="far fa-calendar-check" aria-hidden="true"></i> |
| </span> |
| March 7, 2018 |
| </li> |
| <li class="blog-post-meta-item"> |
| <span class="icon" style="margin-right: .25em"> |
| <i class="fas fa-edit" aria-hidden="true"></i> |
| </span> |
| 吴涛 |
| </li> |
| </ul> |
| |
| <p>本文主要为大家梳理 <code class="language-plaintext highlighter-rouge">last_flushed_decree</code> 的原理。</p> |
| |
| <hr /> |
| |
| <p>一般的强一致性存储分为 <strong>replicated log</strong> 和 <strong>db storage</strong> 两层。replicated log 用于日志的复制,通过一致性协议(如 PacificA)进行组间复制同步,日志同步完成后,数据方可写入 db storage。通常来讲,在数据写入 db storage 之后,与其相对应的那一条日志即可被删除。因为 db storage 具备持久性,既然 db storage 中已经存有一份数据,在日志中就不需要再留一份。为了避免日志占用空间过大,我们需要定期删除日志,这一过程被称为 <strong>log compaction</strong>。</p> |
| |
| <p>这个简单的过程在 pegasus 中,问题稍微复杂了一些。</p> |
| |
| <p>首先 pegasus 在使用 rocksdb 时,关闭了其 write-ahead-log,这样写操作就只会直接落到不具备持久性的 memtable。显然,当数据尚未从 memtable 落至 sstable 时,日志是不可随便清理的。因此,pegasus 在 rocksdb 内部维护了一个 <code class="language-plaintext highlighter-rouge">last_flushed_decree</code>,当数据从 memtable 写落至 sstable 时,它就会更新,表示从〔0, last_flushed_decree〕之间的日志都可以被清除。</p> |
| |
| <p>故事到了这里还要再加一层复杂性:有一些日志只是心跳(<code class="language-plaintext highlighter-rouge">WRITE_EMPTY</code>),它们不含有任何数据。我们<strong>把心跳写入日志中</strong>,可以避免某个表 |
| 长时间无数据写,日志无法被清理的情况,同时也可以起到坏节点检测的作用。许多一致性协议(如 Raft)都会将心跳写入日志,这里不做赘述。</p> |
| |
| <p><strong>但心跳是否需要写入 rocksdb 呢?</strong></p> |
| |
| <p>这里讲一下架构,每个 pegasus 的 replica server 上都有许多分片,每个分片拥有一个 rocksdb 实例,而每个 rocksdb 维护一个 <code class="language-plaintext highlighter-rouge">last_flushed_decree</code>。所有的实例都会写入同一个日志,这被称为 shared log。每个实例自己会单独写一个 WAL,被称为 private log。复杂点在 <strong>shared log</strong>。</p> |
| |
| <pre><code class="language-txt"><r:1 d:1> 表示 replica id 为 1 的实例所写入的 decree = 1 的日志 |
| |
| 0 1 2 3 4 5 |
| <r:1 d:1> <r:2 d:1> <r:2 d:2> <r:2 d:3> <r:2 d:4> <r:2 d:5> |
| </code></pre> |
| |
| <p>可以看到,r1 写入 1 条日志后,r2 不断地写入 5 条日志。假设 r2 的 <code class="language-plaintext highlighter-rouge">last_flushed_decree = 5</code>,那么当前 shared_log 应当将 [0, 5] 的日志全部删掉,即删掉从 <code class="language-plaintext highlighter-rouge"><r:1 d:1></code> 到 <code class="language-plaintext highlighter-rouge"><r:2 d:5></code>。</p> |
| |
| <p>这时候问题来了:如果 <code class="language-plaintext highlighter-rouge"><r:1 d:1></code> 是一个心跳请求,且不写 rocksdb 的话,那就意味着 r1 的 last_flushed_decree = 0,也就意味着 <code class="language-plaintext highlighter-rouge"><r:1 d:1></code> 不可被删。这就给我们带来了困扰,因为日志只能 “前缀删除”,即只能删除 [0, 5],不能删除 [1, 5]。</p> |
| |
| <p>如果 r1 长时间没有数据写入,而 r2 长时间有较大吞吐,那么 shared log 可能会因为 r1 而无法清理,造成磁盘空间不足的情况。 |
| 这个问题是 shared log 的一个弊端。因此我们在设计上选择将每次心跳都写入 <code class="language-plaintext highlighter-rouge">rocksdb</code>,这样就能及时更新 <code class="language-plaintext highlighter-rouge">last_flushed_decree</code>, |
| shared log 也可以及时被删除。 |
| 如何将一个没有任何数据的心跳 “写入” rocksdb 呢?实际上我们也仅仅只是写入一个 <code class="language-plaintext highlighter-rouge">key=""</code>,<code class="language-plaintext highlighter-rouge">value=""</code> 的记录,这对系统几乎没有开销。</p> |
| |
| <p>但如果我们没有 shared log 呢?假设我们仅使用 private log 作为唯一的 WAL 存储,那么 rocksdb 虽然仍需维护 <code class="language-plaintext highlighter-rouge">last_flushed_decree</code>, |
| 但并不需要处理心跳,这一定程度上可以减少写路径的复杂度。</p> |
| |
| </div> |
| |
| <div class="tags"> |
| |
| </div> |
| |
| </div> |
| <div class="column is-one-fourth is-hidden-mobile" style="padding-left: 3rem"> |
| |
| <p class="menu-label"> |
| <span class="icon"> |
| <i class="fa fa-bars" aria-hidden="true"></i> |
| </span> |
| Table of contents |
| </p> |
| |
| |
| |
| </div> |
| </div> |
| </div> |
| </section> |
| <footer class="footer"> |
| <div class="container"> |
| <div class="content is-small has-text-centered"> |
| <div style="margin-bottom: 20px;"> |
| <a href="http://incubator.apache.org"> |
| <img src="/assets/images/egg-logo.png" |
| width="15%" |
| alt="Apache Incubator"/> |
| </a> |
| </div> |
| Copyright © 2023 <a href="http://www.apache.org">The Apache Software Foundation</a>. |
| Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version |
| 2.0</a>. |
| <br><br> |
| |
| Apache Pegasus is an effort undergoing incubation at The Apache Software Foundation (ASF), |
| sponsored by the Apache Incubator. Incubation is required of all newly accepted projects |
| until a further review indicates that the infrastructure, communications, and decision making process |
| have stabilized in a manner consistent with other successful ASF projects. While incubation status is |
| not necessarily a reflection of the completeness or stability of the code, it does indicate that the |
| project has yet to be fully endorsed by the ASF. |
| |
| <br><br> |
| Apache Pegasus, Pegasus, Apache, the Apache feather logo, and the Apache Pegasus project logo are either |
| registered trademarks or trademarks of The Apache Software Foundation in the United States and other |
| countries. |
| </div> |
| </div> |
| </footer> |
| <script src="/assets/js/app.js" type="text/javascript"></script> |
| </body> |
| </html> |