| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <title>Pegasus | Table Migration</title> |
| <link rel="stylesheet" href="/zh/assets/css/app.css"> |
| <link rel="shortcut icon" href="/zh/assets/images/favicon.ico"> |
| <link rel="stylesheet" href="/zh/assets/css/utilities.min.css"> |
| <link rel="stylesheet" href="/zh/assets/css/docsearch.v3.css"> |
| <script src="/assets/js/jquery.min.js"></script> |
| <script src="/assets/js/all.min.js"></script> |
| <script src="/assets/js/docsearch.v3.js"></script> |
| <!-- Begin Jekyll SEO tag v2.8.0 --> |
| <title>Table Migration | Pegasus</title> |
| <meta name="generator" content="Jekyll v4.4.1" /> |
| <meta property="og:title" content="Table Migration" /> |
| <meta property="og:locale" content="en_US" /> |
| <meta name="description" content="Table迁移是指将某个Pegasus集群的一张表所有数据迁移到另一个Pegasus集群中。" /> |
| <meta property="og:description" content="Table迁移是指将某个Pegasus集群的一张表所有数据迁移到另一个Pegasus集群中。" /> |
| <meta property="og:site_name" content="Pegasus" /> |
| <meta property="og:type" content="article" /> |
| <meta property="article:published_time" content="2025-04-18T04:14:19+00:00" /> |
| <meta name="twitter:card" content="summary" /> |
| <meta property="twitter:title" content="Table Migration" /> |
| <script type="application/ld+json"> |
| {"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2025-04-18T04:14:19+00:00","datePublished":"2025-04-18T04:14:19+00:00","description":"Table迁移是指将某个Pegasus集群的一张表所有数据迁移到另一个Pegasus集群中。","headline":"Table Migration","mainEntityOfPage":{"@type":"WebPage","@id":"/administration/table-migration"},"url":"/administration/table-migration"}</script> |
| <!-- End Jekyll SEO tag --> |
| |
| </head> |
| |
| |
| <body> |
| <div class="dashboard is-full-height"> |
| <!-- left panel --> |
| <div class="dashboard-panel is-medium is-hidden-mobile pl-0"> |
| <div class="dashboard-panel-header has-text-centered"> |
| <a href="/zh/"> |
| <img src="/assets/images/pegasus-logo-inv.png" style="width: 80%;"> |
| </a> |
| |
| </div> |
| <div class="dashboard-panel-main is-scrollable pl-6"> |
| |
| |
| <aside class="menu"> |
| |
| <p class="menu-label">Pegasus 产品文档</p> |
| <ul class="menu-list"> |
| |
| <li> |
| <a href="/zh/docs/downloads" |
| class=""> |
| 下载 |
| </a> |
| </li> |
| |
| </ul> |
| |
| <p class="menu-label">编译构建</p> |
| <ul class="menu-list"> |
| |
| <li> |
| <a href="/zh/docs/build/compile-by-docker" |
| class=""> |
| 使用 Docker 完成编译(推荐) |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/docs/build/compile-from-source" |
| class=""> |
| 从源码编译 |
| </a> |
| </li> |
| |
| </ul> |
| |
| <p class="menu-label">客户端库</p> |
| <ul class="menu-list"> |
| |
| <li> |
| <a href="/zh/clients/java-client" |
| class=""> |
| Java 客户端 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/clients/cpp-client" |
| class=""> |
| C++ 客户端 |
| </a> |
| </li> |
| |
| <li> |
| <a href="https://github.com/apache/incubator-pegasus/tree/master/go-client" |
| class=""> |
| Golang 客户端 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/clients/python-client" |
| class=""> |
| Python 客户端 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/clients/node-client" |
| class=""> |
| NodeJS 客户端 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/clients/scala-client" |
| class=""> |
| Scala 客户端 |
| </a> |
| </li> |
| |
| </ul> |
| |
| <p class="menu-label">生态工具</p> |
| <ul class="menu-list"> |
| |
| <li> |
| <a href="/zh/docs/tools/shell" |
| class=""> |
| Pegasus Shell 工具 |
| </a> |
| </li> |
| |
| <li> |
| <a href="https://github.com/pegasus-kv/admin-cli" |
| class=""> |
| 集群管理命令行 |
| </a> |
| </li> |
| |
| <li> |
| <a href="https://github.com/pegasus-kv/pegic" |
| class=""> |
| 数据访问命令行 |
| </a> |
| </li> |
| |
| </ul> |
| |
| <p class="menu-label">用户接口</p> |
| <ul class="menu-list"> |
| |
| <li> |
| <a href="/zh/api/ttl" |
| class=""> |
| TTL |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/api/single-atomic" |
| class=""> |
| 单行原子操作 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/api/redis" |
| class=""> |
| Redis 适配 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/api/geo" |
| class=""> |
| GEO 支持 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/api/http" |
| class=""> |
| HTTP 接口 |
| </a> |
| </li> |
| |
| </ul> |
| |
| <p class="menu-label">高效运维</p> |
| <ul class="menu-list"> |
| |
| <li> |
| <a href="/zh/administration/deployment" |
| class=""> |
| 集群部署 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/config" |
| class=""> |
| 配置说明 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/rebalance" |
| class=""> |
| 负载均衡 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/monitoring" |
| class=""> |
| 可视化监控 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/rolling-update" |
| class=""> |
| 集群重启和升级 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/scale-in-out" |
| class=""> |
| 集群扩容缩容 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/resource-management" |
| class=""> |
| 资源管理 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/cold-backup" |
| class=""> |
| 冷备份 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/meta-recovery" |
| class=""> |
| 元数据恢复 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/replica-recovery" |
| class=""> |
| Replica 数据恢复 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/zk-migration" |
| class=""> |
| Zookeeper 迁移 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/table-migration" |
| class="is-active"> |
| Table 迁移 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/table-soft-delete" |
| class=""> |
| Table 软删除 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/table-env" |
| class=""> |
| Table 环境变量 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/remote-commands" |
| class=""> |
| 远程命令 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/partition-split" |
| class=""> |
| Partition-Split |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/duplication" |
| class=""> |
| 跨机房同步 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/compression" |
| class=""> |
| 数据压缩 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/throttling" |
| class=""> |
| 流量控制 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/experiences" |
| class=""> |
| 运维经验 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/manual-compact" |
| class=""> |
| Manual Compact 功能 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/usage-scenario" |
| class=""> |
| Usage Scenario 功能 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/bad-disk" |
| class=""> |
| 坏盘检修 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/whitelist" |
| class=""> |
| Replica Server 白名单 |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/backup-request" |
| class=""> |
| Backup Request |
| </a> |
| </li> |
| |
| <li> |
| <a href="/zh/administration/hotspot-detection" |
| class=""> |
| 热点检测 |
| </a> |
| </li> |
| |
| </ul> |
| |
| </aside> |
| </div> |
| </div> |
| |
| <!-- main section --> |
| <div class="dashboard-main is-scrollable"> |
| <nav class="navbar is-hidden-desktop"> |
| <div class="navbar-brand"> |
| <a href="/zh/" class="navbar-item"> |
| <!-- Pegasus Icon --> |
| <img src="/assets/images/pegasus-square.png"> |
| </a> |
| <div class="navbar-item"> |
| |
| |
| <!--A simple language switch button that only supports zh and en.--> |
| <!--IF its language is zh, then switches to en.--> |
| |
| <!--If you don't want a url to be relativized, you can add a space explicitly into the href to |
| prevents a url from being relativized by polyglot.--> |
| <a class="button is-light is-outlined is-inverted" href=" /administration/table-migration"><strong>En</strong></a> |
| |
| </div> |
| <a role="button" class="navbar-burger burger" aria-label="menu" aria-expanded="false" data-target="navMenu"> |
| <!-- Appears in mobile mode only --> |
| <span aria-hidden="true"></span> |
| <span aria-hidden="true"></span> |
| <span aria-hidden="true"></span> |
| </a> |
| </div> |
| <div class="navbar-menu" id="navMenu"> |
| <div class="navbar-end"> |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable"> |
| <a href="" |
| class="navbar-link "> |
| <span> |
| Pegasus 产品文档 |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="/zh/docs/downloads" |
| class="navbar-item "> |
| 下载 |
| </a> |
| |
| </div> |
| </div> |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable"> |
| <a href="" |
| class="navbar-link "> |
| <span> |
| 编译构建 |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="/zh/docs/build/compile-by-docker" |
| class="navbar-item "> |
| 使用 Docker 完成编译(推荐) |
| </a> |
| |
| <a href="/zh/docs/build/compile-from-source" |
| class="navbar-item "> |
| 从源码编译 |
| </a> |
| |
| </div> |
| </div> |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable"> |
| <a href="" |
| class="navbar-link "> |
| <span> |
| 客户端库 |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="/zh/clients/java-client" |
| class="navbar-item "> |
| Java 客户端 |
| </a> |
| |
| <a href="/zh/clients/cpp-client" |
| class="navbar-item "> |
| C++ 客户端 |
| </a> |
| |
| <a href="https://github.com/apache/incubator-pegasus/tree/master/go-client" |
| class="navbar-item "> |
| Golang 客户端 |
| </a> |
| |
| <a href="/zh/clients/python-client" |
| class="navbar-item "> |
| Python 客户端 |
| </a> |
| |
| <a href="/zh/clients/node-client" |
| class="navbar-item "> |
| NodeJS 客户端 |
| </a> |
| |
| <a href="/zh/clients/scala-client" |
| class="navbar-item "> |
| Scala 客户端 |
| </a> |
| |
| </div> |
| </div> |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable"> |
| <a href="" |
| class="navbar-link "> |
| <span> |
| 生态工具 |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="/zh/docs/tools/shell" |
| class="navbar-item "> |
| Pegasus Shell 工具 |
| </a> |
| |
| <a href="https://github.com/pegasus-kv/admin-cli" |
| class="navbar-item "> |
| 集群管理命令行 |
| </a> |
| |
| <a href="https://github.com/pegasus-kv/pegic" |
| class="navbar-item "> |
| 数据访问命令行 |
| </a> |
| |
| </div> |
| </div> |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable"> |
| <a href="" |
| class="navbar-link "> |
| <span> |
| 用户接口 |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="/zh/api/ttl" |
| class="navbar-item "> |
| TTL |
| </a> |
| |
| <a href="/zh/api/single-atomic" |
| class="navbar-item "> |
| 单行原子操作 |
| </a> |
| |
| <a href="/zh/api/redis" |
| class="navbar-item "> |
| Redis 适配 |
| </a> |
| |
| <a href="/zh/api/geo" |
| class="navbar-item "> |
| GEO 支持 |
| </a> |
| |
| <a href="/zh/api/http" |
| class="navbar-item "> |
| HTTP 接口 |
| </a> |
| |
| </div> |
| </div> |
| |
| <!--dropdown--> |
| <div class="navbar-item has-dropdown is-hoverable"> |
| <a href="" |
| class="navbar-link "> |
| <span> |
| 高效运维 |
| </span> |
| </a> |
| <div class="navbar-dropdown"> |
| |
| <a href="/zh/administration/deployment" |
| class="navbar-item "> |
| 集群部署 |
| </a> |
| |
| <a href="/zh/administration/config" |
| class="navbar-item "> |
| 配置说明 |
| </a> |
| |
| <a href="/zh/administration/rebalance" |
| class="navbar-item "> |
| 负载均衡 |
| </a> |
| |
| <a href="/zh/administration/monitoring" |
| class="navbar-item "> |
| 可视化监控 |
| </a> |
| |
| <a href="/zh/administration/rolling-update" |
| class="navbar-item "> |
| 集群重启和升级 |
| </a> |
| |
| <a href="/zh/administration/scale-in-out" |
| class="navbar-item "> |
| 集群扩容缩容 |
| </a> |
| |
| <a href="/zh/administration/resource-management" |
| class="navbar-item "> |
| 资源管理 |
| </a> |
| |
| <a href="/zh/administration/cold-backup" |
| class="navbar-item "> |
| 冷备份 |
| </a> |
| |
| <a href="/zh/administration/meta-recovery" |
| class="navbar-item "> |
| 元数据恢复 |
| </a> |
| |
| <a href="/zh/administration/replica-recovery" |
| class="navbar-item "> |
| Replica 数据恢复 |
| </a> |
| |
| <a href="/zh/administration/zk-migration" |
| class="navbar-item "> |
| Zookeeper 迁移 |
| </a> |
| |
| <a href="/zh/administration/table-migration" |
| class="navbar-item is-active"> |
| Table 迁移 |
| </a> |
| |
| <a href="/zh/administration/table-soft-delete" |
| class="navbar-item "> |
| Table 软删除 |
| </a> |
| |
| <a href="/zh/administration/table-env" |
| class="navbar-item "> |
| Table 环境变量 |
| </a> |
| |
| <a href="/zh/administration/remote-commands" |
| class="navbar-item "> |
| 远程命令 |
| </a> |
| |
| <a href="/zh/administration/partition-split" |
| class="navbar-item "> |
| Partition-Split |
| </a> |
| |
| <a href="/zh/administration/duplication" |
| class="navbar-item "> |
| 跨机房同步 |
| </a> |
| |
| <a href="/zh/administration/compression" |
| class="navbar-item "> |
| 数据压缩 |
| </a> |
| |
| <a href="/zh/administration/throttling" |
| class="navbar-item "> |
| 流量控制 |
| </a> |
| |
| <a href="/zh/administration/experiences" |
| class="navbar-item "> |
| 运维经验 |
| </a> |
| |
| <a href="/zh/administration/manual-compact" |
| class="navbar-item "> |
| Manual Compact 功能 |
| </a> |
| |
| <a href="/zh/administration/usage-scenario" |
| class="navbar-item "> |
| Usage Scenario 功能 |
| </a> |
| |
| <a href="/zh/administration/bad-disk" |
| class="navbar-item "> |
| 坏盘检修 |
| </a> |
| |
| <a href="/zh/administration/whitelist" |
| class="navbar-item "> |
| Replica Server 白名单 |
| </a> |
| |
| <a href="/zh/administration/backup-request" |
| class="navbar-item "> |
| Backup Request |
| </a> |
| |
| <a href="/zh/administration/hotspot-detection" |
| class="navbar-item "> |
| 热点检测 |
| </a> |
| |
| </div> |
| </div> |
| |
| </div> |
| </div> |
| </nav> |
| |
| <nav class="navbar is-hidden-mobile"> |
| <div class="navbar-start w-full"> |
| <div class="navbar-item pl-0 w-full"> |
| <!--TODO(wutao): Given the limitation of docsearch that couldn't handle multiple input, |
| I make searchbox only shown in desktop. Fix this issue when docsearch.js v3 released. |
| Related issue: https://github.com/algolia/docsearch/issues/230--> |
| <div id="docsearch"></div> |
| </div> |
| </div> |
| <div class="navbar-end"> |
| <div class="navbar-item"> |
| |
| |
| <!--A simple language switch button that only supports zh and en.--> |
| <!--IF its language is zh, then switches to en.--> |
| |
| <!--If you don't want a url to be relativized, you can add a space explicitly into the href to |
| prevents a url from being relativized by polyglot.--> |
| <a class="button is-light is-outlined is-inverted" href=" /administration/table-migration"><strong>En</strong></a> |
| |
| </div> |
| </div> |
| </nav> |
| |
| <section class="hero is-info lg:mr-3"> |
| <div class="hero-body"> |
| |
| <p class="title is-size-2 is-centered">Table 迁移</p> |
| </div> |
| </section> |
| <section class="section" style="padding-top: 2rem;"> |
| <div class="content"> |
| <p>Table迁移是指将某个Pegasus集群的一张表所有数据迁移到另一个Pegasus集群中。</p> |
| |
| <p>目前提供了四种Table迁移方法:</p> |
| |
| <ol> |
| <li>Shell工具copy_data命令;</li> |
| <li>冷备份恢复;</li> |
| <li>业务双写配合Bulkload;</li> |
| <li>热备迁移;</li> |
| </ol> |
| |
| <p>下面开始讲述这些迁移方法的原理、具体操作方式:</p> |
| |
| <h1 id="shell工具copy_data命令迁移">Shell工具copy_data命令迁移</h1> |
| |
| <h2 id="原理">原理</h2> |
| |
| <p>Shell工具的<a href="/zh/overview/shell#copy_data">copy_data命令</a>原理是通过客户端将原表数据逐条读出并逐条写入新表。具体就是通过scan接口从原集群的表中逐条读入数据,然后通过set接口将数据逐条写入到目标集群的表中。如果set的数据在目标集群的表中已经存在,会直接覆盖。</p> |
| |
| <h2 id="具体操作方式">具体操作方式</h2> |
| |
| <p>copy_data命令:</p> |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> copy_data <-c|--target_cluster_name str> <-a|--target_app_name str> |
| [-p|--partition num] [-b|--max_batch_count num] [-t|--timeout_ms num] |
| [-h|--hash_key_filter_type anywhere|prefix|postfix] |
| [-x|--hash_key_filter_pattern str] |
| [-s|--sort_key_filter_type anywhere|prefix|postfix|exact] |
| [-y|--sort_key_filter_pattern str] |
| [-v|--value_filter_type anywhere|prefix|postfix|exact] |
| [-z|--value_filter_pattern str] [-m|--max_multi_set_concurrency] |
| [-o|--scan_option_batch_size] [-e|--no_ttl] [-n|--no_overwrite] |
| [-i|--no_value] [-g|--geo_data] [-u|--use_multi_set] |
| </code></pre></div></div> |
| |
| <p>假设原集群为ClusterA,目标集群为ClusterB,需要迁移的表为TableA。迁移步骤如下:</p> |
| <ul> |
| <li>在目标集群上建表。由于copy_data命令并不会自动在目标集群上创建表,所以需要自己先建表。相对原表,新表的表名可以不同,partition count也可以不同。假设在目标集群上新建的表名为TableB。</li> |
| <li>在Shell工具的配置文件中添加目标集群的配置。因为copy_data命令需要通过<code class="language-plaintext highlighter-rouge">-c</code>参数指定目标集群,所以需要配置目标集群的MetaServer地址列表。在执行Shell所在文件夹,修改配置文件<a href="https://github.com/apache/incubator-pegasus/blob/master/src/shell/config.ini">src/shell/config.ini</a>,在文件最后添加如下几行(将ClusterB替换为你自己的集群名): |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[pegasus.clusters] |
| ClusterB = {ClusterB的MetaServer地址} |
| </code></pre></div> </div> |
| </li> |
| <li>在Shell中执行命令: |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> use TableA |
| >>> copy_data -c ClusterB -a TableB -t 10000 |
| </code></pre></div> </div> |
| </li> |
| <li>如果以上步骤都没有问题,copy操作应当就开始执行了,每隔1秒会打印进度。通常来说,copy速度应当在10万/秒以上。copy过程中如果出现问题终止了(比如遭遇写限流,write stall等),需排查问题后再重新执行命令。</li> |
| </ul> |
| |
| <h1 id="冷备份迁移">冷备份迁移</h1> |
| |
| <h2 id="原理-1">原理</h2> |
| |
| <p>所谓冷备份迁移,就是利用Pegasus的<a href="/zh/administration/cold-backup">冷备份功能</a>,先将数据备份到HDFS或者其他介质上,然后通过restore或bulkload恢复到新的表中。</p> |
| |
| <p>冷备份迁移的好处:</p> |
| <ul> |
| <li>速度更快:因为冷备份是拷贝文件,相对copy_data的逐条拷贝,速度要快很多。</li> |
| <li>错误容忍度高:冷备份功能有很多容错逻辑,避免因为网络抖动等问题带来的影响。如果用copy_data,中途出错就需要从头再来。</li> |
| <li>多次迁移更友好:如果要从一个表拷贝到多个地方,只需要备份一次,然后执行多次恢复。</li> |
| </ul> |
| |
| <h2 id="具体操作方式-1">具体操作方式</h2> |
| |
| <p><strong>冷备份大致分为两个阶段:</strong></p> |
| |
| <ol> |
| <li>表的所有主副本通过创建checkpoints为上传HDFS做准备。此过程期间冷备表分片越大,占用的磁盘IO越大,会产生短暂的读写毛刺。</li> |
| <li>创建checkpoints后调用HDFS接口进行上传。此过程期间将占用较多网络资源,若不限速,容易造成网络带宽打满。</li> |
| </ol> |
| |
| <p>冷备份最佳实践方式如下:</p> |
| |
| <ul> |
| <li>冷备之前通过Pegasus Shell工具设置限速规避网络带宽资源的占用。</li> |
| </ul> |
| |
| <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#2.3.x版本及以前设置方式</span> |
| remote_command <span class="nt">-t</span> replica-server nfs.max_send_rate_megabytes 50 |
| <span class="c">#2.4.x版本及以后设置方式</span> |
| remote_command <span class="nt">-t</span> replica-server nfs.max_send_rate_megabytes_per_disk 50 |
| </code></pre></div></div> |
| |
| <ul> |
| <li>通过admin-cli发起冷备并等待,参数依次为表id,HDFS所在Region,HDFS存储路径。</li> |
| </ul> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>backup 3 hdfs_zjy /user/pegasus/backup |
| </code></pre></div></div> |
| |
| <p>其中HDFS所在Region字段会匹配config.ini文件中的以下内容来连接HDFS:</p> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[block_service.hdfs_zjy] |
| type = hdfs_service |
| args = hdfs://zjyprc-hadoop / |
| </code></pre></div></div> |
| |
| <ul> |
| <li>观察监控磁盘IO逐渐降低,代表冷备份进入第二阶段。此时可以不断观察监控中网络带宽占用情况,适当放开限速来加速冷备,经验值是每次递增50。</li> |
| </ul> |
| |
| <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#2.3.x版本及以前设置方式</span> |
| remote_command <span class="nt">-t</span> replica-server nfs.max_send_rate_megabytes 100 |
| <span class="c">#2.4.x版本及以后设置方式</span> |
| remote_command <span class="nt">-t</span> replica-server nfs.max_send_rate_megabytes_per_disk 100 |
| </code></pre></div></div> |
| |
| <ul> |
| <li>一旦发生ReplicaServer节点重启将造成冷备份失败,并且只能等待,目前不支持取消冷备。此时需要观察监控<code class="language-plaintext highlighter-rouge">cold.backup.max.upload.file.size</code>,此指标归零后表示失败的冷备结束。后续需要删除HDFS上的冷备目录,重新发起冷备操作。</li> |
| </ul> |
| |
| <p><strong>冷备份数据恢复到新表有两种方式:</strong></p> |
| |
| <ol> |
| <li><a href="/zh/administration/cold-backup">冷备份功能</a>中介绍的restore命令来将数据恢复到新表。</li> |
| </ol> |
| |
| <p>restore执行方式如下:</p> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>restore -c ClusterA -a single -i 4 -t 1742888751127 -b hdfs_zjy -r /user/pegasus/backup |
| </code></pre></div></div> |
| |
| <p>执行此命令需要注意:</p> |
| |
| <ul> |
| <li>restore命令会自动创建表,因此restore命令不支持变更表分片数。</li> |
| <li>restore命令强制要求原表TableA存在,否则无法执行此命令。因此原表不存在时,只能通过Bulkload将数据灌入新表。</li> |
| <li>注意限速避免打满网络带宽,限速方式于冷备份限速方式相同。</li> |
| </ul> |
| |
| <ol> |
| <li><a href="/zh/2020/02/18/bulk-load-design.html">Bulkload功能</a>中介绍的Bulkload功能来将数据灌入到新表。</li> |
| </ol> |
| |
| <p>Bulkload功能可以将冷备份数据灌入新表,最佳实践方式如下:</p> |
| |
| <ul> |
| <li>由于Bulkload需要特定格式数据,使用Pegasus-spark提供的离线split操作将冷备数据转换为所需格式。Pegasus-spark的使用方式此处不进行介绍。</li> |
| <li>使用Pegasus-spark提供的Bulkload操作功能将处理好的数据灌入Pegasus中。 |
| <ul> |
| <li>Pegasus shell命令行同样支持发起Bulkload,假设离线split处理后的数据在<code class="language-plaintext highlighter-rouge">/user/pegasus/split</code>目录中,具体操作方式如下:</li> |
| </ul> |
| </li> |
| </ul> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> use TableB |
| >>> set_app_envs rocksdb.usage_scenario bulk_load |
| >>> start_bulk_load -a TableB -c ClusterB -p hdfs_zjy -r /user/pegasus/split |
| </code></pre></div></div> |
| |
| <h1 id="业务双写配合bulkload">业务双写配合Bulkload</h1> |
| |
| <p>copy_data命令迁移和冷备份迁移都只能迁移存量数据,若业务有增量数据,则需要业务停写迁移。<code class="language-plaintext highlighter-rouge">v2.3.x</code>及以后版本支持了<strong>业务不停写迁移方案</strong>,即业务双写配合Bulkload。</p> |
| |
| <h2 id="原理-2">原理</h2> |
| |
| <ol> |
| <li>业务侧双写原表和目标表,保证增量数据的同步。</li> |
| <li>服务侧通过冷备、离线Split、Bulkload IngestBehind三步来迁移存量数据,保证存量数据同步。</li> |
| </ol> |
| |
| <p>Rocksdb支持IngestBehind功能,Rocksdb内部的sst文件由global seqno号来表示sst文件的新旧,并且是递增的。Rocksdb通过ingest功能会为即将导入的外部sst文件分配global seqno号,IngestBehind功能则表示为导入的sst文件分配的global seqno号为0。这样存量数据将被导入Rocksdb引擎底部,进而保证增量数据和存量数据的读取顺序。</p> |
| |
| <h2 id="具体操作方式-2">具体操作方式</h2> |
| |
| <ul> |
| <li>创建目标表时需指定<code class="language-plaintext highlighter-rouge">rocksdb.allow_ingest_behind=true</code>,若不指定此参数将无法使用IngestBehind功能!</li> |
| </ul> |
| |
| <pre><code class="language-SQL">create TableB -p 64 -e rocksdb.allow_ingest_behind=true |
| </code></pre> |
| |
| <ul> |
| <li>与业务侧沟通让其双写原表和目标表。 |
| <ul> |
| <li>需注意双写两张表均需增加写失败重试机制。</li> |
| </ul> |
| </li> |
| <li>业务侧双写改造完成后,服务侧通过冷备、离线Split将Bulkload所需数据准备好。</li> |
| <li>通过Pegasus shell发起Bulkload操作,与普通Bulkload操作不同的是需指定<code class="language-plaintext highlighter-rouge">--ingest_behind</code>参数。</li> |
| </ul> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> use TableB |
| >>> set_app_envs rocksdb.usage_scenario bulk_load |
| >>> start_bulk_load -a TableB -c ClusterB -p hdfs_zjy -r /user/pegasus/split --ingest_behind |
| </code></pre></div></div> |
| |
| <ul> |
| <li>若Bulkload占用过多网络带宽资源,仍然可以通过上述介绍的<code class="language-plaintext highlighter-rouge">max_send_rate_megabytes</code>进行限速。</li> |
| <li>此方式不要求原表分片数和目标表分片数一致,故可自由调整目标表分片数。</li> |
| </ul> |
| |
| <h1 id="热备迁移">热备迁移</h1> |
| |
| <p><code class="language-plaintext highlighter-rouge">v2.4.x</code>及之后版本支持了热备份,<a href="/zh/administration/duplication">跨机房同步</a>有详细介绍,这里不再阐述。热备份可以实现业务无感迁移,且操作流程简单。</p> |
| |
| <h2 id="具体操作方式-3">具体操作方式</h2> |
| |
| <ul> |
| <li>热备份迁移要求业务所有Client通过MetaProxy组件来访问Pegasus。不能有直连Metaserver IP地址的客户端!关于Pegasus直连IP客户端检测,可咨询Pegasus社区。</li> |
| <li> |
| <p>业务侧全部客户端接入MetaProxy后,开始向目标集群建立热备,此处省略如何建立热备。</p> |
| </li> |
| <li>热备建立后,修改MetaProxy依赖Zookeeper中相应信息,改为目标集群的MetaServer IP地址。</li> |
| <li>对于原表TableA进行阻读阻写,以此触发业务侧Client重新从MetaProxy拉取拓扑。</li> |
| </ul> |
| |
| <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> use TableB |
| >>> set_app_envs replica.deny_client_request reconfig*all |
| </code></pre></div></div> |
| |
| <ul> |
| <li>观察下列监控指标符合预期即为迁移成功: |
| <ul> |
| <li>观察监控原表QPS流量消失;</li> |
| <li>目标表QPS上涨并恢复至原表相当的流量;</li> |
| <li>原集群、目标集群热备监控中<code class="language-plaintext highlighter-rouge">dup.disabled_non_idempotent_write_count</code>为0;</li> |
| <li>原集群、目标集群监控中读写失败次数<code class="language-plaintext highlighter-rouge">recent.read.fail.count recent.write.fail.count</code>为0;</li> |
| </ul> |
| </li> |
| <li>需注意:目前C++ Client和Python Client暂不支持连入MetaProxy。</li> |
| </ul> |
| |
| </div> |
| </section> |
| <footer class="footer"> |
| <div class="container"> |
| <div class="content is-small has-text-centered"> |
| <div style="margin-bottom: 20px;"> |
| <a href="http://incubator.apache.org"> |
| <img src="/assets/images/egg-logo.png" |
| width="15%" |
| alt="Apache Incubator"/> |
| </a> |
| </div> |
| Copyright © 2023 <a href="http://www.apache.org">The Apache Software Foundation</a>. |
| Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version |
| 2.0</a>. |
| <br><br> |
| |
| Apache Pegasus is an effort undergoing incubation at The Apache Software Foundation (ASF), |
| sponsored by the Apache Incubator. Incubation is required of all newly accepted projects |
| until a further review indicates that the infrastructure, communications, and decision making process |
| have stabilized in a manner consistent with other successful ASF projects. While incubation status is |
| not necessarily a reflection of the completeness or stability of the code, it does indicate that the |
| project has yet to be fully endorsed by the ASF. |
| |
| <br><br> |
| Apache Pegasus, Pegasus, Apache, the Apache feather logo, and the Apache Pegasus project logo are either |
| registered trademarks or trademarks of The Apache Software Foundation in the United States and other |
| countries. |
| </div> |
| </div> |
| </footer> |
| </div> |
| |
| <!-- right panel --> |
| <div class="dashboard-panel is-small is-scrollable is-hidden-mobile"> |
| <p class="menu-label"> |
| <span class="icon"> |
| <i class="fa fa-bars" aria-hidden="true"></i> |
| </span> |
| 本页导航 |
| </p> |
| <ul class="menu-list"> |
| <li><a href="#shell工具copy_data命令迁移">Shell工具copy_data命令迁移</a> |
| <ul> |
| <li><a href="#原理">原理</a></li> |
| <li><a href="#具体操作方式">具体操作方式</a></li> |
| </ul> |
| </li> |
| <li><a href="#冷备份迁移">冷备份迁移</a> |
| <ul> |
| <li><a href="#原理-1">原理</a></li> |
| <li><a href="#具体操作方式-1">具体操作方式</a></li> |
| </ul> |
| </li> |
| <li><a href="#业务双写配合bulkload">业务双写配合Bulkload</a> |
| <ul> |
| <li><a href="#原理-2">原理</a></li> |
| <li><a href="#具体操作方式-2">具体操作方式</a></li> |
| </ul> |
| </li> |
| <li><a href="#热备迁移">热备迁移</a> |
| <ul> |
| <li><a href="#具体操作方式-3">具体操作方式</a></li> |
| </ul> |
| </li> |
| </ul> |
| |
| </div> |
| </div> |
| |
| <script src="/assets/js/app.js" type="text/javascript"></script> |
| <script> |
| docsearch({ |
| container: '#docsearch', |
| appId: 'QRN30RBW0S', |
| indexName: 'pegasus-apache', |
| apiKey: 'd3a3252fa344359766707a106c4ed88f', |
| debug: true |
| }); |
| </script> |
| |
| </body> |
| |
| </html> |