blob: 1e210c8de2039b5ab92ce0667c246d8899ae52af [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Apache Ozone Documentation">
<title>Documentation for Apache Ozone</title>
<link href="../css/bootstrap.min.css" rel="stylesheet">
<link href="../css/ozonedoc.css" rel="stylesheet">
<link href="../swagger-resources/swagger-ui.css" rel="stylesheet">
<script>
var _paq = window._paq = window._paq || [];
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="//analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '34']);
var d=document, g=d.createElement('script'),
s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#sidebar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="../index.html" class="navbar-left ozone-logo">
<img src="../ozone-logo-small.png"/>
</a>
<a class="navbar-brand hidden-xs" href="../index.html">
Apache Ozone/HDDS Documentation
</a>
<a class="navbar-brand visible-xs-inline" href="#">Apache Ozone</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://github.com/apache/ozone">Source</a></li>
<li><a href="https://ozone.apache.org">Apache Ozone</a></li>
<li><a href="https://apache.org">ASF</a></li>
</ul>
</div>
</div>
</nav>
<div class="wrapper">
<div class="container-fluid">
<div class="row">
<div class="col-sm-2 col-md-2 sidebar" id="sidebar">
<ul class="nav nav-sidebar">
<li class="">
<a href="../index.html">
<span>Overview</span>
</a>
</li>
<li class="">
<a href="../start.html">
<span>Getting Started</span>
</a>
</li>
<li class="">
<a href="../concept.html">
<span>Architecture</span>
</a>
<ul class="nav">
<li class="">
<a href="../concept/overview.html">Overview</a>
</li>
<li class="">
<a href="../concept/ozonemanager.html">Ozone Manager</a>
</li>
<li class="">
<a href="../concept/storagecontainermanager.html">Storage Container Manager</a>
</li>
<li class="">
<a href="../concept/containers.html">Containers</a>
</li>
<li class="">
<a href="../concept/datanodes.html">Datanodes</a>
</li>
<li class="">
<a href="../concept/recon.html">Recon</a>
</li>
</ul>
</li>
<li class="">
<a href="../feature.html">
<span>Features</span>
</a>
<ul class="nav">
<li class="">
<a href="../feature/decommission.html">Decommissioning</a>
</li>
<li class="">
<a href="../feature/om-ha.html">OM High Availability</a>
</li>
<li class="">
<a href="../feature/erasurecoding.html">Ozone Erasure Coding</a>
</li>
<li class="">
<a href="../feature/snapshot.html">Ozone Snapshot</a>
</li>
<li class="active">
<a href="../feature/scm-ha.html">SCM High Availability</a>
</li>
<li class="">
<a href="../feature/streaming-write-pipeline.html">Streaming Write Pipeline</a>
</li>
<li class="">
<a href="../feature/dn-merge-rocksdb.html">Merge Container RocksDB in DN</a>
</li>
<li class="">
<a href="../feature/prefixfso.html">Prefix based File System Optimization</a>
</li>
<li class="">
<a href="../feature/topology.html">Topology awareness</a>
</li>
<li class="">
<a href="../feature/quota.html">Quota in Ozone</a>
</li>
<li class="">
<a href="../feature/recon.html">Recon Server</a>
</li>
<li class="">
<a href="../feature/observability.html">Observability</a>
</li>
<li class="">
<a href="../feature/nonrolling-upgrade.html">Non-Rolling Upgrades and Downgrades</a>
</li>
<li class="">
<a href="../feature/s3-multi-tenancy.html">
<span>S3 Multi-Tenancy</span>
</a>
<ul class="nav">
<li class="">
<a href="../feature/s3-multi-tenancy-setup.html">Setup</a>
</li>
<li class="">
<a href="../feature/s3-tenant-commands.html">Tenant commands</a>
</li>
<li class="">
<a href="../feature/s3-multi-tenancy-access-control.html">Access Control</a>
</li>
</ul>
</li>
<li class="">
<a href="../feature/reconfigurability.html">Reconfigurability</a>
</li>
</ul>
</li>
<li class="">
<a href="../interface.html">
<span>Client Interfaces</span>
</a>
<ul class="nav">
<li class="">
<a href="../interface/ofs.html">Ofs (Hadoop compatible)</a>
</li>
<li class="">
<a href="../interface/o3fs.html">O3fs (Hadoop compatible)</a>
</li>
<li class="">
<a href="../interface/s3.html">S3 Protocol</a>
</li>
<li class="">
<a href="../interface/cli.html">Command Line Interface</a>
</li>
<li class="">
<a href="../interface/reconapi.html">Recon API</a>
</li>
<li class="">
<a href="../interface/javaapi.html">Java API</a>
</li>
<li class="">
<a href="../interface/csi.html">CSI Protocol</a>
</li>
<li class="">
<a href="../interface/httpfs.html">HttpFS Gateway</a>
</li>
</ul>
</li>
<li class="">
<a href="../security.html">
<span>Security</span>
</a>
<ul class="nav">
<li class="">
<a href="../security/secureozone.html">Securing Ozone</a>
</li>
<li class="">
<a href="../security/securingtde.html">Transparent Data Encryption</a>
</li>
<li class="">
<a href="../security/gdpr.html">GDPR in Ozone</a>
</li>
<li class="">
<a href="../security/securingdatanodes.html">Securing Datanodes</a>
</li>
<li class="">
<a href="../security/securingozonehttp.html">Securing HTTP</a>
</li>
<li class="">
<a href="../security/securings3.html">Securing S3</a>
</li>
<li class="">
<a href="../security/securityacls.html">Ozone ACLs</a>
</li>
<li class="">
<a href="../security/securitywithranger.html">Apache Ranger</a>
</li>
</ul>
</li>
<li class="">
<a href="../tools.html">
<span>Tools</span>
</a>
</li>
<li class="">
<a href="../recipe.html">
<span>Recipes</span>
</a>
</li>
<li><a href="../design.html"><span><b>Design docs</b></span></a></li>
<li class="visible-xs"><a href="#">References</a>
<ul class="nav">
<li><a href="https://github.com/apache/ozone"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Source</a></li>
<li><a href="https://ozone.apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> Apache Ozone</a></li>
<li><a href="https://apache.org"><span class="glyphicon glyphicon-new-window" aria-hidden="true"></span> ASF</a></li>
</ul></li>
</ul>
</div>
<div class="col-sm-10 col-sm-offset-2 col-md-10 col-md-offset-2 main-content">
<div class="col-md-9">
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="../index.html">Home</a></li>
<li class="breadcrumb-item" aria-current="page"><a href="../feature.html">Features</a></li>
<li class="breadcrumb-item active" aria-current="page">SCM High Availability</li>
</ol>
</nav>
<div class="pull-right">
<a href="../zh/feature/scm-ha.html"><span class="label label-success">中文</span></a>
</div>
<div class="col-md-9">
<h1>SCM High Availability</h1>
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Ozone has two metadata-manager nodes (<em>Ozone Manager</em> for key space management and <em>Storage Container Management</em> for block space management) and multiple storage nodes (Datanode). Data is replicated between Datanodes with the help of RAFT consensus algorithm.</p>
<p>To avoid any single point of failure the metadata-manager nodes also should have a HA setup.</p>
<p>Both Ozone Manager and Storage Container Manager supports HA. In this mode the internal state is replicated via RAFT (with Apache Ratis)</p>
<p>This document explains the HA setup of Storage Container Manager (SCM), please check <a href="../feature/om-ha.html">this page</a> for HA setup of Ozone Manager (OM). While they can be setup for HA independently, a reliable, full HA setup requires enabling HA for both services.</p>
<h2 id="configuration">Configuration</h2>
<p>HA mode of Storage Container Manager can be enabled with the following settings in <code>ozone-site.xml</code>:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>ozone.scm.ratis.enable<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>true<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>One Ozone configuration (<code>ozone-site.xml</code>) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers.</p>
<p>This logical name is called <code>serviceId</code> and can be configured in the <code>ozone-site.xml</code></p>
<p>Most of the time you need to set only the values of your current cluster:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>ozone.scm.service.ids<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>cluster1<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>For each of the defined <code>serviceId</code> a logical configuration name should be defined for each of the servers</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>ozone.scm.nodes.cluster1<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>scm1,scm2,scm3<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>The defined prefixes can be used to define the address of each of the SCM services:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>ozone.scm.address.cluster1.scm1<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>host1<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
<span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>ozone.scm.address.cluster1.scm2<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>host2<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
<span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>ozone.scm.address.cluster1.scm3<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>host3<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>For reliable HA support choose 3 independent nodes to form a quorum.</p>
<h2 id="bootstrap">Bootstrap</h2>
<p>The initialization of the <strong>first</strong> SCM-HA node is the same as a non-HA SCM:</p>
<pre><code>ozone scm --init
</code></pre><p>Second and third nodes should be <em>bootstrapped</em> instead of init. These clusters will join to the configured RAFT quorum. The id of the current server is identified by DNS name or can be set explicitly by <code>ozone.scm.node.id</code>. Most of the time you don&rsquo;t need to set it as DNS based id detection can work well.</p>
<pre><code>ozone scm --bootstrap
</code></pre><p>Note: both commands perform one-time initialization. SCM still needs to be started by running <code>ozone scm --daemon start</code>.</p>
<h2 id="auto-bootstrap">Auto-bootstrap</h2>
<p>In some environments (e.g. Kubernetes) we need to have a common, unified way to initialize SCM HA quorum. As a reminder, the standard initialization flow is the following:</p>
<ol>
<li>On the first, &ldquo;primordial&rdquo; node: <code>ozone scm --init</code></li>
<li>On second/third nodes: <code>ozone scm --bootstrap</code></li>
</ol>
<p>This can be improved: primordial SCM can be configured by setting <code>ozone.scm.primordial.node.id</code> in the config to one of the nodes.</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;name&gt;</span>ozone.scm.primordial.node.id<span style="color:#f92672">&lt;/name&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>scm1<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>With this configuration both <code>scm --init</code> and <code>scm --bootstrap</code> can be safely executed on <strong>all</strong> SCM nodes. Each node will only perform the action applicable to it based on the <code>ozone.scm.primordial.node.id</code> and its own node ID.</p>
<p>Note: SCM still needs to be started after the init/bootstrap process.</p>
<pre><code>ozone scm --init
ozone scm --bootstrap
ozone scm --daemon start
</code></pre><p>For Docker/Kubernetes, use <code>ozone scm</code> to start it in the foreground.</p>
<h2 id="scm-ha-security">SCM HA Security</h2>
<p>
<img src="scm-secure-ha.png" alt='SCM Secure HA' class="img-responsive" /></p>
<p>In a secure SCM HA cluster on the SCM where we perform init, we call this SCM as a primordial SCM.
Primordial SCM starts root-CA with self-signed certificates and is used to issue a signed certificate
to itself and other bootstrapped SCM’s. Only primordial SCM can issue signed certificates for other SCM’s.
So, primordial SCM has a special role in the SCM HA cluster, as it is the only one that can issue certificates to SCM’s.</p>
<p>The primordial SCM takes a root-CA role, which signs all SCM instances with a sub-CA certificate.
The sub-CA certificates are used by SCM to sign certificates for OM/Datanodes.</p>
<p>When bootstrapping a SCM, it gets a signed certificate from the primary SCM and starts sub-CA.</p>
<p>Sub-CA on the SCM’s are used to issue signed certificates for OM/DN in the cluster. Only the leader SCM issues a certificate to OM/DN.</p>
<h3 id="how-to-enable-security">How to enable security:</h3>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-XML" data-lang="XML"><span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;config&gt;</span>ozone.security.enable<span style="color:#f92672">&lt;/config&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>true<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
<span style="color:#f92672">&lt;property&gt;</span>
<span style="color:#f92672">&lt;config&gt;</span>hdds.grpc.tls.enabled<span style="color:#f92672">&lt;/config&gt;</span>
<span style="color:#f92672">&lt;value&gt;</span>true<span style="color:#f92672">&lt;/value&gt;</span>
<span style="color:#f92672">&lt;/property&gt;</span>
</code></pre></div><p>Above configs are needed in addition to normal SCM HA configuration.</p>
<h3 id="primordial-scm">Primordial SCM:</h3>
<p>Primordial SCM is determined from the config ozone.scm.primordial.node.id.
The value for this can be node id or hostname of the SCM. If the config is
not defined, the node where init is run is considered as the primordial SCM.</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">bin/ozone scm --init</code></pre></div>
<p>This will set up a public,private key pair and self-signed certificate for root CA
and also generate public, private key pair and CSR to get a signed certificate for sub-CA from root CA.</p>
<h3 id="bootstrap-scm">Bootstrap SCM:</h3>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">bin/ozone scm --bootstrap</code></pre></div>
<p>This will set up a public, private key pair for sub CA and generate CSR to get a
signed certificate for sub-CA from root CA.</p>
<p><strong>Note</strong>: Make sure to run <strong>&ndash;init</strong> only on one of the SCM host if
primordial SCM is not defined. Bring up other SCM&rsquo;s using <strong>&ndash;bootstrap</strong>.</p>
<h3 id="current-scm-ha-security-limitation">Current SCM HA Security limitation:</h3>
<ol>
<li>When primordial SCM is down, new SCM’s cannot be bootstrapped and join the
quorum.</li>
<li>Secure cluster upgrade to ratis-enable secure cluster is not supported.</li>
</ol>
<h2 id="implementation-details">Implementation details</h2>
<p>SCM HA uses Apache Ratis to replicate state between the members of the SCM HA quorum. Each node maintains the block management metadata in local RocksDB.</p>
<p>This replication process is a simpler version of OM HA replication process as it doesn&rsquo;t use any double buffer (as the overall db thourghput of SCM requests are lower)</p>
<p>Datanodes are sending all the reports (Container reports, Pipeline reports&hellip;) to <em>all</em> the Datanodes parallel. Only the leader node can assign/create new containers, and only the leader node sends command back to the Datanodes.</p>
<h2 id="verify-scm-ha-setup">Verify SCM HA setup</h2>
<p>After starting an SCM-HA it can be validated if the SCM nodes are forming one single quorum instead of 3 individual SCM nodes.</p>
<p>First, check if all the SCM nodes store the same ClusterId metadata:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash">cat /data/metadata/scm/current/VERSION
</code></pre></div><p>ClusterId is included in the VERSION file and should be the same in all the SCM nodes:</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-bash" data-lang="bash"><span style="color:#75715e">#Tue Mar 16 10:19:33 UTC 2021</span>
cTime<span style="color:#f92672">=</span><span style="color:#ae81ff">1615889973116</span>
clusterID<span style="color:#f92672">=</span>CID-130fb246-1717-4313-9b62-9ddfe1bcb2e7
nodeType<span style="color:#f92672">=</span>SCM
scmUuid<span style="color:#f92672">=</span>e6877ce5-56cd-4f0b-ad60-4c8ef9000882
layoutVersion<span style="color:#f92672">=</span><span style="color:#ae81ff">0</span>
</code></pre></div><p>You can also create data and double check with <code>ozone debug</code> tool if all the container metadata is replicated.</p>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-shell" data-lang="shell">bin/ozone freon randomkeys --numOfVolumes<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> --numOfBuckets<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span> --numOfKeys<span style="color:#f92672">=</span><span style="color:#ae81ff">10000</span> --keySize<span style="color:#f92672">=</span><span style="color:#ae81ff">524288</span> --replicationType<span style="color:#f92672">=</span>RATIS --numOfThreads<span style="color:#f92672">=</span><span style="color:#ae81ff">8</span> --factor<span style="color:#f92672">=</span>THREE --bufferSize<span style="color:#f92672">=</span><span style="color:#ae81ff">1048576</span>
<span style="color:#75715e"># use debug ldb to check scm.db on all the machines</span>
bin/ozone debug ldb --db<span style="color:#f92672">=</span>/tmp/metadata/scm.db ls
bin/ozone debug ldb --db<span style="color:#f92672">=</span>/tmp/metadata/scm.db scan --column-family<span style="color:#f92672">=</span>containers
</code></pre></div><h2 id="migrating-from-existing-scm">Migrating from existing SCM</h2>
<p>SCM HA can be turned on on any Ozone cluster. First enable Ratis (<code>ozone.scm.ratis.enable</code>) and configure only one node for the Ratis ring (<code>ozone.scm.nodes.serviceId</code> should have one element).</p>
<p>Start the cluster and test if it works well.</p>
<p>If everything is fine, you can extend the cluster configuration with multiple nodes, restart SCM node, and initialize the additional nodes with <code>scm --bootstrap</code> command.</p>
<a class="btn btn-success btn-lg" href="../feature/streaming-write-pipeline.html">Next >></a>
</div>
</div>
</div>
</div>
</div>
<div class="push"></div>
</div>
<footer class="footer">
<div class="container">
<span class="small text-muted">
Version: 1.5.0-SNAPSHOT, Last Modified: February 27, 2024 <a class="hide-child link primary-color" href="https://github.com/apache/ozone/commit/7939faf7d6c904bf1e4ad32baa5d6d0c1de19003">7939faf</a>
</span>
</div>
</footer>
<script src="../js/jquery-3.5.1.min.js"></script>
<script src="../js/ozonedoc.js"></script>
<script src="../js/bootstrap.min.js"></script>
</body>
</html>