blob: 52295dcc1d9ebba51861c02206e74e186c8acad2 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2024-05-17
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 3.5.0-SNAPSHOT &#x2013; ResourceManager High Availability</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20240517" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2024-05-17
&nbsp;| Version: 3.5.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>ResourceManager High Availability</h1>
<ul>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#Architecture">Architecture</a>
<ul>
<li><a href="#RM_Failover">RM Failover</a></li>
<li><a href="#Recovering_previous_active-RM.E2.80.99s_state">Recovering previous active-RM&#x2019;s state</a></li></ul></li>
<li><a href="#Deployment">Deployment</a>
<ul>
<li><a href="#Configurations">Configurations</a></li>
<li><a href="#Admin_commands">Admin commands</a></li>
<li><a href="#ResourceManager_Web_UI_services">ResourceManager Web UI services</a></li>
<li><a href="#Web_Services">Web Services</a></li>
<li><a href="#Load_Balancer_Setup">Load Balancer Setup</a></li></ul></li></ul>
<section>
<h2><a name="Introduction"></a>Introduction</h2>
<p>This guide provides an overview of High Availability of YARN&#x2019;s ResourceManager, and details how to configure and use this feature. The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager is the single point of failure in a YARN cluster. The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure.</p></section><section>
<h2><a name="Architecture"></a>Architecture</h2>
<p><img src="images/rm-ha-overview.png" alt="Overview of ResourceManager High Availability" /></p><section>
<h3><a name="RM_Failover"></a>RM Failover</h3>
<p>ResourceManager HA is realized through an Active/Standby architecture - at any point of time, one of the RMs is Active, and one or more RMs are in Standby mode waiting to take over should anything happen to the Active. The trigger to transition-to-active comes from either the admin (through CLI) or through the integrated failover-controller when automatic-failover is enabled.</p><section>
<h4><a name="Manual_transitions_and_failover"></a>Manual transitions and failover</h4>
<p>When automatic failover is not enabled, admins have to manually transition one of the RMs to Active. To failover from one RM to the other, they are expected to first transition the Active-RM to Standby and transition a Standby-RM to Active. All this can be done using the &#x201c;<code>yarn rmadmin</code>&#x201d; CLI.</p></section><section>
<h4><a name="Automatic_failover"></a>Automatic failover</h4>
<p>The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to decide which RM should be the Active. When the Active goes down or becomes unresponsive, another RM is automatically elected to be the Active which then takes over. Note that, there is no need to run a separate ZKFC daemon as is the case for HDFS because ActiveStandbyElector embedded in RMs acts as a failure detector and a leader elector instead of a separate ZKFC deamon.</p></section><section>
<h4><a name="Client.2C_ApplicationMaster_and_NodeManager_on_RM_failover"></a>Client, ApplicationMaster and NodeManager on RM failover</h4>
<p>When there are multiple RMs, the configuration (yarn-site.xml) used by clients and nodes is expected to list all the RMs. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in a round-robin fashion until they hit the Active RM. If the Active goes down, they resume the round-robin polling until they hit the &#x201c;new&#x201d; Active. This default retry logic is implemented as <code>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</code>. You can override the logic by implementing <code>org.apache.hadoop.yarn.client.RMFailoverProxyProvider</code> and setting the value of <code>yarn.client.failover-proxy-provider</code> to the class name. When running in non-ha mode, set the value of <code>yarn.client.failover-no-ha-proxy-provider</code> instead</p></section></section><section>
<h3><a name="Recovering_previous_active-RM.E2.80.99s_state"></a>Recovering previous active-RM&#x2019;s state</h3>
<p>With the <a href="./ResourceManagerRestart.html">ResourceManager Restart</a> enabled, the RM being promoted to an active state loads the RM internal state and continues to operate from where the previous active left off as much as possible depending on the RM restart feature. A new attempt is spawned for each managed application previously submitted to the RM. Applications can checkpoint periodically to avoid losing any work. The state-store must be visible from the both of Active/Standby RMs. Currently, there are two RMStateStore implementations for persistence - FileSystemRMStateStore and ZKRMStateStore. The <code>ZKRMStateStore</code> implicitly allows write access to a single RM at any point in time, and hence is the recommended store to use in an HA cluster. When using the ZKRMStateStore, there is no need for a separate fencing mechanism to address a potential split-brain situation where multiple RMs can potentially assume the Active role. When using the ZKRMStateStore, it is advisable to NOT set the &#x201c;<code>zookeeper.DigestAuthenticationProvider.superDigest</code>&#x201d; property on the Zookeeper cluster to ensure that the zookeeper admin does not have access to YARN application/user credential information.</p></section></section><section>
<h2><a name="Deployment"></a>Deployment</h2><section>
<h3><a name="Configurations"></a>Configurations</h3>
<p>Most of the failover functionality is tunable using various configuration properties. Following is a list of required/important ones. yarn-default.xml carries a full-list of knobs. See <a href="../hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a> for more information including default values. See the document for <a href="./ResourceManagerRestart.html">ResourceManager Restart</a> also for instructions on setting up the state-store.</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> Configuration Properties </th>
<th align="left"> Description </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>hadoop.zk.address</code> </td>
<td align="left"> Address of the ZK-quorum. Used both for the state-store and embedded leader-election. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.resourcemanager.ha.enabled</code> </td>
<td align="left"> Enable RM HA. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.resourcemanager.ha.rm-ids</code> </td>
<td align="left"> List of logical IDs for the RMs. e.g., &#x201c;rm1,rm2&#x201d;. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.resourcemanager.hostname.</code><i>rm-id</i> </td>
<td align="left"> For each <i>rm-id</i>, specify the hostname the RM corresponds to. Alternately, one could set each of the RM&#x2019;s service addresses. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.resourcemanager.address.</code><i>rm-id</i> </td>
<td align="left"> For each <i>rm-id</i>, specify host:port for clients to submit jobs. If set, overrides the hostname set in <code>yarn.resourcemanager.hostname.</code><i>rm-id</i>. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.resourcemanager.scheduler.address.</code><i>rm-id</i> </td>
<td align="left"> For each <i>rm-id</i>, specify scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in <code>yarn.resourcemanager.hostname.</code><i>rm-id</i>. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.resourcemanager.resource-tracker.address.</code><i>rm-id</i> </td>
<td align="left"> For each <i>rm-id</i>, specify host:port for NodeManagers to connect. If set, overrides the hostname set in <code>yarn.resourcemanager.hostname.</code><i>rm-id</i>. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.resourcemanager.admin.address.</code><i>rm-id</i> </td>
<td align="left"> For each <i>rm-id</i>, specify host:port for administrative commands. If set, overrides the hostname set in <code>yarn.resourcemanager.hostname.</code><i>rm-id</i>. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.resourcemanager.webapp.address.</code><i>rm-id</i> </td>
<td align="left"> For each <i>rm-id</i>, specify host:port of the RM web application corresponds to. You do not need this if you set <code>yarn.http.policy</code> to <code>HTTPS_ONLY</code>. If set, overrides the hostname set in <code>yarn.resourcemanager.hostname.</code><i>rm-id</i>. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.resourcemanager.webapp.https.address.</code><i>rm-id</i> </td>
<td align="left"> For each <i>rm-id</i>, specify host:port of the RM https web application corresponds to. You do not need this if you set <code>yarn.http.policy</code> to <code>HTTP_ONLY</code>. If set, overrides the hostname set in <code>yarn.resourcemanager.hostname.</code><i>rm-id</i>. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.resourcemanager.ha.id</code> </td>
<td align="left"> Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.resourcemanager.ha.automatic-failover.enabled</code> </td>
<td align="left"> Enable automatic failover; By default, it is enabled only when HA is enabled. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.resourcemanager.ha.automatic-failover.embedded</code> </td>
<td align="left"> Use embedded leader-elector to pick the Active RM, when automatic failover is enabled. By default, it is enabled only when HA is enabled. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.resourcemanager.cluster-id</code> </td>
<td align="left"> Identifies the cluster. Used by the elector to ensure an RM doesn&#x2019;t take over as Active for another cluster. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.client.failover-proxy-provider</code> </td>
<td align="left"> The class to be used by Clients, AMs and NMs to failover to the Active RM. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.client.failover-no-ha-proxy-provider</code> </td>
<td align="left"> The class to be used by Clients, AMs and NMs to failover to the Active RM, when not running in HA mode </td></tr>
<tr class="b">
<td align="left"> <code>yarn.client.failover-max-attempts</code> </td>
<td align="left"> The max number of times FailoverProxyProvider should attempt failover. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.client.failover-sleep-base-ms</code> </td>
<td align="left"> The sleep base (in milliseconds) to be used for calculating the exponential delay between failovers. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.client.failover-sleep-max-ms</code> </td>
<td align="left"> The maximum sleep time (in milliseconds) between failovers. </td></tr>
<tr class="a">
<td align="left"> <code>yarn.client.failover-retries</code> </td>
<td align="left"> The number of retries per attempt to connect to a ResourceManager. </td></tr>
<tr class="b">
<td align="left"> <code>yarn.client.failover-retries-on-socket-timeouts</code> </td>
<td align="left"> The number of retries per attempt to connect to a ResourceManager on socket timeouts. </td></tr>
</tbody>
</table><section>
<h4><a name="Sample_configurations"></a>Sample configurations</h4>
<p>Here is the sample of minimal setup for RM failover.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.ha.enabled&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.cluster-id&lt;/name&gt;
&lt;value&gt;cluster1&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.ha.rm-ids&lt;/name&gt;
&lt;value&gt;rm1,rm2&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.hostname.rm1&lt;/name&gt;
&lt;value&gt;master1&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.hostname.rm2&lt;/name&gt;
&lt;value&gt;master2&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.webapp.address.rm1&lt;/name&gt;
&lt;value&gt;master1:8088&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;yarn.resourcemanager.webapp.address.rm2&lt;/name&gt;
&lt;value&gt;master2:8088&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hadoop.zk.address&lt;/name&gt;
&lt;value&gt;zk1:2181,zk2:2181,zk3:2181&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section></section><section>
<h3><a name="Admin_commands"></a>Admin commands</h3>
<p><code>yarn rmadmin</code> has a few HA-specific command options to check the health/state of an RM, and transition to Active/Standby. Commands for HA take service id of RM set by <code>yarn.resourcemanager.ha.rm-ids</code> as argument.</p>
<div class="source">
<div class="source">
<pre> $ yarn rmadmin -getServiceState rm1
active
$ yarn rmadmin -getServiceState rm2
standby
</pre></div></div>
<p>If automatic failover is enabled, you can not use manual transition command. Though you can override this by &#x2013;forcemanual flag, you need caution.</p>
<div class="source">
<div class="source">
<pre> $ yarn rmadmin -transitionToStandby rm1
Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fd
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the forcemanual flag.
</pre></div></div>
<p>See <a href="./YarnCommands.html">YarnCommands</a> for more details.</p></section><section>
<h3><a name="ResourceManager_Web_UI_services"></a>ResourceManager Web UI services</h3>
<p>Assuming a standby RM is up and running, the Standby automatically redirects all web requests to the Active, except for the &#x201c;About&#x201d; page.</p></section><section>
<h3><a name="Web_Services"></a>Web Services</h3>
<p>Assuming a standby RM is up and running, RM web-services described at <a href="./ResourceManagerRest.html">ResourceManager REST APIs</a> when invoked on a standby RM are automatically redirected to the Active RM.</p></section><section>
<h3><a name="Load_Balancer_Setup"></a>Load Balancer Setup</h3>
<p>If you are running a set of ResourceManagers behind a Load Balancer (e.g. <a class="externalLink" href="https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview">Azure</a> or <a class="externalLink" href="https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html">AWS</a> ) and would like the Load Balancer to point to the active RM, you can use the /isActive HTTP endpoint as a health probe. <a class="externalLink" href="http://RM_HOSTNAME/isActive">http://RM_HOSTNAME/isActive</a> will return a 200 status code response if the RM is in Active HA State, 405 otherwise.</p></section></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2024
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>