blob: 0517cde36f1db4c2cc1d2fea4ad504ed9391bfff [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2021-06-15
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 3.3.1 &#x2013; Hadoop: Encrypted Shuffle</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20210615" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xleft">
<a href="http://www.apache.org/" class="externalLink">Apache</a>
&gt;
<a href="http://hadoop.apache.org/" class="externalLink">Hadoop</a>
&gt;
<a href="../index.html">Apache Hadoop MapReduce Client</a>
&gt;
<a href="index.html">Apache Hadoop 3.3.1</a>
&gt;
Hadoop: Encrypted Shuffle
</div>
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2021-06-15
&nbsp;| Version: 3.3.1
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../hadoop-openstack/index.html">OpenStack Swift</a>
</li>
<li class="none">
<a href="../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Hadoop: Encrypted Shuffle</h1>
<ul>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#Configuration">Configuration</a>
<ul>
<li><a href="#core-site.xml_Properties">core-site.xml Properties</a></li>
<li><a href="#mapred-site.xml_Properties">mapred-site.xml Properties</a></li></ul></li>
<li><a href="#Keystore_and_Truststore_Settings">Keystore and Truststore Settings</a>
<ul>
<li><a href="#ssl-server.xml_.28Shuffle_server.29_Configuration:">ssl-server.xml (Shuffle server) Configuration:</a></li>
<li><a href="#ssl-client.xml_.28Reducer.2FFetcher.29_Configuration:">ssl-client.xml (Reducer/Fetcher) Configuration:</a></li></ul></li>
<li><a href="#Activating_Encrypted_Shuffle">Activating Encrypted Shuffle</a></li>
<li><a href="#Client_Certificates">Client Certificates</a></li>
<li><a href="#Reloading_Truststores">Reloading Truststores</a></li>
<li><a href="#Debugging">Debugging</a></li>
<li><a href="#Encrypted_Intermediate_Data_Spill_files">Encrypted Intermediate Data Spill files</a></li></ul>
<div class="section">
<h2><a name="Introduction"></a>Introduction</h2>
<p>The Encrypted Shuffle capability allows encryption of the MapReduce shuffle using HTTPS and with optional client authentication (also known as bi-directional HTTPS, or HTTPS with client certificates). It comprises:</p>
<ul>
<li>
<p>A Hadoop configuration setting for toggling the shuffle between HTTP and HTTPS.</p>
</li>
<li>
<p>A Hadoop configuration settings for specifying the keystore and truststore properties (location, type, passwords) used by the shuffle service and the reducers tasks fetching shuffle data.</p>
</li>
<li>
<p>A way to re-load truststores across the cluster (when a node is added or removed).</p>
</li>
</ul></div>
<div class="section">
<h2><a name="Configuration"></a>Configuration</h2>
<div class="section">
<h3><a name="core-site.xml_Properties"></a><b>core-site.xml</b> Properties</h3>
<p>To enable encrypted shuffle, set the following properties in core-site.xml of all nodes in the cluster:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> <b>Property</b> </th>
<th align="left"> <b>Default Value</b> </th>
<th align="left"> <b>Explanation</b> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <tt>hadoop.ssl.require.client.cert</tt> </td>
<td align="left"> <tt>false</tt> </td>
<td align="left"> Whether client certificates are required </td></tr>
<tr class="a">
<td align="left"> <tt>hadoop.ssl.hostname.verifier</tt> </td>
<td align="left"> <tt>DEFAULT</tt> </td>
<td align="left"> The hostname verifier to provide for HttpsURLConnections. Valid values are: <b>DEFAULT</b>, <b>STRICT</b>, <b>STRICT_IE6</b>, <b>DEFAULT_AND_LOCALHOST</b> and <b>ALLOW_ALL</b> </td></tr>
<tr class="b">
<td align="left"> <tt>hadoop.ssl.keystores.factory.class</tt> </td>
<td align="left"> <tt>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</tt> </td>
<td align="left"> The KeyStoresFactory implementation to use </td></tr>
<tr class="a">
<td align="left"> <tt>hadoop.ssl.server.conf</tt> </td>
<td align="left"> <tt>ssl-server.xml</tt> </td>
<td align="left"> Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory </td></tr>
<tr class="b">
<td align="left"> <tt>hadoop.ssl.client.conf</tt> </td>
<td align="left"> <tt>ssl-client.xml</tt> </td>
<td align="left"> Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory </td></tr>
<tr class="a">
<td align="left"> <tt>hadoop.ssl.enabled.protocols</tt> </td>
<td align="left"> <tt>TLSv1.2</tt> </td>
<td align="left"> The supported SSL protocols. The parameter will only be used from DatanodeHttpServer. </td></tr>
</tbody>
</table>
<p><b>IMPORTANT:</b> Currently requiring client certificates should be set to false. Refer the <a href="#Client_Certificates">Client Certificates</a> section for details.</p>
<p><b>IMPORTANT:</b> All these properties should be marked as final in the cluster configuration files.</p>
<div class="section">
<h4><a name="Example:"></a>Example:</h4>
<div>
<div>
<pre class="source"> &lt;property&gt;
&lt;name&gt;hadoop.ssl.require.client.cert&lt;/name&gt;
&lt;value&gt;false&lt;/value&gt;
&lt;final&gt;true&lt;/final&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hadoop.ssl.hostname.verifier&lt;/name&gt;
&lt;value&gt;DEFAULT&lt;/value&gt;
&lt;final&gt;true&lt;/final&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hadoop.ssl.keystores.factory.class&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory&lt;/value&gt;
&lt;final&gt;true&lt;/final&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hadoop.ssl.server.conf&lt;/name&gt;
&lt;value&gt;ssl-server.xml&lt;/value&gt;
&lt;final&gt;true&lt;/final&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hadoop.ssl.client.conf&lt;/name&gt;
&lt;value&gt;ssl-client.xml&lt;/value&gt;
&lt;final&gt;true&lt;/final&gt;
&lt;/property&gt;
</pre></div></div>
</div></div>
<div class="section">
<h3><a name="mapred-site.xml_Properties"></a><tt>mapred-site.xml</tt> Properties</h3>
<p>To enable encrypted shuffle, set the following property in mapred-site.xml of all nodes in the cluster:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> <b>Property</b> </th>
<th align="left"> <b>Default Value</b> </th>
<th align="left"> <b>Explanation</b> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <tt>mapreduce.shuffle.ssl.enabled</tt> </td>
<td align="left"> <tt>false</tt> </td>
<td align="left"> Whether encrypted shuffle is enabled </td></tr>
</tbody>
</table>
<p><b>IMPORTANT:</b> This property should be marked as final in the cluster configuration files.</p>
<div class="section">
<h4><a name="Example:"></a>Example:</h4>
<div>
<div>
<pre class="source"> &lt;property&gt;
&lt;name&gt;mapreduce.shuffle.ssl.enabled&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;final&gt;true&lt;/final&gt;
&lt;/property&gt;
</pre></div></div>
<p>The Linux container executor should be set to prevent job tasks from reading the server keystore information and gaining access to the shuffle server certificates.</p>
<p>Refer to Hadoop Kerberos configuration for details on how to do this.</p></div></div></div>
<div class="section">
<h2><a name="Keystore_and_Truststore_Settings"></a>Keystore and Truststore Settings</h2>
<p>Currently <tt>FileBasedKeyStoresFactory</tt> is the only <tt>KeyStoresFactory</tt> implementation. The <tt>FileBasedKeyStoresFactory</tt> implementation uses the following properties, in the <b>ssl-server.xml</b> and <b>ssl-client.xml</b> files, to configure the keystores and truststores.</p>
<div class="section">
<h3><a name="ssl-server.xml_.28Shuffle_server.29_Configuration:"></a><tt>ssl-server.xml</tt> (Shuffle server) Configuration:</h3>
<p>The mapred user should own the <b>ssl-server.xml</b> file and have exclusive read access to it.</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> <b>Property</b> </th>
<th align="left"> <b>Default Value</b> </th>
<th align="left"> <b>Explanation</b> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <tt>ssl.server.keystore.type</tt> </td>
<td align="left"> <tt>jks</tt> </td>
<td align="left"> Keystore file type </td></tr>
<tr class="a">
<td align="left"> <tt>ssl.server.keystore.location</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Keystore file location. The mapred user should own this file and have exclusive read access to it. </td></tr>
<tr class="b">
<td align="left"> <tt>ssl.server.keystore.password</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Keystore file password </td></tr>
<tr class="a">
<td align="left"> <tt>ssl.server.truststore.type</tt> </td>
<td align="left"> <tt>jks</tt> </td>
<td align="left"> Truststore file type </td></tr>
<tr class="b">
<td align="left"> <tt>ssl.server.truststore.location</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Truststore file location. The mapred user should own this file and have exclusive read access to it. </td></tr>
<tr class="a">
<td align="left"> <tt>ssl.server.truststore.password</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Truststore file password </td></tr>
<tr class="b">
<td align="left"> <tt>ssl.server.truststore.reload.interval</tt> </td>
<td align="left"> 10000 </td>
<td align="left"> Truststore reload interval, in milliseconds </td></tr>
</tbody>
</table>
<div class="section">
<h4><a name="Example:"></a>Example:</h4>
<div>
<div>
<pre class="source">&lt;configuration&gt;
&lt;!-- Server Certificate Store --&gt;
&lt;property&gt;
&lt;name&gt;ssl.server.keystore.type&lt;/name&gt;
&lt;value&gt;jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.server.keystore.location&lt;/name&gt;
&lt;value&gt;${user.home}/keystores/server-keystore.jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.server.keystore.password&lt;/name&gt;
&lt;value&gt;serverfoo&lt;/value&gt;
&lt;/property&gt;
&lt;!-- Server Trust Store --&gt;
&lt;property&gt;
&lt;name&gt;ssl.server.truststore.type&lt;/name&gt;
&lt;value&gt;jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.server.truststore.location&lt;/name&gt;
&lt;value&gt;${user.home}/keystores/truststore.jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.server.truststore.password&lt;/name&gt;
&lt;value&gt;clientserverbar&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.server.truststore.reload.interval&lt;/name&gt;
&lt;value&gt;10000&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</pre></div></div>
</div></div>
<div class="section">
<h3><a name="ssl-client.xml_.28Reducer.2FFetcher.29_Configuration:"></a><tt>ssl-client.xml</tt> (Reducer/Fetcher) Configuration:</h3>
<p>The mapred user should own the <b>ssl-client.xml</b> file and it should have default permissions.</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> <b>Property</b> </th>
<th align="left"> <b>Default Value</b> </th>
<th align="left"> <b>Explanation</b> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <tt>ssl.client.keystore.type</tt> </td>
<td align="left"> <tt>jks</tt> </td>
<td align="left"> Keystore file type </td></tr>
<tr class="a">
<td align="left"> <tt>ssl.client.keystore.location</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Keystore file location. The mapred user should own this file and it should have default permissions. </td></tr>
<tr class="b">
<td align="left"> <tt>ssl.client.keystore.password</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Keystore file password </td></tr>
<tr class="a">
<td align="left"> <tt>ssl.client.truststore.type</tt> </td>
<td align="left"> <tt>jks</tt> </td>
<td align="left"> Truststore file type </td></tr>
<tr class="b">
<td align="left"> <tt>ssl.client.truststore.location</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Truststore file location. The mapred user should own this file and it should have default permissions. </td></tr>
<tr class="a">
<td align="left"> <tt>ssl.client.truststore.password</tt> </td>
<td align="left"> NONE </td>
<td align="left"> Truststore file password </td></tr>
<tr class="b">
<td align="left"> <tt>ssl.client.truststore.reload.interval</tt> </td>
<td align="left"> 10000 </td>
<td align="left"> Truststore reload interval, in milliseconds </td></tr>
</tbody>
</table>
<div class="section">
<h4><a name="Example:"></a>Example:</h4>
<div>
<div>
<pre class="source">&lt;configuration&gt;
&lt;!-- Client certificate Store --&gt;
&lt;property&gt;
&lt;name&gt;ssl.client.keystore.type&lt;/name&gt;
&lt;value&gt;jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.client.keystore.location&lt;/name&gt;
&lt;value&gt;${user.home}/keystores/client-keystore.jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.client.keystore.password&lt;/name&gt;
&lt;value&gt;clientfoo&lt;/value&gt;
&lt;/property&gt;
&lt;!-- Client Trust Store --&gt;
&lt;property&gt;
&lt;name&gt;ssl.client.truststore.type&lt;/name&gt;
&lt;value&gt;jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.client.truststore.location&lt;/name&gt;
&lt;value&gt;${user.home}/keystores/truststore.jks&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.client.truststore.password&lt;/name&gt;
&lt;value&gt;clientserverbar&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ssl.client.truststore.reload.interval&lt;/name&gt;
&lt;value&gt;10000&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</pre></div></div>
</div></div></div>
<div class="section">
<h2><a name="Activating_Encrypted_Shuffle"></a>Activating Encrypted Shuffle</h2>
<p>When you have made the above configuration changes, activate Encrypted Shuffle by re-starting all NodeManagers.</p>
<p><b>IMPORTANT:</b> Using encrypted shuffle will incur in a significant performance impact. Users should profile this and potentially reserve 1 or more cores for encrypted shuffle.</p></div>
<div class="section">
<h2><a name="Client_Certificates"></a>Client Certificates</h2>
<p>Using Client Certificates does not fully ensure that the client is a reducer task for the job. Currently, Client Certificates (their private key) keystore files must be readable by all users submitting jobs to the cluster. This means that a rogue job could read such those keystore files and use the client certificates in them to establish a secure connection with a Shuffle server. However, unless the rogue job has a proper JobToken, it won&#x2019;t be able to retrieve shuffle data from the Shuffle server. A job, using its own JobToken, can only retrieve shuffle data that belongs to itself.</p></div>
<div class="section">
<h2><a name="Reloading_Truststores"></a>Reloading Truststores</h2>
<p>By default the truststores will reload their configuration every 10 seconds. If a new truststore file is copied over the old one, it will be re-read, and its certificates will replace the old ones. This mechanism is useful for adding or removing nodes from the cluster, or for adding or removing trusted clients. In these cases, the client or NodeManager certificate is added to (or removed from) all the truststore files in the system, and the new configuration will be picked up without you having to restart the NodeManager daemons.</p></div>
<div class="section">
<h2><a name="Debugging"></a>Debugging</h2>
<p><b>NOTE:</b> Enable debugging only for troubleshooting, and then only for jobs running on small amounts of data. It is very verbose and slows down jobs by several orders of magnitude. (You might need to increase mapred.task.timeout to prevent jobs from failing because tasks run so slowly.)</p>
<p>To enable SSL debugging in the reducers, set <tt>-Djavax.net.debug=all</tt> in the <tt>mapreduce.reduce.child.java.opts</tt> property; for example:</p>
<div>
<div>
<pre class="source"> &lt;property&gt;
&lt;name&gt;mapred.reduce.child.java.opts&lt;/name&gt;
&lt;value&gt;-Xmx-200m -Djavax.net.debug=all&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>You can do this on a per-job basis, or by means of a cluster-wide setting in the <tt>mapred-site.xml</tt> file.</p>
<p>To set this property in NodeManager, set it in the <tt>yarn-env.sh</tt> file:</p>
<div>
<div>
<pre class="source"> YARN_NODEMANAGER_OPTS=&quot;-Djavax.net.debug=all&quot;
</pre></div></div>
</div>
<div class="section">
<h2><a name="Encrypted_Intermediate_Data_Spill_files"></a>Encrypted Intermediate Data Spill files</h2>
<p>This capability allows encryption of the intermediate files generated during the merge and shuffle phases. It can be enabled by setting the <tt>mapreduce.job.encrypted-intermediate-data</tt> job property to <tt>true</tt>.</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> Name </th>
<th align="left"> Type </th>
<th align="left"> Description </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> mapreduce.job.encrypted-intermediate-data </td>
<td align="left"> boolean </td>
<td align="left"> Enable or disable encrypt intermediate mapreduce spill files.Default is false. </td></tr>
<tr class="a">
<td align="left"> mapreduce.job.encrypted-intermediate-data-key-size-bits </td>
<td align="left"> int </td>
<td align="left"> The key length used by keygenerator to encrypt data spilled to disk. </td></tr>
<tr class="b">
<td align="left"> mapreduce.job.encrypted-intermediate-data.buffer.kb </td>
<td align="left"> int </td>
<td align="left"> The buffer size in kb for stream written to disk after encryption. </td></tr>
</tbody>
</table>
<p><b>NOTE:</b> Currently, enabling encrypted intermediate data spills would restrict the number of attempts of the job to 1.</p></div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2021
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>