blob: 8f4367d68f8128eb5980a7ccf2079aca94215148 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2021-06-15
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 3.3.1 &#x2013; Apache Hadoop Downstream Developers Guide</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20210615" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xleft">
<a href="http://www.apache.org/" class="externalLink">Apache</a>
&gt;
<a href="http://hadoop.apache.org/" class="externalLink">Hadoop</a>
&gt;
<a href="../index.html">Apache Hadoop Project Dist POM</a>
&gt;
<a href="index.html">Apache Hadoop 3.3.1</a>
&gt;
Apache Hadoop Downstream Developers Guide
</div>
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2021-06-15
&nbsp;| Version: 3.3.1
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../hadoop-openstack/index.html">OpenStack Swift</a>
</li>
<li class="none">
<a href="../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Apache Hadoop Downstream Developer&#x2019;s Guide</h1>
<ul>
<li><a href="#Purpose">Purpose</a>
<ul>
<li><a href="#Target_Audience">Target Audience</a></li></ul></li>
<li><a href="#Hadoop_Releases">Hadoop Releases</a></li>
<li><a href="#Consuming_Hadoop_APIs">Consuming Hadoop APIs</a>
<ul>
<li><a href="#Privacy">Privacy</a></li>
<li><a href="#Stability">Stability</a>
<ul>
<li><a href="#Releases_and_Stability">Releases and Stability</a></li></ul></li>
<li><a href="#Deprecation">Deprecation</a></li>
<li><a href="#Semantic_Compatibility">Semantic Compatibility</a></li>
<li><a href="#Compatibility_Issues">Compatibility Issues</a></li></ul></li>
<li><a href="#Using_the_FileSystem_API">Using the FileSystem API</a></li>
<li><a href="#Consuming_Hadoop_REST_APIs">Consuming Hadoop REST APIs</a></li>
<li><a href="#Consuming_Hadoop_Output">Consuming Hadoop Output</a></li>
<li><a href="#Consuming_Hadoop_Data">Consuming Hadoop Data</a></li>
<li><a href="#Automating_Operations_with_the_Hadoop_CLI">Automating Operations with the Hadoop CLI</a></li>
<li><a href="#Consuming_the_Hadoop_Web_UI">Consuming the Hadoop Web UI</a></li>
<li><a href="#Working_with_Hadoop_configurations">Working with Hadoop configurations</a>
<ul>
<li><a href="#XML_Configuration_Files">XML Configuration Files</a></li>
<li><a href="#Logging_Configuration_Files">Logging Configuration Files</a></li>
<li><a href="#Other_Configuration_Files">Other Configuration Files</a></li></ul></li>
<li><a href="#Using_and_Consuming_Hadoop_Artifacts">Using and Consuming Hadoop Artifacts</a>
<ul>
<li><a href="#Source_and_Configuration_Files">Source and Configuration Files</a></li>
<li><a href="#Build_Artifacts">Build Artifacts</a></li>
<li><a href="#Environment_Variables">Environment Variables</a></li>
<li><a href="#Library_Dependencies">Library Dependencies</a></li>
<li><a href="#Hardware_and_OS_Dependencies">Hardware and OS Dependencies</a></li></ul></li>
<li><a href="#Questions">Questions</a></li></ul>
<div class="section">
<h2><a name="Purpose"></a>Purpose</h2>
<p>The point of this document is to provide downstream developers with a clear reference for what to expect when building applications against the Hadoop source base. This document is primarily a distillation of the <a href="./Compatibility.html">Hadoop Compatibility Guidelines</a> and hence focuses on what the compatibility guarantees are for the various Hadoop interfaces across releases.</p>
<div class="section">
<h3><a name="Target_Audience"></a>Target Audience</h3>
<p>The target audience for this document is any developer working on a project or application that builds or depends on Apache Hadoop, whether the dependency is on the source code itself, a build artifact, or interacting with a running system.</p></div></div>
<div class="section">
<h2><a name="Hadoop_Releases"></a>Hadoop Releases</h2>
<p>The Hadoop development community periodically produces new Hadoop releases to introduce new functionality and fix existing issues. Realeses fall into three categories:</p>
<ul>
<li>Major: a major release will typically include significant new functionality and generally represents the largest upgrade compatibility risk. A major release increments the first number of the release version, e.g. going from 2.8.2 to 3.0.0.</li>
<li>Minor: a minor release will typically include some new functionality as well as fixes for some notable issues. A minor release should not pose much upgrade risk in most cases. A minor release increments the middle number of release version, e.g. going from 2.8.2 to 2.9.0.</li>
<li>Maintenance: a maintenance release should not include any new functionality. The purpose of a maintenance release is to resolve a set of issues that are deemed by the developer community to be significant enough to be worth pushing a new release to address them. Maintenance releases should pose very little upgrade risk. A maintenance release increments the final number in the release version, e.g. going from 2.8.2 to 2.8.3.</li>
</ul></div>
<div class="section">
<h2><a name="Consuming_Hadoop_APIs"></a>Consuming Hadoop APIs</h2>
<p>When writing software that calls methods or uses classes that belong to Apache Hadoop, developers should adhere to the following guidelines. Failure to adhere to the guidelines may result in problems transitioning from one Hadoop release to another.</p>
<div class="section">
<h3><a name="Privacy"></a>Privacy</h3>
<p>Packages, classes, and methods may be annotated with an audience annotation. The three privacy levels are: <a href="./InterfaceClassification.html#Public">Public</a>, <a href="./InterfaceClassification.html#Limited-Private">Limited-Private</a>, and <a href="./InterfaceClassification.html#Private">Private</a>. Downstream developers should only use packages, classes, methods, and fields that are marked as <a href="./InterfaceClassification.html#Public">Public</a>. Packages, classes, and methods that are not marked as <a href="./InterfaceClassification.html#Public">Public</a> are considered internal to Hadoop and are intended only for consumption by other components of Hadoop.</p>
<p>If an element has an annotation that conflicts with it&#x2019;s containing element&#x2019;s annotation, then the most restrictive annotation takes precedence. For example, If a <a href="./InterfaceClassification.html#Private">Private</a> method is contained in a <a href="./InterfaceClassification.html#Public">Public</a> class, then the method should be treated as <a href="./InterfaceClassification.html#Private">Private</a>. If a <a href="./InterfaceClassification.html#Public">Public</a> method is contained in a <a href="./InterfaceClassification.html#Private">Private</a> class, the method should be treated as <a href="./InterfaceClassification.html#Private">Private</a>.</p>
<p>If a method has no privacy annotation, then it inherits its privacy from its class. If a class has no privacy, it inherits its privacy from its package. If a package has no privacy, it should be assumed to be <a href="./InterfaceClassification.html#Private">Private</a>.</p></div>
<div class="section">
<h3><a name="Stability"></a>Stability</h3>
<p>Packages, classes, and methods may be annotated with a stability annotation. There are three classes of stability: <a href="./InterfaceClassification.html#Stable">Stable</a>, <a href="./InterfaceClassification.html#Evolving">Evolving</a>, and <a href="./InterfaceClassification.html#Unstable">Unstable</a>. The stability annotations determine when <a href="./InterfaceClassification.html#Change-Compatibility">incompatible changes</a> are allowed to be made. <a href="./InterfaceClassification.html#Stable">Stable</a> means that no incompatible changes are allowed between major releases. <a href="./InterfaceClassification.html#Evolving">Evolving</a> means no incompatible changes are allowed between minor releases. <a href="./InterfaceClassification.html#Unstable">Unstable</a> means that incompatible changes are allowed at any time. As a downstream developer, it is best to avoid <a href="./InterfaceClassification.html#Unstable">Unstable</a> APIs and where possible to prefer <a href="./InterfaceClassification.html#Stable">Stable</a> ones.</p>
<p>If a method has no stability annotation, then it inherits its stability from its class. If a class has no stability, it inherits its stability from its package. If a package has no stability, it should be assumed to be <a href="./InterfaceClassification.html#Unstable">Unstable</a>.</p>
<div class="section">
<h4><a name="Releases_and_Stability"></a>Releases and Stability</h4>
<p>Per the above rules on API stability, new releases are allowed to change APIs as follows:</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> Release Type </th>
<th align="left"> Stable API Changes </th>
<th align="left"> Evolving API Changes </th>
<th align="left"> Unstable API Changes </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> Major </td>
<td align="left"> Allowed </td>
<td align="left"> Allowed </td>
<td align="left"> Allowed </td></tr>
<tr class="a">
<td align="left"> Minor </td>
<td align="left"> Not Allowed </td>
<td align="left"> Allowed </td>
<td align="left"> Allowed </td></tr>
<tr class="b">
<td align="left"> Maintenance </td>
<td align="left"> Not Allowed </td>
<td align="left"> Not Allowed </td>
<td align="left"> Allowed </td></tr>
</tbody>
</table>
<p>Note that a major release is <i>allowed</i> to break compatibility of any API, even though the Hadoop developer community strives to maintain compatibility as much as possible, even across major releases. Note also that an <a href="./InterfaceClassification.html#Unstable">Unstable</a> API may change at any time without notice.</p></div></div>
<div class="section">
<h3><a name="Deprecation"></a>Deprecation</h3>
<p>Classes or methods that are annotated as @Deprecated are no longer safe to use. The deprecated element should continue to function, but may and likely will be removed in a subsequent release. The stability annotation will determine the earliest release when the deprecated element can be removed. A <a href="./InterfaceClassification.html#Stable">Stable</a> element cannot be removed until the next major release. An <a href="./InterfaceClassification.html#Evolving">Evolving</a> element cannot be removed until the next minor release. An <a href="./InterfaceClassification.html#Unstable">Unstable</a> element may be removed at any time and will typically not be marked as deprecated before it is removed. <a href="./InterfaceClassification.html#Stable">Stable</a> and <a href="./InterfaceClassification.html#Evolving">Evolving</a> elements must be marked as deprecated for a full major or minor release (respectively) before they can be removed. For example, if a <a href="./InterfaceClassification.html#Stable">Stable</a> is marked as deprecated in Hadoop 3.1, it cannot be removed until Hadoop 5.0.</p></div>
<div class="section">
<h3><a name="Semantic_Compatibility"></a>Semantic Compatibility</h3>
<p>The Apache Hadoop developer community strives to ensure that the behavior of APIs remains consistent across releases, though changes for correctness may result in changes in behavior. The API JavaDocs are considered the primary authority for the expected behavior of an API. In cases where the JavaDocs are insufficient or missing, the unit tests are considered the fallback authority for expected behavior. Where unit tests are not present, the intended behavior should be inferred from the naming. As much as possible downstream developers should avoid looking at the source code for the API itself to determine expected behavior as that approach can create dependencies on implementation details that are not expressly held as expected behavior by the Hadoop development community.</p>
<p>In cases where the JavaDocs are insufficient to infer expected behavior, downstream developers are strongly encouraged to file a Hadoop JIRA to request the JavaDocs be added or improved.</p>
<p>Be aware that fixes done for correctness reasons may cause changes to the expected behavior of an API, though such changes are expected to be accompanied by documentation that clarifies the new behavior.</p>
<p>The Apache Hadoop developer community tries to maintain binary compatibility for end user applications across releases. Ideally no updates to applications will be required when upgrading to a new Hadoop release, assuming the application does not use <a href="./InterfaceClassification.html#Private">Private</a>, <a href="./InterfaceClassification.html#Limited-Private">Limited-Private</a>, or <a href="./InterfaceClassification.html#Unstable">Unstable</a> APIs. MapReduce applications in particular are guaranteed binary compatibility across releases.</p></div>
<div class="section">
<h3><a name="Compatibility_Issues"></a>Compatibility Issues</h3>
<p>The <a href="./Compatibility.html">Hadoop Compatibility Specification</a> states the standards that the Hadoop development community is expected to uphold, but for various reasons, the source code may not live up to the ideals of the <a href="./Compatibility.html">compatibility specification</a>.</p>
<p>Two common issues that a downstream developer will encounter are:</p>
<ol style="list-style-type: decimal">
<li>APIs that are needed for application development aren&#x2019;t <a href="./InterfaceClassification.html#Public">Public</a></li>
<li>A <a href="./InterfaceClassification.html#Public">Public</a> API on which a downstream application depends is changed unexpectedly and incompatibly.</li>
</ol>
<p>In both of these cases, downstream developers are strongly encouraged to raise the issues with the Hadoop developer community either by sending an email to the appropriate <a class="externalLink" href="https://hadoop.apache.org/mailing_lists.html">developer mailing list</a> or <a class="externalLink" href="https://hadoop.apache.org/issue_tracking.html">filing a JIRA</a> or both. The developer community appreciates the feedback.</p>
<p>Downstream developers are encouraged to reach out to the Hadoop development community in any case when they encounter an issue while developing an application against Hadoop. Odds are good that if it&#x2019;s an issue for one developer, it&#x2019;s an issue that numerous developers have or will encounter.</p></div></div>
<div class="section">
<h2><a name="Using_the_FileSystem_API"></a>Using the FileSystem API</h2>
<p>In the specific case of working with streams in Hadoop, e.g. <tt>FSDataOutputStream</tt>, an application can programmatically query for the capabilities of the stream using the methods of the <a class="externalLink" href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/StreamCapabilities.html">StreamCapabilities</a> class. Dynamically adjusting to stream capabilities can make an applcation more robust in the face of changing implementations and environments.</p></div>
<div class="section">
<h2><a name="Consuming_Hadoop_REST_APIs"></a>Consuming Hadoop REST APIs</h2>
<p>The Hadoop REST APIs are a primary interface for a variety of downstream and internal applications and services. To support REST clients, the Hadoop REST APIs are versioned and will not change incompatibly within a version. Both the endpoint itself along with the list of supported parameters and the output from the endpoint are prohibited from changing incompatibly within a REST endpoint version. Note, however, that introducing new fields and other additive changes are considered compatible changes, so any consumer of the REST API should be flexible enough to ignore unknown fields.</p>
<p>The REST API version is a single number and has no relationship with the Hadoop version number. The version number is encoded in the endpoint URL prefixed with a &#x2018;v&#x2019;, for example &#x2018;v1&#x2019;. A new REST endpoint version may only be introduced with a minor or major release. A REST endpoint version may only be removed after being labeled as deprecated for a full major release.</p></div>
<div class="section">
<h2><a name="Consuming_Hadoop_Output"></a>Consuming Hadoop Output</h2>
<p>Hadoop produces a variety of outputs that could conceivably be consumed by application clients or downstream libraries. When consuming output from Hadoop, please consider the following:</p>
<ul>
<li>Hadoop log output is not expected to change with a maintenance release unless it resolves a correctness issue. While log output can be consumed by software directly, it is intended primarily for a human reader.</li>
<li>Hadoop produces audit logs for a variety of operations. The audit logs are intended to be machine readable, though the addition of new records and fields are considered to be compatible changes. Any consumer of the audit logs should allow for unexpected records and fields. The audit log format must not change incompatibly between major releases.</li>
<li>Metrics data produced by Hadoop is mostly intended for automated consumption. The metrics format may not change in an incompatible way between major releases, but new records and fields can be compatibly added at any time. Consumers of the metrics data should allow for unknown records and fields.</li>
</ul></div>
<div class="section">
<h2><a name="Consuming_Hadoop_Data"></a>Consuming Hadoop Data</h2>
<p>Binary file formats used by Hadoop to store data, such as sequence files, HAR files, etc, are guaranteed to remain compatible between minor releases. In addition, in cases where changes are made between major releases, both backward and forward compatibility must be maintained. Note that only the sequence file format is guaranteed not to change incompatibly, not the serialized classes that are contained therein.</p>
<p>In addition to the data produced by operations, Hadoop maintains its state information in a variety of data stores in various formats, such as the HDFS metadata store, the YARN resource manager state store, and the YARN federation state store. All Hadoop internal data stores are considered internal and <a href="./InterfaceClassification.html#Private">Private</a> to Hadoop. Downstream developers should not attempt to consume data from the Hadoop state store as the data and/or data format may change unpredictably.</p></div>
<div class="section">
<h2><a name="Automating_Operations_with_the_Hadoop_CLI"></a>Automating Operations with the Hadoop CLI</h2>
<p>The set of tools that make up the Hadoop command-line interface are intended both for consumption by end users and by downstream developers who are creating tools that execute the CLI tools and parse the output. For this reason the Hadoop CLI tools are treated like an interface and held stable between major releases. Between major releases, no CLI tool options will be removed or change semantically. The output from CLI tools will likewise remain the same within a major version number. Note that any change to CLI tool output is considered an incompatible change, so between major version, the CLI output will not change. Note that the CLI tool output is distinct from the log output produced by the CLI tools. Log output is not intended for automated consumption and may change at any time.</p></div>
<div class="section">
<h2><a name="Consuming_the_Hadoop_Web_UI"></a>Consuming the Hadoop Web UI</h2>
<p>The web UIs that are exposed by Hadoop are for human consumption only. Scraping the UIs for data is not a supported use case. No effort is made to ensure any kind of compatibility between the data displayed in any of the web UIs across releases.</p></div>
<div class="section">
<h2><a name="Working_with_Hadoop_configurations"></a>Working with Hadoop configurations</h2>
<p>Hadoop uses two primary forms of configuration files: XML configuration files and logging configuration files.</p>
<div class="section">
<h3><a name="XML_Configuration_Files"></a>XML Configuration Files</h3>
<p>The XML configuration files contain a set of properties as name-value pairs. The names and meanings of the properties are defined by Hadoop and are guaranteed to be stable across minor releases. A property can only be removed in a major release and only if it has been marked as deprecated for at least a full major release. Most properties have a default value that will be used if the property is not explicitly set in the XML configuration files. The default property values will not be changed during a maintenance release. For details about the properties supported by the various Hadoop components, see the component documentation.</p>
<p>Downstream developers and users can add their own properties into the XML configuration files for use by their tools and applications. While Hadoop makes no formal restrictions about defining new properties, a new property that conflicts with a property defined by Hadoop can lead to unexpected and undesirable results. Users are encouraged to avoid using custom configuration property names that conflict with the namespace of Hadoop-defined properties and thus should avoid using any prefixes used by Hadoop, e.g. hadoop, io, ipc, fs, net, file, ftp, kfs, ha, file, dfs, mapred, mapreduce, and yarn.</p></div>
<div class="section">
<h3><a name="Logging_Configuration_Files"></a>Logging Configuration Files</h3>
<p>The log output produced by Hadoop daemons and CLIs is governed by a set of configuration files. These files control the minimum level of log message that will be output by the various components of Hadoop, as well as where and how those messages are stored. Between minor releases no changes will be made to the log configuration that reduce, eliminate, or redirect the log messages.</p></div>
<div class="section">
<h3><a name="Other_Configuration_Files"></a>Other Configuration Files</h3>
<p>Hadoop makes use of a number of other types of configuration files in a variety of formats, such as the JSON resource profiles configuration or the XML fair scheduler configuration. No incompatible changes will be introduced to the configuration file formats within a minor release. Even between minor releases incompatible configuration file format changes will be avoided if possible.</p></div></div>
<div class="section">
<h2><a name="Using_and_Consuming_Hadoop_Artifacts"></a>Using and Consuming Hadoop Artifacts</h2>
<div class="section">
<h3><a name="Source_and_Configuration_Files"></a>Source and Configuration Files</h3>
<p>As a downstream developer or consumer of Hadoop, it&#x2019;s possible to access all elements of the Hadoop platform, including source code, configuration files, build artifacts, etc. While the open nature of the platform allows it, developers should not create dependencies on these internal details of Hadoop as they may change at any time. The Hadoop development community will attempt, however, to keep the existing structure stable within a major version.</p>
<p>The location and general structure of the Hadoop configuration files, job history information (as consumed by the job history server), and logs files generated by Hadoop will be maintained across maintenance releases.</p></div>
<div class="section">
<h3><a name="Build_Artifacts"></a>Build Artifacts</h3>
<p>The build artifacts produced by the Hadoop build process, e.g. JAR files, are subject to change at any time and should not be treated as reliable, except for the client artifacts. Client artifacts and their contents will remain compatible within a major release. It is the goal of the Hadoop development community to allow application code to continue to function unchanged across minor releases and, whenever possible, across major releases. The current list of client artifacts is as follows:</p>
<ul>
<li>hadoop-client</li>
<li>hadoop-client-api</li>
<li>hadoop-client-minicluster</li>
<li>hadoop-client-runtime</li>
<li>hadoop-hdfs-client</li>
<li>hadoop-hdfs-native-client</li>
<li>hadoop-mapreduce-client-app</li>
<li>hadoop-mapreduce-client-common</li>
<li>hadoop-mapreduce-client-core</li>
<li>hadoop-mapreduce-client-jobclient</li>
<li>hadoop-mapreduce-client-nativetask</li>
<li>hadoop-yarn-client</li>
</ul></div>
<div class="section">
<h3><a name="Environment_Variables"></a>Environment Variables</h3>
<p>Some Hadoop components receive information through environment variables. For example, the <tt>HADOOP_OPTS</tt> environment variable is interpreted by most Hadoop processes as a string of additional JVM arguments to be used when starting a new JVM. Between minor releases the way Hadoop interprets environment variables will not change in an incompatible way. In other words, the same value placed into the same variable should produce the same result for all Hadoop releases within the same major version.</p></div>
<div class="section">
<h3><a name="Library_Dependencies"></a>Library Dependencies</h3>
<p>Hadoop relies on a large number of third-party libraries for its operation. As much as possible the Hadoop developer community works to hide these dependencies from downstream developers. Some common libraries, such as Guava, could cause significant compatibility issues between Hadoop and downstream applications if those dependencies were exposed downstream. Nonetheless Hadoop does expose some of its dependencies, especially prior to Hadoop 3. No new dependency will be exposed by Hadoop via the client artifacts between major releases.</p>
<p>A common downstream anti-pattern is to use the output of <tt>hadoop classpath</tt> to set the downstream application&#x2019;s classpath or add all third-party JARs included with Hadoop to the downstream application&#x2019;s classpath. This practice creates a tight coupling between the downstream application and Hadoop&#x2019;s third-party dependencies, which leads to a fragile application that is hard to maintain as Hadoop&#x2019;s dependencies change. This practice is strongly discouraged.</p>
<p>Hadoop depends on the Java virtual machine for its operation, which can impact downstream applications. To minimize disruption, the minimum supported version of the JVM will not change between major releases of Hadoop. In the event that the current minimum supported JVM version becomes unsupported between major releases, the minimum supported JVM version may be changed in a minor release.</p>
<p>Hadoop also includes several native components, including compression, the container executor binary, and various native integrations. These native components introduce a set of native dependencies for Hadoop. The set of native dependencies can change in a minor release, but the Hadoop developer community will try to limit any dependency version changes to minor version changes as much as possible.</p></div>
<div class="section">
<h3><a name="Hardware_and_OS_Dependencies"></a>Hardware and OS Dependencies</h3>
<p>Hadoop is currently supported by the Hadoop developer community on Linux and Windows running on x86 and AMD processors. These OSes and processors are likely to remain supported for the foreseeable future. In the event that support plans change, the OS or processor to be dropped will be documented as deprecated for at least a full minor release, but ideally a full major release, before actually being dropped. Hadoop may function on other OSes and processor architectures, but the community may not be able to provide assistance in the event of issues.</p>
<p>There are no guarantees on how the minimum resources required by Hadoop daemons will change between releases, even maintenance releases. Nonetheless, the Hadoop developer community will try to avoid increasing the requirements within a minor release.</p>
<p>Any file systems supported Hadoop, such as through the FileSystem API, will in most cases continue to be supported throughout a major release. The only case where support for a file system can be dropped within a major version is if a clean migration path to an alternate client implementation is provided.</p></div></div>
<div class="section">
<h2><a name="Questions"></a>Questions</h2>
<p>For question about developing applications and projects against Apache Hadoop, please contact the developer mailing list for the relevant component(s):</p>
<ul>
<li><a class="externalLink" href="mailto:common-dev@hadoop.apache.org">common-dev</a></li>
<li><a class="externalLink" href="mailto:hdfs-dev@hadoop.apache.org">hdfs-dev</a></li>
<li><a class="externalLink" href="mailto:mapreduce-dev@hadoop.apache.org">mapreduce-dev</a></li>
<li><a class="externalLink" href="mailto:yarn-dev@hadoop.apache.org">yarn-dev</a></li>
</ul></div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2021
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>