blob: 305070a1ac6e7d633f33e83f75400dad7b7d9368 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2021-06-15
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop Aliyun OSS support &#x2013; Hadoop-Aliyun module: Integration with Aliyun Web Services</title>
<style type="text/css" media="all">
@import url("../../css/maven-base.css");
@import url("../../css/maven-theme.css");
@import url("../../css/site.css");
</style>
<link rel="stylesheet" href="../../css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20210615" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xleft">
<a href="http://www.apache.org/" class="externalLink">Apache</a>
&gt;
<a href="http://hadoop.apache.org/" class="externalLink">Hadoop</a>
&gt;
<a href="../../index.html">Apache Hadoop Aliyun OSS support</a>
&gt;
Hadoop-Aliyun module: Integration with Aliyun Web Services
</div>
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
&nbsp;| Last Published: 2021-06-15
&nbsp;| Version: 3.3.1
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../../hadoop-openstack/index.html">OpenStack Swift</a>
</li>
<li class="none">
<a href="../../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="../../images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Hadoop-Aliyun module: Integration with Aliyun Web Services</h1>
<ul>
<li><a href="#Overview">Overview</a>
<ul>
<li><a href="#Features">Features</a></li>
<li><a href="#Warning_.231:_Object_Stores_are_not_filesystems.">Warning #1: Object Stores are not filesystems.</a></li>
<li><a href="#Warning_.232:_Directory_last_access_time_is_not_tracked.2C">Warning #2: Directory last access time is not tracked,</a></li>
<li><a href="#Warning_.233:_Your_Aliyun_credentials_are_valuable">Warning #3: Your Aliyun credentials are valuable</a></li>
<li><a href="#Warning_.234:_The_Aliyun_OSS_client_provided_by_Aliyun_E-MapReduce_are_different_from_this_implementation">Warning #4: The Aliyun OSS client provided by Aliyun E-MapReduce are different from this implementation</a></li></ul></li>
<li><a href="#OSS">OSS</a>
<ul>
<li><a href="#Authentication_properties">Authentication properties</a></li>
<li><a href="#Other_properties">Other properties</a></li></ul></li>
<li><a href="#Testing_the_hadoop-aliyun_Module">Testing the hadoop-aliyun Module</a>
<ul>
<li><a href="#core-site.xml">core-site.xml</a></li>
<li><a href="#auth-keys.xml">auth-keys.xml</a></li>
<li><a href="#Run_Hadoop_contract_tests">Run Hadoop contract tests</a></li></ul></li></ul>
<div class="section">
<h2><a name="Overview"></a>Overview</h2>
<p>The <tt>hadoop-aliyun</tt> module provides support for Aliyun integration with <a class="externalLink" href="https://www.aliyun.com/product/oss">Aliyun Object Storage Service (Aliyun OSS)</a>. The generated JAR file, <tt>hadoop-aliyun.jar</tt> also declares a transitive dependency on all external artifacts which are needed for this support &#x2014; enabling downstream applications to easily use this support.</p>
<p>To make it part of Apache Hadoop&#x2019;s default classpath, simply make sure that HADOOP_OPTIONAL_TOOLS in hadoop-env.sh has &#x2018;hadoop-aliyun&#x2019; in the list.</p>
<div class="section">
<h3><a name="Features"></a>Features</h3>
<ul>
<li>Read and write data stored in Aliyun OSS.</li>
<li>Present a hierarchical file system view by implementing the standard Hadoop <a href="../api/org/apache/hadoop/fs/FileSystem.html"><tt>FileSystem</tt></a> interface.</li>
<li>Can act as a source of data in a MapReduce job, or a sink.</li>
</ul></div>
<div class="section">
<h3><a name="Warning_.231:_Object_Stores_are_not_filesystems."></a>Warning #1: Object Stores are not filesystems.</h3>
<p>Aliyun OSS is an example of &#x201c;an object store&#x201d;. In order to achieve scalability and especially high availability, Aliyun OSS has relaxed some of the constraints which classic &#x201c;POSIX&#x201d; filesystems promise.</p>
<p>Specifically</p>
<ol style="list-style-type: decimal">
<li>Atomic operations: <tt>delete()</tt> and <tt>rename()</tt> are implemented by recursive file-by-file operations. They take time at least proportional to the number of files, during which time partial updates may be visible. <tt>delete()</tt> and <tt>rename()</tt> can not guarantee atomicity. If the operations are interrupted, the filesystem is left in an intermediate state.</li>
<li>File owner and group are persisted, but the permissions model is not enforced. Authorization occurs at the level of the entire Aliyun account via <a class="externalLink" href="https://www.aliyun.com/product/ram">Aliyun Resource Access Management (Aliyun RAM)</a>.</li>
<li>Directory last access time is not tracked.</li>
<li>The append operation is not supported.</li>
</ol></div>
<div class="section">
<h3><a name="Warning_.232:_Directory_last_access_time_is_not_tracked.2C"></a>Warning #2: Directory last access time is not tracked,</h3>
<p>Features of Hadoop relying on this can have unexpected behaviour. E.g. the AggregatedLogDeletionService of YARN will not remove the appropriate log files.</p></div>
<div class="section">
<h3><a name="Warning_.233:_Your_Aliyun_credentials_are_valuable"></a>Warning #3: Your Aliyun credentials are valuable</h3>
<p>Your Aliyun credentials not only pay for services, they offer read and write access to the data. Anyone with the account can not only read your datasets &#x2014; they can delete them.</p>
<p>Do not inadvertently share these credentials through means such as 1. Checking in to SCM any configuration files containing the secrets. 2. Logging them to a console, as they invariably end up being seen. 3. Defining filesystem URIs with the credentials in the URL, such as <tt>oss://accessKeyId:accessKeySecret@directory/file</tt>. They will end up in logs and error messages. 4. Including the secrets in bug reports.</p>
<p>If you do any of these: change your credentials immediately!</p></div>
<div class="section">
<h3><a name="Warning_.234:_The_Aliyun_OSS_client_provided_by_Aliyun_E-MapReduce_are_different_from_this_implementation"></a>Warning #4: The Aliyun OSS client provided by Aliyun E-MapReduce are different from this implementation</h3>
<p>Specifically: on Aliyun E-MapReduce, <tt>oss://</tt> is also supported but with a different implementation. If you are using Aliyun E-MapReduce, follow these instructions &#x2014;and be aware that all issues related to Aliyun OSS integration in E-MapReduce can only be addressed by Aliyun themselves: please raise your issues with them.</p></div></div>
<div class="section">
<h2><a name="OSS"></a>OSS</h2>
<div class="section">
<h3><a name="Authentication_properties"></a>Authentication properties</h3>
<div>
<div>
<pre class="source">&lt;property&gt;
&lt;name&gt;fs.oss.accessKeyId&lt;/name&gt;
&lt;description&gt;Aliyun access key ID&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.accessKeySecret&lt;/name&gt;
&lt;description&gt;Aliyun access key secret&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.credentials.provider&lt;/name&gt;
&lt;description&gt;
Class name of a credentials provider that implements
com.aliyun.oss.common.auth.CredentialsProvider. Omit if using access/secret keys
or another authentication mechanism. The specified class must provide an
accessible constructor accepting java.net.URI and
org.apache.hadoop.conf.Configuration, or an accessible default constructor.
&lt;/description&gt;
&lt;/property&gt;
</pre></div></div>
</div>
<div class="section">
<h3><a name="Other_properties"></a>Other properties</h3>
<div>
<div>
<pre class="source">&lt;property&gt;
&lt;name&gt;fs.AbstractFileSystem.oss.impl&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.fs.aliyun.oss.OSS&lt;/value&gt;
&lt;description&gt;The implementation class of the OSS AbstractFileSystem.
If you want to use OSS as YARN&#x2019;s resource storage dir via the
fs.defaultFS configuration property in Hadoop&#x2019;s core-site.xml,
you should add this configuration to Hadoop's core-site.xml
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.endpoint&lt;/name&gt;
&lt;description&gt;Aliyun OSS endpoint to connect to. An up-to-date list is
provided in the Aliyun OSS Documentation.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.impl&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.proxy.host&lt;/name&gt;
&lt;description&gt;Hostname of the (optinal) proxy server for Aliyun OSS connection&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.proxy.port&lt;/name&gt;
&lt;description&gt;Proxy server port&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.proxy.username&lt;/name&gt;
&lt;description&gt;Username for authenticating with proxy server&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.proxy.password&lt;/name&gt;
&lt;description&gt;Password for authenticating with proxy server.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.proxy.domain&lt;/name&gt;
&lt;description&gt;Domain for authenticating with proxy server.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.proxy.workstation&lt;/name&gt;
&lt;description&gt;Workstation for authenticating with proxy server.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.attempts.maximum&lt;/name&gt;
&lt;value&gt;20&lt;/value&gt;
&lt;description&gt;How many times we should retry commands on transient errors.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.connection.establish.timeout&lt;/name&gt;
&lt;value&gt;50000&lt;/value&gt;
&lt;description&gt;Connection setup timeout in milliseconds.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.connection.timeout&lt;/name&gt;
&lt;value&gt;200000&lt;/value&gt;
&lt;description&gt;Socket connection timeout in milliseconds.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.paging.maximum&lt;/name&gt;
&lt;value&gt;1000&lt;/value&gt;
&lt;description&gt;How many keys to request from Aliyun OSS when doing directory listings at a time.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.multipart.upload.size&lt;/name&gt;
&lt;value&gt;10485760&lt;/value&gt;
&lt;description&gt;Size of each of multipart pieces in bytes.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.upload.active.blocks&lt;/name&gt;
&lt;value&gt;4&lt;/value&gt;
&lt;description&gt;Active(Concurrent) upload blocks when uploading a file.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.multipart.download.threads&lt;/name&gt;
&lt;value&gt;10&lt;/value&gt;
&lt;description&gt;The maximum number of threads allowed in the pool for multipart download and upload.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.multipart.download.ahead.part.max.number&lt;/name&gt;
&lt;value&gt;4&lt;/value&gt;
&lt;description&gt;The maximum number of read ahead parts when reading a file.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.max.total.tasks&lt;/name&gt;
&lt;value&gt;128&lt;/value&gt;
&lt;description&gt;The maximum queue number for multipart download and upload.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.max.copy.threads&lt;/name&gt;
&lt;value&gt;25&lt;/value&gt;
&lt;description&gt;The maximum number of threads allowed in the pool for copy operations.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.max.copy.tasks.per.dir&lt;/name&gt;
&lt;value&gt;5&lt;/value&gt;
&lt;description&gt;The maximum number of concurrent tasks allowed when copying a directory.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.multipart.upload.threshold&lt;/name&gt;
&lt;value&gt;20971520&lt;/value&gt;
&lt;description&gt;Minimum size in bytes before we start a multipart uploads or copy.
Notice: This property is deprecated and will be removed in further version.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.multipart.download.size&lt;/name&gt;
&lt;value&gt;102400/value&gt;
&lt;description&gt;Size in bytes in each request from ALiyun OSS.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.buffer.dir&lt;/name&gt;
&lt;description&gt;Comma separated list of directories to buffer OSS data before uploading to Aliyun OSS&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.acl.default&lt;/name&gt;
&lt;value&gt;&lt;/vaule&gt;
&lt;description&gt;Set a canned ACL for bucket. Value may be private, public-read, public-read-write.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.server-side-encryption-algorithm&lt;/name&gt;
&lt;value&gt;&lt;/vaule&gt;
&lt;description&gt;Specify a server-side encryption algorithm for oss: file system.
Unset by default, and the only other currently allowable value is AES256.
&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.connection.maximum&lt;/name&gt;
&lt;value&gt;32&lt;/value&gt;
&lt;description&gt;Number of simultaneous connections to oss.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.connection.secure.enabled&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;description&gt;Connect to oss over ssl or not, true by default.&lt;/description&gt;
&lt;/property&gt;
</pre></div></div>
</div></div>
<div class="section">
<h2><a name="Testing_the_hadoop-aliyun_Module"></a>Testing the hadoop-aliyun Module</h2>
<p>To test <tt>oss://</tt> filesystem client, two files which pass in authentication details to the test runner are needed.</p>
<ol style="list-style-type: decimal">
<li><tt>auth-keys.xml</tt></li>
<li><tt>core-site.xml</tt></li>
</ol>
<p>Those two configuration files must be put into <tt>hadoop-tools/hadoop-aliyun/src/test/resources</tt>.</p>
<div class="section">
<h3><a name="core-site.xml"></a><tt>core-site.xml</tt></h3>
<p>This file pre-exists and sources the configurations created in <tt>auth-keys.xml</tt>.</p>
<p>For most cases, no modification is needed, unless a specific, non-default property needs to be set during the testing.</p></div>
<div class="section">
<h3><a name="auth-keys.xml"></a><tt>auth-keys.xml</tt></h3>
<p>This file triggers the testing of Aliyun OSS module. Without this file, <i>none of the tests in this module will be executed</i></p>
<p>It contains the access key Id/secret and proxy information that are needed to connect to Aliyun OSS, and an OSS bucket URL should be also provided.</p>
<ol style="list-style-type: decimal">
<li><tt>test.fs.oss.name</tt> : the URL of the bucket for Aliyun OSS tests</li>
</ol>
<p>The contents of the bucket will be cleaned during the testing process, so do not use the bucket for any purpose other than testing.</p></div>
<div class="section">
<h3><a name="Run_Hadoop_contract_tests"></a>Run Hadoop contract tests</h3>
<p>Create file <tt>contract-test-options.xml</tt> under <tt>/test/resources</tt>. If a specific file <tt>fs.contract.test.fs.oss</tt> test path is not defined, those tests will be skipped. Credentials are also needed to run any of those tests, they can be copied from <tt>auth-keys.xml</tt> or through direct XInclude inclusion. Here is an example of <tt>contract-test-options.xml</tt>:</p>
<div>
<div>
<pre class="source">&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;
&lt;configuration&gt;
&lt;include xmlns=&quot;http://www.w3.org/2001/XInclude&quot;
href=&quot;auth-keys.xml&quot;/&gt;
&lt;property&gt;
&lt;name&gt;fs.contract.test.fs.oss&lt;/name&gt;
&lt;value&gt;oss://spark-tests&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.impl&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.endpoint&lt;/name&gt;
&lt;value&gt;oss-cn-hangzhou.aliyuncs.com&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.buffer.dir&lt;/name&gt;
&lt;value&gt;/tmp/oss&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.oss.multipart.download.size&lt;/name&gt;
&lt;value&gt;102400&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</pre></div></div></div></div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2021
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>