blob: d4749269f16bbe7b765cc95787a96303d960c768 [file] [log] [blame]
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
<document>
<header>
<title> HFTP Guide</title>
</header>
<body>
<section>
<title> Introduction </title>
<p> HFTP is a Hadoop filesystem implementation that lets you read data from a remote Hadoop HDFS cluster.
The reads are done via HTTP, and data is sourced from DataNodes.
HFTP is a read-only filesystem, and will throw exceptions if you try to use it to write data or modify
the filesystem state.</p>
<p>HFTP is primarily useful if you have multiple HDFS clusters with different versions and you need to move data from one to another. HFTP is wire-compatible even between different versions of HDFS. For example, you can do things like:
<code>hadoop distcp -i hftp://sourceFS:50070/src hdfs://destFS:50070/dest</code>. Note that HFTP is read-only so the destination must be an HDFS filesystem. (Also, in this example, the <code>distcp</code> should be run using the configuraton of the new filesystem.)</p>
<p>An extension, HSFTP, uses HTTPS by default. This means that data will be encrypted in transit.</p>
</section>
<section>
<title>Implementation</title>
<p>The code for HFTP lives in the Java class <code>org.apache.hadoop.hdfs.HftpFileSystem</code>. Likewise,
HSFTP is implemented in <code>org.apache.hadoop.hdfs.HsftpFileSystem</code>.
</p>
</section>
<section>
<title> Configuration Options </title>
<table>
<tr>
<th>Name</th>
<th>Description</th>
</tr>
<tr>
<td>dfs.hftp.https.port</td>
<td>the HTTPS port on the remote cluster. If not set, HFTP will fall back on
<code>dfs.https.port</code>.</td>
</tr>
<tr>
<td>hdfs.service.host_<strong>ip:port</strong></td>
<td>Specifies the service name (for the security subsystem) associated with the HFTP filesystem
running at <strong>ip:port.</strong></td>
</tr>
</table>
</section>
</body>
</document>