blob: 39985357f9aa44b12f6075c32f78c3316f1b2fa3 [file] [log] [blame]
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
HFTP Guide
---
---
${maven.build.timestamp}
HFTP Guide
\[ {{{./index.html}Go Back}} \]
%{toc|section=1|fromDepth=0}
* Introduction
HFTP is a Hadoop filesystem implementation that lets you read data from
a remote Hadoop HDFS cluster. The reads are done via HTTP, and data is
sourced from DataNodes. HFTP is a read-only filesystem, and will throw
exceptions if you try to use it to write data or modify the filesystem
state.
HFTP is primarily useful if you have multiple HDFS clusters with
different versions and you need to move data from one to another. HFTP
is wire-compatible even between different versions of HDFS. For
example, you can do things like: <<<hadoop distcp -i hftp://sourceFS:50070/src hdfs://destFS:50070/dest>>>.
Note that HFTP is read-only so the destination must be an HDFS filesystem.
(Also, in this example, the distcp should be run using the configuraton of
the new filesystem.)
An extension, HSFTP, uses HTTPS by default. This means that data will
be encrypted in transit.
* Implementation
The code for HFTP lives in the Java class
<<<org.apache.hadoop.hdfs.HftpFileSystem>>>. Likewise, HSFTP is implemented
in <<<org.apache.hadoop.hdfs.HsftpFileSystem>>>.
* Configuration Options
*-----------------------:-----------------------------------+
| <<Name>> | <<Description>> |
*-----------------------:-----------------------------------+
| <<<dfs.hftp.https.port>>> | the HTTPS port on the remote cluster. If not set,
| | HFTP will fall back on <<<dfs.https.port>>>.
*-----------------------:-----------------------------------+
| <<<hdfs.service.host_ip:port>>> | Specifies the service name (for the security
| | subsystem) associated with the HFTP filesystem running at ip:port.
*-----------------------:-----------------------------------+