| <?xml version="1.0"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd"> |
| |
| |
| <document> |
| |
| <header> |
| <title> HFTP Guide</title> |
| </header> |
| |
| <body> |
| <section> |
| <title> Introduction </title> |
| <p> HFTP is a Hadoop filesystem implementation that lets you read data from a remote Hadoop HDFS cluster. |
| The reads are done via HTTP, and data is sourced from DataNodes. |
| HFTP is a read-only filesystem, and will throw exceptions if you try to use it to write data or modify |
| the filesystem state.</p> |
| |
| <p>HFTP is primarily useful if you have multiple HDFS clusters with different versions and you need to move data from one to another. HFTP is wire-compatible even between different versions of HDFS. For example, you can do things like: |
| <code>hadoop distcp -i hftp://sourceFS:50070/src hdfs://destFS:50070/dest</code>. Note that HFTP is read-only so the destination must be an HDFS filesystem. (Also, in this example, the <code>distcp</code> should be run using the configuraton of the new filesystem.)</p> |
| |
| <p>An extension, HSFTP, uses HTTPS by default. This means that data will be encrypted in transit.</p> |
| </section> |
| |
| <section> |
| <title>Implementation</title> |
| <p>The code for HFTP lives in the Java class <code>org.apache.hadoop.hdfs.HftpFileSystem</code>. Likewise, |
| HSFTP is implemented in <code>org.apache.hadoop.hdfs.HsftpFileSystem</code>. |
| </p> |
| </section> |
| |
| <section> |
| <title> Configuration Options </title> |
| <table> |
| <tr> |
| <th>Name</th> |
| <th>Description</th> |
| </tr> |
| <tr> |
| <td>dfs.hftp.https.port</td> |
| <td>the HTTPS port on the remote cluster. If not set, HFTP will fall back on |
| <code>dfs.https.port</code>.</td> |
| </tr> |
| <tr> |
| <td>hdfs.service.host_<strong>ip:port</strong></td> |
| <td>Specifies the service name (for the security subsystem) associated with the HFTP filesystem |
| running at <strong>ip:port.</strong></td> |
| </tr> |
| </table> |
| </section> |
| </body> |
| </document> |