| <!--- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. See accompanying LICENSE file. |
| --> |
| |
| Hadoop HDFS over HTTP - Server Setup |
| ==================================== |
| |
| This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication. |
| |
| Install HttpFS |
| -------------- |
| |
| ~ $ tar xzf httpfs-${project.version}.tar.gz |
| |
| Configure HttpFS |
| ---------------- |
| |
| By default, HttpFS assumes that Hadoop configuration files (`core-site.xml & hdfs-site.xml`) are in the HttpFS configuration directory. |
| |
| If this is not the case, add to the `httpfs-site.xml` file the `httpfs.hadoop.config.dir` property set to the location of the Hadoop configuration directory. |
| |
| Configure Hadoop |
| ---------------- |
| |
| Edit Hadoop `core-site.xml` and defined the Unix user that will run the HttpFS server as a proxyuser. For example: |
| |
| ```xml |
| <property> |
| <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name> |
| <value>httpfs-host.foo.com</value> |
| </property> |
| <property> |
| <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name> |
| <value>*</value> |
| </property> |
| ``` |
| |
| IMPORTANT: Replace `#HTTPFSUSER#` with the Unix user that will start the HttpFS server. |
| |
| Restart Hadoop |
| -------------- |
| |
| You need to restart Hadoop for the proxyuser configuration to become active. |
| |
| Start/Stop HttpFS |
| ----------------- |
| |
| To start/stop HttpFS use HttpFS's sbin/httpfs.sh script. For example: |
| |
| $ sbin/httpfs.sh start |
| |
| NOTE: Invoking the script without any parameters list all possible parameters (start, stop, run, etc.). The `httpfs.sh` script is a wrapper for Tomcat's `catalina.sh` script that sets the environment variables and Java System properties required to run HttpFS server. |
| |
| Test HttpFS is working |
| ---------------------- |
| |
| $ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs' |
| {"Path":"\/user\/hdfs"} |
| |
| Embedded Tomcat Configuration |
| ----------------------------- |
| |
| To configure the embedded Tomcat go to the `tomcat/conf`. |
| |
| HttpFS preconfigures the HTTP and Admin ports in Tomcat's `server.xml` to 14000 and 14001, and it binds to all IP addresses on the host. |
| |
| Tomcat logs are also preconfigured to go to HttpFS's `logs/` directory. |
| |
| HttpFS default value for the maxHttpHeaderSize parameter in Tomcat's `server.xml` is set to 65536 by default. |
| |
| The following environment variables (which can be set in HttpFS's `etc/hadoop/httpfs-env.sh` script) can be used to alter those values: |
| * HTTPFS\_HTTP\_HOSTNAME |
| |
| * HTTPFS\_HTTP\_PORT |
| |
| * HTTPFS\_ADMIN\_PORT |
| |
| * HADOOP\_LOG\_DIR |
| |
| * HTTPFS\_MAX\_HTTP\_HEADER\_SIZE |
| |
| HttpFS Configuration |
| -------------------- |
| |
| HttpFS supports the following [configuration properties](./httpfs-default.html) in the HttpFS's `etc/hadoop/httpfs-site.xml` configuration file. |
| |
| HttpFS over HTTPS (SSL) |
| ----------------------- |
| |
| To configure HttpFS to work over SSL edit the [httpfs-env.sh](#httpfs-env.sh) script in the configuration directory setting the [HTTPFS\_SSL\_ENABLED](#HTTPFS_SSL_ENABLED) to [true](#true). |
| |
| In addition, the following 2 properties may be defined (shown with default values): |
| |
| * HTTPFS\_SSL\_KEYSTORE\_FILE=$HOME/.keystore |
| |
| * HTTPFS\_SSL\_KEYSTORE\_PASS=password |
| |
| In the HttpFS `tomcat/conf` directory, replace the `server.xml` file with the `ssl-server.xml` file. |
| |
| You need to create an SSL certificate for the HttpFS server. As the `httpfs` Unix user, using the Java `keytool` command to create the SSL certificate: |
| |
| $ keytool -genkey -alias tomcat -keyalg RSA |
| |
| You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named **.keystore** and located in the `httpfs` user home directory. |
| |
| The password you enter for "keystore password" must match the value of the `HTTPFS_SSL_KEYSTORE_PASS` environment variable set in the `httpfs-env.sh` script in the configuration directory. |
| |
| The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the HttpFS Server will be running. |
| |
| Start HttpFS. It should work over HTTPS. |
| |
| Using the Hadoop `FileSystem` API or the Hadoop FS shell, use the `swebhdfs://` scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate. |
| |
| Set environment variable `HTTPFS_SSL_CLIENT_AUTH` to change client |
| authentication. The default is `false`. See `clientAuth` in |
| [Tomcat 6.0 SSL Support](https://tomcat.apache.org/tomcat-6.0-doc/config/http.html#SSL_Support). |
| |
| Set environment variable `HTTPFS_SSL_ENABLED_PROTOCOLS` to specify a list of |
| enabled SSL protocols. The default list includes `TLSv1`, `TLSv1.1`, |
| `TLSv1.2`, and `SSLv2Hello`. See `sslEnabledProtocols` in |
| [Tomcat 6.0 SSL Support](https://tomcat.apache.org/tomcat-6.0-doc/config/http.html#SSL_Support). |
| |
| In order to support some old SSL clients, the default encryption ciphers |
| include a few relatively weaker ciphers. Set environment variable |
| `HTTPFS_SSL_CIPHERS` to override. The value is a comma separated list of |
| ciphers in [Tomcat Wiki](https://wiki.apache.org/tomcat/Security/Ciphers). |