| <!--- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. See accompanying LICENSE file. |
| --> |
| |
| Hadoop HDFS over HTTP - Server Setup |
| ==================================== |
| |
| This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication. |
| |
| Install HttpFS |
| -------------- |
| |
| ~ $ tar xzf httpfs-${project.version}.tar.gz |
| |
| Configure HttpFS |
| ---------------- |
| |
| By default, HttpFS assumes that Hadoop configuration files (`core-site.xml & hdfs-site.xml`) are in the HttpFS configuration directory. |
| |
| If this is not the case, add to the `httpfs-site.xml` file the `httpfs.hadoop.config.dir` property set to the location of the Hadoop configuration directory. |
| |
| Configure Hadoop |
| ---------------- |
| |
| Edit Hadoop `core-site.xml` and defined the Unix user that will run the HttpFS server as a proxyuser. For example: |
| |
| ```xml |
| <property> |
| <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name> |
| <value>httpfs-host.foo.com</value> |
| </property> |
| <property> |
| <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name> |
| <value>*</value> |
| </property> |
| ``` |
| |
| IMPORTANT: Replace `#HTTPFSUSER#` with the Unix user that will start the HttpFS server. |
| |
| Restart Hadoop |
| -------------- |
| |
| You need to restart Hadoop for the proxyuser configuration to become active. |
| |
| Start/Stop HttpFS |
| ----------------- |
| |
| To start/stop HttpFS, use `hdfs --daemon start|stop httpfs`. For example: |
| |
| hadoop-${project.version} $ hdfs --daemon start httpfs |
| |
| NOTE: The script `httpfs.sh` is deprecated. It is now just a wrapper of |
| `hdfs httpfs`. |
| |
| Test HttpFS is working |
| ---------------------- |
| |
| $ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs' |
| {"Path":"\/user\/hdfs"} |
| |
| HttpFS Configuration |
| -------------------- |
| |
| HttpFS preconfigures the HTTP port to 14000. |
| |
| HttpFS supports the following [configuration properties](./httpfs-default.html) in the HttpFS's `etc/hadoop/httpfs-site.xml` configuration file. |
| |
| HttpFS over HTTPS (SSL) |
| ----------------------- |
| |
| Enable SSL in `etc/hadoop/httpfs-site.xml`: |
| |
| ```xml |
| <property> |
| <name>httpfs.ssl.enabled</name> |
| <value>true</value> |
| <description> |
| Whether SSL is enabled. Default is false, i.e. disabled. |
| </description> |
| </property> |
| ``` |
| |
| Configure `etc/hadoop/ssl-server.xml` with proper values, for example: |
| |
| ```xml |
| <property> |
| <name>ssl.server.keystore.location</name> |
| <value>${user.home}/.keystore</value> |
| <description>Keystore to be used. Must be specified. |
| </description> |
| </property> |
| |
| <property> |
| <name>ssl.server.keystore.password</name> |
| <value></value> |
| <description>Must be specified.</description> |
| </property> |
| |
| <property> |
| <name>ssl.server.keystore.keypassword</name> |
| <value></value> |
| <description>Must be specified.</description> |
| </property> |
| ``` |
| |
| The SSL passwords can be secured by a credential provider. See |
| [Credential Provider API](../../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html). |
| |
| You need to create an SSL certificate for the HttpFS server. As the `httpfs` Unix user, using the Java `keytool` command to create the SSL certificate: |
| |
| $ keytool -genkey -alias jetty -keyalg RSA |
| |
| You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named **.keystore** and located in the `httpfs` user home directory. |
| |
| The password you enter for "keystore password" must match the value of the |
| property `ssl.server.keystore.password` set in the `ssl-server.xml` in the |
| configuration directory. |
| |
| The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the HttpFS Server will be running. |
| |
| Start HttpFS. It should work over HTTPS. |
| |
| Using the Hadoop `FileSystem` API or the Hadoop FS shell, use the `swebhdfs://` scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate. |
| |
| NOTE: Some old SSL clients may use weak ciphers that are not supported by the HttpFS server. It is recommended to upgrade the SSL client. |
| |
| Deprecated Environment Variables |
| -------------------------------- |
| |
| The following environment variables are deprecated. Set the corresponding |
| configuration properties instead. |
| |
| Environment Variable | Configuration Property | Configuration File |
| ----------------------------|------------------------------|-------------------- |
| HTTPFS_HTTP_HOSTNAME | httpfs.http.hostname | httpfs-site.xml |
| HTTPFS_HTTP_PORT | httpfs.http.port | httpfs-site.xml |
| HTTPFS_MAX_HTTP_HEADER_SIZE | hadoop.http.max.request.header.size and hadoop.http.max.response.header.size | httpfs-site.xml |
| HTTPFS_MAX_THREADS | hadoop.http.max.threads | httpfs-site.xml |
| HTTPFS_SSL_ENABLED | httpfs.ssl.enabled | httpfs-site.xml |
| HTTPFS_SSL_KEYSTORE_FILE | ssl.server.keystore.location | ssl-server.xml |
| HTTPFS_SSL_KEYSTORE_PASS | ssl.server.keystore.password | ssl-server.xml |
| |
| HTTP Default Services |
| --------------------- |
| |
| Name | Description |
| -------------------|------------------------------------ |
| /conf | Display configuration properties |
| /jmx | Java JMX management interface |
| /logLevel | Get or set log level per class |
| /logs | Display log files |
| /stacks | Display JVM stacks |
| /static/index.html | The static home page |
| |
| To control the access to servlet `/conf`, `/jmx`, `/logLevel`, `/logs`, |
| and `/stacks`, configure the following properties in `httpfs-site.xml`: |
| |
| ```xml |
| <property> |
| <name>hadoop.security.authorization</name> |
| <value>true</value> |
| <description>Is service-level authorization enabled?</description> |
| </property> |
| |
| <property> |
| <name>hadoop.security.instrumentation.requires.admin</name> |
| <value>true</value> |
| <description> |
| Indicates if administrator ACLs are required to access |
| instrumentation servlets (JMX, METRICS, CONF, STACKS). |
| </description> |
| </property> |
| |
| <property> |
| <name>httpfs.http.administrators</name> |
| <value></value> |
| <description>ACL for the admins, this configuration is used to control |
| who can access the default servlets for HttpFS server. The value |
| should be a comma separated list of users and groups. The user list |
| comes first and is separated by a space followed by the group list, |
| e.g. "user1,user2 group1,group2". Both users and groups are optional, |
| so "user1", " group1", "", "user1 group1", "user1,user2 group1,group2" |
| are all valid (note the leading space in " group1"). '*' grants access |
| to all users and groups, e.g. '*', '* ' and ' *' are all valid. |
| </description> |
| </property> |
| ``` |