| ~~ Licensed under the Apache License, Version 2.0 (the "License"); |
| ~~ you may not use this file except in compliance with the License. |
| ~~ You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. See accompanying LICENSE file. |
| |
| --- |
| Hadoop in Secure Mode |
| --- |
| --- |
| ${maven.build.timestamp} |
| |
| %{toc|section=0|fromDepth=0|toDepth=3} |
| |
| Hadoop in Secure Mode |
| |
| * Introduction |
| |
| This document describes how to configure authentication for Hadoop in |
| secure mode. |
| |
| By default Hadoop runs in non-secure mode in which no actual |
| authentication is required. |
| By configuring Hadoop runs in secure mode, |
| each user and service needs to be authenticated by Kerberos |
| in order to use Hadoop services. |
| |
| Security features of Hadoop consist of |
| {{{Authentication}authentication}}, |
| {{{./ServiceLevelAuth.html}service level authorization}}, |
| {{{./HttpAuthentication.html}authentication for Web consoles}} |
| and {{{Data confidentiality}data confidenciality}}. |
| |
| |
| * Authentication |
| |
| ** End User Accounts |
| |
| When service level authentication is turned on, |
| end users using Hadoop in secure mode needs to be authenticated by Kerberos. |
| The simplest way to do authentication is using <<<kinit>>> command of Kerberos. |
| |
| ** User Accounts for Hadoop Daemons |
| |
| Ensure that HDFS and YARN daemons run as different Unix users, |
| e.g. <<<hdfs>>> and <<<yarn>>>. |
| Also, ensure that the MapReduce JobHistory server runs as |
| different user such as <<<mapred>>>. |
| |
| It's recommended to have them share a Unix group, for e.g. <<<hadoop>>>. |
| See also "{{Mapping from user to group}}" for group management. |
| |
| *---------------+----------------------------------------------------------------------+ |
| || User:Group || Daemons | |
| *---------------+----------------------------------------------------------------------+ |
| | hdfs:hadoop | NameNode, Secondary NameNode, JournalNode, DataNode | |
| *---------------+----------------------------------------------------------------------+ |
| | yarn:hadoop | ResourceManager, NodeManager | |
| *---------------+----------------------------------------------------------------------+ |
| | mapred:hadoop | MapReduce JobHistory Server | |
| *---------------+----------------------------------------------------------------------+ |
| |
| ** Kerberos principals for Hadoop Daemons and Users |
| |
| For running hadoop service daemons in Hadoop in secure mode, |
| Kerberos principals are required. |
| Each service reads auhenticate information saved in keytab file with appropriate permission. |
| |
| HTTP web-consoles should be served by principal different from RPC's one. |
| |
| Subsections below shows the examples of credentials for Hadoop services. |
| |
| *** HDFS |
| |
| The NameNode keytab file, on the NameNode host, should look like the |
| following: |
| |
| ---- |
| $ klist -e -k -t /etc/security/keytab/nn.service.keytab |
| Keytab name: FILE:/etc/security/keytab/nn.service.keytab |
| KVNO Timestamp Principal |
| 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| ---- |
| |
| The Secondary NameNode keytab file, on that host, should look like the |
| following: |
| |
| ---- |
| $ klist -e -k -t /etc/security/keytab/sn.service.keytab |
| Keytab name: FILE:/etc/security/keytab/sn.service.keytab |
| KVNO Timestamp Principal |
| 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| ---- |
| |
| The DataNode keytab file, on each host, should look like the following: |
| |
| ---- |
| $ klist -e -k -t /etc/security/keytab/dn.service.keytab |
| Keytab name: FILE:/etc/security/keytab/dn.service.keytab |
| KVNO Timestamp Principal |
| 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| ---- |
| |
| *** YARN |
| |
| The ResourceManager keytab file, on the ResourceManager host, should look |
| like the following: |
| |
| ---- |
| $ klist -e -k -t /etc/security/keytab/rm.service.keytab |
| Keytab name: FILE:/etc/security/keytab/rm.service.keytab |
| KVNO Timestamp Principal |
| 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| ---- |
| |
| The NodeManager keytab file, on each host, should look like the following: |
| |
| ---- |
| $ klist -e -k -t /etc/security/keytab/nm.service.keytab |
| Keytab name: FILE:/etc/security/keytab/nm.service.keytab |
| KVNO Timestamp Principal |
| 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| ---- |
| |
| *** MapReduce JobHistory Server |
| |
| The MapReduce JobHistory Server keytab file, on that host, should look |
| like the following: |
| |
| ---- |
| $ klist -e -k -t /etc/security/keytab/jhs.service.keytab |
| Keytab name: FILE:/etc/security/keytab/jhs.service.keytab |
| KVNO Timestamp Principal |
| 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) |
| 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) |
| ---- |
| |
| ** Mapping from Kerberos principal to OS user account |
| |
| Hadoop maps Kerberos principal to OS user account using |
| the rule specified by <<<hadoop.security.auth_to_local>>> |
| which works in the same way as the <<<auth_to_local>>> in |
| {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos configuration file (krb5.conf)}}. |
| In addition, Hadoop <<<auth_to_local>>> mapping supports the <</L>> flag that |
| lowercases the returned name. |
| |
| By default, it picks the first component of principal name as a user name |
| if the realms matches to the <<<default_realm>>> (usually defined in /etc/krb5.conf). |
| For example, <<<host/full.qualified.domain.name@REALM.TLD>>> is mapped to <<<host>>> |
| by default rule. |
| |
| ** Mapping from user to group |
| |
| Though files on HDFS are associated to owner and group, |
| Hadoop does not have the definition of group by itself. |
| Mapping from user to group is done by OS or LDAP. |
| |
| You can change a way of mapping by |
| specifying the name of mapping provider as a value of |
| <<<hadoop.security.group.mapping>>> |
| See {{{../hadoop-hdfs/HdfsPermissionsGuide.html}HDFS Permissions Guide}} for details. |
| |
| Practically you need to manage SSO environment using Kerberos with LDAP |
| for Hadoop in secure mode. |
| |
| ** Proxy user |
| |
| Some products such as Apache Oozie which access the services of Hadoop |
| on behalf of end users need to be able to impersonate end users. |
| You can configure proxy user using properties |
| <<<hadoop.proxyuser.${superuser}.hosts>>> along with either or both of |
| <<<hadoop.proxyuser.${superuser}.groups>>> |
| and <<<hadoop.proxyuser.${superuser}.users>>>. |
| |
| For example, by specifying as below in core-site.xml, |
| user named <<<oozie>>> accessing from any host |
| can impersonate any user belonging to any group. |
| |
| ---- |
| <property> |
| <name>hadoop.proxyuser.oozie.hosts</name> |
| <value>*</value> |
| </property> |
| <property> |
| <name>hadoop.proxyuser.oozie.groups</name> |
| <value>*</value> |
| </property> |
| ---- |
| |
| User named <<<oozie>>> accessing from any host |
| can impersonate user1 and user2 by specifying as below in core-site.xml. |
| |
| ---- |
| <property> |
| <name>hadoop.proxyuser.oozie.hosts</name> |
| <value>*</value> |
| </property> |
| <property> |
| <name>hadoop.proxyuser.oozie.users</name> |
| <value>user1,user2</value> |
| </property> |
| ---- |
| |
| The <<<hadoop.proxyuser.${superuser}.hosts>>> accepts list of ip addresses, |
| ip address ranges in CIDR format and/or host names. |
| |
| For example, by specifying as below in core-site.xml, |
| user named <<<oozie>>> accessing from hosts in the range |
| 10.222.0.0-15 and 10.113.221.221 |
| can impersonate any user belonging to any group. |
| |
| ---- |
| <property> |
| <name>hadoop.proxyuser.oozie.hosts</name> |
| <value>10.222.0.0/16,10.113.221.221</value> |
| </property> |
| <property> |
| <name>hadoop.proxyuser.oozie.groups</name> |
| <value>*</value> |
| </property> |
| ---- |
| |
| ** Secure DataNode |
| |
| Because the data transfer protocol of DataNode |
| does not use the RPC framework of Hadoop, |
| DataNode must authenticate itself by |
| using privileged ports which are specified by |
| <<<dfs.datanode.address>>> and <<<dfs.datanode.http.address>>>. |
| This authentication is based on the assumption |
| that the attacker won't be able to get root privileges. |
| |
| When you execute <<<hdfs datanode>>> command as root, |
| server process binds privileged port at first, |
| then drops privilege and runs as the user account specified by |
| <<<HADOOP_SECURE_DN_USER>>>. |
| This startup process uses jsvc installed to <<<JSVC_HOME>>>. |
| You must specify <<<HADOOP_SECURE_DN_USER>>> and <<<JSVC_HOME>>> |
| as environment variables on start up (in hadoop-env.sh). |
| |
| |
| * Data confidentiality |
| |
| ** Data Encryption on RPC |
| |
| The data transfered between hadoop services and clients. |
| Setting <<<hadoop.rpc.protection>>> to <<<"privacy">>> in the core-site.xml |
| activate data encryption. |
| |
| ** Data Encryption on Block data transfer. |
| |
| You need to set <<<dfs.encrypt.data.transfer>>> to <<<"true">>> in the hdfs-site.xml |
| in order to activate data encryption for data transfer protocol of DataNode. |
| |
| ** Data Encryption on HTTP |
| |
| Data transfer between Web-console and clients are protected by using SSL(HTTPS). |
| |
| |
| * Configuration |
| |
| ** Permissions for both HDFS and local fileSystem paths |
| |
| The following table lists various paths on HDFS and local filesystems (on |
| all nodes) and recommended permissions: |
| |
| *-------------------+-------------------+------------------+------------------+ |
| || Filesystem || Path || User:Group || Permissions | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<dfs.namenode.name.dir>>> | hdfs:hadoop | drwx------ | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<dfs.datanode.data.dir>>> | hdfs:hadoop | drwx------ | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | container-executor | root:hadoop | --Sr-s--- | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- | |
| *-------------------+-------------------+------------------+------------------+ |
| | hdfs | / | hdfs:hadoop | drwxr-xr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| | hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt | |
| *-------------------+-------------------+------------------+------------------+ |
| | hdfs | /user | hdfs:hadoop | drwxr-xr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| | hdfs | <<<yarn.nodemanager.remote-app-log-dir>>> | yarn:hadoop | drwxrwxrwxt | |
| *-------------------+-------------------+------------------+------------------+ |
| | hdfs | <<<mapreduce.jobhistory.intermediate-done-dir>>> | mapred:hadoop | | |
| | | | | drwxrwxrwxt | |
| *-------------------+-------------------+------------------+------------------+ |
| | hdfs | <<<mapreduce.jobhistory.done-dir>>> | mapred:hadoop | | |
| | | | | drwxr-x--- | |
| *-------------------+-------------------+------------------+------------------+ |
| |
| ** Common Configurations |
| |
| In order to turn on RPC authentication in hadoop, |
| set the value of <<<hadoop.security.authentication>>> property to |
| <<<"kerberos">>>, and set security related settings listed below appropriately. |
| |
| The following properties should be in the <<<core-site.xml>>> of all the |
| nodes in the cluster. |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<hadoop.security.authentication>>> | <kerberos> | | |
| | | | <<<simple>>> : No authentication. (default) \ |
| | | | <<<kerberos>>> : Enable authentication by Kerberos. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<hadoop.security.authorization>>> | <true> | | |
| | | | Enable {{{./ServiceLevelAuth.html}RPC service-level authorization}}. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<hadoop.rpc.protection>>> | <authentication> | |
| | | | <authentication> : authentication only (default) \ |
| | | | <integrity> : integrity check in addition to authentication \ |
| | | | <privacy> : data encryption in addition to integrity | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<hadoop.security.auth_to_local>>> | | | |
| | | <<<RULE:>>><exp1>\ |
| | | <<<RULE:>>><exp2>\ |
| | | <...>\ |
| | | DEFAULT | |
| | | | The value is string containing new line characters. |
| | | | See |
| | | | {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos documentation}} |
| | | | for format for <exp>. |
| *-------------------------+-------------------------+------------------------+ |
| | <<<hadoop.proxyuser.>>><superuser><<<.hosts>>> | | | |
| | | | comma separated hosts from which <superuser> access are allowd to impersonation. | |
| | | | <<<*>>> means wildcard. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<hadoop.proxyuser.>>><superuser><<<.groups>>> | | | |
| | | | comma separated groups to which users impersonated by <superuser> belongs. | |
| | | | <<<*>>> means wildcard. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/core-site.xml>>> |
| |
| ** NameNode |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.block.access.token.enable>>> | <true> | | |
| | | | Enable HDFS block access tokens for secure operations. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.https.enable>>> | <true> | | |
| | | | This value is deprecated. Use dfs.http.policy | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.http.policy>>> | <HTTP_ONLY> or <HTTPS_ONLY> or <HTTP_AND_HTTPS> | | |
| | | | HTTPS_ONLY turns off http access. This option takes precedence over | |
| | | | the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.https-address>>> | <nn_host_fqdn:50470> | | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.https.port>>> | <50470> | | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.keytab.file>>> | </etc/security/keytab/nn.service.keytab> | | |
| | | | Kerberos keytab file for the NameNode. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.kerberos.principal>>> | nn/_HOST@REALM.TLD | | |
| | | | Kerberos principal name for the NameNode. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.kerberos.https.principal>>> | host/_HOST@REALM.TLD | | |
| | | | HTTPS Kerberos principal name for the NameNode. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/hdfs-site.xml>>> |
| |
| ** Secondary NameNode |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.secondary.http-address>>> | <c_nn_host_fqdn:50090> | | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.secondary.https-port>>> | <50470> | | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.secondary.keytab.file>>> | | | |
| | | </etc/security/keytab/sn.service.keytab> | | |
| | | | Kerberos keytab file for the NameNode. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.secondary.kerberos.principal>>> | sn/_HOST@REALM.TLD | | |
| | | | Kerberos principal name for the Secondary NameNode. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.namenode.secondary.kerberos.https.principal>>> | | | |
| | | host/_HOST@REALM.TLD | | |
| | | | HTTPS Kerberos principal name for the Secondary NameNode. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/hdfs-site.xml>>> |
| |
| ** DataNode |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.datanode.data.dir.perm>>> | 700 | | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.datanode.address>>> | <0.0.0.0:1004> | | |
| | | | Secure DataNode must use privileged port | |
| | | | in order to assure that the server was started securely. | |
| | | | This means that the server must be started via jsvc. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.datanode.http.address>>> | <0.0.0.0:1006> | | |
| | | | Secure DataNode must use privileged port | |
| | | | in order to assure that the server was started securely. | |
| | | | This means that the server must be started via jsvc. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.datanode.https.address>>> | <0.0.0.0:50470> | | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.datanode.keytab.file>>> | </etc/security/keytab/dn.service.keytab> | | |
| | | | Kerberos keytab file for the DataNode. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.datanode.kerberos.principal>>> | dn/_HOST@REALM.TLD | | |
| | | | Kerberos principal name for the DataNode. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.datanode.kerberos.https.principal>>> | | | |
| | | host/_HOST@REALM.TLD | | |
| | | | HTTPS Kerberos principal name for the DataNode. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.encrypt.data.transfer>>> | <false> | | |
| | | | set to <<<true>>> when using data encryption | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/hdfs-site.xml>>> |
| |
| |
| ** WebHDFS |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.webhdfs.enabled>>> | http/_HOST@REALM.TLD | | |
| | | | Enable security on WebHDFS. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.web.authentication.kerberos.principal>>> | http/_HOST@REALM.TLD | | |
| | | | Kerberos keytab file for the WebHDFS. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<dfs.web.authentication.kerberos.keytab>>> | </etc/security/keytab/http.service.keytab> | | |
| | | | Kerberos principal name for WebHDFS. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/hdfs-site.xml>>> |
| |
| |
| ** ResourceManager |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.resourcemanager.keytab>>> | | | |
| | | </etc/security/keytab/rm.service.keytab> | | |
| | | | Kerberos keytab file for the ResourceManager. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.resourcemanager.principal>>> | rm/_HOST@REALM.TLD | | |
| | | | Kerberos principal name for the ResourceManager. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/yarn-site.xml>>> |
| |
| ** NodeManager |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.nodemanager.keytab>>> | </etc/security/keytab/nm.service.keytab> | | |
| | | | Kerberos keytab file for the NodeManager. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.nodemanager.principal>>> | nm/_HOST@REALM.TLD | | |
| | | | Kerberos principal name for the NodeManager. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.nodemanager.container-executor.class>>> | | | |
| | | <<<org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor>>> | |
| | | | Use LinuxContainerExecutor. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | | |
| | | | Unix group of the NodeManager. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.nodemanager.linux-container-executor.path>>> | </path/to/bin/container-executor> | | |
| | | | The path to the executable of Linux container executor. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/yarn-site.xml>>> |
| |
| ** Configuration for WebAppProxy |
| |
| The <<<WebAppProxy>>> provides a proxy between the web applications |
| exported by an application and an end user. If security is enabled |
| it will warn users before accessing a potentially unsafe web application. |
| Authentication and authorization using the proxy is handled just like |
| any other privileged web application. |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.web-proxy.address>>> | | | |
| | | <<<WebAppProxy>>> host:port for proxy to AM web apps. | | |
| | | | <host:port> if this is the same as <<<yarn.resourcemanager.webapp.address>>>| |
| | | | or it is not defined then the <<<ResourceManager>>> will run the proxy| |
| | | | otherwise a standalone proxy server will need to be launched.| |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.web-proxy.keytab>>> | | | |
| | | </etc/security/keytab/web-app.service.keytab> | | |
| | | | Kerberos keytab file for the WebAppProxy. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.web-proxy.principal>>> | wap/_HOST@REALM.TLD | | |
| | | | Kerberos principal name for the WebAppProxy. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/yarn-site.xml>>> |
| |
| ** LinuxContainerExecutor |
| |
| A <<<ContainerExecutor>>> used by YARN framework which define how any |
| <container> launched and controlled. |
| |
| The following are the available in Hadoop YARN: |
| |
| *--------------------------------------+--------------------------------------+ |
| || ContainerExecutor || Description | |
| *--------------------------------------+--------------------------------------+ |
| | <<<DefaultContainerExecutor>>> | | |
| | | The default executor which YARN uses to manage container execution. | |
| | | The container process has the same Unix user as the NodeManager. | |
| *--------------------------------------+--------------------------------------+ |
| | <<<LinuxContainerExecutor>>> | | |
| | | Supported only on GNU/Linux, this executor runs the containers as either the | |
| | | YARN user who submitted the application (when full security is enabled) or | |
| | | as a dedicated user (defaults to nobody) when full security is not enabled. | |
| | | When full security is enabled, this executor requires all user accounts to be | |
| | | created on the cluster nodes where the containers are launched. It uses | |
| | | a <setuid> executable that is included in the Hadoop distribution. | |
| | | The NodeManager uses this executable to launch and kill containers. | |
| | | The setuid executable switches to the user who has submitted the | |
| | | application and launches or kills the containers. For maximum security, | |
| | | this executor sets up restricted permissions and user/group ownership of | |
| | | local files and directories used by the containers such as the shared | |
| | | objects, jars, intermediate files, log files etc. Particularly note that, | |
| | | because of this, except the application owner and NodeManager, no other | |
| | | user can access any of the local files/directories including those | |
| | | localized as part of the distributed cache. | |
| *--------------------------------------+--------------------------------------+ |
| |
| To build the LinuxContainerExecutor executable run: |
| |
| ---- |
| $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/ |
| ---- |
| |
| The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the |
| path on the cluster nodes where a configuration file for the setuid |
| executable should be located. The executable should be installed in |
| $HADOOP_YARN_HOME/bin. |
| |
| The executable must have specific permissions: 6050 or --Sr-s--- |
| permissions user-owned by <root> (super-user) and group-owned by a |
| special group (e.g. <<<hadoop>>>) of which the NodeManager Unix user is |
| the group member and no ordinary application user is. If any application |
| user belongs to this special group, security will be compromised. This |
| special group name should be specified for the configuration property |
| <<<yarn.nodemanager.linux-container-executor.group>>> in both |
| <<<conf/yarn-site.xml>>> and <<<conf/container-executor.cfg>>>. |
| |
| For example, let's say that the NodeManager is run as user <yarn> who is |
| part of the groups users and <hadoop>, any of them being the primary group. |
| Let also be that <users> has both <yarn> and another user |
| (application submitter) <alice> as its members, and <alice> does not |
| belong to <hadoop>. Going by the above description, the setuid/setgid |
| executable should be set 6050 or --Sr-s--- with user-owner as <yarn> and |
| group-owner as <hadoop> which has <yarn> as its member (and not <users> |
| which has <alice> also as its member besides <yarn>). |
| |
| The LinuxTaskController requires that paths including and leading up to |
| the directories specified in <<<yarn.nodemanager.local-dirs>>> and |
| <<<yarn.nodemanager.log-dirs>>> to be set 755 permissions as described |
| above in the table on permissions on directories. |
| |
| * <<<conf/container-executor.cfg>>> |
| |
| The executable requires a configuration file called |
| <<<container-executor.cfg>>> to be present in the configuration |
| directory passed to the mvn target mentioned above. |
| |
| The configuration file must be owned by the user running NodeManager |
| (user <<<yarn>>> in the above example), group-owned by anyone and |
| should have the permissions 0400 or r--------. |
| |
| The executable requires following configuration items to be present |
| in the <<<conf/container-executor.cfg>>> file. The items should be |
| mentioned as simple key=value pairs, one per-line: |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | | |
| | | | Unix group of the NodeManager. The group owner of the | |
| | | |<container-executor> binary should be this group. Should be same as the | |
| | | | value with which the NodeManager is configured. This configuration is | |
| | | | required for validating the secure access of the <container-executor> | |
| | | | binary. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<banned.users>>> | hfds,yarn,mapred,bin | Banned users. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<allowed.system.users>>> | foo,bar | Allowed system users. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<min.user.id>>> | 1000 | Prevent other super-users. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/yarn-site.xml>>> |
| |
| To re-cap, here are the local file-sysytem permissions required for the |
| various paths related to the <<<LinuxContainerExecutor>>>: |
| |
| *-------------------+-------------------+------------------+------------------+ |
| || Filesystem || Path || User:Group || Permissions | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | container-executor | root:hadoop | --Sr-s--- | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| | local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x | |
| *-------------------+-------------------+------------------+------------------+ |
| |
| ** MapReduce JobHistory Server |
| |
| *-------------------------+-------------------------+------------------------+ |
| || Parameter || Value || Notes | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<mapreduce.jobhistory.address>>> | | | |
| | | MapReduce JobHistory Server <host:port> | Default port is 10020. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<mapreduce.jobhistory.keytab>>> | | |
| | | </etc/security/keytab/jhs.service.keytab> | | |
| | | | Kerberos keytab file for the MapReduce JobHistory Server. | |
| *-------------------------+-------------------------+------------------------+ |
| | <<<mapreduce.jobhistory.principal>>> | jhs/_HOST@REALM.TLD | | |
| | | | Kerberos principal name for the MapReduce JobHistory Server. | |
| *-------------------------+-------------------------+------------------------+ |
| Configuration for <<<conf/mapred-site.xml>>> |