Enable Hadoop

1. Build Hadoop

Apply the patch to hadoop-2.7.2 source code

git apply hadoop-2.7.2.patch

Build Hadoop

mvn package -Pdist,native -Dtar -DskipTests -Dmaven.javadoc.skip=true -Dcontainer-executor.conf.dir=/etc/hadoop/conf

Redeploy Hadoop

2. Distribute and configure Keytab files

Create keytab and deploy krb5.conf and has-client.conf

Distribute keytab files to the corresponding nodes.

Set permission of keytab files

3. Update hadoop configuration files

Update core-site.xml

add the following properties:


Update hdfs-site.xml

add the following properties:

<!-- General HDFS security config -->

<!-- NameNode security config -->
  <description>The maximum lifetime in milliseconds for which a delegation token is valid.</description>

<!-- Secondary NameNode security config -->

<!-- DataNode security config -->

<!-- HTTPS config -->

Configuration for HDFS HA

For normal configuration, please look at HDFS High Availability

add the following properties in hdfs-site.xml:


Update yarn-site.xml

add the following properties:

<!-- ResourceManager security config -->

<!-- NodeManager security config -->

<!-- HTTPS config -->

<!-- Container executor config -->

<!-- Timeline service config, if timeline service enabled -->





<!-- Proxy server config, if web proxy server enabled -->


Update mapred-site.xml

add the following properties:

<!-- MapReduce security config -->

Create and configure ssl-server.xml

cp etc/hadoop/ssl-server.xml.example etc/hadoop/ssl-server.xml

Configure ssl-server.xml: Please look at How to deploy https.

4. Configure container-executor

Create and configure container-executor.cfg

Example of container-executor.cfg:

#configured value of yarn.nodemanager.linux-container-executor.group
#comma separated list of users who can not run applications
#Prevent other super-users
#comma separated list of system users who CAN run applications

Set permission:

mv container-executor.cfg /etc/hadoop/conf
// Container-executor.cfg should be read-only
chmod 400 container-executor.cfg

Set permission of container-executor:

chmod 6050 container-executor
// Test whether configuration is correct
container-executor --checksetup

5. Setting up cross-realm for DistCp

Setup cross realm trust between realms

Please look at How to setup cross-realm.

Update core-site.xml

Set hadoop.security.auth_to_local parameter in both clusters, add the following properties:

<!-- Set up cross realm between A.HADOOP.COM and B.HADOOP.COM -->

Test the mapping:

hadoop org.apache.hadoop.security.HadoopKerberosName hdfs/localhost@A.HADOOP.COM

Update hdfs-site.xml

add the following properties in client-side:

<!-- Control allowed realms to authenticate with -->


Test trust is setup by running hdfs commands from A.HADOOP.COM to B.HADOOP.COM, run the following command on the node of A.HADOOP.COM cluster:

hdfs dfs –ls hdfs://<NameNode_FQDN_for_B.HADOOP.COM_Cluster>:8020/

Distcp between secure clusters

Run the distcp command:

hadoop distcp hdfs://<Cluster_A_URI> hdfs://<Cluster_B_URI>

Distcp between secure and insecure clusters

Add the following properties in core-site.xml:


Or run the distcp command with security setting:

hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://<Cluster_A_URI> hdfs://<Cluster_B_URI>