MAPREDUCE-1624. Documents the job credentials and associated details to do with delegation tokens (on the client side). Contributed by Jitendra Pandey and Devaraj Das. git-svn-id: https://svn.apache.org/repos/asf/hadoop/mapreduce/trunk@981264 13f79535-47bb-0310-9956-ffa450edef68

commit: a9349267bbb44ddbc21e09858235089d41b1fd8c [log] [tgz]
author: Devaraj Das <ddas@apache.org> Sun Aug 01 16:37:44 2010 +0000
committer: Devaraj Das <ddas@apache.org> Sun Aug 01 16:37:44 2010 +0000
tree: 682a25ef0d78344757acb0c641884e2c05df5694
parent: b2414079831a0e3de5153512b03eb114245e128c [diff]
diff --git a/CHANGES.txt b/CHANGES.txt
index d8c8cf3..2c8aa75 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt

@@ -101,6 +101,10 @@
     specify a credentials file. The tokens from there will be passed to the job.
     (Jitendra Pandey and Owen O'Malley via ddas)
 
+    MAPREDUCE-1624. Documents the job credentials and associated details to do 
+    with delegation tokens (on the client side). 
+    (Jitendra Pandey and Devaraj Das via ddas)
+
   OPTIMIZATIONS
 
     MAPREDUCE-1354. Enhancements to JobTracker for better performance and

diff --git a/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml b/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
index 418c5a3..8d83877 100644
--- a/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
+++ b/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1605,6 +1605,81 @@
         <a href="http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+Native+Libraries+Through+DistributedCache">
         Building Native Hadoop Libraries</a>.</p>
         </section>
+        <section>
+          <title>Job Credentials</title>
+          <p>In a secure cluster, the user is authenticated via Kerberos'
+             kinit command. Because of scalability concerns, we don't push
+             the client's Kerberos' tickets in MapReduce jobs. Instead, we
+             acquire delegation tokens from each HDFS NameNode that the job
+             will use and store them in the job as part of job submission.
+             The delegation tokens are automatically obtained
+             for the HDFS that holds the staging directories, where the
+             job files are written, and any HDFS systems referenced by
+             FileInputFormats, FileOutputFormats, DistCp, and the
+             distributed cache.
+             Other applications require to set the configuration
+             "mapreduce.job.hdfs-servers" for all NameNodes that tasks might 
+             need to talk during the job execution. This is a comma separated
+             list of file system names, such as "hdfs://nn1/,hdfs://nn2/".
+             These tokens are passed to the JobTracker
+             as part of the job submission as <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html">Credentials</a>. </p> 
+
+          <p>Similar to HDFS delegation tokens, we also have MapReduce delegation tokens. The
+             MapReduce tokens are provided so that tasks can spawn jobs if they wish to. The tasks authenticate
+             to the JobTracker via the MapReduce delegation tokens. The delegation token can
+             be obtained via the API in <a href="api/org/apache/hadoop/mapred/jobclient/getdelegationtoken">
+             JobClient.getDelegationToken</a>. The obtained token must then be pushed onto the
+             credentials that is there in the JobConf used for job submission. The API  
+             <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html#addToken(org.apache.hadoop.io.Text, org.apache.hadoop.security.token.Token)">Credentials.addToken</a>
+             can be used for this. </p>
+
+          <p>The credentials are sent to the JobTracker as part of the job submission process.
+             The JobTracker persists the tokens and secrets in its filesystem (typically HDFS) 
+             in a file within mapred.system.dir/JOBID. The TaskTracker localizes the file as part
+             job localization. Tasks see an environment variable called
+             HADOOP_TOKEN_FILE_LOCATION and the framework sets this to point to the
+             localized file. In order to launch jobs from tasks or for doing any HDFS operation,
+             tasks must set the configuration "mapreduce.job.credentials.binary" to point to
+             this token file.</p> 
+
+          <p>The HDFS delegation tokens passed to the JobTracker during job submission are
+             are cancelled by the JobTracker when the job completes. This is the default behavior
+             unless mapreduce.job.complete.cancel.delegation.tokens is set to false in the 
+             JobConf. For jobs whose tasks in turn spawns jobs, this should be set to false.
+             Applications sharing JobConf objects between multiple jobs on the JobClient side 
+             should look at setting mapreduce.job.complete.cancel.delegation.tokens to false. 
+             This is because the Credentials object within the JobConf will then be shared. 
+             All jobs will end up sharing the same tokens, and hence the tokens should not be 
+             canceled when the jobs in the sequence finish.</p>
+
+          <p>Apart from the HDFS delegation tokens, arbitrary secrets can also be 
+             passed during the job submission for tasks to access other third party services.
+             The APIs 
+             <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+             JobConf.getCredentials</a> or <a href="ext:api/org/apache/
+              hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+             should be used to get the credentials object and then
+             <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html#addSecretKey(org.apache.hadoop.io.Text, byte[])">
+             Credentials.addSecretKey</a> should be used to add secrets.</p>
+
+          <p>For applications written using the old MapReduce API, the Mapper/Reducer classes 
+             need to implement <a href="api/org/apache/hadoop/mapred/jobconfigurable">
+             JobConfigurable</a> in order to get access to the credentials in the tasks.
+             A reference to the JobConf passed in the 
+             <a href="api/org/apache/hadoop/mapred/jobconfigurable/configure">
+             JobConfigurable.configure</a> should be stored. In the new MapReduce API, 
+             a similar thing can be done in the 
+             <a href="api/org/apache/hadoop/mapreduce/mapper/setup">Mapper.setup</a>
+             method.
+             The api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+              JobConf.getCredentials()</a> or the api <a href="ext:api/org/apache/
+              hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+              should be used to get the credentials reference (depending
+              on whether the new MapReduce API or the old MapReduce API is used). 
+              Tasks can access the secrets using the APIs in <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html">Credentials</a> </p>
+
+             
+        </section>
       </section>
       
       <section>

diff --git a/src/docs/src/documentation/content/xdocs/site.xml b/src/docs/src/documentation/content/xdocs/site.xml
index c02456d..316f464 100644
--- a/src/docs/src/documentation/content/xdocs/site.xml
+++ b/src/docs/src/documentation/content/xdocs/site.xml

@@ -158,6 +158,14 @@
                 <compressioncodec href="CompressionCodec.html" />
               </compress>
             </io>
+            <mapreduce href="mapreduce/">
+              <mapper href="Mapper.html">
+                <setup href="#setup(org.apache.hadoop.mapreduce.Mapper.Context)" />
+              </mapper>
+              <jobcontext href="JobContext.html">
+                <getcredentials href="#getcredentials" />
+              </jobcontext>
+            </mapreduce>
             <mapred href="mapred/">
               <clusterstatus href="ClusterStatus.html" />
               <counters href="Counters.html" />
@@ -181,6 +189,7 @@
               <jobclient href="JobClient.html">
                 <runjob href="#runJob(org.apache.hadoop.mapred.JobConf)" />
                 <submitjob href="#submitJob(org.apache.hadoop.mapred.JobConf)" />
+                <getdelegationtoken href="#getDelegationToken(org.apache.hadoop.io.Text)" />
               </jobclient>
               <jobconf href="JobConf.html">
                 <setnummaptasks href="#setNumMapTasks(int)" />
@@ -206,6 +215,7 @@
                 <setqueuename href="#setQueueName(java.lang.String)" />
                 <getjoblocaldir href="#getJobLocalDir()" />
                 <getjar href="#getJar()" />
+                <getcredentials href="#getCredentials()" />
               </jobconf>
               <jobconfigurable href="JobConfigurable.html">
                 <configure href="#configure(org.apache.hadoop.mapred.JobConf)" />
commit	a9349267bbb44ddbc21e09858235089d41b1fd8c	[log] [tgz]
author	Devaraj Das <ddas@apache.org>	Sun Aug 01 16:37:44 2010 +0000
committer	Devaraj Das <ddas@apache.org>	Sun Aug 01 16:37:44 2010 +0000
tree	682a25ef0d78344757acb0c641884e2c05df5694
parent	b2414079831a0e3de5153512b03eb114245e128c [diff]