MAPREDUCE-1624. Documents the job credentials and associated details to do with delegation tokens (on the client side). Contributed by Jitendra Pandey and Devaraj Das.
git-svn-id: https://svn.apache.org/repos/asf/hadoop/mapreduce/trunk@981264 13f79535-47bb-0310-9956-ffa450edef68
diff --git a/CHANGES.txt b/CHANGES.txt
index d8c8cf3..2c8aa75 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -101,6 +101,10 @@
specify a credentials file. The tokens from there will be passed to the job.
(Jitendra Pandey and Owen O'Malley via ddas)
+ MAPREDUCE-1624. Documents the job credentials and associated details to do
+ with delegation tokens (on the client side).
+ (Jitendra Pandey and Devaraj Das via ddas)
+
OPTIMIZATIONS
MAPREDUCE-1354. Enhancements to JobTracker for better performance and
diff --git a/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml b/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
index 418c5a3..8d83877 100644
--- a/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
+++ b/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
@@ -1605,6 +1605,81 @@
<a href="http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+Native+Libraries+Through+DistributedCache">
Building Native Hadoop Libraries</a>.</p>
</section>
+ <section>
+ <title>Job Credentials</title>
+ <p>In a secure cluster, the user is authenticated via Kerberos'
+ kinit command. Because of scalability concerns, we don't push
+ the client's Kerberos' tickets in MapReduce jobs. Instead, we
+ acquire delegation tokens from each HDFS NameNode that the job
+ will use and store them in the job as part of job submission.
+ The delegation tokens are automatically obtained
+ for the HDFS that holds the staging directories, where the
+ job files are written, and any HDFS systems referenced by
+ FileInputFormats, FileOutputFormats, DistCp, and the
+ distributed cache.
+ Other applications require to set the configuration
+ "mapreduce.job.hdfs-servers" for all NameNodes that tasks might
+ need to talk during the job execution. This is a comma separated
+ list of file system names, such as "hdfs://nn1/,hdfs://nn2/".
+ These tokens are passed to the JobTracker
+ as part of the job submission as <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html">Credentials</a>. </p>
+
+ <p>Similar to HDFS delegation tokens, we also have MapReduce delegation tokens. The
+ MapReduce tokens are provided so that tasks can spawn jobs if they wish to. The tasks authenticate
+ to the JobTracker via the MapReduce delegation tokens. The delegation token can
+ be obtained via the API in <a href="api/org/apache/hadoop/mapred/jobclient/getdelegationtoken">
+ JobClient.getDelegationToken</a>. The obtained token must then be pushed onto the
+ credentials that is there in the JobConf used for job submission. The API
+ <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html#addToken(org.apache.hadoop.io.Text, org.apache.hadoop.security.token.Token)">Credentials.addToken</a>
+ can be used for this. </p>
+
+ <p>The credentials are sent to the JobTracker as part of the job submission process.
+ The JobTracker persists the tokens and secrets in its filesystem (typically HDFS)
+ in a file within mapred.system.dir/JOBID. The TaskTracker localizes the file as part
+ job localization. Tasks see an environment variable called
+ HADOOP_TOKEN_FILE_LOCATION and the framework sets this to point to the
+ localized file. In order to launch jobs from tasks or for doing any HDFS operation,
+ tasks must set the configuration "mapreduce.job.credentials.binary" to point to
+ this token file.</p>
+
+ <p>The HDFS delegation tokens passed to the JobTracker during job submission are
+ are cancelled by the JobTracker when the job completes. This is the default behavior
+ unless mapreduce.job.complete.cancel.delegation.tokens is set to false in the
+ JobConf. For jobs whose tasks in turn spawns jobs, this should be set to false.
+ Applications sharing JobConf objects between multiple jobs on the JobClient side
+ should look at setting mapreduce.job.complete.cancel.delegation.tokens to false.
+ This is because the Credentials object within the JobConf will then be shared.
+ All jobs will end up sharing the same tokens, and hence the tokens should not be
+ canceled when the jobs in the sequence finish.</p>
+
+ <p>Apart from the HDFS delegation tokens, arbitrary secrets can also be
+ passed during the job submission for tasks to access other third party services.
+ The APIs
+ <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+ JobConf.getCredentials</a> or <a href="ext:api/org/apache/
+ hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+ should be used to get the credentials object and then
+ <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html#addSecretKey(org.apache.hadoop.io.Text, byte[])">
+ Credentials.addSecretKey</a> should be used to add secrets.</p>
+
+ <p>For applications written using the old MapReduce API, the Mapper/Reducer classes
+ need to implement <a href="api/org/apache/hadoop/mapred/jobconfigurable">
+ JobConfigurable</a> in order to get access to the credentials in the tasks.
+ A reference to the JobConf passed in the
+ <a href="api/org/apache/hadoop/mapred/jobconfigurable/configure">
+ JobConfigurable.configure</a> should be stored. In the new MapReduce API,
+ a similar thing can be done in the
+ <a href="api/org/apache/hadoop/mapreduce/mapper/setup">Mapper.setup</a>
+ method.
+ The api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+ JobConf.getCredentials()</a> or the api <a href="ext:api/org/apache/
+ hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+ should be used to get the credentials reference (depending
+ on whether the new MapReduce API or the old MapReduce API is used).
+ Tasks can access the secrets using the APIs in <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/security/Credentials.html">Credentials</a> </p>
+
+
+ </section>
</section>
<section>
diff --git a/src/docs/src/documentation/content/xdocs/site.xml b/src/docs/src/documentation/content/xdocs/site.xml
index c02456d..316f464 100644
--- a/src/docs/src/documentation/content/xdocs/site.xml
+++ b/src/docs/src/documentation/content/xdocs/site.xml
@@ -158,6 +158,14 @@
<compressioncodec href="CompressionCodec.html" />
</compress>
</io>
+ <mapreduce href="mapreduce/">
+ <mapper href="Mapper.html">
+ <setup href="#setup(org.apache.hadoop.mapreduce.Mapper.Context)" />
+ </mapper>
+ <jobcontext href="JobContext.html">
+ <getcredentials href="#getcredentials" />
+ </jobcontext>
+ </mapreduce>
<mapred href="mapred/">
<clusterstatus href="ClusterStatus.html" />
<counters href="Counters.html" />
@@ -181,6 +189,7 @@
<jobclient href="JobClient.html">
<runjob href="#runJob(org.apache.hadoop.mapred.JobConf)" />
<submitjob href="#submitJob(org.apache.hadoop.mapred.JobConf)" />
+ <getdelegationtoken href="#getDelegationToken(org.apache.hadoop.io.Text)" />
</jobclient>
<jobconf href="JobConf.html">
<setnummaptasks href="#setNumMapTasks(int)" />
@@ -206,6 +215,7 @@
<setqueuename href="#setQueueName(java.lang.String)" />
<getjoblocaldir href="#getJobLocalDir()" />
<getjar href="#getJar()" />
+ <getcredentials href="#getCredentials()" />
</jobconf>
<jobconfigurable href="JobConfigurable.html">
<configure href="#configure(org.apache.hadoop.mapred.JobConf)" />