HADOOP-13360. Documentation for HADOOP_subcommand_OPTS Signed-off-by: Allen Wittenauer <aw@apache.org>

commit: b417ba32d4d202bf0bc43ad5b6d17336b3b58ee6 [log] [tgz]
author: Allen Wittenauer <aw@apache.org> Wed Aug 31 07:39:34 2016 -0700
committer: Allen Wittenauer <aw@apache.org> Fri Sep 09 04:07:43 2016 -0700
tree: 0ae2eabcb1033d32e3afc04e18a70d6b06d070a3
parent: d294200635162368e37906594fa4446d9fafd1af [diff]
diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md b/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md
index 0d551b1..f222769 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md

@@ -64,17 +64,17 @@
 
 | Daemon | Environment Variable |
 |:---- |:---- |
-| NameNode | HADOOP\_NAMENODE\_OPTS |
-| DataNode | HADOOP\_DATANODE\_OPTS |
-| Secondary NameNode | HADOOP\_SECONDARYNAMENODE\_OPTS |
+| NameNode | HDFS\_NAMENODE\_OPTS |
+| DataNode | HDFS\_DATANODE\_OPTS |
+| Secondary NameNode | HDFS\_SECONDARYNAMENODE\_OPTS |
 | ResourceManager | YARN\_RESOURCEMANAGER\_OPTS |
 | NodeManager | YARN\_NODEMANAGER\_OPTS |
 | WebAppProxy | YARN\_PROXYSERVER\_OPTS |
-| Map Reduce Job History Server | HADOOP\_JOB\_HISTORYSERVER\_OPTS |
+| Map Reduce Job History Server | MAPRED\_HISTORYSERVER\_OPTS |
 
-For example, To configure Namenode to use parallelGC, the following statement should be added in hadoop-env.sh :
+For example, To configure Namenode to use parallelGC and a 4GB Java Heap, the following statement should be added in hadoop-env.sh :
 
-      export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
+      export HDFS_NAMENODE_OPTS="-XX:+UseParallelGC -Xmx4g"
 
 See `etc/hadoop/hadoop-env.sh` for other examples.
 
@@ -91,13 +91,6 @@
       HADOOP_HOME=/path/to/hadoop
       export HADOOP_HOME
 
-| Daemon | Environment Variable |
-|:---- |:---- |
-| ResourceManager | YARN\_RESOURCEMANAGER\_HEAPSIZE |
-| NodeManager | YARN\_NODEMANAGER\_HEAPSIZE |
-| WebAppProxy | YARN\_PROXYSERVER\_HEAPSIZE |
-| Map Reduce Job History Server | HADOOP\_JOB\_HISTORYSERVER\_HEAPSIZE |
-
 ### Configuring the Hadoop Daemons
 
 This section deals with important parameters to be specified in the given configuration files:

diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md b/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
index 940627d..b130f0f 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md

@@ -24,7 +24,7 @@
 
 ### `HADOOP_CLIENT_OPTS`
 
-This environment variable is used for almost all end-user operations.  It can be used to set any Java options as well as any Apache Hadoop options via a system property definition. For example:
+This environment variable is used for all end-user, non-daemon operations.  It can be used to set any Java options as well as any Apache Hadoop options via a system property definition. For example:
 
 ```bash
 HADOOP_CLIENT_OPTS="-Xmx1g -Dhadoop.socks.server=localhost:4000" hadoop fs -ls /tmp
@@ -32,6 +32,18 @@
 
 will increase the memory and send this command via a SOCKS proxy server.
 
+### `(command)_(subcommand)_OPTS`
+
+It is also possible to set options on a per subcommand basis.  This allows for one to create special options for particular cases.  The first part of the pattern is the command being used, but all uppercase.  The second part of the command is the subcommand being used.  Then finally followed by the string `_OPT`.
+
+For example, to configure `mapred distcp` to use a 2GB heap, one would use:
+
+```bash
+MAPRED_DISTCP_OPTS="-Xmx2g"
+```
+
+These options will appear *after* `HADOOP_CLIENT_OPTS` during execution and will generally take precedence.
+
 ### `HADOOP_CLASSPATH`
 
   NOTE: Site-wide settings should be configured via a shellprofile entry and permanent user-wide settings should be configured via ${HOME}/.hadooprc using the `hadoop_add_classpath` function. See below for more information.
@@ -56,6 +68,8 @@
 #
 
 HADOOP_CLIENT_OPTS="-Xmx1g"
+MAPRED_DISTCP_OPTS="-Xmx2g"
+HADOOP_DISTCP_OPTS="-Xmx2g"
 ```
 
 The `.hadoop-env` file can also be used to extend functionality and teach Apache Hadoop new tricks.  For example, to run hadoop commands accessing the server referenced in the environment variable `${HADOOP_SERVER}`, the following in the `.hadoop-env` will do just that:
@@ -71,11 +85,23 @@
 
 ## Administrator Environment
 
-There are many environment variables that impact how the system operates.  By far, the most important are the series of `_OPTS` variables that control how daemons work.  These variables should contain all of the relevant settings for those daemons.
+In addition to the various XML files, there are two key capabilities for administrators to configure Apache Hadoop when using the Unix Shell:
 
-More, detailed information is contained in `hadoop-env.sh` and the other env.sh files.
+  * Many environment variables that impact how the system operates.  This guide will only highlight some key ones.  There is generally more information in the various `*-env.sh` files.
 
-Advanced administrators may wish to supplement or do some platform-specific fixes to the existing scripts.  In some systems, this means copying the errant script or creating a custom build with these changes.  Apache Hadoop provides the capabilities to do function overrides so that the existing code base may be changed in place without all of that work.  Replacing functions is covered later under the Shell API documentation.
+  * Supplement or do some platform-specific changes to the existing scripts.  Apache Hadoop provides the capabilities to do function overrides so that the existing code base may be changed in place without all of that work.  Replacing functions is covered later under the Shell API documentation.
+
+### `(command)_(subcommand)_OPTS`
+
+By far, the most important are the series of `_OPTS` variables that control how daemons work.  These variables should contain all of the relevant settings for those daemons.
+
+Similar to the user commands above, all daemons will honor the `(command)_(subcommand)_OPTS` pattern.  It is generally recommended that these be set in `hadoop-env.sh` to guarantee that the system will know which settings it should use on restart.  Unlike user-facing subcommands, daemons will *NOT* honor `HADOOP_CLIENT_OPTS`.
+
+In addition, daemons that run in an extra security mode also support `(command)_(subcommand)_SECURE_EXTRA_OPTS`.  These options are *supplemental* to the generic `*_OPTS` and will appear after, therefore generally taking precedence.
+
+### `(command)_(subcommand)_USER`
+
+Apache Hadoop provides a way to do a user check per-subcommand.  While this method is easily circumvented and should not be considered a security-feature, it does provide a mechanism by which to prevent accidents.  For example, setting `HDFS_NAMENODE_USER=hdfs` will make the `hdfs namenode` and `hdfs --daemon start namenode` commands verify that the user running the commands are the hdfs user by checking the `USER` environment variable.  This also works for non-daemons.  Setting `HADOOP_DISTCP_USER=jane` will verify that `USER` is set to `jane` before being allowed to execute the `hadoop distcp` command.
 
 ## Developer and Advanced Administrator Environment
 

diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
index 6731189..4742637 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md

@@ -183,7 +183,7 @@
         </property>
 
 *   JVM and log settings. You can export JVM settings (e.g., heap size and GC log) in
-    HADOOP\_NFS3\_OPTS. More NFS related settings can be found in hadoop-env.sh.
+    HDFS\_NFS3\_OPTS. More NFS related settings can be found in hadoop-env.sh.
     To get NFS debug trace, you can edit the log4j.property file
     to add the following. Note, debug trace, especially for ONCRPC, can be very verbose.
commit	b417ba32d4d202bf0bc43ad5b6d17336b3b58ee6	[log] [tgz]
author	Allen Wittenauer <aw@apache.org>	Wed Aug 31 07:39:34 2016 -0700
committer	Allen Wittenauer <aw@apache.org>	Fri Sep 09 04:07:43 2016 -0700
tree	0ae2eabcb1033d32e3afc04e18a70d6b06d070a3
parent	d294200635162368e37906594fa4446d9fafd1af [diff]