Final edits to readme.txt.

commit: af93629f94d339b1fc11bac8981e944d0c5754ff [log] [tgz]
author: Kevin Minder <kevin.minder@hortonworks.com> Thu Dec 20 17:41:54 2012 -0500
committer: Kevin Minder <kevin.minder@hortonworks.com> Thu Dec 20 17:41:54 2012 -0500
tree: 46a58cb47485dcbc2a87b94faaabb0620f7dfc33
parent: a05df6f34e897ab3d93a38fbdf36c43b08326c42 [diff]
diff --git a/gateway-release/readme.txt b/gateway-release/readme.txt
index fbc9e12..113698e 100644
--- a/gateway-release/readme.txt
+++ b/gateway-release/readme.txt

@@ -29,16 +29,17 @@
 ------------------------------------------------------------------------------
 Description
 ------------------------------------------------------------------------------
-The charter for the Gateway project is to simplify and normalize the deployment
-and implementation of secure Hadoop clusters as well as be a centralize access point
-for the service specific REST APIs exposed from within the cluster.
+The charter for the Gateway project is to simplify and normalize the
+deployment and implementation of secure Hadoop clusters as well as be
+a centralize access point for the service specific REST APIs exposed from
+within the cluster.
 
 Milestone-1 of this project intends to demonstrate the ability to dynamically
 provision reverse proxy capabilities with filter chains that meet the cluster
 specific needs for authentication.
 
-BASIC authentication with identity being asserted to the rest of the cluster 
-via Pseudo/Simple authentication will be demonstrated for security.
+HTTP BASIC authentication with identity being asserted to the rest of the
+cluster via Pseudo/Simple authentication will be demonstrated for security.
 
 For API aggregation, the Gateway will provide a central endpoint for HDFS and
 Templeton APIs for each cluster.
@@ -56,94 +57,109 @@
 Hadoop Cluster:
   A local installation of a Hadoop Cluster is required at this time.  Hadoop 
   EC2 cluster and/or Sandbox installations are currently difficult to access 
-  remotely via the Gateway. The EC2 and Sandbox limitation is caused by Hadoop
-  services running with internal IP addresses.  For the Gateway to work in these 
-  cases it will need to be deployed on the EC2 cluster or Sandbox, at this time.  
+  remotely via the Gateway. The EC2 and Sandbox limitation is caused by
+  Hadoop services running with internal IP addresses.  For the Gateway to work
+  in these cases it will need to be deployed on the EC2 cluster or Sandbox, at
+  this time.
   
-  The instructions that follow assume that the Gateway is *not* colocated with
-  the Hadoop clusters themselves and (most importantly) that the IP addresses 
-  of the cluster services are accessible by the gateway where ever it happens to
-  be running.
+  The instructions that follow assume that the Gateway is *not* collocated
+  with the Hadoop clusters themselves and (most importantly) that the IP
+  addresses of the cluster services are accessible by the gateway where ever
+  it happens to be running.
 
-  The Hadoop cluster should be ensured to have WebHDFS and WebHCat (i.e. Templeton) 
-  deployed and configured.
+  The Hadoop cluster should be ensured to have WebHDFS and WebHCat
+  (i.e. Templeton) configured and deployed.
 
 ------------------------------------------------------------------------------
 Know Issues
 ------------------------------------------------------------------------------
-Currently there is an issue with submitting Java MapReduce jobs via the WebHCat 
+Currently there is an issue with submitting MapReduce jobs via the WebHCat
 REST APIs.  Therefore step 7 in the Example section currently fails.
 
-The Gateway cannot be be used against either an EC2 cluster or Hadoop Sandbox 
-unless the gateway is deployed in the EC2 cluster or the on the Sandbox VM.
+The Gateway cannot be be used against either EC2 clusters or Hadoop Sandbox
+VMs unless the gateway is deployed in the EC2 cluster or the on the Sandbox
+VM.
 
 Currently when any of the files in {GATEWAY_HOME}/deployments is changed, all
-deployed cluster topologies will be reloaded.  Therefore you may see
-unexpected message of the form "Loading topology file:"
+deployed cluster topologies will be reloaded.  Therefore, you may see
+unexpected messages of the form "Loading topology file:".  These can safely be
+ignored.
 
 If the cluster deployment descriptors in {GATEWAY_HOME}/deployments are
-incorrect the errors logged by the gateway are overly detailed and not
+incorrect, the errors logged by the gateway are overly detailed and not
 diagnostic enough.
 
 ------------------------------------------------------------------------------
 Installation and Deployment Instructions
 ------------------------------------------------------------------------------
-
 1. Install
-     Download and extract the gateway-0.1.0-SNAPSHOT.zip file into the installation directory that will contain your
-     GATEWAY_HOME
+     Download and extract the gateway-0.1.0-SNAPSHOT.zip file into the
+     installation directory that will contain your GATEWAY_HOME
        jar xf gateway-0.1.0-SNAPSHOT.zip
      This will create a directory 'gateway' in your current directory.
 
 2. Enter Gateway Home directory
      cd gateway
-   The fully qualified name of this directory will be referenced as {GATEWAY_HOME} throughout the remainder of this
-   document.
+   The fully qualified name of this directory will be referenced as
+   {GATEWAY_HOME} throughout the remainder of this document.
 
 3. Start the demo LDAP server (ApacheDS)
-   a. First, understand that the LDAP server provided here is for demonstration purposes. You may configure the
-      LDAP specifics within the topology descriptor for the cluster as described in step 5 below, in order to
-      customize what LDAP instance to use. The assumption is that most users will leverage the demo LDAP server 
-      while evaluating this release and should therefore continue with the instructions here in step 3.
-   b. Edit {GATEWAY_HOME}/conf/users.ldif if required and add your users and groups to the file.
-      A number of normal Hadoop users (e.g. hdfs, mapred, hcat, hive) have already been included.  Note that
-      the passwords in this file are "fictitious" and have nothing to do with the actual accounts on the Hadoop
-      cluster you are using.  There is also a copy of this file in the templates directory that you can use to
-      start over if necessary.
-   c. Start the LDAP server - pointing it to the config dir where it will find the users.ldif file in the conf
-      directory.
+   a. First, understand that the LDAP server provided here is for demonstration
+      purposes. You may configure the LDAP specifics within the topology
+      descriptor for the cluster as described in step 5 below, in order to
+      customize what LDAP instance to use. The assumption is that most users
+      will leverage the demo LDAP server while evaluating this release and
+      should therefore continue with the instructions here in step 3.
+   b. Edit {GATEWAY_HOME}/conf/users.ldif if required and add your users and
+      groups to the file.  A number of normal Hadoop users
+      (e.g. hdfs, mapred, hcat, hive) have already been included.  Note that
+      the passwords in this file are "fictitious" and have nothing to do with
+      the actual accounts on the Hadoop cluster you are using.  There is also
+      a copy of this file in the templates directory that you can use to start
+      over if necessary.
+   c. Start the LDAP server - pointing it to the config dir where it will find
+      the users.ldif file in the conf directory.
         java -jar bin/gateway-test-ldap-0.1.0-SNAPSHOT.jar conf &
-      There are a number of messages of the form "Created null." that can safely be ignored.
-      Take note of the port on which it was started as this needs to match later configuration.
-      This will create a directory named 'org.apache.hadoop.gateway.security.EmbeddedApacheDirectoryServer' that
+      There are a number of log messages of the form "Created null." that can
+      safely be ignored.  Take note of the port on which it was started as this
+      needs to match later configuration.  This will create a directory named
+      'org.apache.hadoop.gateway.security.EmbeddedApacheDirectoryServer' that
       can safely be ignored.
 
 4. Start the Gateway server
      java -jar bin/gateway-server-0.1.0-SNAPSHOT.jar
-   Take note of the port identified in the logging output as you will need this for accessing the gateway.
+   Take note of the port identified in the log output as you will need this
+   for accessing the Gateway.
 
 5. Configure the Gateway with the topology of your Hadoop cluster
    a. Edit the file {GATEWAY_HOME}/deployments/sample.xml
-   b. Change the host and port in the urls of the <service> elements for NAMENODE and TEMPLETON service to match your
-      cluster deployment.
-   c. The default configuration contains the LDAP URL for a LDAP server.  By default that file is configured to access 
-      the demo ApacheDS based LDAP server and its default configuration. By default, this server listens on port 33389.
-      Optionally, you can change the LDAP URL for the LDAP server to be used for authentication.  This is set via
-      the main.ldapRealm.contextFactory.url property in the <gateway><provider><authentication> section.
-   d. Save the file.  The directory {GATEWAY_HOME}/deployments is monitored by the Gateway server and reacts to the
-      discovery of a new or changed cluster topology descriptor by provisioning the endpoints and required filter
-      chains to serve the needs of each cluster as described by the topology file.  Note that the name of the file
-      excluding the extension is also used as the path for that cluster in the URL.  So for example the sample.xml
-      file will result in Gateway URLs of the form
+   b. Change the host and port in the urls of the <service> elements for
+      NAMENODE and TEMPLETON services to match your Hadoop cluster deployment.
+   c. The default configuration contains the LDAP URL for a LDAP server.  By
+      default that file is configured to access the demo ApacheDS based LDAP
+      server and its default configuration. By default, this server listens on
+      port 33389.  Optionally, you can change the LDAP URL for the LDAP server
+      to be used for authentication.  This is set via the
+      main.ldapRealm.contextFactory.url property in the
+      <gateway><provider><authentication> section.
+   d. Save the file.  The directory {GATEWAY_HOME}/deployments is monitored
+      by the Gateway server and reacts to the discovery of a new or changed
+      cluster topology descriptor by provisioning the endpoints and required
+      filter chains to serve the needs of each cluster as described by the
+      topology file.  Note that the name of the file excluding the extension
+      is also used as the path for that cluster in the URL.  So for example
+      the sample.xml file will result in Gateway URLs of the form
         http://{gateway-host}:{gateway-port}/gateway/sample/namenode/api/v1
 
 6. Test the installation and configuration of your Gateway
-   Invoke the LISTSATUS operation on HDFS represented by your configured NAMENODE by using your web browser or curl:
+   Invoke the LISTSATUS operation on HDFS represented by your configured
+   NAMENODE by using your web browser or curl:
 
-     curl --user hdfs:hdfs-password -i -L http://localhost:8888/gateway/sample/namenode/api/v1/tmp?op=LISTSTATUS
+     curl -u hdfs:hdfs-password -i 'http://localhost:8888/gateway/sample/namenode/api/v1/?op=LISTSTATUS'
 
-   The results of the above command should result in something to along the lines of the output below.  The exact
-   information returned is subject to the content within HDFS in your Hadoop cluster.
+   The results of the above command should result in something to along the
+   lines of the output below.  The exact information returned is subject to
+   the content within HDFS in your Hadoop cluster.
 
      HTTP/1.1 200 OK
        Content-Type: application/json
@@ -157,7 +173,8 @@
      {"accessTime":0,"blockSize":0,"group":"hdfs","length":0,"modificationTime":1350595857178,"owner":"hdfs","pathSuffix":"user","permission":"755","replication":0,"type":"DIRECTORY"}
      ]}}
 
-   For additional information on HDFS and Templeton APIs, see the following URLs respectively:
+   For additional information on WebHDFS and Templeton REST APIs, see the
+   following URLs respectively:
 
    http://hadoop.apache.org/docs/r1.0.4/webhdfs.html
      and
@@ -166,11 +183,13 @@
 ------------------------------------------------------------------------------
 Mapping Gateway URLs to Hadoop cluster URLs
 ------------------------------------------------------------------------------
-The Gateway functions in much like a reverse proxy.  As such it maintains a mapping of URLs that are exposed
-externally by the Gateway to URLs that are provided by the Hadoop cluster.  Examples of mappings for the NameNode and
-Templeton are shown below.  These mapping are generated from the combination of the Gateway configuration file
-(i.e. {GATEWAY_HOME}/gateway-site.xml) and the cluster topology descriptors
-(e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml}.
+The Gateway functions much like a reverse proxy.  As such it maintains a
+mapping of URLs that are exposed externally by the Gateway to URLs that are
+provided by the Hadoop cluster.  Examples of mappings for the NameNode and
+Templeton are shown below.  These mapping are generated from the combination
+of the Gateway configuration file (i.e. {GATEWAY_HOME}/gateway-site.xml)
+and the cluster topology descriptors
+(e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml).
 
   HDFS (NameNode)
     Gateway: http://<gateway-host>:<gateway-port>/<gateway-path>/<cluster-name>/namenode/api/v1
@@ -179,20 +198,26 @@
     Gateway: http://<gateway-host>:<gateway-port>/<gateway-path>/<cluster-name>/templeton/api/v1
     Cluster: http://<templeton-host>:50111/templeton/v1
 
-The values for <gateway-host>, <gateway-port>, <gateway-path> are provided via the Gateway configuration file
-(i.e. {GATEWAY_HOME}/gateway-site.xml).
-The value for <cluster-name> is derived from the name of the cluster topology descriptor
-(e.g. {GATEWAY_HOME}/deployments/{cluster-name>.xml).
-The value for <namenode-host> are provided via the cluster topology descriptor.
-Note: The ports 50070 and 50111 are the defaults for NameNode and Templeton respectively.
-      Their values can also be provided via the cluster topology descriptor.
+The values for <gateway-host>, <gateway-port>, <gateway-path> are provided via
+the Gateway configuration file (i.e. {GATEWAY_HOME}/gateway-site.xml).
+
+The value for <cluster-name> is derived from the name of the cluster topology
+descriptor (e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml).
+
+The value for <namenode-host> and <templeton-host> is provided via the cluster
+topology descriptor (e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml).
+
+Note: The ports 50070 and 50111 are the defaults for NameNode and Templeton
+      respectively. Their values can also be provided via the cluster topology
+      descriptor if your Hadoop cluster uses different ports.
 
 ------------------------------------------------------------------------------
 Enabling logging
 ------------------------------------------------------------------------------
-If necessary you can enable additional logging by editing the log4j.properties file in the conf directory.
-Changing the rootLogger value from ERROR to DEBUG will generate a large amount of debug logging.  A number
-of useful, more fine loggers are also provided in the file.
+If necessary you can enable additional logging by editing the log4j.properties
+file in the conf directory.  Changing the rootLogger value from ERROR to DEBUG
+will generate a large amount of debug logging.  A number of useful, more fine
+loggers are also provided in the file.
 
 ------------------------------------------------------------------------------
 Filing bugs
@@ -200,15 +225,17 @@
 File bugs at hortonworks.jira.com under Project "Hadoop Gateway Development"
 Include the results of
   java -jar bin/gateway-server-0.1.0-SNAPSHOT.jar -version
-in the Environment section.  Also include the version of Hadoop that you are using there as well.
+in the Environment section.  Also include the version of Hadoop being used.
 
 ------------------------------------------------------------------------------
 Example
 ------------------------------------------------------------------------------
-The example below illustrates the sequence of curl commands that could be used to run a word count job.  It utilizes
-the hadoop-examples.jar from a Hadoop install for running a simple word count job.  Take care to follow the
-instructions below for steps 4/5 and 6/7 where the Location header returned by the call to the NameNode is copied for
-use with the call to the DataNode that follows it.
+The example below illustrates the sequence of curl commands that could be used
+to run a "word count" map reduce job.  It utilizes the hadoop-examples.jar
+from a Hadoop install for running a simple word count job.  Take care to follow
+the instructions below for steps 4/5 and 6/7 where the Location header returned
+by the call to the NameNode is copied for use with the call to the DataNode
+that follows it.
 
 # 1. Create a test input directory /tmp/test/input
 curl -i -u mapred:mapred-password -X PUT \
commit	af93629f94d339b1fc11bac8981e944d0c5754ff	[log] [tgz]
author	Kevin Minder <kevin.minder@hortonworks.com>	Thu Dec 20 17:41:54 2012 -0500
committer	Kevin Minder <kevin.minder@hortonworks.com>	Thu Dec 20 17:41:54 2012 -0500
tree	46a58cb47485dcbc2a87b94faaabb0620f7dfc33
parent	a05df6f34e897ab3d93a38fbdf36c43b08326c42 [diff]