Final edits to readme.txt.
diff --git a/gateway-release/readme.txt b/gateway-release/readme.txt
index fbc9e12..113698e 100644
--- a/gateway-release/readme.txt
+++ b/gateway-release/readme.txt
@@ -29,16 +29,17 @@
------------------------------------------------------------------------------
Description
------------------------------------------------------------------------------
-The charter for the Gateway project is to simplify and normalize the deployment
-and implementation of secure Hadoop clusters as well as be a centralize access point
-for the service specific REST APIs exposed from within the cluster.
+The charter for the Gateway project is to simplify and normalize the
+deployment and implementation of secure Hadoop clusters as well as be
+a centralize access point for the service specific REST APIs exposed from
+within the cluster.
Milestone-1 of this project intends to demonstrate the ability to dynamically
provision reverse proxy capabilities with filter chains that meet the cluster
specific needs for authentication.
-BASIC authentication with identity being asserted to the rest of the cluster
-via Pseudo/Simple authentication will be demonstrated for security.
+HTTP BASIC authentication with identity being asserted to the rest of the
+cluster via Pseudo/Simple authentication will be demonstrated for security.
For API aggregation, the Gateway will provide a central endpoint for HDFS and
Templeton APIs for each cluster.
@@ -56,94 +57,109 @@
Hadoop Cluster:
A local installation of a Hadoop Cluster is required at this time. Hadoop
EC2 cluster and/or Sandbox installations are currently difficult to access
- remotely via the Gateway. The EC2 and Sandbox limitation is caused by Hadoop
- services running with internal IP addresses. For the Gateway to work in these
- cases it will need to be deployed on the EC2 cluster or Sandbox, at this time.
+ remotely via the Gateway. The EC2 and Sandbox limitation is caused by
+ Hadoop services running with internal IP addresses. For the Gateway to work
+ in these cases it will need to be deployed on the EC2 cluster or Sandbox, at
+ this time.
- The instructions that follow assume that the Gateway is *not* colocated with
- the Hadoop clusters themselves and (most importantly) that the IP addresses
- of the cluster services are accessible by the gateway where ever it happens to
- be running.
+ The instructions that follow assume that the Gateway is *not* collocated
+ with the Hadoop clusters themselves and (most importantly) that the IP
+ addresses of the cluster services are accessible by the gateway where ever
+ it happens to be running.
- The Hadoop cluster should be ensured to have WebHDFS and WebHCat (i.e. Templeton)
- deployed and configured.
+ The Hadoop cluster should be ensured to have WebHDFS and WebHCat
+ (i.e. Templeton) configured and deployed.
------------------------------------------------------------------------------
Know Issues
------------------------------------------------------------------------------
-Currently there is an issue with submitting Java MapReduce jobs via the WebHCat
+Currently there is an issue with submitting MapReduce jobs via the WebHCat
REST APIs. Therefore step 7 in the Example section currently fails.
-The Gateway cannot be be used against either an EC2 cluster or Hadoop Sandbox
-unless the gateway is deployed in the EC2 cluster or the on the Sandbox VM.
+The Gateway cannot be be used against either EC2 clusters or Hadoop Sandbox
+VMs unless the gateway is deployed in the EC2 cluster or the on the Sandbox
+VM.
Currently when any of the files in {GATEWAY_HOME}/deployments is changed, all
-deployed cluster topologies will be reloaded. Therefore you may see
-unexpected message of the form "Loading topology file:"
+deployed cluster topologies will be reloaded. Therefore, you may see
+unexpected messages of the form "Loading topology file:". These can safely be
+ignored.
If the cluster deployment descriptors in {GATEWAY_HOME}/deployments are
-incorrect the errors logged by the gateway are overly detailed and not
+incorrect, the errors logged by the gateway are overly detailed and not
diagnostic enough.
------------------------------------------------------------------------------
Installation and Deployment Instructions
------------------------------------------------------------------------------
-
1. Install
- Download and extract the gateway-0.1.0-SNAPSHOT.zip file into the installation directory that will contain your
- GATEWAY_HOME
+ Download and extract the gateway-0.1.0-SNAPSHOT.zip file into the
+ installation directory that will contain your GATEWAY_HOME
jar xf gateway-0.1.0-SNAPSHOT.zip
This will create a directory 'gateway' in your current directory.
2. Enter Gateway Home directory
cd gateway
- The fully qualified name of this directory will be referenced as {GATEWAY_HOME} throughout the remainder of this
- document.
+ The fully qualified name of this directory will be referenced as
+ {GATEWAY_HOME} throughout the remainder of this document.
3. Start the demo LDAP server (ApacheDS)
- a. First, understand that the LDAP server provided here is for demonstration purposes. You may configure the
- LDAP specifics within the topology descriptor for the cluster as described in step 5 below, in order to
- customize what LDAP instance to use. The assumption is that most users will leverage the demo LDAP server
- while evaluating this release and should therefore continue with the instructions here in step 3.
- b. Edit {GATEWAY_HOME}/conf/users.ldif if required and add your users and groups to the file.
- A number of normal Hadoop users (e.g. hdfs, mapred, hcat, hive) have already been included. Note that
- the passwords in this file are "fictitious" and have nothing to do with the actual accounts on the Hadoop
- cluster you are using. There is also a copy of this file in the templates directory that you can use to
- start over if necessary.
- c. Start the LDAP server - pointing it to the config dir where it will find the users.ldif file in the conf
- directory.
+ a. First, understand that the LDAP server provided here is for demonstration
+ purposes. You may configure the LDAP specifics within the topology
+ descriptor for the cluster as described in step 5 below, in order to
+ customize what LDAP instance to use. The assumption is that most users
+ will leverage the demo LDAP server while evaluating this release and
+ should therefore continue with the instructions here in step 3.
+ b. Edit {GATEWAY_HOME}/conf/users.ldif if required and add your users and
+ groups to the file. A number of normal Hadoop users
+ (e.g. hdfs, mapred, hcat, hive) have already been included. Note that
+ the passwords in this file are "fictitious" and have nothing to do with
+ the actual accounts on the Hadoop cluster you are using. There is also
+ a copy of this file in the templates directory that you can use to start
+ over if necessary.
+ c. Start the LDAP server - pointing it to the config dir where it will find
+ the users.ldif file in the conf directory.
java -jar bin/gateway-test-ldap-0.1.0-SNAPSHOT.jar conf &
- There are a number of messages of the form "Created null." that can safely be ignored.
- Take note of the port on which it was started as this needs to match later configuration.
- This will create a directory named 'org.apache.hadoop.gateway.security.EmbeddedApacheDirectoryServer' that
+ There are a number of log messages of the form "Created null." that can
+ safely be ignored. Take note of the port on which it was started as this
+ needs to match later configuration. This will create a directory named
+ 'org.apache.hadoop.gateway.security.EmbeddedApacheDirectoryServer' that
can safely be ignored.
4. Start the Gateway server
java -jar bin/gateway-server-0.1.0-SNAPSHOT.jar
- Take note of the port identified in the logging output as you will need this for accessing the gateway.
+ Take note of the port identified in the log output as you will need this
+ for accessing the Gateway.
5. Configure the Gateway with the topology of your Hadoop cluster
a. Edit the file {GATEWAY_HOME}/deployments/sample.xml
- b. Change the host and port in the urls of the <service> elements for NAMENODE and TEMPLETON service to match your
- cluster deployment.
- c. The default configuration contains the LDAP URL for a LDAP server. By default that file is configured to access
- the demo ApacheDS based LDAP server and its default configuration. By default, this server listens on port 33389.
- Optionally, you can change the LDAP URL for the LDAP server to be used for authentication. This is set via
- the main.ldapRealm.contextFactory.url property in the <gateway><provider><authentication> section.
- d. Save the file. The directory {GATEWAY_HOME}/deployments is monitored by the Gateway server and reacts to the
- discovery of a new or changed cluster topology descriptor by provisioning the endpoints and required filter
- chains to serve the needs of each cluster as described by the topology file. Note that the name of the file
- excluding the extension is also used as the path for that cluster in the URL. So for example the sample.xml
- file will result in Gateway URLs of the form
+ b. Change the host and port in the urls of the <service> elements for
+ NAMENODE and TEMPLETON services to match your Hadoop cluster deployment.
+ c. The default configuration contains the LDAP URL for a LDAP server. By
+ default that file is configured to access the demo ApacheDS based LDAP
+ server and its default configuration. By default, this server listens on
+ port 33389. Optionally, you can change the LDAP URL for the LDAP server
+ to be used for authentication. This is set via the
+ main.ldapRealm.contextFactory.url property in the
+ <gateway><provider><authentication> section.
+ d. Save the file. The directory {GATEWAY_HOME}/deployments is monitored
+ by the Gateway server and reacts to the discovery of a new or changed
+ cluster topology descriptor by provisioning the endpoints and required
+ filter chains to serve the needs of each cluster as described by the
+ topology file. Note that the name of the file excluding the extension
+ is also used as the path for that cluster in the URL. So for example
+ the sample.xml file will result in Gateway URLs of the form
http://{gateway-host}:{gateway-port}/gateway/sample/namenode/api/v1
6. Test the installation and configuration of your Gateway
- Invoke the LISTSATUS operation on HDFS represented by your configured NAMENODE by using your web browser or curl:
+ Invoke the LISTSATUS operation on HDFS represented by your configured
+ NAMENODE by using your web browser or curl:
- curl --user hdfs:hdfs-password -i -L http://localhost:8888/gateway/sample/namenode/api/v1/tmp?op=LISTSTATUS
+ curl -u hdfs:hdfs-password -i 'http://localhost:8888/gateway/sample/namenode/api/v1/?op=LISTSTATUS'
- The results of the above command should result in something to along the lines of the output below. The exact
- information returned is subject to the content within HDFS in your Hadoop cluster.
+ The results of the above command should result in something to along the
+ lines of the output below. The exact information returned is subject to
+ the content within HDFS in your Hadoop cluster.
HTTP/1.1 200 OK
Content-Type: application/json
@@ -157,7 +173,8 @@
{"accessTime":0,"blockSize":0,"group":"hdfs","length":0,"modificationTime":1350595857178,"owner":"hdfs","pathSuffix":"user","permission":"755","replication":0,"type":"DIRECTORY"}
]}}
- For additional information on HDFS and Templeton APIs, see the following URLs respectively:
+ For additional information on WebHDFS and Templeton REST APIs, see the
+ following URLs respectively:
http://hadoop.apache.org/docs/r1.0.4/webhdfs.html
and
@@ -166,11 +183,13 @@
------------------------------------------------------------------------------
Mapping Gateway URLs to Hadoop cluster URLs
------------------------------------------------------------------------------
-The Gateway functions in much like a reverse proxy. As such it maintains a mapping of URLs that are exposed
-externally by the Gateway to URLs that are provided by the Hadoop cluster. Examples of mappings for the NameNode and
-Templeton are shown below. These mapping are generated from the combination of the Gateway configuration file
-(i.e. {GATEWAY_HOME}/gateway-site.xml) and the cluster topology descriptors
-(e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml}.
+The Gateway functions much like a reverse proxy. As such it maintains a
+mapping of URLs that are exposed externally by the Gateway to URLs that are
+provided by the Hadoop cluster. Examples of mappings for the NameNode and
+Templeton are shown below. These mapping are generated from the combination
+of the Gateway configuration file (i.e. {GATEWAY_HOME}/gateway-site.xml)
+and the cluster topology descriptors
+(e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml).
HDFS (NameNode)
Gateway: http://<gateway-host>:<gateway-port>/<gateway-path>/<cluster-name>/namenode/api/v1
@@ -179,20 +198,26 @@
Gateway: http://<gateway-host>:<gateway-port>/<gateway-path>/<cluster-name>/templeton/api/v1
Cluster: http://<templeton-host>:50111/templeton/v1
-The values for <gateway-host>, <gateway-port>, <gateway-path> are provided via the Gateway configuration file
-(i.e. {GATEWAY_HOME}/gateway-site.xml).
-The value for <cluster-name> is derived from the name of the cluster topology descriptor
-(e.g. {GATEWAY_HOME}/deployments/{cluster-name>.xml).
-The value for <namenode-host> are provided via the cluster topology descriptor.
-Note: The ports 50070 and 50111 are the defaults for NameNode and Templeton respectively.
- Their values can also be provided via the cluster topology descriptor.
+The values for <gateway-host>, <gateway-port>, <gateway-path> are provided via
+the Gateway configuration file (i.e. {GATEWAY_HOME}/gateway-site.xml).
+
+The value for <cluster-name> is derived from the name of the cluster topology
+descriptor (e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml).
+
+The value for <namenode-host> and <templeton-host> is provided via the cluster
+topology descriptor (e.g. {GATEWAY_HOME}/deployments/<cluster-name>.xml).
+
+Note: The ports 50070 and 50111 are the defaults for NameNode and Templeton
+ respectively. Their values can also be provided via the cluster topology
+ descriptor if your Hadoop cluster uses different ports.
------------------------------------------------------------------------------
Enabling logging
------------------------------------------------------------------------------
-If necessary you can enable additional logging by editing the log4j.properties file in the conf directory.
-Changing the rootLogger value from ERROR to DEBUG will generate a large amount of debug logging. A number
-of useful, more fine loggers are also provided in the file.
+If necessary you can enable additional logging by editing the log4j.properties
+file in the conf directory. Changing the rootLogger value from ERROR to DEBUG
+will generate a large amount of debug logging. A number of useful, more fine
+loggers are also provided in the file.
------------------------------------------------------------------------------
Filing bugs
@@ -200,15 +225,17 @@
File bugs at hortonworks.jira.com under Project "Hadoop Gateway Development"
Include the results of
java -jar bin/gateway-server-0.1.0-SNAPSHOT.jar -version
-in the Environment section. Also include the version of Hadoop that you are using there as well.
+in the Environment section. Also include the version of Hadoop being used.
------------------------------------------------------------------------------
Example
------------------------------------------------------------------------------
-The example below illustrates the sequence of curl commands that could be used to run a word count job. It utilizes
-the hadoop-examples.jar from a Hadoop install for running a simple word count job. Take care to follow the
-instructions below for steps 4/5 and 6/7 where the Location header returned by the call to the NameNode is copied for
-use with the call to the DataNode that follows it.
+The example below illustrates the sequence of curl commands that could be used
+to run a "word count" map reduce job. It utilizes the hadoop-examples.jar
+from a Hadoop install for running a simple word count job. Take care to follow
+the instructions below for steps 4/5 and 6/7 where the Location header returned
+by the call to the NameNode is copied for use with the call to the DataNode
+that follows it.
# 1. Create a test input directory /tmp/test/input
curl -i -u mapred:mapred-password -X PUT \