Here are the steps to have Apache Knox up and running against a Hadoop Cluster:
Java 1.6 or later is required for the Knox Gateway runtime. Use the command below to check the version of Java installed on the system where Knox will be running.
java -version
Knox supports Hadoop 1.x or 2.x, the quick start instructions assume a Hadoop 2.x virtual machine based environment.
The quick start provides a link to download Hadoop 2.0 based Hortonworks virtual machine Sandbox. Please note Knox supports other Hadoop distributions and is configurable against a full blown Hadoop cluster. Configuring Knox for Hadoop 1.x/2.x version, or Hadoop deployed in EC2 or a custom Hadoop cluster is documented in advance deployment guide.
Download one of the distributions below from the [Apache mirrors][mirror].
Apache Knox Gateway releases are available under the [Apache License, Version 2.0][asl]. See the NOTICE file contained in each release artifact for applicable copyright attribution notices.
While recommended, verify is an optional step. You can verify the integrity of any downloaded files using the PGP signatures. Please read Verifying Apache HTTP Server Releases for more information on why you should verify our releases.
The PGP signatures can be verified using PGP or GPG. First download the KEYS file as well as the .asc signature files for the relevant release packages. Make sure you get these files from the main distribution directory linked above, rather than from a mirror. Then verify the signatures using one of the methods below.
% pgpk -a KEYS % pgpv knox-incubating-0.3.0.zip.asc
or
% pgp -ka KEYS % pgp knox-incubating-0.3.0.zip.asc
or
% gpg --import KEYS % gpg --verify knox-incubating-0.3.0.zip.asc
Start the Hadoop virtual machine.
The steps required to install the gateway will vary depending upon which distribution format (zip | rpm) was downloaded. In either case you will end up with a directory where the gateway is installed. This directory will be referred to as your {GATEWAY_HOME}
throughout this document.
If you downloaded the Zip distribution you can simply extract the contents into a directory. The example below provides a command that can be executed to do this. Note the {VERSION}
portion of the command must be replaced with an actual Apache Knox Gateway version number. This might be 0.3.0 for example and must patch the value in the file downloaded.
jar xf knox-incubating-{VERSION}.zip
This will create a directory knox-incubating-{VERSION}
in your current directory. The directory knox-incubating-{VERSION}
will considered your {GATEWAY_HOME}
If you downloaded the RPM distribution you can install it using normal RPM package tools. It is important that the user that will be running the gateway server is used to install. This is because several directories are created that are owned by this user. These command will install Knox to /usr/lib/knox
following the pattern of other Hadoop components. This directory will be considered your {GATEWAY_HOME}
.
sudo yum localinstall knox-incubating-{VERSION}.rpm
or
sudo rpm -ihv knox-incubating-{VERSION}.rpm
Knox comes with an LDAP server for demonstration purposes.
cd {GATEWAY_HOME} java -jar bin/ldap.jar conf &
The gateway can be started in one of two ways, as java -jar or with a shell script.
This is the simplest way to start the gateway. Starting this way will result in all logging being written directly to standard output.
cd {GATEWAY_HOME} java -jar bin/gateway.jar
Upon start, Knox server will prompt you for the master secret (i.e. password). This secret is used to secure artifacts used by the gateway server for things like SSL and credential/password aliasing. This secret will have to be entered at startup unless you choose to persist it.
Run the setup command with root privileges.
cd {GATEWAY_HOME} sudo bin/gateway.sh setup
The server will prompt you for the master secret (i.e. password).
The server can then be started without root privileges using this command.
cd {GATEWAY_HOME} bin/gateway.sh start
When starting the gateway this way the process will be run in the backgroud. The log output is written into the directory /var/log/knox. In addition a PID (process ID) is written into /var/run/knox.
In order to stop a gateway that was started with the script use this command.
cd {GATEWAY_HOME} bin/gateway.sh stop
If for some reason the gateway is stopped other than by using the command above you may need to clear the tracking PID.
cd {GATEWAY_HOME} bin/gateway.sh clean
NOTE: This command will also clear any log output in /var/log/knox so use this with caution.
This will return a directory listing of the root (i.e. /) directory of HDFS.
curl -i -k -u guest:guest-password -X GET \ 'https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS'
The results of the above command should result in something to along the lines of the output below. The exact information returned is subject to the content within HDFS in your Hadoop cluster. Successfully executing this command at a minimum proves that the gateway is properly configured to provide access to WebHDFS. It does not necessarily provide that any of the other services are correct configured to be accessible. To validate that see the sections for the individual services in #[Service Details].
HTTP/1.1 200 OK Content-Type: application/json Content-Length: 760 Server: Jetty(6.1.26) {"FileStatuses":{"FileStatus":[ {"accessTime":0,"blockSize":0,"group":"hdfs","length":0,"modificationTime":1350595859762,"owner":"hdfs","pathSuffix":"apps","permission":"755","replication":0,"type":"DIRECTORY"}, {"accessTime":0,"blockSize":0,"group":"mapred","length":0,"modificationTime":1350595874024,"owner":"mapred","pathSuffix":"mapred","permission":"755","replication":0,"type":"DIRECTORY"}, {"accessTime":0,"blockSize":0,"group":"hdfs","length":0,"modificationTime":1350596040075,"owner":"hdfs","pathSuffix":"tmp","permission":"777","replication":0,"type":"DIRECTORY"}, {"accessTime":0,"blockSize":0,"group":"hdfs","length":0,"modificationTime":1350595857178,"owner":"hdfs","pathSuffix":"user","permission":"755","replication":0,"type":"DIRECTORY"} ]}}