Apache Knox Details

This section provides everything you need to know to get the Knox gateway up and running against a Hadoop cluster.

Hadoop

An an existing Hadoop 1.x or 2.x cluster is required for Knox sit in front of and protect. It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration not covered here. It is also possible to use a limited set of services in Hadoop cluster secured with Kerberos. This too required additional configuration that is not described here. See #[Supported Services] for details on what is supported for this release.

The Hadoop cluster should be ensured to have at least WebHDFS, WebHCat (i.e. Templeton) and Oozie configured, deployed and running. HBase/Stargate and Hive can also be accessed via the Knox Gateway given the proper versions and configuration.

The instructions that follow assume a few things:

  1. The gateway is not collocated with the Hadoop clusters themselves.
  2. The host names and IP addresses of the cluster services are accessible by the gateway where ever it happens to be running.

All of the instructions and samples provided here are tailored and tested to work “out of the box” against a [Hortonworks Sandbox 2.x VM][sandbox].

Apache Knox Directory Layout

Knox can be installed by expanding the zip file or with rpm. With rpm based install the following directories are created in addition to those described in this section.

/usr/lib/knox
/var/log/knox
/var/run/knox

The directory /usr/lib/knox is considered your {GATEWAY_HOME} and will adhere to the layout described below. The directory /var/log/knox will contain the output files from the server. The directory /var/run/knox will contain the process ID for a currently running gateway server.

Regardless of the installation method used the layout and content of the {GATEWAY_HOME} will be identical. The table below provides a brief explanation of the important files and directories within {GATEWWAY_HOME}

DirectoryPurpose
conf/Contains configuration files that apply to the gateway globally (i.e. not cluster specific ).
bin/Contains the executable shell scripts, batch files and JARs for clients and servers.
deployments/Contains topology descriptors used to configure the gateway for specific Hadoop clusters.
lib/Contains the JARs for all the components that make up the gateway.
dep/Contains the JARs for all of the components upon which the gateway depends.
ext/A directory where user supplied extension JARs can be placed to extends the gateways functionality.
samples/Contains a number of samples that can be used to explore the functionality of the gateway.
templates/Contains default configuration files that can be copied and customized.
READMEProvides basic information about the Apache Knox Gateway.
ISSUESDescribes significant know issues.
CHANGESEnumerates the changes between releases.
LICENSEDocuments the license under which this software is provided.
NOTICEDocuments required attribution notices for included dependencies.
DISCLAIMERDocuments that this release is from a project undergoing incubation at Apache.

Supported Services

This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway. Only more recent versions of some Hadoop components when secured via Kerberos can be accessed via the Knox Gateway.

ServiceVersionNon-SecureSecure
WebHDFS2.1.0![y]![y]
WebHCat/Templeton0.11.0![y]![n]
0.12.0![y]![y]
Ozzie4.0.0![y]![y]
HBase/Stargate0.95.2![y]![n]
Hive (via WebHCat)0.11.0![y]![n]
0.12.0![y]![y]
Hive (via JDBC)0.11.0![n]![n]
0.12.0![y]![n]
Hive (via ODBC)0.11.0![n]![n]
0.12.0![n]![n]

More Examples

These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.

  • #[WebHDFS Examples]
  • #[WebHCat Examples]
  • #[Oozie Examples]
  • #[HBase Examples]
  • #[Hive Examples]