Apache Knox Details

This section provides everything you need to know to get the Knox gateway up and running against a Hadoop cluster.

Hadoop

An existing Hadoop 2.x or 3.x cluster is required for Knox to sit in front of and protect. It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration not covered here. It is also possible to protect access to a services of a Hadoop cluster that is secured with Kerberos. This too requires additional configuration that is described in other sections of this guide. See #[Supported Services] for details on what is supported for this release.

The instructions that follow assume a few things:

  1. The gateway is not collocated with the Hadoop clusters themselves.
  2. The host names and IP addresses of the cluster services are accessible by the gateway where ever it happens to be running.

All of the instructions and samples provided here are tailored and tested to work “out of the box” against a [Hortonworks Sandbox 2.x VM][sandbox].

Apache Knox Directory Layout

Knox can be installed by expanding the zip/archive file.

The table below provides a brief explanation of the important files and directories within {GATEWAY_HOME}

DirectoryPurpose
conf/Contains configuration files that apply to the gateway globally (i.e. not cluster specific ).
data/Contains security and topology specific artifacts that require read/write access at runtime
conf/topologies/Contains topology files that represent Hadoop clusters which the gateway uses to deploy cluster proxies
data/security/Contains the persisted master secret and keystore dir
data/security/keystores/Contains the gateway identity keystore and credential stores for the gateway and each deployed cluster topology
data/servicesContains service behavior definitions for the services currently supported.
bin/Contains the executable shell scripts, batch files and JARs for clients and servers.
data/deployments/Contains deployed cluster topologies used to protect access to specific Hadoop clusters.
lib/Contains the JARs for all the components that make up the gateway.
dep/Contains the JARs for all of the components upon which the gateway depends.
ext/A directory where user supplied extension JARs can be placed to extends the gateways functionality.
pids/Contains the process ids for running LDAP and gateway servers
samples/Contains a number of samples that can be used to explore the functionality of the gateway.
templates/Contains default configuration files that can be copied and customized.
READMEProvides basic information about the Apache Knox Gateway.
ISSUESDescribes significant know issues.
CHANGESEnumerates the changes between releases.
LICENSEDocuments the license under which this software is provided.
NOTICEDocuments required attribution notices for included dependencies.

Supported Services

This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway.

ServiceVersionNon-SecureSecureHA
WebHDFS2.4.0![y]![y]![y]
WebHCat/Templeton0.13.0![y]![y]![y]
Oozie4.0.0![y]![y]![y]
HBase0.98.0![y]![y]![y]
Hive (via WebHCat)0.13.0![y]![y]![y]
Hive (via JDBC/ODBC)0.13.0![y]![y]![y]
Yarn ResourceManager2.5.0![y]![y]![n]
Kafka (via REST Proxy)0.10.0![y]![y]![y]
Storm0.9.3![y]![n]![n]
Solr5.5+ and 6+![y]![y]![y]

More Examples

These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.