The topology descriptor files provide the gateway with per-cluster configuration information. This includes configuration for both the providers within the gateway and the services within the Hadoop cluster. These files are located in {GATEWAY_HOME}/deployments
. The general outline of this document looks like this.
<topology> <gateway> <provider> </provider> </gateway> <service> </service> </topology>
There are typically multiple <provider>
and <service>
elements.
/topology : Defines the provider and configuration and service topology for a single Hadoop cluster.
/topology/gateway : Groups all of the provider elements
/topology/gateway/provider : Defines the configuration of a specific provider for the cluster.
/topology/service : Defines the location of a specific Hadoop service within the Hadoop cluster.
Provider configuration is used to customize the behavior of a particular gateway feature. The general outline of a provider element looks like this.
<provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name></name> <value></value> </param> </provider>
/topology/gateway/provider : Groups information for a specific provider.
/topology/gateway/provider/role : Defines the role of a particular provider. There are a number of pre-defined roles used by out-of-the-box provider plugins for the gateay. These roles are: authentication, identity-assertion, authentication, rewrite and hostmap
/topology/gateway/provider/name : Defines the name of the provider for which this configuration applies. There can be multiple provider implementations for a given role. Specifying the name is used identify which particular provider is being configured. Typically each topology descriptor should contain only one provider for each role but there are exceptions.
/topology/gateway/provider/enabled : Allows a particular provider to be enabled or disabled via true
or false
respectively. When a provider is disabled any filters associated with that provider are excluded from the processing chain.
/topology/gateway/provider/param : These elements are used to supply provider configuration. There can be zero or more of these per provider.
/topology/gateway/provider/param/name : The name of a parameter to pass to the provider.
/topology/gateway/provider/param/value : The value of a parameter to pass to the provider.
Service configuration is used to specify the location of services within the Hadoop cluster. The general outline of a service element looks like this.
<service> <role>WEBHDFS</role> <url>http://localhost:50070/webhdfs</url> </service>
/topology/service : Provider information about a particular service within the Hadoop cluster. Not all services are necessarily exposed as gateway endpoints.
/topology/service/role : Identifies the role of this service. Currently supported roles are: WEBHDFS, WEBHCAT, WEBHBASE, OOZIE, HIVE, NAMENODE, JOBTRACKER Additional service roles can be supported via plugins.
topology/service/url : The URL identifying the location of a particular service within the Hadoop cluster.
The purpose of the Hostmap provider is to handle situations where host are know by one name within the cluster and another name externally. This frequently occurs when virtual machines are used and in particular using cloud hosting services. Currently the Hostmap provider is configured as part of the topology file. The basic structure is shown below.
<topology> <gateway> ... <provider> <role>hostmap</role> <name>static</name> <enabled>true</enabled> <param><name>external-host-name</name><value>internal-host-name</value></param> </provider> ... </gateway> ... </topology>
This mapping is required because the Hadoop servies running within the cluster are unaware that they are being accessed from outside the cluster. Therefore URLs returned as part of REST API responses will typically contain internal host names. Since clients outside the cluster will be unable to resolve those host name they must be mapped to external host names.
Consider an EC2 example where two VMs have been allocated. Each VM has an external host name by which it can be accessed via the internet. However the EC2 VM is unaware of this external host name and instead is configured with the internal host name.
External HOSTNAMES: ec2-23-22-31-165.compute-1.amazonaws.com ec2-23-23-25-10.compute-1.amazonaws.com Internal HOSTNAMES: ip-10-118-99-172.ec2.internal ip-10-39-107-209.ec2.internal
The Hostmap configuration required to allow access external to the Hadoop cluster via the Apache Knox Gateway would be this.
<topology> <gateway> ... <provider> <role>hostmap</role> <name>static</name> <enabled>true</enabled> <param> <name>ec2-23-22-31-165.compute-1.amazonaws.com</name> <value>ip-10-118-99-172.ec2.internal</value> </param> <param> <name>ec2-23-23-25-10.compute-1.amazonaws.com</name> <value>ip-10-39-107-209.ec2.internal</value> </param> </provider> ... </gateway> ... </topology>
Hortonwork's Sandbox 2.x poses a different challenge for host name mapping. This version of the Sandbox uses port mapping to make the Sandbox VM appear as though it is accessible via localhost. However the Sandbox VM is internally configured to consider sandbox.hortonworks.com as the host name. So from the perspective of a client accessing Sandbox the external host name is localhost. The Hostmap configuration required to allow access to Sandbox from the host operating system is this.
<topology> <gateway> ... <provider> <role>hostmap</role> <name>static</name> <enabled>true</enabled> <param><name>localhost</name><value>sandbox,sandbox.hortonworks.com</value></param> </provider> ... </gateway> ... </topology>
Details about each provider configuration element is enumerated below.
topology/gateway/provider/role : The role for a Hostmap provider must always be hostmap
.
topology/gateway/provider/name : The Hostmap provider supplied out-of-the-box is selected via the name static
.
topology/gateway/provider/enabled : Host mapping can be enabled or disabled by providing true
or false
.
topology/gateway/provider/param : Host mapping is configured by providing parameters for each external to internal mapping.
topology/gateway/provider/param/name : The parameter names represent an external host names associated with the internal host names provided by the value element. This can be a comma separated list of host names that all represent the same physical host. When mapping from internal to external host name the first external host name in the list is used.
topology/gateway/provider/param/value : The parameter values represent the internal host names associated with the external host names provider by the name element. This can be a comma separated list of host names that all represent the same physical host. When mapping from external to internal host names the first internal host name in the list is used.
If necessary you can enable additional logging by editing the log4j.properties
file in the conf
directory. Changing the rootLogger value from ERROR
to DEBUG
will generate a large amount of debug logging. A number of useful, more fine loggers are also provided in the file.
TODO - Java VM options doc.
The master secret is required to start the server. This secret is used to access secured artifacts by the gateway instance. Keystore, trust stores and credential stores are all protected with the master secret.
You may persist the master secret by supplying the -persist-master switch at startup. This will result in a warning indicating that persisting the secret is less secure than providing it at startup. We do make some provisions in order to protect the persisted password.
It is encrypted with AES 128 bit encryption and where possible the file permissions are set to only be accessible by the user that the gateway is running as.
After persisting the secret, ensure that the file at config/security/master has the appropriate permissions set for your environment. This is probably the most important layer of defense for master secret. Do not assume that the encryption if sufficient protection.
A specific user should be created to run the gateway this will protect a persisted master file.
There are a number of artifacts that are used by the gateway in ensuring the security of wire level communications, access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.
The following is a description of how this is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.
Upon start of the gateway server we:
conf/security/keystores/gateway.jks
. The identity store contains the certificate and private key used to represent the identity of the server for SSL connections and signature creation.conf/security/keystores/__gateway-credentials.jceks
. This credential store is used to store secrets/passwords that are used by the gateway. For instance, this is where the pass-phrase for accessing the gateway-identity certificate is kept.gateway-identity-passphrase
. This is coordinated with the population of the self-signed cert into the identity-store.Upon deployment of a Hadoop cluster topology within the gateway we:
conf/security/keystores/sandbox-credentials.jceks
. This topology specific credential store is used for storing secrets/passwords that are used for encrypting sensitive data with topology specific keys.By leveraging the algorithm described above we can provide a window of opportunity for management of these artifacts in a number of ways.
In order to provide your own certificate for use by the gateway, you will need to either import an existing key pair into a Java keystore or generate a self-signed cert using the Java keytool.
One way to accomplish this is to start with a PKCS12 store for your key pair and then convert it to a Java keystore or JKS.
openssl pkcs12 -export -in cert.pem -inkey key.pem > server.p12
The above example uses openssl to create a PKCS12 encoded store for your provided certificate private key.
keytool -importkeystore -srckeystore {server.p12} -destkeystore gateway.jks -srcstoretype pkcs12
This example converts the PKCS12 store into a Java keystore (JKS). It should prompt you for the keystore and key passwords for the destination keystore. You must use the master-secret for both.
While using this approach a couple of important things to be aware of:
NOTE: The password for the keystore as well as that of the imported key must be the master secret for the gateway instance.
keytool -genkey -keyalg RSA -alias gateway-identity -keystore gateway.jks \ -storepass {master-secret} -validity 360 -keysize 2048
Keytool will prompt you for a number of elements used that will comprise this distiniguished name (DN) within your certificate.
NOTE: When it prompts you for your First and Last name be sure to type in the hostname of the machine that your gateway instance will be running on. This is used by clients during hostname verification to ensure that the presented certificate matches the hostname that was used in the URL for the connection - so they need to match.
NOTE: When it prompts for the key password just press enter to ensure that it is the same as the keystore password. Which as was described earlier must match the master secret for the gateway instance.
Whenever you provide your own keystore with either a self-signed cert or a real certificate signed by a trusted authority, you will need to create an empty credential store. This is necessary for the current release in order for the system to utilize the same password for the keystore and the key.
The credential stores in Knox use the JCEKS keystore type as it allows for the storage of general secrets in addition to certificates.
keytool -genkey -alias {anything} -keystore __gateway-credentials.jceks \ -storepass {master-secret} -validity 360 -keysize 1024 -storetype JCEKS
Follow the prompts again for the DN for the cert of the credential store. This certificate isn't really used for anything at the moment but is required to create the credential store.
Once you have created these keystores you must move them into place for the gateway to discover them and use them to represent its identity for SSL connections. This is done by copying the keystores to the {GATEWAY_HOME}/conf/security/keystores
directory for your gateway install.
NOTE: the SSL certificate will need special consideration depending on the type of certificate. Wildcard certs may be able to be shared across all gateway instances in a cluster. When certs are dedicated to specific machines the gateway identity store will not be able to be blindly replicated as host name verification problems will ensue. Obviously, trust-stores will need to be taken into account as well.