blob: 30b7dc7353e27039def381e43359f59fa3e02529 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--->
## Apache Knox Details ##
This section provides everything you need to know to get the Knox gateway up and running against a Hadoop cluster.
#### Hadoop ####
An existing Hadoop 2.x or 3.x cluster is required for Knox to sit in front of and protect.
It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration not covered here.
It is also possible to protect access to a services of a Hadoop cluster that is secured with Kerberos.
This too requires additional configuration that is described in other sections of this guide.
See #[Supported Services] for details on what is supported for this release.
The instructions that follow assume a few things:
1. The gateway is *not* collocated with the Hadoop clusters themselves.
2. The host names and IP addresses of the cluster services are accessible by the gateway where ever it happens to be running.
All of the instructions and samples provided here are tailored and tested to work "out of the box" against a [Hortonworks Sandbox 2.x VM][sandbox].
#### Apache Knox Directory Layout ####
Knox can be installed by expanding the zip/archive file.
The table below provides a brief explanation of the important files and directories within `{GATEWAY_HOME}`
| Directory | Purpose |
| ------------------------ | ------- |
| conf/ | Contains configuration files that apply to the gateway globally (i.e. not cluster specific ). |
| data/ | Contains security and topology specific artifacts that require read/write access at runtime |
| conf/topologies/ | Contains topology files that represent Hadoop clusters which the gateway uses to deploy cluster proxies |
| data/security/ | Contains the persisted master secret and keystore dir |
| data/security/keystores/ | Contains the gateway identity keystore and credential stores for the gateway and each deployed cluster topology |
| data/services | Contains service behavior definitions for the services currently supported. |
| bin/ | Contains the executable shell scripts, batch files and JARs for clients and servers. |
| data/deployments/ | Contains deployed cluster topologies used to protect access to specific Hadoop clusters. |
| lib/ | Contains the JARs for all the components that make up the gateway. |
| dep/ | Contains the JARs for all of the components upon which the gateway depends. |
| ext/ | A directory where user supplied extension JARs can be placed to extends the gateways functionality. |
| pids/ | Contains the process ids for running LDAP and gateway servers |
| samples/ | Contains a number of samples that can be used to explore the functionality of the gateway. |
| templates/ | Contains default configuration files that can be copied and customized. |
| README | Provides basic information about the Apache Knox Gateway. |
| ISSUES | Describes significant know issues. |
| CHANGES | Enumerates the changes between releases. |
| LICENSE | Documents the license under which this software is provided. |
| NOTICE | Documents required attribution notices for included dependencies. |
### Supported Services ###
This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway.
| Service | Version | Non-Secure | Secure | HA |
| -----------------------|-------------|-------------|--------|----|
| WebHDFS | 2.4.0 | ![y] | ![y] |![y]|
| WebHCat/Templeton | 0.13.0 | ![y] | ![y] |![y]|
| Oozie | 4.0.0 | ![y] | ![y] |![y]|
| HBase | 0.98.0 | ![y] | ![y] |![y]|
| Hive (via WebHCat) | 0.13.0 | ![y] | ![y] |![y]|
| Hive (via JDBC/ODBC) | 0.13.0 | ![y] | ![y] |![y]|
| Yarn ResourceManager | 2.5.0 | ![y] | ![y] |![n]|
| Kafka (via REST Proxy) | 0.10.0 | ![y] | ![y] |![y]|
| Storm | 0.9.3 | ![y] | ![n] |![n]|
| Solr | 5.5+ and 6+ | ![y] | ![y] |![y]|
### More Examples ###
These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.
* [WebHDFS Examples](service_webhdfs.md)
* [WebHCat Examples](service_webhcat.md)
* [Oozie Examples](service_oozie.md)
* [HBase Examples](service_hbase.md)
* [Hive Examples](service_hive.md)
* [Yarn Examples](service_yarn.md)
* [Storm Examples](service_storm.md)