blob: 5d97aeb09d7241d46510383cc64ab0d035093382 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="kerberos">
<title>Enabling Kerberos Authentication for Impala</title>
<prolog>
<metadata>
<data name="Category" value="Security"/>
<data name="Category" value="Kerberos"/>
<data name="Category" value="Authentication"/>
<data name="Category" value="Impala"/>
<data name="Category" value="Configuring"/>
<data name="Category" value="Starting and Stopping"/>
<data name="Category" value="Administrators"/>
</metadata>
</prolog>
<conbody>
<p>
Impala supports an enterprise-grade authentication system called Kerberos. Kerberos provides strong security benefits including
capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of
impersonation by never sending a user's credentials in cleartext over the network. For more information on Kerberos, visit
the <xref href="https://web.mit.edu/kerberos/" scope="external" format="html">MIT Kerberos website</xref>.
</p>
<p>
The rest of this topic assumes you have a working <xref keyref="mit_install_kdc"/>
set up. To enable Kerberos, you first create a Kerberos principal for each host running
<cmdname>impalad</cmdname> or <cmdname>statestored</cmdname>.
</p>
<note conref="../shared/impala_common.xml#common/authentication_vs_authorization"/>
<p>
An alternative form of authentication you can use is LDAP, described in <xref href="impala_ldap.xml#ldap"/>.
</p>
<p outputclass="toc inpage"/>
</conbody>
<concept id="kerberos_prereqs">
<title>Requirements for Using Impala with Kerberos</title>
<prolog>
<metadata>
<data name="Category" value="Requirements"/>
<data name="Category" value="Planning"/>
</metadata>
</prolog>
<conbody>
<p conref="../shared/impala_common.xml#common/rhel5_kerberos"/>
<note type="important">
<p>
If you plan to use Impala in your cluster, you must configure your KDC to allow tickets to be renewed,
and you must configure <filepath>krb5.conf</filepath> to request renewable tickets. Typically, you can do
this by adding the <codeph>max_renewable_life</codeph> setting to your realm in
<filepath>kdc.conf</filepath>, and by adding the <filepath>renew_lifetime</filepath> parameter to the
<filepath>libdefaults</filepath> section of <filepath>krb5.conf</filepath>. For more information about
renewable tickets, see the
<xref href="http://web.mit.edu/Kerberos/krb5-1.8/" scope="external" format="html"> Kerberos
documentation</xref>.
</p>
<p rev="1.2">
Currently, you cannot use the resource management feature on a cluster that has Kerberos
authentication enabled.
</p>
</note>
<p>
Start all <cmdname>impalad</cmdname> and <cmdname>statestored</cmdname> daemons with the
<codeph>--principal</codeph> and <codeph>--keytab-file</codeph> flags set to the principal and full path
name of the <codeph>keytab</codeph> file containing the credentials for the principal.
</p>
<p>
To enable Kerberos in the Impala shell, start the <cmdname>impala-shell</cmdname> command using the
<codeph>-k</codeph> flag.
</p>
<p>
To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the
installation and configuration steps in
<xref keyref="upstream_hadoop_authentication"/>.
Note that when Kerberos security is enabled in Impala, a web browser that
supports Kerberos HTTP SPNEGO is required to access the Impala web console (for example, Firefox, Internet
Explorer, or Chrome).
</p>
<p>
If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager, NodeManagers,
HttpFS, Oozie, Impala, or Impala statestore services are configured to use Kerberos HTTP SPNEGO
authentication, and two or more of these services are running on the same host, then all of the running
services must use the same HTTP principal and keytab file used for their HTTP endpoints.
</p>
</conbody>
</concept>
<concept id="kerberos_config">
<title>Configuring Impala to Support Kerberos Security</title>
<prolog>
<metadata>
<data name="Category" value="Configuring"/>
</metadata>
</prolog>
<conbody>
<p>
Enabling Kerberos authentication for Impala involves steps that can be summarized as follows:
</p>
<ul>
<li>
Creating service principals for Impala and the HTTP service. Principal names take the form:
<codeph><varname>serviceName</varname>/<varname>fully.qualified.domain.name</varname>@<varname>KERBEROS.REALM</varname></codeph>.
<p conref="../shared/impala_common.xml#common/user_kerberized"/>
</li>
<li>
Creating, merging, and distributing key tab files for these principals.
</li>
<li>
Editing <codeph>/etc/default/impala</codeph>
to accommodate Kerberos authentication.
</li>
</ul>
</conbody>
<concept id="kerberos_setup">
<title>Enabling Kerberos for Impala</title>
<conbody>
<!--
<p>
<b>To enable Kerberos for Impala:</b>
</p>
-->
<ol>
<li>
Create an Impala service principal, specifying the name of the OS user that the Impala daemons run
under, the fully qualified domain name of each node running <cmdname>impalad</cmdname>, and the realm
name. For example:
<codeblock>$ kadmin
kadmin: addprinc -requires_preauth -randkey impala/impala_host.example.com@TEST.EXAMPLE.COM</codeblock>
</li>
<li>
Create an HTTP service principal. For example:
<codeblock>kadmin: addprinc -randkey HTTP/impala_host.example.com@TEST.EXAMPLE.COM</codeblock>
<note>
The <codeph>HTTP</codeph> component of the service principal must be uppercase as shown in the
preceding example.
</note>
</li>
<li>
Create <codeph>keytab</codeph> files with both principals. For example:
<codeblock>kadmin: xst -k impala.keytab impala/impala_host.example.com
kadmin: xst -k http.keytab HTTP/impala_host.example.com
kadmin: quit</codeblock>
</li>
<li>
Use <codeph>ktutil</codeph> to read the contents of the two keytab files and then write those contents
to a new file. For example:
<codeblock>$ ktutil
ktutil: rkt impala.keytab
ktutil: rkt http.keytab
ktutil: wkt impala-http.keytab
ktutil: quit</codeblock>
</li>
<li>
(Optional) Test that credentials in the merged keytab file are valid, and that the <q>renew until</q>
date is in the future. For example:
<codeblock>$ klist -e -k -t impala-http.keytab</codeblock>
</li>
<li>
Copy the <filepath>impala-http.keytab</filepath> file to the Impala configuration directory. Change the
permissions to be only read for the file owner and change the file owner to the <codeph>impala</codeph>
user. By default, the Impala user and group are both named <codeph>impala</codeph>. For example:
<codeblock>$ cp impala-http.keytab /etc/impala/conf
$ cd /etc/impala/conf
$ chmod 400 impala-http.keytab
$ chown impala:impala impala-http.keytab</codeblock>
</li>
<li>
Add Kerberos options to the Impala defaults file, <filepath>/etc/default/impala</filepath>. Add the
options for both the <cmdname>impalad</cmdname> and <cmdname>statestored</cmdname> daemons, using the
<codeph>IMPALA_SERVER_ARGS</codeph> and <codeph>IMPALA_STATE_STORE_ARGS</codeph> variables. For
example, you might add:
<!-- Found these in a discussion post somewhere but not applicable as Impala startup options.
-kerberos_ticket_life=36000
-maxrenewlife 7days
-->
<codeblock>-kerberos_reinit_interval=60
-principal=impala_1/impala_host.example.com@TEST.EXAMPLE.COM
-keytab_file=<varname>/path/to/impala.keytab</varname></codeblock>
<p>
For more information on changing the Impala defaults specified in
<filepath>/etc/default/impala</filepath>, see
<xref href="impala_config_options.xml#config_options">Modifying Impala Startup
Options</xref>.
</p>
</li>
</ol>
<note>
Restart <cmdname>impalad</cmdname> and <cmdname>statestored</cmdname> for these configuration changes to
take effect.
</note>
</conbody>
</concept>
</concept>
<concept id="kerberos_proxy">
<title>Enabling Kerberos for Impala with a Proxy Server</title>
<conbody>
<p>
A common configuration for Impala with High Availability is to use a proxy server to submit requests to the
actual <cmdname>impalad</cmdname> daemons on different hosts in the cluster. This configuration avoids
connection problems in case of machine failure, because the proxy server can route new requests through one
of the remaining hosts in the cluster. This configuration also helps with load balancing, because the
additional overhead of being the <q>coordinator node</q> for each query is spread across multiple hosts.
</p>
<p>
Although you can set up a proxy server with or without Kerberos authentication, typically users set up a
secure Kerberized configuration. For information about setting up a proxy server for Impala, including
Kerberos-specific steps, see <xref href="impala_proxy.xml#proxy"/>.
</p>
</conbody>
</concept>
<concept id="spnego">
<title>Using a Web Browser to Access a URL Protected by Kerberos HTTP SPNEGO</title>
<conbody>
<p>
Your web browser must support Kerberos HTTP SPNEGO. For example, Chrome, Firefox, or Internet Explorer.
</p>
<p>
<b>To configure Firefox to access a URL protected by Kerberos HTTP SPNEGO:</b>
</p>
<ol>
<li>
Open the advanced settings Firefox configuration page by loading the <codeph>about:config</codeph> page.
</li>
<li>
Use the <b>Filter</b> text box to find <codeph>network.negotiate-auth.trusted-uris</codeph>.
</li>
<li>
Double-click the <codeph>network.negotiate-auth.trusted-uris</codeph> preference and enter the hostname
or the domain of the web server that is protected by Kerberos HTTP SPNEGO. Separate multiple domains and
hostnames with a comma.
</li>
<li>
Click <b>OK</b>.
</li>
</ol>
</conbody>
</concept>
<concept id="kerberos_delegation">
<title>Enabling Impala Delegation for Kerberos Users</title>
<conbody>
<p>
See <xref href="impala_delegation.xml#delegation"/> for details about the delegation feature
that lets certain users submit queries using the credentials of other users.
</p>
</conbody>
</concept>
<concept id="ssl_jdbc_odbc">
<title>Using TLS/SSL with Business Intelligence Tools</title>
<conbody>
<p>
You can use Kerberos authentication, TLS/SSL encryption, or both to secure
connections from JDBC and ODBC applications to Impala.
See <xref href="impala_jdbc.xml#impala_jdbc"/> and <xref href="impala_odbc.xml#impala_odbc"/>
for details.
</p>
<p conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"/>
</conbody>
</concept>
<concept id="whitelisting_internal_apis">
<title>Enabling Access to Internal Impala APIs for Kerberos Users</title>
<conbody>
<!-- Reusing (most of) the text from the New Features bullet here. Turn into a conref in both places. -->
<p rev="IMPALA-3095">
For applications that need direct access
to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
specify a list of Kerberos users who are allowed to call those APIs. By default, the
<codeph>impala</codeph> and <codeph>hdfs</codeph> users are the only ones authorized
for this kind of access.
Any users not explicitly authorized through the <codeph>internal_principals_whitelist</codeph>
configuration setting are blocked from accessing the APIs. This setting applies to all the
Impala-related daemons, although currently it is primarily used for HDFS to control the
behavior of the catalog server.
</p>
</conbody>
</concept>
<concept id="auth_to_local" rev="IMPALA-2660">
<title>Mapping Kerberos Principals to Short Names for Impala</title>
<conbody>
<p conref="../shared/impala_common.xml#common/auth_to_local_instructions"/>
</conbody>
</concept>
<concept rev="IMPALA-2294" id="kerberos_overhead_memory_usage">
<title>Kerberos-Related Memory Overhead for Large Clusters</title>
<conbody>
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="vm_overcommit_memory_end"/>
</conbody>
</concept>
</concept>