blob: f664491bd4a02d3419ae8a2951eff29c3a282177 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="1.1" id="security_guidelines">
<title>Security Guidelines for Impala</title>
<prolog>
<metadata>
<data name="Category" value="Security"/>
<data name="Category" value="Impala"/>
<data name="Category" value="Planning"/>
<data name="Category" value="Guidelines"/>
<data name="Category" value="Best Practices"/>
<data name="Category" value="Administrators"/>
</metadata>
</prolog>
<conbody>
<p>
The following are the major steps to harden a cluster running Impala against accidents and mistakes, or
malicious attackers trying to access sensitive data:
</p>
<ul>
<li>
<p>
Secure the <codeph>root</codeph> account. The <codeph>root</codeph> user can tamper with the
<cmdname>impalad</cmdname> daemon, read and write the data files in HDFS, log into other user accounts, and
access other system services that are beyond the control of Impala.
</p>
</li>
<li>
<p>
Restrict membership in the <codeph>sudoers</codeph> list (in the <filepath>/etc/sudoers</filepath> file).
The users who can run the <codeph>sudo</codeph> command can do many of the same things as the
<codeph>root</codeph> user.
</p>
</li>
<li>
<p>
Ensure the Hadoop ownership and permissions for Impala data files are restricted.
</p>
</li>
<li>
<p>
Ensure the Hadoop ownership and permissions for Impala log files are restricted.
</p>
</li>
<li>
<p>
Ensure that the Impala web UI (available by default on port 25000 on each Impala node) is
password-protected. See <xref href="impala_webui.xml#webui"/> for details.
</p>
</li>
<li>
<p>
Create a policy file that specifies which Impala privileges are available to users in particular Hadoop
groups (which by default map to Linux OS groups). Create the associated Linux groups using the
<cmdname>groupadd</cmdname> command if necessary.
</p>
</li>
<li>
<p>
The Impala authorization feature makes use of the HDFS file ownership and permissions mechanism; for
background information, see the
<xref href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html" scope="external" format="html">HDFS Permissions Guide</xref>.
Set up users and assign them to groups at the OS level, corresponding to the
different categories of users with different access levels for various databases, tables, and HDFS
locations (URIs). Create the associated Linux users using the <cmdname>useradd</cmdname> command if
necessary, and add them to the appropriate groups with the <cmdname>usermod</cmdname> command.
</p>
</li>
<li>
<p>
Design your databases, tables, and views with database and table structure to allow policy rules to specify
simple, consistent rules. For example, if all tables related to an application are inside a single
database, you can assign privileges for that database and use the <codeph>*</codeph> wildcard for the table
name. If you are creating views with different privileges than the underlying base tables, you might put
the views in a separate database so that you can use the <codeph>*</codeph> wildcard for the database
containing the base tables, while specifying the precise names of the individual views. (For specifying
table or database names, you either specify the exact name or <codeph>*</codeph> to mean all the databases
on a server, or all the tables and views in a database.)
</p>
</li>
<li>
<p>
Enable authorization by running the <codeph>impalad</codeph> daemons with the <codeph>-server_name</codeph>
and <codeph>-authorization_policy_file</codeph> options on all nodes. (The authorization feature does not
apply to the <cmdname>statestored</cmdname> daemon, which has no access to schema objects or data files.)
</p>
</li>
<li>
<p>
Set up authentication using Kerberos, to make sure users really are who they say they are.
</p>
</li>
</ul>
</conbody>
</concept>