blob: 9c45970ec80a28e9d9678b73e9c1e53fbf7224cd [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="1.1" id="authorization">
<title>Impala Authorization</title>
<prolog>
<metadata>
<data name="Category" value="Security"/>
<data name="Category" value="Sentry"/>
<data name="Category" value="Impala"/>
<data name="Category" value="Configuring"/>
<data name="Category" value="Starting and Stopping"/>
<data name="Category" value="Users"/>
<data name="Category" value="Groups"/>
<data name="Category" value="Administrators"/>
</metadata>
</prolog>
<conbody id="sentry">
<p>
Authorization determines which users are allowed to access which resources, and what
operations they are allowed to perform. You use Apache Sentry or Apache Ranger for
authorization. By default, when authorization is not enabled, Impala does all read and
write operations with the privileges of the <codeph>impala</codeph> user, which is
suitable for a development/test environment but not for a secure production environment.
When authorization is enabled, Impala uses the OS user ID of the user who runs
<cmdname>impala-shell</cmdname> or other client programs, and associates various
privileges with each user.
</p>
<p audience="PDF" outputclass="toc inpage">
See the following sections for details about using the Impala authorization features.
</p>
</conbody>
<concept id="sentry_priv_model">
<title>The Privilege Model</title>
<conbody>
<p>
Privileges can be granted on different objects in the schema. Any privilege that can be
granted is associated with a level in the object hierarchy. If a privilege is granted on
a parent object in the hierarchy, the child object automatically inherits it. This is
the same privilege model as Hive and other database systems.
</p>
<p>
The objects in the Impala schema hierarchy are:
</p>
<codeblock>Server
URI
Database
Table
Column
</codeblock>
<p rev="2.3.0 collevelauth">
The table-level privileges apply to views as well. Anywhere you specify a table name,
you can specify a view name instead.
</p>
<p rev="2.3.0 collevelauth">
In <keyword keyref="impala23_full"/> and higher, you can specify privileges for
individual columns.
</p>
<p conref="../shared/impala_common.xml#common/privileges_objects"/>
<p>
Privileges are managed via the <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> SQL
statements that require the Sentry or Ranger service enabled.
</p>
<p>
If you change privileges outside of Impala, e.g. adding a user, removing a user,
modifying privileges, you must clear the Impala Catalog server cache by running the
<codeph>REFRESH AUTHORIZATION</codeph> statement. <codeph>REFRESH AUTHORIZATION</codeph>
is not required if you make the changes to privileges within Impala.
</p>
</conbody>
</concept>
<concept id="object_ownership">
<title>Object Ownership in Sentry</title>
<conbody>
<p>
Impala supports the ownership on databases, tables, and views. The
<codeph>CREATE</codeph> statements implicitly make the user running the statement the
owner of the object. An owner has the <codeph>OWNER</codeph> privilege if enabled in
Sentry. For example, if <varname>User A</varname> creates a database,
<varname>foo</varname>, via the <codeph>CREATE DATABASE</codeph> statement,
<varname>User A</varname> now owns the <varname>foo</varname> database and is authorized
to perform any operation on the <varname>foo</varname> database.
</p>
<p>
The <codeph>OWNER</codeph> privilege is not a grantable or revokable privilege whereas
the <codeph>ALL</codeph> privilege is explicitly granted via the <codeph>GRANT</codeph>
statement.
</p>
<p>
The object ownership feature is controlled by a Sentry configuration. The
<codeph>OWNER</codeph> privilege is only granted when the feature is enabled in Sentry.
When enabled they get the owner privilege, with or without the <codeph>GRANT
OPTION</codeph>, which is also controlled by the Sentry configuration.
</p>
<p>
An ownership can be transferred to another user or role via the <codeph>ALTER
DATABASE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>ALTER VIEW</codeph> with the
<codeph>SET OWNER</codeph> clause.
</p>
</conbody>
</concept>
<concept id="concept_fgf_smj_bjb">
<title>Object Ownership in Ranger</title>
<conbody>
<p>
Object ownership for tables, views and databases is enabled by default in Impala.
</p>
<p>
To define owner specific privileges, go to ranger UI and define appropriate policies on
the <codeph>{OWNER}</codeph> user.
</p>
<p>
The <codeph>CREATE</codeph> statements implicitly make the user running the statement
the owner of the object. For example, if <varname>User A</varname> creates a database,
<varname>foo</varname>, via the <codeph>CREATE DATABASE</codeph> statement,
<varname>User A</varname> now owns the <varname>foo</varname> database and is authorized
to perform any operation on the <varname>foo</varname> database.
</p>
<p>
An ownership can be transferred to another user or role via the <codeph>ALTER
DATABASE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>ALTER VIEW</codeph> with the
<codeph>SET OWNER</codeph> clause.
</p>
<note id="impala-8937">
Currently, due to a known issue
(<xref
href="https://issues.apache.org/jira/browse/IMPALA-8937" format="html"
scope="external">IMPALA-8937</xref>),
until the ownership information is fully loaded in the coordinator catalog cache, the
owner of a table might not be able to see the table when executing the <codeph>SHOW
TABLES</codeph> statement The owner can still query the table.
</note>
</conbody>
</concept>
<concept id="secure_startup">
<title>Starting Impala with Sentry Authorization Enabled</title>
<prolog>
<metadata>
<data name="Category" value="Starting and Stopping"/>
</metadata>
</prolog>
<conbody>
<p>
To enable authorization in an Impala cluster using Sentry:
<ol>
<li>
Add the following options to the <codeph>IMPALA_SERVER_ARGS</codeph> and the
<codeph>IMPALA_CATALOG_ARGS</codeph> settings in the
<filepath>/etc/default/impala</filepath> configuration file:
<ul>
<li>
<codeph>-server_name</codeph>: For all <cmdname>impalad</cmdname> nodes and the
<codeph>catalogd</codeph> in the cluster, specify the same name set in the
<codeph>sentry.hive.server</codeph> property in the
<filepath>sentry-site.xml</filepath> configuration file for Hive.
</li>
<li>
<codeph>-sentry_config</codeph>: Specifies the local path to the
<codeph>sentry-site.xml</codeph> configuration file.
</li>
</ul>
</li>
<li>
Restart the <codeph>catalogd</codeph> and all <cmdname>impalad</cmdname> daemons.
</li>
</ol>
</p>
</conbody>
</concept>
<concept id="enable_ranger_authz">
<title>Starting Impala with Ranger Authorization Enabled</title>
<conbody>
<p>
To enable authorization in an Impala cluster using Ranger:
</p>
<ol>
<li>
Add the following options to the <codeph>IMPALA_SERVER_ARGS</codeph> and the
<codeph>IMPALA_CATALOG_ARGS</codeph> settings in the
<filepath>/etc/default/impala</filepath> configuration file:
<ul>
<li>
<codeph>-server_name</codeph>: Specify the same name for all
<cmdname>impalad</cmdname> nodes and the <codeph>catalogd</codeph> in the cluster.
</li>
<li>
<codeph>-ranger_service_type=hive</codeph>
</li>
<li>
<codeph>-ranger_app_id</codeph>: Set it to the Ranger application id.
</li>
<li>
<codeph>-authorization_provider=ranger</codeph>
</li>
</ul>
</li>
<li>
Restart the <codeph>catalogd</codeph> and all <cmdname>impalad</cmdname> daemons.
</li>
</ol>
</conbody>
</concept>
<concept id="sentry_service">
<title>Managing Privileges</title>
<conbody>
<p>
You set up privileges through the <codeph>GRANT</codeph> and <codeph>REVOKE</codeph>
statements in either Impala or Hive. Then both components use those same privileges
automatically.
</p>
<p>
For information about using the Impala <codeph>GRANT</codeph> and
<codeph>REVOKE</codeph> statements, see <xref
href="impala_grant.xml#grant"/>
and <xref
href="impala_revoke.xml#revoke"/>.
</p>
</conbody>
<concept id="changing_privileges">
<title>Changing Privileges from Outside of Impala</title>
<conbody>
<p>
If you make a change to privileges in Sentry or Ranger from outside of Impala, e.g.
adding a user, removing a user, modifying privileges, there are two options to
propagate the change:
</p>
<ul>
<li>
Use the <codeph>catalogd</codeph> flag,
<codeph>--sentry_catalog_polling_frequency_s</codeph> to specify how often to do a
Sentry refresh. The flag is set to 60 seconds by default.
</li>
<li>
Use the <codeph>ranger.plugin.hive.policy.pollIntervalMs</codeph> property to
specify how often to do a Ranger refresh. The property is specified in
<codeph>ranger-hive-security.xml</codeph> in the <codeph>conf</codeph> directory
under your Impala home directory.
</li>
<li>
Run the <codeph>INVALIDATE METADATA</codeph> or <codeph>REFRESH
AUTHORIZATION</codeph> statement to force a refresh.
</li>
</ul>
<p>
If you make a change to privileges within Impala, <codeph>INVALIDATE METADATA</codeph>
is not required.
</p>
<note type="warning">
As <codeph>INVALIDATE METADATA</codeph> is an expensive operation, you should use it
judiciously.
</note>
</conbody>
</concept>
<concept id="granting_on_uri">
<title>Granting Privileges on URI</title>
<conbody>
<p>
URIs represent the file paths you specify as part of statements such as <codeph>CREATE
EXTERNAL TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify what
look like UNIX paths, but these locations can also be prefixed with
<codeph>hdfs://</codeph> to make clear that they are really URIs. To set privileges
for a URI, specify the name of a directory, and the privilege applies to all the files
in that directory and any directories underneath it.
</p>
<p>
URIs must start with <codeph>hdfs://</codeph>, <codeph>s3a://</codeph>,
<codeph>adl://</codeph>, or <codeph>file://</codeph>. If a URI starts with an absolute
path, the path will be appended to the default filesystem prefix. For example, if you
specify:
<codeblock>
GRANT ALL ON URI '/tmp';
</codeblock>
The above statement effectively becomes the following where the default filesystem is
HDFS.
<codeblock>
GRANT ALL ON URI 'hdfs://localhost:20500/tmp';
</codeblock>
</p>
<p>
When defining URIs for HDFS, you must also specify the NameNode. For example:
<codeblock>GRANT ALL ON URI file:///path/to/dir TO &lt;role>
GRANT ALL ON URI hdfs://namenode:port/path/to/dir TO &lt;role></codeblock>
<note type="warning">
Because the NameNode host and port must be specified, it is strongly recommended
that you use High Availability (HA). This ensures that the URI will remain constant
even if the NameNode changes. For example:
<codeblock>GRANT ALL ON URI hdfs://ha-nn-uri/path/to/dir TO &lt;role></codeblock>
</note>
</p>
</conbody>
</concept>
<concept id="concept_k45_lbm_f2b">
<title>Examples of Setting up Authorization for Security Scenarios</title>
<conbody>
<p>
The following examples show how to set up authorization to deal with various
scenarios.
</p>
<example>
<title>A User with No Privileges</title>
<p>
If a user has no privileges at all, that user cannot access any schema objects in
the system. The error messages do not disclose the names or existence of objects
that the user is not authorized to read.
</p>
<p>
This is the experience you want a user to have if they somehow log into a system
where they are not an authorized Impala user. Or in a real deployment, a user might
have no privileges because they are not a member of any of the authorized groups.
</p>
</example>
<example>
<title>Examples of Privileges for Administrative Users</title>
<p>
In this example, the SQL statements grant the <codeph>entire_server</codeph> role
all privileges on both the databases and URIs within the server.
</p>
<codeblock>CREATE ROLE entire_server;
GRANT ROLE entire_server TO GROUP admin_group;
GRANT ALL ON SERVER server1 TO ROLE entire_server;
</codeblock>
</example>
<example>
<title>A User with Privileges for Specific Databases and Tables</title>
<p>
If a user has privileges for specific tables in specific databases, the user can
access those things but nothing else. They can see the tables and their parent
databases in the output of <codeph>SHOW TABLES</codeph> and <codeph>SHOW
DATABASES</codeph>, <codeph>USE</codeph> the appropriate databases, and perform the
relevant actions (<codeph>SELECT</codeph> and/or <codeph>INSERT</codeph>) based on
the table privileges. To actually create a table requires the <codeph>ALL</codeph>
privilege at the database level, so you might define separate roles for the user
that sets up a schema and other users or applications that perform day-to-day
operations on the tables.
</p>
<codeblock>
CREATE ROLE one_database;
GRANT ROLE one_database TO GROUP admin_group;
GRANT ALL ON DATABASE db1 TO ROLE one_database;
CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP trainers;
GRANT ALL ON TABLE db1.lesson TO ROLE instructor;
# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
CREATE ROLE student;
GRANT ROLE student TO GROUP visitors;
GRANT SELECT ON TABLE db1.training TO ROLE student;</codeblock>
</example>
<example>
<title>Privileges for Working with External Data Files</title>
<p>
When data is being inserted through the <codeph>LOAD DATA</codeph> statement or is
referenced from an HDFS location outside the normal Impala database directories, the
user also needs appropriate permissions on the URIs corresponding to those HDFS
locations.
</p>
<p>
In this example:
</p>
<ul>
<li>
The <codeph>external_table</codeph> role can insert into and query the Impala
table, <codeph>external_table.sample</codeph>.
</li>
<li>
The <codeph>staging_dir</codeph> role can specify the HDFS path
<filepath>/user/impala-user/external_data</filepath> with the <codeph>LOAD
DATA</codeph> statement. When Impala queries or loads data files, it operates on
all the files in that directory, not just a single file, so any Impala
<codeph>LOCATION</codeph> parameters refer to a directory rather than an
individual file.
</li>
</ul>
<codeblock>CREATE ROLE external_table;
GRANT ROLE external_table TO GROUP impala_users;
GRANT ALL ON TABLE external_table.sample TO ROLE external_table;
CREATE ROLE staging_dir;
GRANT ROLE staging TO GROUP impala_users;
GRANT ALL ON URI 'hdfs://127.0.0.1:8020/user/impala-user/external_data' TO ROLE staging_dir;</codeblock>
</example>
<example>
<title>Separating Administrator Responsibility from Read and Write Privileges</title>
<p>
To create a database, you need the full privilege on that database while day-to-day
operations on tables within that database can be performed with lower levels of
privilege on a specific table. Thus, you might set up separate roles for each
database or application: an administrative one that could create or drop the
database, and a user-level one that can access only the relevant tables.
</p>
<p>
In this example, the responsibilities are divided between users in 3 different
groups:
</p>
<ul>
<li>
Members of the <codeph>supergroup</codeph> group have the
<codeph>training_sysadmin</codeph> role and so can set up a database named
<codeph>training</codeph>.
</li>
<li>
Members of the <codeph>impala_users</codeph> group have the
<codeph>instructor</codeph> role and so can create, insert into, and query any
tables in the <codeph>training</codeph> database, but cannot create or drop the
database itself.
</li>
<li>
Members of the <codeph>visitor</codeph> group have the <codeph>student</codeph>
role and so can query those tables in the <codeph>training</codeph> database.
</li>
</ul>
<codeblock>CREATE ROLE training_sysadmin;
GRANT ROLE training_sysadmin TO GROUP supergroup;
GRANT ALL ON DATABASE training1 TO ROLE training_sysadmin;
CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP impala_users;
GRANT ALL ON TABLE training1.course1 TO ROLE instructor;
CREATE ROLE visitor;
GRANT ROLE student TO GROUP visitor;
GRANT SELECT ON TABLE training1.course1 TO ROLE student;</codeblock>
</example>
</conbody>
</concept>
</concept>
<concept id="security_schema">
<title>Setting Up Schema Objects for a Secure Impala Deployment</title>
<conbody>
<p>
In your role definitions, you must specify privileges at the level of individual
databases and tables, or all databases or all tables within a database. To simplify the
structure of these rules, plan ahead of time how to name your schema objects so that
data with different authorization requirements are divided into separate databases.
</p>
<p>
If you are adding security on top of an existing Impala deployment, you can rename
tables or even move them between databases using the <codeph>ALTER TABLE</codeph>
statement.
</p>
</conbody>
</concept>
<concept id="sec_ex_default">
<title>The DEFAULT Database in a Secure Deployment</title>
<conbody>
<p>
Because of the extra emphasis on granular access controls in a secure deployment, you
should move any important or sensitive information out of the <codeph>DEFAULT</codeph>
database into a named database. Sometimes you might need to give privileges on the
<codeph>DEFAULT</codeph> database for administrative reasons, for example, as a place
you can reliably specify with a <codeph>USE</codeph> statement when preparing to drop a
database.
</p>
</conbody>
</concept>
</concept>