| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept rev="1.1" id="authorization"> |
| |
| <title>Impala Authorization</title> |
| |
| <prolog> |
| <metadata> |
| <data name="Category" value="Security"/> |
| <data name="Category" value="Sentry"/> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Starting and Stopping"/> |
| <data name="Category" value="Users"/> |
| <data name="Category" value="Groups"/> |
| <data name="Category" value="Administrators"/> |
| </metadata> |
| </prolog> |
| |
| <conbody id="sentry"> |
| |
| <p> |
| Authorization determines which users are allowed to access which resources, and what |
| operations they are allowed to perform. You use Apache Sentry or Apache Ranger for |
| authorization. By default, when authorization is not enabled, Impala does all read and |
| write operations with the privileges of the <codeph>impala</codeph> user, which is |
| suitable for a development/test environment but not for a secure production environment. |
| When authorization is enabled, Impala uses the OS user ID of the user who runs |
| <cmdname>impala-shell</cmdname> or other client programs, and associates various |
| privileges with each user. |
| </p> |
| |
| <p audience="PDF" outputclass="toc inpage"> |
| See the following sections for details about using the Impala authorization features. |
| </p> |
| |
| </conbody> |
| |
| <concept id="sentry_priv_model"> |
| |
| <title>The Privilege Model</title> |
| |
| <conbody> |
| |
| <p> |
| Privileges can be granted on different objects in the schema. Any privilege that can be |
| granted is associated with a level in the object hierarchy. If a privilege is granted on |
| a parent object in the hierarchy, the child object automatically inherits it. This is |
| the same privilege model as Hive and other database systems. |
| </p> |
| |
| <p> |
| The objects in the Impala schema hierarchy are: |
| </p> |
| |
| <codeblock>Server |
| URI |
| Database |
| Table |
| Column |
| </codeblock> |
| |
| <p rev="2.3.0 collevelauth"> |
| The table-level privileges apply to views as well. Anywhere you specify a table name, |
| you can specify a view name instead. |
| </p> |
| |
| <p rev="2.3.0 collevelauth"> |
| In <keyword keyref="impala23_full"/> and higher, you can specify privileges for |
| individual columns. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/privileges_objects"/> |
| |
| <p> |
| Privileges are managed via the <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> SQL |
| statements that require the Sentry or Ranger service enabled. |
| </p> |
| |
| <p> |
| If you change privileges outside of Impala, e.g. adding a user, removing a user, |
| modifying privileges, you must clear the Impala Catalog server cache by running the |
| <codeph>REFRESH AUTHORIZATION</codeph> statement. <codeph>REFRESH AUTHORIZATION</codeph> |
| is not required if you make the changes to privileges within Impala. |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="object_ownership"> |
| |
| <title>Object Ownership in Sentry</title> |
| |
| <conbody> |
| |
| <p> |
| Impala supports the ownership on databases, tables, and views. The |
| <codeph>CREATE</codeph> statements implicitly make the user running the statement the |
| owner of the object. An owner has the <codeph>OWNER</codeph> privilege if enabled in |
| Sentry. For example, if <varname>User A</varname> creates a database, |
| <varname>foo</varname>, via the <codeph>CREATE DATABASE</codeph> statement, |
| <varname>User A</varname> now owns the <varname>foo</varname> database and is authorized |
| to perform any operation on the <varname>foo</varname> database. |
| </p> |
| |
| <p> |
| The <codeph>OWNER</codeph> privilege is not a grantable or revokable privilege whereas |
| the <codeph>ALL</codeph> privilege is explicitly granted via the <codeph>GRANT</codeph> |
| statement. |
| </p> |
| |
| <p> |
| The object ownership feature is controlled by a Sentry configuration. The |
| <codeph>OWNER</codeph> privilege is only granted when the feature is enabled in Sentry. |
| When enabled they get the owner privilege, with or without the <codeph>GRANT |
| OPTION</codeph>, which is also controlled by the Sentry configuration. |
| </p> |
| |
| <p> |
| An ownership can be transferred to another user or role via the <codeph>ALTER |
| DATABASE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>ALTER VIEW</codeph> with the |
| <codeph>SET OWNER</codeph> clause. |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="concept_fgf_smj_bjb"> |
| |
| <title>Object Ownership in Ranger</title> |
| |
| <conbody> |
| |
| <p> |
| Object ownership for tables, views and databases is enabled by default in Impala. |
| </p> |
| |
| <p> |
| To define owner specific privileges, go to ranger UI and define appropriate policies on |
| the <codeph>{OWNER}</codeph> user. |
| </p> |
| |
| <p> |
| The <codeph>CREATE</codeph> statements implicitly make the user running the statement |
| the owner of the object. For example, if <varname>User A</varname> creates a database, |
| <varname>foo</varname>, via the <codeph>CREATE DATABASE</codeph> statement, |
| <varname>User A</varname> now owns the <varname>foo</varname> database and is authorized |
| to perform any operation on the <varname>foo</varname> database. |
| </p> |
| |
| <p> |
| An ownership can be transferred to another user or role via the <codeph>ALTER |
| DATABASE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>ALTER VIEW</codeph> with the |
| <codeph>SET OWNER</codeph> clause. |
| </p> |
| |
| <note id="impala-8937"> |
| Currently, due to a known issue |
| (<xref |
| href="https://issues.apache.org/jira/browse/IMPALA-8937" format="html" |
| scope="external">IMPALA-8937</xref>), |
| until the ownership information is fully loaded in the coordinator catalog cache, the |
| owner of a table might not be able to see the table when executing the <codeph>SHOW |
| TABLES</codeph> statement The owner can still query the table. |
| </note> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="secure_startup"> |
| |
| <title>Starting Impala with Sentry Authorization Enabled</title> |
| |
| <prolog> |
| <metadata> |
| <data name="Category" value="Starting and Stopping"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| To enable authorization in an Impala cluster using Sentry: |
| <ol> |
| <li> |
| Add the following options to the <codeph>IMPALA_SERVER_ARGS</codeph> and the |
| <codeph>IMPALA_CATALOG_ARGS</codeph> settings in the |
| <filepath>/etc/default/impala</filepath> configuration file: |
| <ul> |
| <li> |
| <codeph>-server_name</codeph>: For all <cmdname>impalad</cmdname> nodes and the |
| <codeph>catalogd</codeph> in the cluster, specify the same name set in the |
| <codeph>sentry.hive.server</codeph> property in the |
| <filepath>sentry-site.xml</filepath> configuration file for Hive. |
| </li> |
| |
| <li> |
| <codeph>-sentry_config</codeph>: Specifies the local path to the |
| <codeph>sentry-site.xml</codeph> configuration file. |
| </li> |
| </ul> |
| </li> |
| |
| <li> |
| Restart the <codeph>catalogd</codeph> and all <cmdname>impalad</cmdname> daemons. |
| </li> |
| </ol> |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="enable_ranger_authz"> |
| |
| <title>Starting Impala with Ranger Authorization Enabled</title> |
| |
| <conbody> |
| |
| <p> |
| To enable authorization in an Impala cluster using Ranger: |
| </p> |
| |
| <ol> |
| <li> |
| Add the following options to the <codeph>IMPALA_SERVER_ARGS</codeph> and the |
| <codeph>IMPALA_CATALOG_ARGS</codeph> settings in the |
| <filepath>/etc/default/impala</filepath> configuration file: |
| <ul> |
| <li> |
| <codeph>-server_name</codeph>: Specify the same name for all |
| <cmdname>impalad</cmdname> nodes and the <codeph>catalogd</codeph> in the cluster. |
| </li> |
| |
| <li> |
| <codeph>-ranger_service_type=hive</codeph> |
| </li> |
| |
| <li> |
| <codeph>-ranger_app_id</codeph>: Set it to the Ranger application id. |
| </li> |
| |
| <li> |
| <codeph>-authorization_provider=ranger</codeph> |
| </li> |
| </ul> |
| </li> |
| |
| <li> |
| Restart the <codeph>catalogd</codeph> and all <cmdname>impalad</cmdname> daemons. |
| </li> |
| </ol> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="sentry_service"> |
| |
| <title>Managing Privileges</title> |
| |
| <conbody> |
| |
| <p> |
| You set up privileges through the <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> |
| statements in either Impala or Hive. Then both components use those same privileges |
| automatically. |
| </p> |
| |
| <p> |
| For information about using the Impala <codeph>GRANT</codeph> and |
| <codeph>REVOKE</codeph> statements, see <xref |
| href="impala_grant.xml#grant"/> |
| and <xref |
| href="impala_revoke.xml#revoke"/>. |
| </p> |
| |
| </conbody> |
| |
| <concept id="changing_privileges"> |
| |
| <title>Changing Privileges from Outside of Impala</title> |
| |
| <conbody> |
| |
| <p> |
| If you make a change to privileges in Sentry or Ranger from outside of Impala, e.g. |
| adding a user, removing a user, modifying privileges, there are two options to |
| propagate the change: |
| </p> |
| |
| <ul> |
| <li> |
| Use the <codeph>catalogd</codeph> flag, |
| <codeph>--sentry_catalog_polling_frequency_s</codeph> to specify how often to do a |
| Sentry refresh. The flag is set to 60 seconds by default. |
| </li> |
| |
| <li> |
| Use the <codeph>ranger.plugin.hive.policy.pollIntervalMs</codeph> property to |
| specify how often to do a Ranger refresh. The property is specified in |
| <codeph>ranger-hive-security.xml</codeph> in the <codeph>conf</codeph> directory |
| under your Impala home directory. |
| </li> |
| |
| <li> |
| Run the <codeph>INVALIDATE METADATA</codeph> or <codeph>REFRESH |
| AUTHORIZATION</codeph> statement to force a refresh. |
| </li> |
| </ul> |
| |
| <p> |
| If you make a change to privileges within Impala, <codeph>INVALIDATE METADATA</codeph> |
| is not required. |
| </p> |
| |
| <note type="warning"> |
| As <codeph>INVALIDATE METADATA</codeph> is an expensive operation, you should use it |
| judiciously. |
| </note> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="granting_on_uri"> |
| |
| <title>Granting Privileges on URI</title> |
| |
| <conbody> |
| |
| <p> |
| URIs represent the file paths you specify as part of statements such as <codeph>CREATE |
| EXTERNAL TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify what |
| look like UNIX paths, but these locations can also be prefixed with |
| <codeph>hdfs://</codeph> to make clear that they are really URIs. To set privileges |
| for a URI, specify the name of a directory, and the privilege applies to all the files |
| in that directory and any directories underneath it. |
| </p> |
| |
| <p> |
| URIs must start with <codeph>hdfs://</codeph>, <codeph>s3a://</codeph>, |
| <codeph>adl://</codeph>, or <codeph>file://</codeph>. If a URI starts with an absolute |
| path, the path will be appended to the default filesystem prefix. For example, if you |
| specify: |
| <codeblock> |
| GRANT ALL ON URI '/tmp'; |
| </codeblock> |
| The above statement effectively becomes the following where the default filesystem is |
| HDFS. |
| <codeblock> |
| GRANT ALL ON URI 'hdfs://localhost:20500/tmp'; |
| </codeblock> |
| </p> |
| |
| <p> |
| When defining URIs for HDFS, you must also specify the NameNode. For example: |
| <codeblock>GRANT ALL ON URI file:///path/to/dir TO <role> |
| GRANT ALL ON URI hdfs://namenode:port/path/to/dir TO <role></codeblock> |
| <note type="warning"> |
| Because the NameNode host and port must be specified, it is strongly recommended |
| that you use High Availability (HA). This ensures that the URI will remain constant |
| even if the NameNode changes. For example: |
| <codeblock>GRANT ALL ON URI hdfs://ha-nn-uri/path/to/dir TO <role></codeblock> |
| </note> |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="concept_k45_lbm_f2b"> |
| |
| <title>Examples of Setting up Authorization for Security Scenarios</title> |
| |
| <conbody> |
| |
| <p> |
| The following examples show how to set up authorization to deal with various |
| scenarios. |
| </p> |
| |
| <example> |
| |
| <title>A User with No Privileges</title> |
| |
| <p> |
| If a user has no privileges at all, that user cannot access any schema objects in |
| the system. The error messages do not disclose the names or existence of objects |
| that the user is not authorized to read. |
| </p> |
| |
| <p> |
| This is the experience you want a user to have if they somehow log into a system |
| where they are not an authorized Impala user. Or in a real deployment, a user might |
| have no privileges because they are not a member of any of the authorized groups. |
| </p> |
| |
| </example> |
| |
| <example> |
| |
| <title>Examples of Privileges for Administrative Users</title> |
| |
| <p> |
| In this example, the SQL statements grant the <codeph>entire_server</codeph> role |
| all privileges on both the databases and URIs within the server. |
| </p> |
| |
| <codeblock>CREATE ROLE entire_server; |
| GRANT ROLE entire_server TO GROUP admin_group; |
| GRANT ALL ON SERVER server1 TO ROLE entire_server; |
| </codeblock> |
| |
| </example> |
| |
| <example> |
| |
| <title>A User with Privileges for Specific Databases and Tables</title> |
| |
| <p> |
| If a user has privileges for specific tables in specific databases, the user can |
| access those things but nothing else. They can see the tables and their parent |
| databases in the output of <codeph>SHOW TABLES</codeph> and <codeph>SHOW |
| DATABASES</codeph>, <codeph>USE</codeph> the appropriate databases, and perform the |
| relevant actions (<codeph>SELECT</codeph> and/or <codeph>INSERT</codeph>) based on |
| the table privileges. To actually create a table requires the <codeph>ALL</codeph> |
| privilege at the database level, so you might define separate roles for the user |
| that sets up a schema and other users or applications that perform day-to-day |
| operations on the tables. |
| </p> |
| |
| <codeblock> |
| CREATE ROLE one_database; |
| GRANT ROLE one_database TO GROUP admin_group; |
| GRANT ALL ON DATABASE db1 TO ROLE one_database; |
| |
| CREATE ROLE instructor; |
| GRANT ROLE instructor TO GROUP trainers; |
| GRANT ALL ON TABLE db1.lesson TO ROLE instructor; |
| |
| # This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP. |
| CREATE ROLE student; |
| GRANT ROLE student TO GROUP visitors; |
| GRANT SELECT ON TABLE db1.training TO ROLE student;</codeblock> |
| |
| </example> |
| |
| <example> |
| |
| <title>Privileges for Working with External Data Files</title> |
| |
| <p> |
| When data is being inserted through the <codeph>LOAD DATA</codeph> statement or is |
| referenced from an HDFS location outside the normal Impala database directories, the |
| user also needs appropriate permissions on the URIs corresponding to those HDFS |
| locations. |
| </p> |
| |
| <p> |
| In this example: |
| </p> |
| |
| <ul> |
| <li> |
| The <codeph>external_table</codeph> role can insert into and query the Impala |
| table, <codeph>external_table.sample</codeph>. |
| </li> |
| |
| <li> |
| The <codeph>staging_dir</codeph> role can specify the HDFS path |
| <filepath>/user/impala-user/external_data</filepath> with the <codeph>LOAD |
| DATA</codeph> statement. When Impala queries or loads data files, it operates on |
| all the files in that directory, not just a single file, so any Impala |
| <codeph>LOCATION</codeph> parameters refer to a directory rather than an |
| individual file. |
| </li> |
| </ul> |
| |
| <codeblock>CREATE ROLE external_table; |
| GRANT ROLE external_table TO GROUP impala_users; |
| GRANT ALL ON TABLE external_table.sample TO ROLE external_table; |
| |
| CREATE ROLE staging_dir; |
| GRANT ROLE staging TO GROUP impala_users; |
| GRANT ALL ON URI 'hdfs://127.0.0.1:8020/user/impala-user/external_data' TO ROLE staging_dir;</codeblock> |
| |
| </example> |
| |
| <example> |
| |
| <title>Separating Administrator Responsibility from Read and Write Privileges</title> |
| |
| <p> |
| To create a database, you need the full privilege on that database while day-to-day |
| operations on tables within that database can be performed with lower levels of |
| privilege on a specific table. Thus, you might set up separate roles for each |
| database or application: an administrative one that could create or drop the |
| database, and a user-level one that can access only the relevant tables. |
| </p> |
| |
| <p> |
| In this example, the responsibilities are divided between users in 3 different |
| groups: |
| </p> |
| |
| <ul> |
| <li> |
| Members of the <codeph>supergroup</codeph> group have the |
| <codeph>training_sysadmin</codeph> role and so can set up a database named |
| <codeph>training</codeph>. |
| </li> |
| |
| <li> |
| Members of the <codeph>impala_users</codeph> group have the |
| <codeph>instructor</codeph> role and so can create, insert into, and query any |
| tables in the <codeph>training</codeph> database, but cannot create or drop the |
| database itself. |
| </li> |
| |
| <li> |
| Members of the <codeph>visitor</codeph> group have the <codeph>student</codeph> |
| role and so can query those tables in the <codeph>training</codeph> database. |
| </li> |
| </ul> |
| |
| <codeblock>CREATE ROLE training_sysadmin; |
| GRANT ROLE training_sysadmin TO GROUP supergroup; |
| GRANT ALL ON DATABASE training1 TO ROLE training_sysadmin; |
| |
| CREATE ROLE instructor; |
| GRANT ROLE instructor TO GROUP impala_users; |
| GRANT ALL ON TABLE training1.course1 TO ROLE instructor; |
| |
| CREATE ROLE visitor; |
| GRANT ROLE student TO GROUP visitor; |
| GRANT SELECT ON TABLE training1.course1 TO ROLE student;</codeblock> |
| |
| </example> |
| |
| </conbody> |
| |
| </concept> |
| |
| </concept> |
| |
| <concept id="security_schema"> |
| |
| <title>Setting Up Schema Objects for a Secure Impala Deployment</title> |
| |
| <conbody> |
| |
| <p> |
| In your role definitions, you must specify privileges at the level of individual |
| databases and tables, or all databases or all tables within a database. To simplify the |
| structure of these rules, plan ahead of time how to name your schema objects so that |
| data with different authorization requirements are divided into separate databases. |
| </p> |
| |
| <p> |
| If you are adding security on top of an existing Impala deployment, you can rename |
| tables or even move them between databases using the <codeph>ALTER TABLE</codeph> |
| statement. |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="sec_ex_default"> |
| |
| <title>The DEFAULT Database in a Secure Deployment</title> |
| |
| <conbody> |
| |
| <p> |
| Because of the extra emphasis on granular access controls in a secure deployment, you |
| should move any important or sensitive information out of the <codeph>DEFAULT</codeph> |
| database into a named database. Sometimes you might need to give privileges on the |
| <codeph>DEFAULT</codeph> database for administrative reasons, for example, as a place |
| you can reliably specify with a <codeph>USE</codeph> statement when preparing to drop a |
| database. |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| </concept> |