| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept rev="1.1" id="authorization"> |
| |
| <title>Impala Authorization</title> |
| |
| <prolog> |
| <metadata> |
| <data name="Category" value="Security"/> |
| <data name="Category" value="Sentry"/> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Starting and Stopping"/> |
| <data name="Category" value="Users"/> |
| <data name="Category" value="Groups"/> |
| <data name="Category" value="Administrators"/> |
| </metadata> |
| </prolog> |
| |
| <conbody id="sentry"> |
| |
| <p> |
| Authorization determines which users are allowed to access which resources, and what |
| operations they are allowed to perform. You use Apache Ranger for authorization. |
| By default, when authorization is not enabled, Impala does all read and |
| write operations with the privileges of the <codeph>impala</codeph> user, which is |
| suitable for a development/test environment but not for a secure production environment. |
| When authorization is enabled, Impala uses the OS user ID of the user who runs |
| <cmdname>impala-shell</cmdname> or other client programs, and associates various |
| privileges with each user. |
| </p> |
| |
| <p audience="PDF" outputclass="toc inpage"> |
| See the following sections for details about using the Impala authorization features. |
| </p> |
| |
| </conbody> |
| |
| <concept id="sentry_priv_model"> |
| |
| <title>The Privilege Model</title> |
| |
| <conbody> |
| |
| <p> |
| Privileges can be granted on different objects in the schema. Any privilege that can be |
| granted is associated with a level in the object hierarchy. If a privilege is granted on |
| a parent object in the hierarchy, the child object automatically inherits it. This is |
| the same privilege model as Hive and other database systems. |
| </p> |
| |
| <p> |
| The objects in the Impala schema hierarchy are: |
| </p> |
| |
| <codeblock>Server |
| URI |
| Database |
| Table |
| Column |
| </codeblock> |
| |
| <p rev="2.3.0 collevelauth"> |
| The table-level privileges apply to views as well. Anywhere you specify a table name, |
| you can specify a view name instead. |
| </p> |
| |
| <p rev="2.3.0 collevelauth"> |
| In <keyword keyref="impala23_full"/> and higher, you can specify privileges for |
| individual columns. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/privileges_objects"/> |
| |
| <p> |
| Privileges are managed via the <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> SQL |
| statements that require the Ranger service enabled. |
| </p> |
| |
| <p> |
| If you change privileges outside of Impala, e.g. adding a user, removing a user, |
| modifying privileges, you must clear the Impala Catalog server cache by running the |
| <codeph>REFRESH AUTHORIZATION</codeph> statement. <codeph>REFRESH AUTHORIZATION</codeph> |
| is not required if you make the changes to privileges within Impala. |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="concept_fgf_smj_bjb"> |
| |
| <title>Object Ownership in Ranger</title> |
| |
| <conbody> |
| |
| <p> |
| Object ownership for tables, views and databases is enabled by default in Impala. |
| </p> |
| |
| <p> |
| To define owner specific privileges, go to ranger UI and define appropriate policies on |
| the <codeph>{OWNER}</codeph> user. |
| </p> |
| |
| <p> |
| The <codeph>CREATE</codeph> statements implicitly make the user running the statement |
| the owner of the object. For example, if <varname>User A</varname> creates a database, |
| <varname>foo</varname>, via the <codeph>CREATE DATABASE</codeph> statement, |
| <varname>User A</varname> now owns the <varname>foo</varname> database and is authorized |
| to perform any operation on the <varname>foo</varname> database. |
| </p> |
| |
| <p> |
| An ownership can be transferred to another user or role via the <codeph>ALTER |
| DATABASE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>ALTER VIEW</codeph> with the |
| <codeph>SET OWNER</codeph> clause. |
| </p> |
| |
| <note id="impala-8937"> |
| Currently, due to a known issue |
| (<xref |
| href="https://issues.apache.org/jira/browse/IMPALA-8937" format="html" |
| scope="external">IMPALA-8937</xref>), |
| until the ownership information is fully loaded in the coordinator catalog cache, the |
| owner of a table might not be able to see the table when executing the <codeph>SHOW |
| TABLES</codeph> statement The owner can still query the table. |
| </note> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="enable_ranger_authz"> |
| |
| <title>Starting Impala with Ranger Authorization Enabled</title> |
| |
| <conbody> |
| |
| <p> |
| To enable authorization in an Impala cluster using Ranger: |
| </p> |
| |
| <ol> |
| <li> |
| Add the following options to the <codeph>IMPALA_SERVER_ARGS</codeph> and the |
| <codeph>IMPALA_CATALOG_ARGS</codeph> settings in the |
| <filepath>/etc/default/impala</filepath> configuration file: |
| <ul> |
| <li> |
| <codeph>-server_name</codeph>: Specify the same name for all |
| <cmdname>impalad</cmdname> nodes and the <codeph>catalogd</codeph> in the cluster. </li> |
| <li> |
| <codeph>-ranger_service_type=hive</codeph> |
| </li> |
| <li> |
| <codeph>-ranger_app_id</codeph>: Set it to the Ranger application id. </li> |
| <li> |
| <codeph>-authorization_provider=ranger</codeph> |
| </li> |
| </ul> |
| </li> |
| |
| <li> |
| Restart the <codeph>catalogd</codeph> and all <cmdname>impalad</cmdname> daemons. |
| </li> |
| </ol> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="sentry_service"> |
| |
| <title>Managing Privileges</title> |
| |
| <conbody> |
| |
| <p> |
| You set up privileges through the <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> |
| statements in either Impala or Hive. |
| </p> |
| |
| <p> |
| For information about using the Impala <codeph>GRANT</codeph> and |
| <codeph>REVOKE</codeph> statements, see <xref |
| href="impala_grant.xml#grant"/> |
| and <xref |
| href="impala_revoke.xml#revoke"/>. |
| </p> |
| |
| </conbody> |
| |
| <concept id="changing_privileges"> |
| |
| <title>Changing Privileges from Outside of Impala</title> |
| |
| <conbody> |
| |
| <p> |
| If you make a change to privileges in Ranger from outside of Impala, e.g. |
| adding a user, removing a user, modifying privileges, there are two options to |
| propagate the change: |
| </p> |
| |
| <ul> |
| <li> |
| Use the <codeph>ranger.plugin.hive.policy.pollIntervalMs</codeph> property to |
| specify how often to do a Ranger refresh. The property is specified in |
| <codeph>ranger-hive-security.xml</codeph> in the <codeph>conf</codeph> directory |
| under your Impala home directory. |
| </li> |
| |
| <li> |
| Run the <codeph>INVALIDATE METADATA</codeph> or <codeph>REFRESH |
| AUTHORIZATION</codeph> statement to force a refresh. |
| </li> |
| </ul> |
| |
| <p> |
| If you make a change to privileges within Impala, <codeph>INVALIDATE METADATA</codeph> |
| is not required. |
| </p> |
| |
| <note type="warning"> |
| As <codeph>INVALIDATE METADATA</codeph> is an expensive operation, you should use it |
| judiciously. |
| </note> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="granting_on_uri"> |
| |
| <title>Granting Privileges on URI</title> |
| |
| <conbody> |
| |
| <p> |
| URIs represent the file paths you specify as part of statements such as <codeph>CREATE |
| EXTERNAL TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify what |
| look like UNIX paths, but these locations can also be prefixed with |
| <codeph>hdfs://</codeph> to make clear that they are really URIs. To set privileges |
| for a URI, specify the name of a directory, and the privilege applies to all the files |
| in that directory and any directories underneath it. |
| </p> |
| |
| <p> |
| URIs must start with <codeph>hdfs://</codeph>, <codeph>s3a://</codeph>, |
| <codeph>adl://</codeph>, or <codeph>file://</codeph>. If a URI starts with an absolute |
| path, the path will be appended to the default filesystem prefix. For example, if you |
| specify: |
| <codeblock> |
| GRANT ALL ON URI '/tmp'; |
| </codeblock> |
| The above statement effectively becomes the following where the default filesystem is |
| HDFS. |
| <codeblock> |
| GRANT ALL ON URI 'hdfs://localhost:20500/tmp'; |
| </codeblock> |
| </p> |
| |
| <p> |
| When defining URIs for HDFS, you must also specify the NameNode. For example: |
| <codeblock>GRANT ALL ON URI file:///path/to/dir TO <role> |
| GRANT ALL ON URI hdfs://namenode:port/path/to/dir TO <role></codeblock> |
| <note type="warning"> |
| Because the NameNode host and port must be specified, it is strongly recommended |
| that you use High Availability (HA). This ensures that the URI will remain constant |
| even if the NameNode changes. For example: |
| <codeblock>GRANT ALL ON URI hdfs://ha-nn-uri/path/to/dir TO <role></codeblock> |
| </note> |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="concept_k45_lbm_f2b"> |
| |
| <title>Examples of Setting up Authorization for Security Scenarios</title> |
| |
| <conbody> |
| |
| <p> |
| The following examples show how to set up authorization to grant privileges on objects |
| to groups of users via roles. |
| </p> |
| |
| <example> |
| <title>A User with No Privileges</title> |
| <p> |
| If a user has no privileges at all, that user cannot access any schema objects in the |
| system. The error messages do not disclose the names or existence of objects that the |
| user is not authorized to read. |
| </p> |
| <p> |
| This is the experience you want a user to have if they somehow log into a system where |
| they are not an authorized Impala user. Or in a real deployment, a user might have no |
| privileges because they are not a member of any of the authorized groups. |
| </p> |
| </example> |
| |
| <example> |
| <title>Examples of Privileges for Administrative Users</title> |
| <p> |
| In this example, the SQL statements grant the <codeph>entire_server</codeph> role all |
| privileges on both the databases and URIs within the server. |
| </p> |
| |
| <codeblock>CREATE ROLE entire_server; |
| GRANT ROLE entire_server TO GROUP admin_group; |
| GRANT ALL ON SERVER server1 TO ROLE entire_server; |
| </codeblock> |
| </example> |
| |
| <example> |
| <title>A User with Privileges for Specific Databases and Tables</title> |
| <p> |
| If a user has privileges for specific tables in specific databases, the user can |
| access those things but nothing else. They can see the tables and their parent databases |
| in the output of <codeph>SHOW TABLES</codeph> and <codeph>SHOW DATABASES</codeph>, |
| <codeph>USE</codeph> the appropriate databases, and perform the relevant actions |
| (<codeph>SELECT</codeph> and/or <codeph>INSERT</codeph>) based on the table |
| privileges. To actually create a table requires the <codeph>ALL</codeph> privilege at |
| the database level, so you might define separate roles for the user that sets up a |
| schema and other users or applications that perform day-to-day operations on the tables. |
| </p> |
| <codeblock> |
| CREATE ROLE one_database; |
| GRANT ROLE one_database TO GROUP admin_group; |
| GRANT ALL ON DATABASE db1 TO ROLE one_database; |
| |
| CREATE ROLE instructor; |
| GRANT ROLE instructor TO GROUP trainers; |
| GRANT ALL ON TABLE db1.lesson TO ROLE instructor; |
| |
| # This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP. |
| CREATE ROLE student; |
| GRANT ROLE student TO GROUP visitors; |
| GRANT SELECT ON TABLE db1.training TO ROLE student;</codeblock> |
| </example> |
| |
| <example> |
| <title>Privileges for Working with External Data Files</title> |
| <p> When data is being inserted through the <codeph>LOAD DATA</codeph> statement or is |
| referenced from an HDFS location outside the normal Impala database directories, the |
| user also needs appropriate permissions on the URIs corresponding to those HDFS |
| locations. </p> |
| <p> In this example: </p> |
| <ul> |
| <li> The <codeph>external_table</codeph> role can insert into and query the Impala |
| table, <codeph>external_table.sample</codeph>. </li> |
| |
| <li> The <codeph>staging_dir</codeph> role can specify the HDFS path |
| <filepath>/user/impala-user/external_data</filepath> with the <codeph>LOAD |
| DATA</codeph> statement. When Impala queries or loads data files, it operates on all |
| the files in that directory, not just a single file, so any Impala |
| <codeph>LOCATION</codeph> parameters refer to a directory rather than an individual |
| file. </li> |
| </ul> |
| <codeblock>CREATE ROLE external_table; |
| GRANT ROLE external_table TO GROUP impala_users; |
| GRANT ALL ON TABLE external_table.sample TO ROLE external_table; |
| |
| CREATE ROLE staging_dir; |
| GRANT ROLE staging TO GROUP impala_users; |
| GRANT ALL ON URI 'hdfs://127.0.0.1:8020/user/impala-user/external_data' TO ROLE staging_dir;</codeblock> |
| </example> |
| |
| <example> |
| <title>Separating Administrator Responsibility from Read and Write Privileges</title> |
| <p> To create a database, you need the full privilege on that database while day-to-day |
| operations on tables within that database can be performed with lower levels of |
| privilege on a specific table. Thus, you might set up separate roles for each database |
| or application: an administrative one that could create or drop the database, and a |
| user-level one that can access only the relevant tables. </p> |
| <p> In this example, the responsibilities are divided between users in 3 different groups: </p> |
| <ul> |
| <li> Members of the <codeph>supergroup</codeph> group have the |
| <codeph>training_sysadmin</codeph> role and so can set up a database named |
| <codeph>training</codeph>. </li> |
| |
| <li> Members of the <codeph>impala_users</codeph> group have the |
| <codeph>instructor</codeph> role and so can create, insert into, and query any |
| tables in the <codeph>training</codeph> database, but cannot create or drop the |
| database itself. </li> |
| |
| <li> Members of the <codeph>visitor</codeph> group have the <codeph>student</codeph> |
| role and so can query those tables in the <codeph>training</codeph> database. </li> |
| </ul> |
| <codeblock>CREATE ROLE training_sysadmin; |
| GRANT ROLE training_sysadmin TO GROUP supergroup; |
| GRANT ALL ON DATABASE training TO ROLE training_sysadmin; |
| |
| CREATE ROLE instructor; |
| GRANT ROLE instructor TO GROUP impala_users; |
| GRANT ALL ON TABLE training.course1 TO ROLE instructor; |
| |
| CREATE ROLE student; |
| GRANT ROLE student TO GROUP visitor; |
| GRANT SELECT ON TABLE training.course1 TO ROLE student;</codeblock> |
| </example> |
| |
| </conbody> |
| |
| </concept> |
| |
| </concept> |
| |
| <concept id="security_schema"> |
| |
| <title>Setting Up Schema Objects for a Secure Impala Deployment</title> |
| |
| <conbody> |
| |
| <p> |
| In your role definitions, you must specify privileges at the level of individual |
| databases and tables, or all databases or all tables within a database. To simplify the |
| structure of these rules, plan ahead of time how to name your schema objects so that |
| data with different authorization requirements are divided into separate databases. |
| </p> |
| |
| <p> |
| If you are adding security on top of an existing Impala deployment, you can rename |
| tables or even move them between databases using the <codeph>ALTER TABLE</codeph> |
| statement. |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| |
| <concept id="sec_ex_default"> |
| |
| <title>The DEFAULT Database in a Secure Deployment</title> |
| |
| <conbody> |
| |
| <p> |
| Because of the extra emphasis on granular access controls in a secure deployment, you |
| should move any important or sensitive information out of the <codeph>DEFAULT</codeph> |
| database into a named database. Sometimes you might need to give privileges on the |
| <codeph>DEFAULT</codeph> database for administrative reasons, for example, as a place |
| you can reliably specify with a <codeph>USE</codeph> statement when preparing to drop a |
| database. |
| </p> |
| |
| </conbody> |
| |
| </concept> |
| <concept id="sec_ranger_col_masking"> |
| <title>Ranger Column Masking</title> |
| <conbody> |
| <p> Ranger column masking hides sensitive columnar data in Impala query output. For example, you |
| can define a policy that reveals only the first or last four characters of column data. |
| Column masking is enabled by default. The Impala behavior mimics Hive behavior with respect |
| to column masking. For more information, see the <xref |
| href="https://cwiki.apache.org/confluence/display/RANGER/Row-level+filtering+and+column-masking+using+Apache+Ranger+policies+in+Apache+Hive" |
| format="html" scope="external">Apache Ranger documentation</xref>.</p> |
| |
| <p> |
| The following table lists all supported, built-in mask types for defining column masking in |
| a policy using the Ranger REST API. <table rowsep="1" colsep="1" id="table_mask_types"> |
| <tgroup cols="4"> |
| <colspec colname="c1" colnum="1"/> |
| <colspec colname="c2" colnum="2"/> |
| <colspec colname="c3" colnum="3"/> |
| <colspec colname="c4" colnum="4"/> |
| <thead> |
| <row> |
| <entry>Type</entry> |
| <entry>Name</entry> |
| <entry>Description</entry> |
| <entry>Transformer</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry>MASK</entry> |
| <entry>Redact</entry> |
| <entry>Replace lowercase with 'x', uppercase with 'X', digits with '0'</entry> |
| <entry>mask({col})</entry> |
| </row> |
| <row> |
| <entry>MASK_SHOW_LAST_4</entry> |
| <entry>Partial mask: show last 4</entry> |
| <entry>Show last 4 characters; replace rest with 'x'</entry> |
| <entry>mask_show_last_n({col}, 4, 'x', 'x', 'x', -1, '1')</entry> |
| </row> |
| <row> |
| <entry>MASK_SHOW_FIRST_4</entry> |
| <entry>Partial mask: show first 4</entry> |
| <entry>Show first 4 characters; replace rest with 'x'</entry> |
| <entry>mask_show_first_n({col}, 4, 'x', 'x', 'x', -1, '1')</entry> |
| </row> |
| <row> |
| <entry>MASK_HASH</entry> |
| <entry>Hash</entry> |
| <entry>Hash the value</entry> |
| <entry>mask_hash({col})</entry> |
| </row> |
| <row> |
| <entry>MASK_NULL</entry> |
| <entry>Nullify</entry> |
| <entry>Replace with NULL</entry> |
| <entry> N/A</entry> |
| </row> |
| <row> |
| <entry>MASK_NONE</entry> |
| <entry>Unmasked (retain original value)</entry> |
| <entry>No masking</entry> |
| <entry> N/A</entry> |
| </row> |
| <row> |
| <entry>MASK_DATE_SHOW_YEAR</entry> |
| <entry>Date: show only year</entry> |
| <entry>Date: show only year</entry> |
| <entry>mask({col}, 'x', 'x', 'x', -1, '1', 1, 0, -1)</entry> |
| </row> |
| <row> |
| <entry>CUSTOM</entry> |
| <entry>Custom</entry> |
| <entry>Custom</entry> |
| <entry>N/A</entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| </p> |
| </conbody> |
| </concept> |
| <concept id="lim_on_mask_functions"> |
| <title>Limitations on Mask Functions</title> |
| <conbody> |
| <p>The mask functions in Hive are implemented through GenericUDFs. Even though Impala users |
| can call Hive UDFs, Impala does not yet support Hive GenericUDFs, so you cannot use Hive's |
| mask functions in Impala. However, Impala has builtin mask functions that are implemented |
| through overloads. In Impala, when using mask functions, not all parameter combinations are |
| supported. These mask functions are introduced in Impala 3.4</p> |
| <p>The following list includes all the implemented overloads.</p> |
| <ul> |
| <li>Overloads used by Ranger default masking policies,</li> |
| <li>Overloads with simple arguments,</li> |
| <li>Overload with all arguments in <codeph>int</codeph> type for full functionality. Char |
| argument needs to be converted to their ASCII value.</li> |
| </ul> |
| <p>To list the available overloads, use the following query:</p> |
| <codeblock>show functions in _impala_builtins like "mask*";</codeblock> |
| <note> |
| <ul> |
| <li>An error message that states "<i>No matching function with signature: mask...</i>" |
| implies that Impala does not contain the corresponding overload.</li> |
| </ul> |
| </note> |
| </conbody> |
| </concept> |
| </concept> |