| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept rev="1.1" id="authorization"> |
| |
| <title>Enabling Sentry Authorization for Impala</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Security"/> |
| <data name="Category" value="Sentry"/> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Starting and Stopping"/> |
| <data name="Category" value="Users"/> |
| <data name="Category" value="Groups"/> |
| <data name="Category" value="Administrators"/> |
| </metadata> |
| </prolog> |
| |
| <conbody id="sentry"> |
| |
| <p> |
| Authorization determines which users are allowed to access which resources, and what operations they are |
| allowed to perform. In Impala 1.1 and higher, you use Apache Sentry for |
| authorization. Sentry adds a fine-grained authorization framework for Hadoop. By default (when authorization |
| is not enabled), Impala does all read and write operations with the privileges of the <codeph>impala</codeph> |
| user, which is suitable for a development/test environment but not for a secure production environment. When |
| authorization is enabled, Impala uses the OS user ID of the user who runs <cmdname>impala-shell</cmdname> or |
| other client program, and associates various privileges with each user. |
| </p> |
| |
| <note> |
| Sentry is typically used in conjunction with Kerberos authentication, which defines which hosts are allowed |
| to connect to each server. Using the combination of Sentry and Kerberos prevents malicious users from being |
| able to connect by creating a named account on an untrusted machine. See |
| <xref href="impala_kerberos.xml#kerberos"/> for details about Kerberos authentication. |
| </note> |
| |
| <p audience="PDF" outputclass="toc inpage"> |
| See the following sections for details about using the Impala authorization features: |
| </p> |
| </conbody> |
| |
| <concept id="sentry_priv_model"> |
| |
| <title>The Sentry Privilege Model</title> |
| |
| <conbody> |
| |
| <p> |
| Privileges can be granted on different objects in the schema. Any privilege that can be granted is |
| associated with a level in the object hierarchy. If a privilege is granted on a container object in the |
| hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other |
| database systems such as MySQL. |
| </p> |
| |
| <p rev="2.3.0 collevelauth"> |
| The object hierarchy for Impala covers Server, URI, Database, Table, and Column. (The Table privileges apply to views as well; |
| anywhere you specify a table name, you can specify a view name instead.) |
| Column-level authorization is available in <keyword keyref="impala23_full"/> and higher. |
| Previously, you constructed views to query specific columns and assigned privilege based on |
| the views rather than the base tables. Now, you can use Impala's <xref href="impala_grant.xml"/> and |
| <xref href="impala_revoke.xml"/> statements to assign and revoke privileges from specific columns |
| in a table. |
| </p> |
| |
| <p> |
| A restricted set of privileges determines what you can do with each object: |
| </p> |
| |
| <dl> |
| <dlentry id="select_priv"> |
| |
| <dt> |
| SELECT privilege |
| </dt> |
| |
| <dd> |
| Lets you read data from a table or view, for example with the <codeph>SELECT</codeph> statement, the |
| <codeph>INSERT...SELECT</codeph> syntax, or <codeph>CREATE TABLE...LIKE</codeph>. Also required to |
| issue the <codeph>DESCRIBE</codeph> statement or the <codeph>EXPLAIN</codeph> statement for a query |
| against a particular table. Only objects for which a user has this privilege are shown in the output |
| for <codeph>SHOW DATABASES</codeph> and <codeph>SHOW TABLES</codeph> statements. The |
| <codeph>REFRESH</codeph> statement and <codeph>INVALIDATE METADATA</codeph> statements only access |
| metadata for tables for which the user has this privilege. |
| </dd> |
| |
| </dlentry> |
| |
| <dlentry id="insert_priv"> |
| |
| <dt> |
| INSERT privilege |
| </dt> |
| |
| <dd> |
| Lets you write data to a table. Applies to the <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> |
| statements. |
| </dd> |
| |
| </dlentry> |
| |
| <dlentry id="all_priv"> |
| |
| <dt> |
| ALL privilege |
| </dt> |
| |
| <dd> |
| Lets you create or modify the object. Required to run DDL statements such as <codeph>CREATE |
| TABLE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>DROP TABLE</codeph> for a table, |
| <codeph>CREATE DATABASE</codeph> or <codeph>DROP DATABASE</codeph> for a database, or <codeph>CREATE |
| VIEW</codeph>, <codeph>ALTER VIEW</codeph>, or <codeph>DROP VIEW</codeph> for a view. Also required for |
| the URI of the <q>location</q> parameter for the <codeph>CREATE EXTERNAL TABLE</codeph> and |
| <codeph>LOAD DATA</codeph> statements. |
| <!-- Have to think about the best wording, how often to repeat, how best to conref this caveat. |
| You do not actually code the keyword <codeph>ALL</codeph> in the policy file; instead you use |
| <codeph>action=*</codeph> or shorten the right-hand portion of the rule. |
| --> |
| </dd> |
| |
| </dlentry> |
| </dl> |
| |
| <p> |
| Privileges can be specified for a table or view before that object actually exists. If you do not have |
| sufficient privilege to perform an operation, the error message does not disclose if the object exists or |
| not. |
| </p> |
| |
| <p> |
| Originally, privileges were encoded in a policy file, stored in HDFS. This mode of operation is still an |
| option, but the emphasis of privilege management is moving towards being SQL-based. Although currently |
| Impala does not have <codeph>GRANT</codeph> or <codeph>REVOKE</codeph> statements, Impala can make use of |
| privileges assigned through <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements done through |
| Hive. The mode of operation with <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements instead of |
| the policy file requires that a special Sentry service be enabled; this service stores, retrieves, and |
| manipulates privilege information stored inside the metastore database. |
| </p> |
| </conbody> |
| </concept> |
| |
| <concept id="secure_startup"> |
| |
| <title>Starting the impalad Daemon with Sentry Authorization Enabled</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Starting and Stopping"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| To run the <cmdname>impalad</cmdname> daemon with authorization enabled, you add one or more options to the |
| <codeph>IMPALA_SERVER_ARGS</codeph> declaration in the <filepath>/etc/default/impala</filepath> |
| configuration file: |
| </p> |
| |
| <ul> |
| <li> |
| The <codeph>-server_name</codeph> option turns on Sentry authorization for Impala. The authorization |
| rules refer to a symbolic server name, and you specify the name to use as the argument to the |
| <codeph>-server_name</codeph> option. |
| </li> |
| |
| <li rev="1.4.0"> |
| If you specify just <codeph>-server_name</codeph>, Impala uses the Sentry service for authorization, |
| relying on the results of <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements issued through |
| Hive. (This mode of operation is available in Impala 1.4.0 and higher.) Prior to Impala 1.4.0, or if you |
| want to continue storing privilege rules in the policy file, also specify the |
| <codeph>-authorization_policy_file</codeph> option as in the following item. |
| </li> |
| |
| <li> |
| Specifying the <codeph>-authorization_policy_file</codeph> option in addition to |
| <codeph>-server_name</codeph> makes Impala read privilege information from a policy file, rather than |
| from the metastore database. The argument to the <codeph>-authorization_policy_file</codeph> option |
| specifies the HDFS path to the policy file that defines the privileges on different schema objects. |
| </li> |
| </ul> |
| |
| <p rev="1.4.0"> |
| For example, you might adapt your <filepath>/etc/default/impala</filepath> configuration to contain lines |
| like the following. To use the Sentry service rather than the policy file: |
| </p> |
| |
| <codeblock rev="1.4.0">IMPALA_SERVER_ARGS=" \ |
| -server_name=server1 \ |
| ... |
| </codeblock> |
| |
| <p> |
| Or to use the policy file, as in releases prior to Impala 1.4: |
| </p> |
| |
| <codeblock>IMPALA_SERVER_ARGS=" \ |
| -authorization_policy_file=/user/hive/warehouse/auth-policy.ini \ |
| -server_name=server1 \ |
| ... |
| </codeblock> |
| |
| <p> |
| The preceding examples set up a symbolic name of <codeph>server1</codeph> to refer to the current instance |
| of Impala. This symbolic name is used in the following ways: |
| </p> |
| |
| <ul> |
| <li> |
| <p> |
| Specify the <codeph>server1</codeph> value for the <codeph>sentry.hive.server</codeph> property in the |
| <filepath>sentry-site.xml</filepath> configuration file for Hive, as well as in the |
| <codeph>-server_name</codeph> option for <cmdname>impalad</cmdname>. |
| </p> |
| <p> |
| If the <cmdname>impalad</cmdname> daemon is not already running, start it as described in |
| <xref href="impala_processes.xml#processes"/>. If it is already running, restart it with the command |
| <codeph>sudo /etc/init.d/impala-server restart</codeph>. Run the appropriate commands on all the nodes |
| where <cmdname>impalad</cmdname> normally runs. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| If you use the mode of operation using the policy file, the rules in the <codeph>[roles]</codeph> |
| section of the policy file refer to this same <codeph>server1</codeph> name. For example, the following |
| rule sets up a role <codeph>report_generator</codeph> that lets users with that role query any table in |
| a database named <codeph>reporting_db</codeph> on a node where the <cmdname>impalad</cmdname> daemon |
| was started up with the <codeph>-server_name=server1</codeph> option: |
| </p> |
| <codeblock>[roles] |
| report_generator = server=server1->db=reporting_db->table=*->action=SELECT |
| </codeblock> |
| </li> |
| </ul> |
| |
| <p> |
| When <cmdname>impalad</cmdname> is started with one or both of the <codeph>-server_name=server1</codeph> |
| and <codeph>-authorization_policy_file</codeph> options, Impala authorization is enabled. If Impala detects |
| any errors or inconsistencies in the authorization settings or the policy file, the daemon refuses to |
| start. |
| </p> |
| </conbody> |
| </concept> |
| |
| <concept id="sentry_service"> |
| |
| <title>Using Impala with the Sentry Service (<keyword keyref="impala14"/> or higher only)</title> |
| |
| <conbody> |
| |
| <p> |
| When you use the Sentry service rather than the policy file, you set up privileges through |
| <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statement in either Impala or Hive, then both components |
| use those same privileges automatically. (Impala added the <codeph>GRANT</codeph> and |
| <codeph>REVOKE</codeph> statements in <keyword keyref="impala20_full"/>.) |
| </p> |
| |
| </conbody> |
| </concept> |
| |
| <concept id="security_policy_file"> |
| |
| <title>Using Impala with the Sentry Policy File</title> |
| |
| <conbody> |
| |
| <p> |
| The policy file is a file that you put in a designated location in HDFS, and is read during the startup of |
| the <cmdname>impalad</cmdname> daemon when you specify both the <codeph>-server_name</codeph> and |
| <codeph>-authorization_policy_file</codeph> startup options. It controls which objects (databases, tables, |
| and HDFS directory paths) can be accessed by the user who connects to <cmdname>impalad</cmdname>, and what |
| operations that user can perform on the objects. |
| </p> |
| |
| <note rev="1.4.0"> |
| <p rev="1.4.0"> |
| The Sentry service, as described in <xref href="impala_authorization.xml#sentry_service"/>, stores |
| authorization metadata in a relational database. This means you can manage user privileges for Impala tables |
| using traditional <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> SQL statements, rather than the |
| policy file approach described here.If you are still using policy files, migrate to the |
| database-backed service whenever practical. |
| </p> |
| </note> |
| |
| <p> |
| The location of the policy file is listed in the <filepath>auth-site.xml</filepath> configuration file. To |
| minimize overhead, the security information from this file is cached by each <cmdname>impalad</cmdname> |
| daemon and refreshed automatically, with a default interval of 5 minutes. After making a substantial change |
| to security policies, restart all Impala daemons to pick up the changes immediately. |
| </p> |
| |
| <p outputclass="toc inpage"/> |
| </conbody> |
| |
| <concept id="security_policy_file_details"> |
| |
| <title>Policy File Location and Format</title> |
| |
| <conbody> |
| |
| <p> |
| The policy file uses the familiar <codeph>.ini</codeph> format, divided into the major sections |
| <codeph>[groups]</codeph> and <codeph>[roles]</codeph>. There is also an optional |
| <codeph>[databases]</codeph> section, which allows you to specify a specific policy file for a particular |
| database, as explained in <xref href="#security_multiple_policy_files"/>. Another optional section, |
| <codeph>[users]</codeph>, allows you to override the OS-level mapping of users to groups; that is an |
| advanced technique primarily for testing and debugging, and is beyond the scope of this document. |
| </p> |
| |
| <p> |
| In the <codeph>[groups]</codeph> section, you define various categories of users and select which roles |
| are associated with each category. The group and usernames correspond to Linux groups and users on the |
| server where the <cmdname>impalad</cmdname> daemon runs. |
| </p> |
| |
| <p> |
| The group and usernames in the <codeph>[groups]</codeph> section correspond to Linux groups and users on |
| the server where the <cmdname>impalad</cmdname> daemon runs. When you access Impala through the |
| <cmdname>impalad</cmdname> interpreter, for purposes of authorization, the user is the logged-in Linux |
| user and the groups are the Linux groups that user is a member of. When you access Impala through the |
| ODBC or JDBC interfaces, the user and password specified through the connection string are used as login |
| credentials for the Linux server, and authorization is based on that username and the associated Linux |
| group membership. |
| </p> |
| |
| <p> |
| In the <codeph>[roles]</codeph> section, you a set of roles. For each role, you specify precisely the set |
| of privileges is available. That is, which objects users with that role can access, and what operations |
| they can perform on those objects. This is the lowest-level category of security information; the other |
| sections in the policy file map the privileges to higher-level divisions of groups and users. In the |
| <codeph>[groups]</codeph> section, you specify which roles are associated with which groups. The group |
| and usernames correspond to Linux groups and users on the server where the <cmdname>impalad</cmdname> |
| daemon runs. The privileges are specified using patterns like: |
| <codeblock>server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=SELECT |
| server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=CREATE |
| server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=ALL |
| </codeblock> |
| For the <varname>server_name</varname> value, substitute the same symbolic name you specify with the |
| <cmdname>impalad</cmdname> <codeph>-server_name</codeph> option. You can use <codeph>*</codeph> wildcard |
| characters at each level of the privilege specification to allow access to all such objects. For example: |
| <codeblock>server=impala-host.example.com->db=default->table=t1->action=SELECT |
| server=impala-host.example.com->db=*->table=*->action=CREATE |
| server=impala-host.example.com->db=*->table=audit_log->action=SELECT |
| server=impala-host.example.com->db=default->table=t1->action=* |
| </codeblock> |
| </p> |
| |
| <p> |
| When authorization is enabled, Impala uses the policy file as a <i>whitelist</i>, representing every |
| privilege available to any user on any object. That is, only operations specified for the appropriate |
| combination of object, role, group, and user are allowed; all other operations are not allowed. If a |
| group or role is defined multiple times in the policy file, the last definition takes precedence. |
| </p> |
| |
| <p> |
| To understand the notion of whitelisting, set up a minimal policy file that does not provide any |
| privileges for any object. When you connect to an Impala node where this policy file is in effect, you |
| get no results for <codeph>SHOW DATABASES</codeph>, and an error when you issue any <codeph>SHOW |
| TABLES</codeph>, <codeph>USE <varname>database_name</varname></codeph>, <codeph>DESCRIBE |
| <varname>table_name</varname></codeph>, <codeph>SELECT</codeph>, and or other statements that expect to |
| access databases or tables, even if the corresponding databases and tables exist. |
| </p> |
| |
| <p> |
| The contents of the policy file are cached, to avoid a performance penalty for each query. The policy |
| file is re-checked by each <cmdname>impalad</cmdname> node every 5 minutes. When you make a |
| non-time-sensitive change such as adding new privileges or new users, you can let the change take effect |
| automatically a few minutes later. If you remove or reduce privileges, and want the change to take effect |
| immediately, restart the <cmdname>impalad</cmdname> daemon on all nodes, again specifying the |
| <codeph>-server_name</codeph> and <codeph>-authorization_policy_file</codeph> options so that the rules |
| from the updated policy file are applied. |
| </p> |
| </conbody> |
| </concept> |
| |
| <concept id="security_examples"> |
| |
| <title>Examples of Policy File Rules for Security Scenarios</title> |
| |
| <conbody> |
| |
| <p> |
| The following examples show rules that might go in the policy file to deal with various |
| authorization-related scenarios. For illustration purposes, this section shows several very small policy |
| files with only a few rules each. In your environment, typically you would define many roles to cover all |
| the scenarios involving your own databases, tables, and applications, and a smaller number of groups, |
| whose members are given the privileges from one or more roles. |
| </p> |
| |
| <example id="sec_ex_unprivileged"> |
| |
| <title>A User with No Privileges</title> |
| |
| <p> |
| If a user has no privileges at all, that user cannot access any schema objects in the system. The error |
| messages do not disclose the names or existence of objects that the user is not authorized to read. |
| </p> |
| |
| <p> |
| <!-- This example demonstrates the lack of privileges using a blank policy file, so no users have any privileges. --> |
| This is the experience you want a user to have if they somehow log into a system where they are not an |
| authorized Impala user. In a real deployment with a filled-in policy file, a user might have no |
| privileges because they are not a member of any of the relevant groups mentioned in the policy file. |
| </p> |
| |
| <!-- Have the raw material but not formatted into easily digestible example. Do for first 1.1 doc refresh. |
| <codeblock></codeblock> --> |
| |
| </example> |
| |
| <example id="sec_ex_superuser"> |
| |
| <title>Examples of Privileges for Administrative Users</title> |
| |
| <p> |
| When an administrative user has broad access to tables or databases, the associated rules in the |
| <codeph>[roles]</codeph> section typically use wildcards and/or inheritance. For example, in the |
| following sample policy file, <codeph>db=*</codeph> refers to all databases and |
| <codeph>db=*->table=*</codeph> refers to all tables in all databases. |
| </p> |
| |
| <p> |
| Omitting the rightmost portion of a rule means that the privileges apply to all the objects that could |
| be specified there. For example, in the following sample policy file, the |
| <codeph>all_databases</codeph> role has all privileges for all tables in all databases, while the |
| <codeph>one_database</codeph> role has all privileges for all tables in one specific database. The |
| <codeph>all_databases</codeph> role does not grant privileges on URIs, so a group with that role could |
| not issue a <codeph>CREATE TABLE</codeph> statement with a <codeph>LOCATION</codeph> clause. The |
| <codeph>entire_server</codeph> role has all privileges on both databases and URIs within the server. |
| </p> |
| |
| <codeblock>[groups] |
| supergroup = all_databases |
| |
| [roles] |
| read_all_tables = server=server1->db=*->table=*->action=SELECT |
| all_tables = server=server1->db=*->table=* |
| all_databases = server=server1->db=* |
| one_database = server=server1->db=test_db |
| entire_server = server=server1 |
| </codeblock> |
| |
| </example> |
| |
| <example id="sec_ex_detailed"> |
| |
| <title>A User with Privileges for Specific Databases and Tables</title> |
| |
| <p> |
| If a user has privileges for specific tables in specific databases, the user can access those things |
| but nothing else. They can see the tables and their parent databases in the output of <codeph>SHOW |
| TABLES</codeph> and <codeph>SHOW DATABASES</codeph>, <codeph>USE</codeph> the appropriate databases, |
| and perform the relevant actions (<codeph>SELECT</codeph> and/or <codeph>INSERT</codeph>) based on the |
| table privileges. To actually create a table requires the <codeph>ALL</codeph> privilege at the |
| database level, so you might define separate roles for the user that sets up a schema and other users |
| or applications that perform day-to-day operations on the tables. |
| </p> |
| |
| <p> |
| The following sample policy file shows some of the syntax that is appropriate as the policy file grows, |
| such as the <codeph>#</codeph> comment syntax, <codeph>\</codeph> continuation syntax, and comma |
| separation for roles assigned to groups or privileges assigned to roles. |
| </p> |
| |
| <codeblock>[groups] |
| employee = training_sysadmin, instructor |
| visitor = student |
| |
| [roles] |
| training_sysadmin = server=server1->db=training, \ |
| server=server1->db=instructor_private, \ |
| server=server1->db=lesson_development |
| instructor = server=server1->db=training->table=*->action=*, \ |
| server=server1->db=instructor_private->table=*->action=*, \ |
| server=server1->db=lesson_development->table=lesson* |
| # This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP. |
| student = server=server1->db=training->table=lesson_*->action=SELECT |
| </codeblock> |
| |
| </example> |
| |
| <!-- |
| <example id="sec_ex_superuser_single_db"> |
| <title>A User with Full Privileges for a Specific Database</title> |
| <p> |
| |
| </p> |
| <codeblock></codeblock> |
| </example> |
| |
| <example id="sec_ex_readonly_single_db"> |
| <title>A User with Read-Only Privileges for a Specific Database</title> |
| <p> |
| |
| </p> |
| <codeblock></codeblock> |
| <p> |
| If a user has <codeph>SELECT</codeph> privilege for a database, they can issue a <codeph>USE</codeph> statement |
| for that database. Whether or not they can access tables within the database depends on further privileges |
| defined at the table level. |
| </p> |
| |
| <codeblock></codeblock> |
| |
| </example> |
| |
| <example id="sec_ex_superuser_single_table"> |
| <title>A User with Full Privileges for a Specific Table</title> |
| <p> |
| If a user has <codeph>SELECT</codeph> privilege for a table, they can query, describe, or explain queries for |
| that table. |
| </p> |
| |
| <codeblock></codeblock> |
| </example> |
| |
| <example id="sec_ex_load_data"> |
| <title>A User with Privileges to Load Data but not Read Data</title> |
| |
| <p> |
| If a user has <codeph>INSERT</codeph> privilege for a table, they can write to the table if it already exists. |
| They cannot create or alter the table; those operations require the <codeph>ALL</codeph> privilege. |
| </p> |
| <codeblock></codeblock> |
| </example> |
| --> |
| |
| <example id="sec_ex_external_files"> |
| |
| <title>Privileges for Working with External Data Files</title> |
| |
| <p> |
| When data is being inserted through the <codeph>LOAD DATA</codeph> statement, or is referenced from an |
| HDFS location outside the normal Impala database directories, the user also needs appropriate |
| permissions on the URIs corresponding to those HDFS locations. |
| </p> |
| |
| <p> |
| In this sample policy file: |
| </p> |
| |
| <ul> |
| <li> |
| The <codeph>external_table</codeph> role lets us insert into and query the Impala table, |
| <codeph>external_table.sample</codeph>. |
| </li> |
| |
| <li> |
| The <codeph>staging_dir</codeph> role lets us specify the HDFS path |
| <filepath>/user/username/external_data</filepath> with the <codeph>LOAD DATA</codeph> statement. |
| Remember, when Impala queries or loads data files, it operates on all the files in that directory, |
| not just a single file, so any Impala <codeph>LOCATION</codeph> parameters refer to a directory |
| rather than an individual file. |
| </li> |
| |
| <li> |
| We included the IP address and port of the Hadoop name node in the HDFS URI of the |
| <codeph>staging_dir</codeph> rule. We found those details in |
| <filepath>/etc/hadoop/conf/core-site.xml</filepath>, under the <codeph>fs.default.name</codeph> |
| element. That is what we use in any roles that specify URIs (that is, the locations of directories in |
| HDFS). |
| </li> |
| |
| <li> |
| We start this example after the table <codeph>external_table.sample</codeph> is already created. In |
| the policy file for the example, we have already taken away the <codeph>external_table_admin</codeph> |
| role from the <codeph>username</codeph> group, and replaced it with the lesser-privileged |
| <codeph>external_table</codeph> role. |
| </li> |
| |
| <li> |
| We assign privileges to a subdirectory underneath <filepath>/user/username</filepath> in HDFS, |
| because such privileges also apply to any subdirectories underneath. If we had assigned privileges to |
| the parent directory <filepath>/user/username</filepath>, it would be too likely to mess up other |
| files by specifying a wrong location by mistake. |
| </li> |
| |
| <li> |
| The <codeph>username</codeph> under the <codeph>[groups]</codeph> section refers to the |
| <codeph>username</codeph> group. (In this example, there is a <codeph>username</codeph> user |
| that is a member of a <codeph>username</codeph> group.) |
| </li> |
| </ul> |
| |
| <p> |
| Policy file: |
| </p> |
| |
| <codeblock>[groups] |
| username = external_table, staging_dir |
| |
| [roles] |
| external_table_admin = server=server1->db=external_table |
| external_table = server=server1->db=external_table->table=sample->action=* |
| staging_dir = server=server1->uri=hdfs://127.0.0.1:8020/user/username/external_data->action=* |
| </codeblock> |
| |
| <p> |
| <cmdname>impala-shell</cmdname> session: |
| </p> |
| |
| <codeblock>[localhost:21000] > use external_table; |
| Query: use external_table |
| [localhost:21000] > show tables; |
| Query: show tables |
| Query finished, fetching results ... |
| +--------+ |
| | name | |
| +--------+ |
| | sample | |
| +--------+ |
| Returned 1 row(s) in 0.02s |
| |
| [localhost:21000] > select * from sample; |
| Query: select * from sample |
| Query finished, fetching results ... |
| +-----+ |
| | x | |
| +-----+ |
| | 1 | |
| | 5 | |
| | 150 | |
| +-----+ |
| Returned 3 row(s) in 1.04s |
| |
| [localhost:21000] > load data inpath '/user/username/external_data' into table sample; |
| Query: load data inpath '/user/username/external_data' into table sample |
| Query finished, fetching results ... |
| +----------------------------------------------------------+ |
| | summary | |
| +----------------------------------------------------------+ |
| | Loaded 1 file(s). Total files in destination location: 2 | |
| +----------------------------------------------------------+ |
| Returned 1 row(s) in 0.26s |
| [localhost:21000] > select * from sample; |
| Query: select * from sample |
| Query finished, fetching results ... |
| +-------+ |
| | x | |
| +-------+ |
| | 2 | |
| | 4 | |
| | 6 | |
| | 8 | |
| | 64738 | |
| | 49152 | |
| | 1 | |
| | 5 | |
| | 150 | |
| +-------+ |
| Returned 9 row(s) in 0.22s |
| |
| [localhost:21000] > load data inpath '/user/username/unauthorized_data' into table sample; |
| Query: load data inpath '/user/username/unauthorized_data' into table sample |
| ERROR: AuthorizationException: User 'username' does not have privileges to access: hdfs://127.0.0.1:8020/user/username/unauthorized_data |
| </codeblock> |
| |
| </example> |
| |
| <example audience="hidden" id="sec_ex_views" rev="2.3.0 collevelauth"> |
| |
| <title>Controlling Access at the Column Level through Views</title> |
| |
| <p> |
| If a user has <codeph>SELECT</codeph> privilege for a view, they can query the view, even if they do |
| not have any privileges on the underlying table. To see the details about the underlying table through |
| <codeph>EXPLAIN</codeph> or <codeph>DESCRIBE FORMATTED</codeph> statements on the view, the user must |
| also have <codeph>SELECT</codeph> privilege for the underlying table. |
| </p> |
| |
| <note type="important"> |
| <p> |
| The types of data that are considered sensitive and confidential differ depending on the jurisdiction |
| the type of industry, or both. For fine-grained access controls, set up appropriate privileges based |
| on all applicable laws and regulations. |
| </p> |
| <p> |
| Be careful using the <codeph>ALTER VIEW</codeph> statement to point an existing view at a different |
| base table or a new set of columns that includes sensitive or restricted data. Make sure that any |
| users who have <codeph>SELECT</codeph> privilege on the view do not gain access to any additional |
| information they are not authorized to see. |
| </p> |
| </note> |
| |
| <p> |
| The following example shows how a system administrator could set up a table containing some columns |
| with sensitive information, then create a view that only exposes the non-confidential columns. |
| </p> |
| |
| <codeblock>[localhost:21000] > create table sensitive_info |
| > ( |
| > name string, |
| > address string, |
| > credit_card string, |
| > taxpayer_id string |
| > ); |
| [localhost:21000] > create view name_address_view as select name, address from sensitive_info; |
| </codeblock> |
| |
| <p> |
| Then the following policy file specifies read-only privilege for that view, without authorizing access |
| to the underlying table: |
| </p> |
| |
| <codeblock>[groups] |
| employee = view_only_privs |
| |
| [roles] |
| view_only_privs = server=server1->db=reports->table=name_address_view->action=SELECT |
| </codeblock> |
| |
| <p> |
| Thus, a user with the <codeph>view_only_privs</codeph> role could access through Impala queries the |
| basic information but not the sensitive information, even if both kinds of information were part of the |
| same data file. |
| </p> |
| |
| <p> |
| You might define other views to allow users from different groups to query different sets of columns. |
| </p> |
| |
| </example> |
| |
| <example id="sec_sysadmin"> |
| |
| <title>Separating Administrator Responsibility from Read and Write Privileges</title> |
| |
| <p> |
| Remember that to create a database requires full privilege on that database, while day-to-day |
| operations on tables within that database can be performed with lower levels of privilege on specific |
| table. Thus, you might set up separate roles for each database or application: an administrative one |
| that could create or drop the database, and a user-level one that can access only the relevant tables. |
| </p> |
| |
| <p> |
| For example, this policy file divides responsibilities between users in 3 different groups: |
| </p> |
| |
| <ul> |
| <li> |
| Members of the <codeph>supergroup</codeph> group have the <codeph>training_sysadmin</codeph> role and |
| so can set up a database named <codeph>training</codeph>. |
| </li> |
| |
| <li> Members of the <codeph>employee</codeph> group have the |
| <codeph>instructor</codeph> role and so can create, insert into, |
| and query any tables in the <codeph>training</codeph> database, |
| but cannot create or drop the database itself. </li> |
| |
| <li> |
| Members of the <codeph>visitor</codeph> group have the <codeph>student</codeph> role and so can query |
| those tables in the <codeph>training</codeph> database. |
| </li> |
| </ul> |
| |
| <codeblock>[groups] |
| supergroup = training_sysadmin |
| employee = instructor |
| visitor = student |
| |
| [roles] |
| training_sysadmin = server=server1->db=training |
| instructor = server=server1->db=training->table=*->action=* |
| student = server=server1->db=training->table=*->action=SELECT |
| </codeblock> |
| |
| </example> |
| </conbody> |
| </concept> |
| |
| <concept id="security_multiple_policy_files"> |
| |
| <title>Using Multiple Policy Files for Different Databases</title> |
| |
| <conbody> |
| |
| <p> |
| For an Impala cluster with many databases being accessed by many users and applications, it might be |
| cumbersome to update the security policy file for each privilege change or each new database, table, or |
| view. You can allow security to be managed separately for individual databases, by setting up a separate |
| policy file for each database: |
| </p> |
| |
| <ul> |
| <li> |
| Add the optional <codeph>[databases]</codeph> section to the main policy file. |
| </li> |
| |
| <li> |
| Add entries in the <codeph>[databases]</codeph> section for each database that has its own policy file. |
| </li> |
| |
| <li> |
| For each listed database, specify the HDFS path of the appropriate policy file. |
| </li> |
| </ul> |
| |
| <p> |
| For example: |
| </p> |
| |
| <codeblock>[databases] |
| # Defines the location of the per-DB policy files for the 'customers' and 'sales' databases. |
| customers = hdfs://ha-nn-uri/etc/access/customers.ini |
| sales = hdfs://ha-nn-uri/etc/access/sales.ini |
| </codeblock> |
| |
| <p> |
| To enable URIs in per-DB policy files, the Java configuration option <codeph>sentry.allow.uri.db.policyfile</codeph> |
| must be set to <codeph>true</codeph>. For example: |
| </p> |
| |
| <codeblock>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true" |
| </codeblock> |
| |
| <note type="important"> |
| Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of the db-level |
| policy file to grant himself/herself load privileges to anything the <codeph>impala</codeph> user has |
| read permissions for in HDFS (including data in other databases controlled by different db-level policy |
| files). |
| </note> |
| </conbody> |
| </concept> |
| </concept> |
| |
| <concept id="security_schema"> |
| |
| <title>Setting Up Schema Objects for a Secure Impala Deployment</title> |
| |
| <conbody> |
| |
| <p> |
| Remember that in your role definitions, you specify privileges at the level of individual databases and |
| tables, or all databases or all tables within a database. To simplify the structure of these rules, plan |
| ahead of time how to name your schema objects so that data with different authorization requirements is |
| divided into separate databases. |
| </p> |
| |
| <p> |
| If you are adding security on top of an existing Impala deployment, remember that you can rename tables or |
| even move them between databases using the <codeph>ALTER TABLE</codeph> statement. In Impala, creating new |
| databases is a relatively inexpensive operation, basically just creating a new directory in HDFS. |
| </p> |
| |
| <p> |
| You can also plan the security scheme and set up the policy file before the actual schema objects named in |
| the policy file exist. Because the authorization capability is based on whitelisting, a user can only |
| create a new database or table if the required privilege is already in the policy file: either by listing |
| the exact name of the object being created, or a <codeph>*</codeph> wildcard to match all the applicable |
| objects within the appropriate container. |
| </p> |
| </conbody> |
| </concept> |
| |
| <concept id="security_privileges"> |
| |
| <title>Privilege Model and Object Hierarchy</title> |
| |
| <conbody> |
| |
| <p> |
| Privileges can be granted on different objects in the schema. Any privilege that can be granted is |
| associated with a level in the object hierarchy. If a privilege is granted on a container object in the |
| hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other |
| database systems such as MySQL. |
| </p> |
| |
| <p> |
| The kinds of objects in the schema hierarchy are: |
| </p> |
| |
| <codeblock>Server |
| URI |
| Database |
| Table |
| </codeblock> |
| |
| <p> |
| The server name is specified by the <codeph>-server_name</codeph> option when <cmdname>impalad</cmdname> |
| starts. Specify the same name for all <cmdname>impalad</cmdname> nodes in the cluster. |
| </p> |
| |
| <p> |
| URIs represent the HDFS paths you specify as part of statements such as <codeph>CREATE EXTERNAL |
| TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify what look like UNIX paths, but these |
| locations can also be prefixed with <codeph>hdfs://</codeph> to make clear that they are really URIs. To |
| set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in |
| that directory and any directories underneath it. |
| </p> |
| |
| <p rev="2.3.0 collevelauth"> |
| In <keyword keyref="impala23_full"/> and higher, you can specify privileges for individual columns. |
| Formerly, to specify read privileges at this level, you created a view that queried specific columns |
| and/or partitions from a base table, and gave <codeph>SELECT</codeph> privilege on the view but not |
| the underlying table. Now, you can use Impala's <xref href="impala_grant.xml"/> and |
| <xref href="impala_revoke.xml"/> statements to assign and revoke privileges from specific columns |
| in a table. |
| </p> |
| |
| <p> |
| URIs must start with either <codeph>hdfs://</codeph> or <codeph>file://</codeph>. If a URI starts with |
| anything else, it will cause an exception and the policy file will be invalid. When defining URIs for HDFS, |
| you must also specify the NameNode. For example: |
| <codeblock>data_read = server=server1->uri=file:///path/to/dir, \ |
| server=server1->uri=hdfs://namenode:port/path/to/dir |
| </codeblock> |
| <note type="warning"> |
| <p> |
| Because the NameNode host and port must be specified, enable High Availability (HA) to ensure |
| that the URI will remain constant even if the NameNode changes. |
| </p> |
| <codeblock>data_read = server=server1->uri=file:///path/to/dir,\ server=server1->uri=hdfs://ha-nn-uri/path/to/dir |
| </codeblock> |
| </note> |
| </p> |
| |
| <!-- Experiment with replacing my original copied table with a conref'ed version of Ambreen's from the Security Guide. --> |
| |
| <!-- |
| <table> |
| <title>Sentry privilege types and objects they apply to</title> |
| <tgroup cols="2"> |
| <colspec colnum="1" colname="col1"/> |
| <colspec colnum="2" colname="col2"/> |
| <tbody> |
| <row> |
| <entry>Privilege</entry> |
| <entry>Object</entry> |
| </row> |
| <row> |
| <entry>INSERT</entry> |
| <entry>TABLE, URI</entry> |
| </row> |
| <row> |
| <entry>SELECT</entry> |
| <entry>TABLE, VIEW, URI</entry> |
| </row> |
| <row> |
| <entry>ALL</entry> |
| <entry>SERVER, DB, URI</entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| --> |
| |
| <table conref="../shared/impala_common.xml#common/sentry_privileges_objects"> |
| <tgroup cols="2"> |
| <colspec colnum="1" colname="col1" colwidth="1*"/> |
| <tbody> |
| <row> |
| <entry/> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <note> |
| <p> |
| Although this document refers to the <codeph>ALL</codeph> privilege, currently if you use the policy file |
| mode, you do not use the actual keyword <codeph>ALL</codeph> in the policy file. When you code role |
| entries in the policy file: |
| </p> |
| <ul> |
| <li> |
| To specify the <codeph>ALL</codeph> privilege for a server, use a role like |
| <codeph>server=<varname>server_name</varname></codeph>. |
| </li> |
| |
| <li> |
| To specify the <codeph>ALL</codeph> privilege for a database, use a role like |
| <codeph>server=<varname>server_name</varname>->db=<varname>database_name</varname></codeph>. |
| </li> |
| |
| <li> |
| To specify the <codeph>ALL</codeph> privilege for a table, use a role like |
| <codeph>server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=*</codeph>. |
| </li> |
| </ul> |
| </note> |
| <table> |
| <tgroup cols="4"> |
| <colspec colnum="1" colname="col1" colwidth="1.31*"/> |
| <colspec colnum="2" colname="col2" colwidth="1.17*"/> |
| <colspec colnum="3" colname="col3" colwidth="1*"/> |
| <colspec colname="newCol4" colnum="4" colwidth="1*"/> |
| <thead> |
| <row> |
| <entry> |
| Operation |
| </entry> |
| <entry> |
| Scope |
| </entry> |
| <entry> |
| Privileges |
| </entry> |
| <entry> |
| URI |
| </entry> |
| </row> |
| </thead> |
| <tbody> |
| <row conref="../shared/impala_common.xml#common/explain_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/load_data_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/create_database_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/drop_database_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/create_table_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/drop_table_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/describe_table_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_add_columns_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_replace_columns_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_change_column_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_rename_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_set_tblproperties_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_set_fileformat_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_set_location_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_add_partition_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_add_partition_location_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_drop_partition_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_partition_set_fileformat_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_set_serdeproperties_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/create_view_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/drop_view_privs"> |
| <entry/> |
| </row> |
| <row id="alter_view_privs"> |
| <entry> |
| ALTER VIEW |
| </entry> |
| <entry rev="2.3.0 collevelauth"> |
| You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and the parent |
| database</ph>, plus <codeph>SELECT</codeph> privilege for any tables or views referenced by the |
| view query. Once the view is created or altered by a high-privileged system administrator, it can |
| be queried by a lower-privileged user who does not have full query privileges for the base tables. |
| </entry> |
| <entry> |
| ALL, SELECT |
| </entry> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/alter_table_set_location_privs"> |
| <entry/> |
| </row> |
| <row id="create_external_table_privs"> |
| <entry> |
| CREATE EXTERNAL TABLE |
| </entry> |
| <entry> |
| Database (ALL), URI (SELECT) |
| </entry> |
| <entry> |
| ALL, SELECT |
| </entry> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/select_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/use_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/create_function_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/drop_function_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/refresh_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/invalidate_metadata_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/invalidate_metadata_table_privs"> |
| <entry/> |
| </row> |
| <row conref="../shared/impala_common.xml#common/compute_stats_privs"> |
| <entry/> |
| </row> |
| <row id="show_table_stats_privs"> |
| <entry> |
| SHOW TABLE STATS, SHOW PARTITIONS |
| </entry> |
| <entry> |
| TABLE |
| </entry> |
| <entry> |
| SELECT/INSERT |
| </entry> |
| <entry/> |
| </row> |
| <row> |
| <entry id="show_column_stats_privs"> |
| SHOW COLUMN STATS |
| </entry> |
| <entry> |
| TABLE |
| </entry> |
| <entry> |
| SELECT/INSERT |
| </entry> |
| <entry/> |
| </row> |
| <row> |
| <entry id="show_functions_privs"> |
| SHOW FUNCTIONS |
| </entry> |
| <entry> |
| DATABASE |
| </entry> |
| <entry> |
| SELECT |
| </entry> |
| <entry/> |
| </row> |
| <row id="show_tables_privs"> |
| <entry> |
| SHOW TABLES |
| </entry> |
| <entry/> |
| <entry> |
| No special privileges needed to issue the statement, but only shows objects you are authorized for |
| </entry> |
| <entry/> |
| </row> |
| <row id="show_databases_privs"> |
| <entry> |
| SHOW DATABASES, SHOW SCHEMAS |
| </entry> |
| <entry/> |
| <entry> |
| No special privileges needed to issue the statement, but only shows objects you are authorized for |
| </entry> |
| <entry/> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| </conbody> |
| </concept> |
| |
| <concept id="sentry_debug"> |
| |
| <title><ph conref="../shared/impala_common.xml#common/title_sentry_debug"/></title> |
| |
| <conbody> |
| |
| <p conref="../shared/impala_common.xml#common/sentry_debug"/> |
| </conbody> |
| </concept> |
| |
| <concept id="sec_ex_default"> |
| |
| <title>The DEFAULT Database in a Secure Deployment</title> |
| |
| <conbody> |
| |
| <p> |
| Because of the extra emphasis on granular access controls in a secure deployment, you should move any |
| important or sensitive information out of the <codeph>DEFAULT</codeph> database into a named database whose |
| privileges are specified in the policy file. Sometimes you might need to give privileges on the |
| <codeph>DEFAULT</codeph> database for administrative reasons; for example, as a place you can reliably |
| specify with a <codeph>USE</codeph> statement when preparing to drop a database. |
| </p> |
| |
| <!-- Maybe have an example later, but not for initial 1.1 release. |
| <codeblock></codeblock> |
| --> |
| </conbody> |
| </concept> |
| </concept> |