docs/topics/impala_authorization.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept rev="1.1" id="authorization">

   <title>Enabling Sentry Authorization for Impala</title>
   <prolog>
     <metadata>
       <data name="Category" value="Security"/>
       <data name="Category" value="Sentry"/>
       <data name="Category" value="Impala"/>
       <data name="Category" value="Configuring"/>
       <data name="Category" value="Starting and Stopping"/>
       <data name="Category" value="Users"/>
       <data name="Category" value="Groups"/>
       <data name="Category" value="Administrators"/>
     </metadata>
   </prolog>

   <conbody id="sentry">

     <p>
       Authorization determines which users are allowed to access which resources, and what operations they are
       allowed to perform. In Impala 1.1 and higher, you use Apache Sentry for
       authorization. Sentry adds a fine-grained authorization framework for Hadoop. By default (when authorization
       is not enabled), Impala does all read and write operations with the privileges of the <codeph>impala</codeph>
       user, which is suitable for a development/test environment but not for a secure production environment. When
       authorization is enabled, Impala uses the OS user ID of the user who runs <cmdname>impala-shell</cmdname> or
       other client program, and associates various privileges with each user.
     </p>

     <note>
       Sentry is typically used in conjunction with Kerberos authentication, which defines which hosts are allowed
       to connect to each server. Using the combination of Sentry and Kerberos prevents malicious users from being
       able to connect by creating a named account on an untrusted machine. See
       <xref href="impala_kerberos.xml#kerberos"/> for details about Kerberos authentication.
     </note>

     <p audience="PDF" outputclass="toc inpage">
       See the following sections for details about using the Impala authorization features:
     </p>
   </conbody>

   <concept id="sentry_priv_model">

     <title>The Sentry Privilege Model</title>

     <conbody>

       <p>
         Privileges can be granted on different objects in the schema. Any privilege that can be granted is
         associated with a level in the object hierarchy. If a privilege is granted on a container object in the
         hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
         database systems such as MySQL.
       </p>

       <p rev="2.3.0 collevelauth">
         The object hierarchy for Impala covers Server, URI, Database, Table, and Column. (The Table privileges apply to views as well;
         anywhere you specify a table name, you can specify a view name instead.)
         Column-level authorization is available in <keyword keyref="impala23_full"/> and higher.
         Previously, you constructed views to query specific columns and assigned privilege based on
         the views rather than the base tables. Now, you can use Impala's <xref href="impala_grant.xml"/> and
         <xref href="impala_revoke.xml"/> statements to assign and revoke privileges from specific columns
         in a table.
       </p>

       <p>
         A restricted set of privileges determines what you can do with each object:
       </p>

       <dl>
         <dlentry id="select_priv">

           <dt>
             SELECT privilege
           </dt>

           <dd>
             Lets you read data from a table or view, for example with the <codeph>SELECT</codeph> statement, the
             <codeph>INSERT...SELECT</codeph> syntax, or <codeph>CREATE TABLE...LIKE</codeph>. Also required to
             issue the <codeph>DESCRIBE</codeph> statement or the <codeph>EXPLAIN</codeph> statement for a query
             against a particular table. Only objects for which a user has this privilege are shown in the output
             for <codeph>SHOW DATABASES</codeph> and <codeph>SHOW TABLES</codeph> statements. The
             <codeph>REFRESH</codeph> statement and <codeph>INVALIDATE METADATA</codeph> statements only access
             metadata for tables for which the user has this privilege.
           </dd>

         </dlentry>

         <dlentry id="insert_priv">

           <dt>
             INSERT privilege
           </dt>

           <dd>
             Lets you write data to a table. Applies to the <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph>
             statements.
           </dd>

         </dlentry>

         <dlentry id="all_priv">

           <dt>
             ALL privilege
           </dt>

           <dd>
             Lets you create or modify the object. Required to run DDL statements such as <codeph>CREATE
             TABLE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>DROP TABLE</codeph> for a table,
             <codeph>CREATE DATABASE</codeph> or <codeph>DROP DATABASE</codeph> for a database, or <codeph>CREATE
             VIEW</codeph>, <codeph>ALTER VIEW</codeph>, or <codeph>DROP VIEW</codeph> for a view. Also required for
             the URI of the <q>location</q> parameter for the <codeph>CREATE EXTERNAL TABLE</codeph> and
             <codeph>LOAD DATA</codeph> statements.
 <!-- Have to think about the best wording, how often to repeat, how best to conref this caveat.
           You do not actually code the keyword <codeph>ALL</codeph> in the policy file; instead you use
           <codeph>action=*</codeph> or shorten the right-hand portion of the rule.
           -->
           </dd>

         </dlentry>
       </dl>

       <p>
         Privileges can be specified for a table or view before that object actually exists. If you do not have
         sufficient privilege to perform an operation, the error message does not disclose if the object exists or
         not.
       </p>

       <p>
         Originally, privileges were encoded in a policy file, stored in HDFS. This mode of operation is still an
         option, but the emphasis of privilege management is moving towards being SQL-based. Although currently
         Impala does not have <codeph>GRANT</codeph> or <codeph>REVOKE</codeph> statements, Impala can make use of
         privileges assigned through <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements done through
         Hive. The mode of operation with <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements instead of
         the policy file requires that a special Sentry service be enabled; this service stores, retrieves, and
         manipulates privilege information stored inside the metastore database.
       </p>
     </conbody>
   </concept>

   <concept id="secure_startup">

     <title>Starting the impalad Daemon with Sentry Authorization Enabled</title>
   <prolog>
     <metadata>
       <data name="Category" value="Starting and Stopping"/>
     </metadata>
   </prolog>

     <conbody>

       <p>
         To run the <cmdname>impalad</cmdname> daemon with authorization enabled, you add one or more options to the
         <codeph>IMPALA_SERVER_ARGS</codeph> declaration in the <filepath>/etc/default/impala</filepath>
         configuration file:
       </p>

       <ul>
         <li>
           The <codeph>-server_name</codeph> option turns on Sentry authorization for Impala. The authorization
           rules refer to a symbolic server name, and you specify the name to use as the argument to the
           <codeph>-server_name</codeph> option.
         </li>

         <li rev="1.4.0">
           If you specify just <codeph>-server_name</codeph>, Impala uses the Sentry service for authorization,
           relying on the results of <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements issued through
           Hive. (This mode of operation is available in Impala 1.4.0 and higher.) Prior to Impala 1.4.0, or if you
           want to continue storing privilege rules in the policy file, also specify the
           <codeph>-authorization_policy_file</codeph> option as in the following item.
         </li>

         <li>
           Specifying the <codeph>-authorization_policy_file</codeph> option in addition to
           <codeph>-server_name</codeph> makes Impala read privilege information from a policy file, rather than
           from the metastore database. The argument to the <codeph>-authorization_policy_file</codeph> option
           specifies the HDFS path to the policy file that defines the privileges on different schema objects.
         </li>
       </ul>

       <p rev="1.4.0">
         For example, you might adapt your <filepath>/etc/default/impala</filepath> configuration to contain lines
         like the following. To use the Sentry service rather than the policy file:
       </p>

 <codeblock rev="1.4.0">IMPALA_SERVER_ARGS=" \
 -server_name=server1 \
 ...
 </codeblock>

       <p>
         Or to use the policy file, as in releases prior to Impala 1.4:
       </p>

 <codeblock>IMPALA_SERVER_ARGS=" \
 -authorization_policy_file=/user/hive/warehouse/auth-policy.ini \
 -server_name=server1 \
 ...
 </codeblock>

       <p>
         The preceding examples set up a symbolic name of <codeph>server1</codeph> to refer to the current instance
         of Impala. This symbolic name is used in the following ways:
       </p>

       <ul>
         <li>
           <p>
             Specify the <codeph>server1</codeph> value for the <codeph>sentry.hive.server</codeph> property in the
             <filepath>sentry-site.xml</filepath> configuration file for Hive, as well as in the
             <codeph>-server_name</codeph> option for <cmdname>impalad</cmdname>.
           </p>
           <p>
             If the <cmdname>impalad</cmdname> daemon is not already running, start it as described in
             <xref href="impala_processes.xml#processes"/>. If it is already running, restart it with the command
             <codeph>sudo /etc/init.d/impala-server restart</codeph>. Run the appropriate commands on all the nodes
             where <cmdname>impalad</cmdname> normally runs.
           </p>
         </li>

         <li>
           <p>
             If you use the mode of operation using the policy file, the rules in the <codeph>[roles]</codeph>
             section of the policy file refer to this same <codeph>server1</codeph> name. For example, the following
             rule sets up a role <codeph>report_generator</codeph> that lets users with that role query any table in
             a database named <codeph>reporting_db</codeph> on a node where the <cmdname>impalad</cmdname> daemon
             was started up with the <codeph>-server_name=server1</codeph> option:
           </p>
 <codeblock>[roles]
 report_generator = server=server1-&gt;db=reporting_db-&gt;table=*-&gt;action=SELECT
 </codeblock>
         </li>
       </ul>

       <p>
         When <cmdname>impalad</cmdname> is started with one or both of the <codeph>-server_name=server1</codeph>
         and <codeph>-authorization_policy_file</codeph> options, Impala authorization is enabled. If Impala detects
         any errors or inconsistencies in the authorization settings or the policy file, the daemon refuses to
         start.
       </p>
     </conbody>
   </concept>

   <concept id="sentry_service">

     <title>Using Impala with the Sentry Service (<keyword keyref="impala14"/> or higher only)</title>

     <conbody>

       <p>
         When you use the Sentry service rather than the policy file, you set up privileges through
         <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statement in either Impala or Hive, then both components
         use those same privileges automatically. (Impala added the <codeph>GRANT</codeph> and
         <codeph>REVOKE</codeph> statements in <keyword keyref="impala20_full"/>.)
       </p>

     </conbody>
   </concept>

   <concept id="security_policy_file">

     <title>Using Impala with the Sentry Policy File</title>

     <conbody>

       <p>
         The policy file is a file that you put in a designated location in HDFS, and is read during the startup of
         the <cmdname>impalad</cmdname> daemon when you specify both the <codeph>-server_name</codeph> and
         <codeph>-authorization_policy_file</codeph> startup options. It controls which objects (databases, tables,
         and HDFS directory paths) can be accessed by the user who connects to <cmdname>impalad</cmdname>, and what
         operations that user can perform on the objects.
       </p>

       <note rev="1.4.0">
         <p rev="1.4.0">
           The Sentry service, as described in <xref href="impala_authorization.xml#sentry_service"/>, stores
           authorization metadata in a relational database. This means you can manage user privileges for Impala tables
           using traditional <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> SQL statements, rather than the
           policy file approach described here.If you are still using policy files, migrate to the
           database-backed service whenever practical.
         </p>
       </note>

       <p>
         The location of the policy file is listed in the <filepath>auth-site.xml</filepath> configuration file. To
         minimize overhead, the security information from this file is cached by each <cmdname>impalad</cmdname>
         daemon and refreshed automatically, with a default interval of 5 minutes. After making a substantial change
         to security policies, restart all Impala daemons to pick up the changes immediately.
       </p>

       <p outputclass="toc inpage"/>
     </conbody>

     <concept id="security_policy_file_details">

       <title>Policy File Location and Format</title>

       <conbody>

         <p>
           The policy file uses the familiar <codeph>.ini</codeph> format, divided into the major sections
           <codeph>[groups]</codeph> and <codeph>[roles]</codeph>. There is also an optional
           <codeph>[databases]</codeph> section, which allows you to specify a specific policy file for a particular
           database, as explained in <xref href="#security_multiple_policy_files"/>. Another optional section,
           <codeph>[users]</codeph>, allows you to override the OS-level mapping of users to groups; that is an
           advanced technique primarily for testing and debugging, and is beyond the scope of this document.
         </p>

         <p>
           In the <codeph>[groups]</codeph> section, you define various categories of users and select which roles
           are associated with each category. The group and usernames correspond to Linux groups and users on the
           server where the <cmdname>impalad</cmdname> daemon runs.
         </p>

         <p>
           The group and usernames in the <codeph>[groups]</codeph> section correspond to Linux groups and users on
           the server where the <cmdname>impalad</cmdname> daemon runs. When you access Impala through the
           <cmdname>impalad</cmdname> interpreter, for purposes of authorization, the user is the logged-in Linux
           user and the groups are the Linux groups that user is a member of. When you access Impala through the
           ODBC or JDBC interfaces, the user and password specified through the connection string are used as login
           credentials for the Linux server, and authorization is based on that username and the associated Linux
           group membership.
         </p>

         <p>
           In the <codeph>[roles]</codeph> section, you a set of roles. For each role, you specify precisely the set
           of privileges is available. That is, which objects users with that role can access, and what operations
           they can perform on those objects. This is the lowest-level category of security information; the other
           sections in the policy file map the privileges to higher-level divisions of groups and users. In the
           <codeph>[groups]</codeph> section, you specify which roles are associated with which groups. The group
           and usernames correspond to Linux groups and users on the server where the <cmdname>impalad</cmdname>
           daemon runs. The privileges are specified using patterns like:
 <codeblock>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=SELECT
 server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=CREATE
 server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=ALL
 </codeblock>
           For the <varname>server_name</varname> value, substitute the same symbolic name you specify with the
           <cmdname>impalad</cmdname> <codeph>-server_name</codeph> option. You can use <codeph>*</codeph> wildcard
           characters at each level of the privilege specification to allow access to all such objects. For example:
 <codeblock>server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=SELECT
 server=impala-host.example.com-&gt;db=*-&gt;table=*-&gt;action=CREATE
 server=impala-host.example.com-&gt;db=*-&gt;table=audit_log-&gt;action=SELECT
 server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=*
 </codeblock>
         </p>

         <p>
           When authorization is enabled, Impala uses the policy file as a <i>whitelist</i>, representing every
           privilege available to any user on any object. That is, only operations specified for the appropriate
           combination of object, role, group, and user are allowed; all other operations are not allowed. If a
           group or role is defined multiple times in the policy file, the last definition takes precedence.
         </p>

         <p>
           To understand the notion of whitelisting, set up a minimal policy file that does not provide any
           privileges for any object. When you connect to an Impala node where this policy file is in effect, you
           get no results for <codeph>SHOW DATABASES</codeph>, and an error when you issue any <codeph>SHOW
           TABLES</codeph>, <codeph>USE <varname>database_name</varname></codeph>, <codeph>DESCRIBE
           <varname>table_name</varname></codeph>, <codeph>SELECT</codeph>, and or other statements that expect to
           access databases or tables, even if the corresponding databases and tables exist.
         </p>

         <p>
           The contents of the policy file are cached, to avoid a performance penalty for each query. The policy
           file is re-checked by each <cmdname>impalad</cmdname> node every 5 minutes. When you make a
           non-time-sensitive change such as adding new privileges or new users, you can let the change take effect
           automatically a few minutes later. If you remove or reduce privileges, and want the change to take effect
           immediately, restart the <cmdname>impalad</cmdname> daemon on all nodes, again specifying the
           <codeph>-server_name</codeph> and <codeph>-authorization_policy_file</codeph> options so that the rules
           from the updated policy file are applied.
         </p>
       </conbody>
     </concept>

     <concept id="security_examples">

       <title>Examples of Policy File Rules for Security Scenarios</title>

       <conbody>

         <p>
           The following examples show rules that might go in the policy file to deal with various
           authorization-related scenarios. For illustration purposes, this section shows several very small policy
           files with only a few rules each. In your environment, typically you would define many roles to cover all
           the scenarios involving your own databases, tables, and applications, and a smaller number of groups,
           whose members are given the privileges from one or more roles.
         </p>

         <example id="sec_ex_unprivileged">

           <title>A User with No Privileges</title>

           <p>
             If a user has no privileges at all, that user cannot access any schema objects in the system. The error
             messages do not disclose the names or existence of objects that the user is not authorized to read.
           </p>

           <p>
 <!--        This example demonstrates the lack of privileges using a blank policy file, so no users have any privileges. -->
             This is the experience you want a user to have if they somehow log into a system where they are not an
             authorized Impala user. In a real deployment with a filled-in policy file, a user might have no
             privileges because they are not a member of any of the relevant groups mentioned in the policy file.
           </p>

 <!-- Have the raw material but not formatted into easily digestible example. Do for first 1.1 doc refresh.
 <codeblock></codeblock> -->

         </example>

         <example id="sec_ex_superuser">

           <title>Examples of Privileges for Administrative Users</title>

           <p>
             When an administrative user has broad access to tables or databases, the associated rules in the
             <codeph>[roles]</codeph> section typically use wildcards and/or inheritance. For example, in the
             following sample policy file, <codeph>db=*</codeph> refers to all databases and
             <codeph>db=*-&gt;table=*</codeph> refers to all tables in all databases.
           </p>

           <p>
             Omitting the rightmost portion of a rule means that the privileges apply to all the objects that could
             be specified there. For example, in the following sample policy file, the
             <codeph>all_databases</codeph> role has all privileges for all tables in all databases, while the
             <codeph>one_database</codeph> role has all privileges for all tables in one specific database. The
             <codeph>all_databases</codeph> role does not grant privileges on URIs, so a group with that role could
             not issue a <codeph>CREATE TABLE</codeph> statement with a <codeph>LOCATION</codeph> clause. The
             <codeph>entire_server</codeph> role has all privileges on both databases and URIs within the server.
           </p>

 <codeblock>[groups]
 supergroup = all_databases

 [roles]
 read_all_tables = server=server1-&gt;db=*-&gt;table=*-&gt;action=SELECT
 all_tables = server=server1-&gt;db=*-&gt;table=*
 all_databases = server=server1-&gt;db=*
 one_database = server=server1-&gt;db=test_db
 entire_server = server=server1
 </codeblock>

         </example>

         <example id="sec_ex_detailed">

           <title>A User with Privileges for Specific Databases and Tables</title>

           <p>
             If a user has privileges for specific tables in specific databases, the user can access those things
             but nothing else. They can see the tables and their parent databases in the output of <codeph>SHOW
             TABLES</codeph> and <codeph>SHOW DATABASES</codeph>, <codeph>USE</codeph> the appropriate databases,
             and perform the relevant actions (<codeph>SELECT</codeph> and/or <codeph>INSERT</codeph>) based on the
             table privileges. To actually create a table requires the <codeph>ALL</codeph> privilege at the
             database level, so you might define separate roles for the user that sets up a schema and other users
             or applications that perform day-to-day operations on the tables.
           </p>

           <p>
             The following sample policy file shows some of the syntax that is appropriate as the policy file grows,
             such as the <codeph>#</codeph> comment syntax, <codeph>\</codeph> continuation syntax, and comma
             separation for roles assigned to groups or privileges assigned to roles.
           </p>

 <codeblock>[groups]
 employee = training_sysadmin, instructor
 visitor = student

 [roles]
 training_sysadmin = server=server1-&gt;db=training, \
 server=server1-&gt;db=instructor_private, \
 server=server1-&gt;db=lesson_development
 instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*, \
 server=server1-&gt;db=instructor_private-&gt;table=*-&gt;action=*, \
 server=server1-&gt;db=lesson_development-&gt;table=lesson*
 # This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
 student = server=server1-&gt;db=training-&gt;table=lesson_*-&gt;action=SELECT
 </codeblock>

         </example>

 <!--
 <example id="sec_ex_superuser_single_db">
 <title>A User with Full Privileges for a Specific Database</title>
 <p>

 </p>
 <codeblock></codeblock>
 </example>

 <example id="sec_ex_readonly_single_db">
 <title>A User with Read-Only Privileges for a Specific Database</title>
 <p>

 </p>
 <codeblock></codeblock>
     <p>
       If a user has <codeph>SELECT</codeph> privilege for a database, they can issue a <codeph>USE</codeph> statement
       for that database. Whether or not they can access tables within the database depends on further privileges
       defined at the table level.
     </p>

 <codeblock></codeblock>

 </example>

 <example id="sec_ex_superuser_single_table">
 <title>A User with Full Privileges for a Specific Table</title>
     <p>
       If a user has <codeph>SELECT</codeph> privilege for a table, they can query, describe, or explain queries for
       that table.
     </p>

 <codeblock></codeblock>
 </example>

 <example id="sec_ex_load_data">
 <title>A User with Privileges to Load Data but not Read Data</title>

     <p>
       If a user has <codeph>INSERT</codeph> privilege for a table, they can write to the table if it already exists.
       They cannot create or alter the table; those operations require the <codeph>ALL</codeph> privilege.
     </p>
 <codeblock></codeblock>
 </example>
 -->

         <example id="sec_ex_external_files">

           <title>Privileges for Working with External Data Files</title>

           <p>
             When data is being inserted through the <codeph>LOAD DATA</codeph> statement, or is referenced from an
             HDFS location outside the normal Impala database directories, the user also needs appropriate
             permissions on the URIs corresponding to those HDFS locations.
           </p>

           <p>
             In this sample policy file:
           </p>

           <ul>
             <li>
               The <codeph>external_table</codeph> role lets us insert into and query the Impala table,
               <codeph>external_table.sample</codeph>.
             </li>

             <li>
               The <codeph>staging_dir</codeph> role lets us specify the HDFS path
               <filepath>/user/username/external_data</filepath> with the <codeph>LOAD DATA</codeph> statement.
               Remember, when Impala queries or loads data files, it operates on all the files in that directory,
               not just a single file, so any Impala <codeph>LOCATION</codeph> parameters refer to a directory
               rather than an individual file.
             </li>

             <li>
               We included the IP address and port of the Hadoop name node in the HDFS URI of the
               <codeph>staging_dir</codeph> rule. We found those details in
               <filepath>/etc/hadoop/conf/core-site.xml</filepath>, under the <codeph>fs.default.name</codeph>
               element. That is what we use in any roles that specify URIs (that is, the locations of directories in
               HDFS).
             </li>

             <li>
               We start this example after the table <codeph>external_table.sample</codeph> is already created. In
               the policy file for the example, we have already taken away the <codeph>external_table_admin</codeph>
               role from the <codeph>username</codeph> group, and replaced it with the lesser-privileged
               <codeph>external_table</codeph> role.
             </li>

             <li>
               We assign privileges to a subdirectory underneath <filepath>/user/username</filepath> in HDFS,
               because such privileges also apply to any subdirectories underneath. If we had assigned privileges to
               the parent directory <filepath>/user/username</filepath>, it would be too likely to mess up other
               files by specifying a wrong location by mistake.
             </li>

             <li>
               The <codeph>username</codeph> under the <codeph>[groups]</codeph> section refers to the
               <codeph>username</codeph> group. (In this example, there is a <codeph>username</codeph> user
               that is a member of a <codeph>username</codeph> group.)
             </li>
           </ul>

           <p>
             Policy file:
           </p>

 <codeblock>[groups]
 username = external_table, staging_dir

 [roles]
 external_table_admin = server=server1-&gt;db=external_table
 external_table = server=server1-&gt;db=external_table-&gt;table=sample-&gt;action=*
 staging_dir = server=server1-&gt;uri=hdfs://127.0.0.1:8020/user/username/external_data-&gt;action=*
 </codeblock>

           <p>
             <cmdname>impala-shell</cmdname> session:
           </p>

 <codeblock>[localhost:21000] &gt; use external_table;
 Query: use external_table
 [localhost:21000] &gt; show tables;
 Query: show tables
 Query finished, fetching results ...
 +--------+
 | name   |
 +--------+
 | sample |
 +--------+
 Returned 1 row(s) in 0.02s

 [localhost:21000] &gt; select * from sample;
 Query: select * from sample
 Query finished, fetching results ...
 +-----+
 | x   |
 +-----+
 | 1   |
 | 5   |
 | 150 |
 +-----+
 Returned 3 row(s) in 1.04s

 [localhost:21000] &gt; load data inpath '/user/username/external_data' into table sample;
 Query: load data inpath '/user/username/external_data' into table sample
 Query finished, fetching results ...
 +----------------------------------------------------------+
 | summary                                                  |
 +----------------------------------------------------------+
 | Loaded 1 file(s). Total files in destination location: 2 |
 +----------------------------------------------------------+
 Returned 1 row(s) in 0.26s
 [localhost:21000] &gt; select * from sample;
 Query: select * from sample
 Query finished, fetching results ...
 +-------+
 | x     |
 +-------+
 | 2     |
 | 4     |
 | 6     |
 | 8     |
 | 64738 |
 | 49152 |
 | 1     |
 | 5     |
 | 150   |
 +-------+
 Returned 9 row(s) in 0.22s

 [localhost:21000] &gt; load data inpath '/user/username/unauthorized_data' into table sample;
 Query: load data inpath '/user/username/unauthorized_data' into table sample
 ERROR: AuthorizationException: User 'username' does not have privileges to access: hdfs://127.0.0.1:8020/user/username/unauthorized_data
 </codeblock>

         </example>

         <example audience="hidden" id="sec_ex_views" rev="2.3.0 collevelauth">

           <title>Controlling Access at the Column Level through Views</title>

           <p>
             If a user has <codeph>SELECT</codeph> privilege for a view, they can query the view, even if they do
             not have any privileges on the underlying table. To see the details about the underlying table through
             <codeph>EXPLAIN</codeph> or <codeph>DESCRIBE FORMATTED</codeph> statements on the view, the user must
             also have <codeph>SELECT</codeph> privilege for the underlying table.
           </p>

           <note type="important">
             <p>
               The types of data that are considered sensitive and confidential differ depending on the jurisdiction
               the type of industry, or both. For fine-grained access controls, set up appropriate privileges based
               on all applicable laws and regulations.
             </p>
             <p>
               Be careful using the <codeph>ALTER VIEW</codeph> statement to point an existing view at a different
               base table or a new set of columns that includes sensitive or restricted data. Make sure that any
               users who have <codeph>SELECT</codeph> privilege on the view do not gain access to any additional
               information they are not authorized to see.
             </p>
           </note>

           <p>
             The following example shows how a system administrator could set up a table containing some columns
             with sensitive information, then create a view that only exposes the non-confidential columns.
           </p>

 <codeblock>[localhost:21000] &gt; create table sensitive_info
                 &gt; (
                 &gt;   name string,
                 &gt;   address string,
                 &gt;   credit_card string,
                 &gt;   taxpayer_id string
                 &gt; );
 [localhost:21000] &gt; create view name_address_view as select name, address from sensitive_info;
 </codeblock>

           <p>
             Then the following policy file specifies read-only privilege for that view, without authorizing access
             to the underlying table:
           </p>

 <codeblock>[groups]
 employee = view_only_privs

 [roles]
 view_only_privs = server=server1-&gt;db=reports-&gt;table=name_address_view-&gt;action=SELECT
 </codeblock>

           <p>
             Thus, a user with the <codeph>view_only_privs</codeph> role could access through Impala queries the
             basic information but not the sensitive information, even if both kinds of information were part of the
             same data file.
           </p>

           <p>
             You might define other views to allow users from different groups to query different sets of columns.
           </p>

         </example>

         <example id="sec_sysadmin">

           <title>Separating Administrator Responsibility from Read and Write Privileges</title>

           <p>
             Remember that to create a database requires full privilege on that database, while day-to-day
             operations on tables within that database can be performed with lower levels of privilege on specific
             table. Thus, you might set up separate roles for each database or application: an administrative one
             that could create or drop the database, and a user-level one that can access only the relevant tables.
           </p>

           <p>
             For example, this policy file divides responsibilities between users in 3 different groups:
           </p>

           <ul>
             <li>
               Members of the <codeph>supergroup</codeph> group have the <codeph>training_sysadmin</codeph> role and
               so can set up a database named <codeph>training</codeph>.
             </li>

             <li> Members of the <codeph>employee</codeph> group have the
                 <codeph>instructor</codeph> role and so can create, insert into,
               and query any tables in the <codeph>training</codeph> database,
               but cannot create or drop the database itself. </li>

             <li>
               Members of the <codeph>visitor</codeph> group have the <codeph>student</codeph> role and so can query
               those tables in the <codeph>training</codeph> database.
             </li>
           </ul>

 <codeblock>[groups]
 supergroup = training_sysadmin
 employee = instructor
 visitor = student

 [roles]
 training_sysadmin = server=server1-&gt;db=training
 instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*
 student = server=server1-&gt;db=training-&gt;table=*-&gt;action=SELECT
 </codeblock>

         </example>
       </conbody>
     </concept>

     <concept id="security_multiple_policy_files">

       <title>Using Multiple Policy Files for Different Databases</title>

       <conbody>

         <p>
           For an Impala cluster with many databases being accessed by many users and applications, it might be
           cumbersome to update the security policy file for each privilege change or each new database, table, or
           view. You can allow security to be managed separately for individual databases, by setting up a separate
           policy file for each database:
         </p>

         <ul>
           <li>
             Add the optional <codeph>[databases]</codeph> section to the main policy file.
           </li>

           <li>
             Add entries in the <codeph>[databases]</codeph> section for each database that has its own policy file.
           </li>

           <li>
             For each listed database, specify the HDFS path of the appropriate policy file.
           </li>
         </ul>

         <p>
           For example:
         </p>

 <codeblock>[databases]
 # Defines the location of the per-DB policy files for the 'customers' and 'sales' databases.
 customers = hdfs://ha-nn-uri/etc/access/customers.ini
 sales = hdfs://ha-nn-uri/etc/access/sales.ini
 </codeblock>

         <p>
           To enable URIs in per-DB policy files, the Java configuration option <codeph>sentry.allow.uri.db.policyfile</codeph>
           must be set to <codeph>true</codeph>. For example:
         </p>

 <codeblock>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
 </codeblock>

         <note type="important">
           Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of the db-level
           policy file to grant himself/herself load privileges to anything the <codeph>impala</codeph> user has
           read permissions for in HDFS (including data in other databases controlled by different db-level policy
           files).
         </note>
       </conbody>
     </concept>
   </concept>

   <concept id="security_schema">

     <title>Setting Up Schema Objects for a Secure Impala Deployment</title>

     <conbody>

       <p>
         Remember that in your role definitions, you specify privileges at the level of individual databases and
         tables, or all databases or all tables within a database. To simplify the structure of these rules, plan
         ahead of time how to name your schema objects so that data with different authorization requirements is
         divided into separate databases.
       </p>

       <p>
         If you are adding security on top of an existing Impala deployment, remember that you can rename tables or
         even move them between databases using the <codeph>ALTER TABLE</codeph> statement. In Impala, creating new
         databases is a relatively inexpensive operation, basically just creating a new directory in HDFS.
       </p>

       <p>
         You can also plan the security scheme and set up the policy file before the actual schema objects named in
         the policy file exist. Because the authorization capability is based on whitelisting, a user can only
         create a new database or table if the required privilege is already in the policy file: either by listing
         the exact name of the object being created, or a <codeph>*</codeph> wildcard to match all the applicable
         objects within the appropriate container.
       </p>
     </conbody>
   </concept>

   <concept id="security_privileges">

     <title>Privilege Model and Object Hierarchy</title>

     <conbody>

       <p>
         Privileges can be granted on different objects in the schema. Any privilege that can be granted is
         associated with a level in the object hierarchy. If a privilege is granted on a container object in the
         hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
         database systems such as MySQL.
       </p>

       <p>
         The kinds of objects in the schema hierarchy are:
       </p>

 <codeblock>Server
 URI
 Database
   Table
 </codeblock>

       <p>
         The server name is specified by the <codeph>-server_name</codeph> option when <cmdname>impalad</cmdname>
         starts. Specify the same name for all <cmdname>impalad</cmdname> nodes in the cluster.
       </p>

       <p>
         URIs represent the HDFS paths you specify as part of statements such as <codeph>CREATE EXTERNAL
         TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify what look like UNIX paths, but these
         locations can also be prefixed with <codeph>hdfs://</codeph> to make clear that they are really URIs. To
         set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in
         that directory and any directories underneath it.
       </p>

       <p rev="2.3.0 collevelauth">
         In <keyword keyref="impala23_full"/> and higher, you can specify privileges for individual columns.
         Formerly, to specify read privileges at this level, you created a view that queried specific columns
         and/or partitions from a base table, and gave <codeph>SELECT</codeph> privilege on the view but not
         the underlying table. Now, you can use Impala's <xref href="impala_grant.xml"/> and
         <xref href="impala_revoke.xml"/> statements to assign and revoke privileges from specific columns
         in a table.
       </p>

       <p>
         URIs must start with either <codeph>hdfs://</codeph> or <codeph>file://</codeph>. If a URI starts with
         anything else, it will cause an exception and the policy file will be invalid. When defining URIs for HDFS,
         you must also specify the NameNode. For example:
 <codeblock>data_read = server=server1-&gt;uri=file:///path/to/dir, \
 server=server1-&gt;uri=hdfs://namenode:port/path/to/dir
 </codeblock>
         <note type="warning">
           <p>
             Because the NameNode host and port must be specified, enable High Availability (HA) to ensure
             that the URI will remain constant even if the NameNode changes.
           </p>
 <codeblock>data_read = server=server1-&gt;uri=file:///path/to/dir,\ server=server1-&gt;uri=hdfs://ha-nn-uri/path/to/dir
 </codeblock>
         </note>
       </p>

 <!-- Experiment with replacing my original copied table with a conref'ed version of Ambreen's from the Security Guide. -->

 <!--
           <table>
           <title>Sentry privilege types and objects they apply to</title>
           <tgroup cols="2">
               <colspec colnum="1" colname="col1"/>
               <colspec colnum="2" colname="col2"/>
               <tbody>
                   <row>
                       <entry>Privilege</entry>
                       <entry>Object</entry>
                   </row>
                   <row>
                       <entry>INSERT</entry>
                       <entry>TABLE, URI</entry>
                   </row>
                   <row>
                       <entry>SELECT</entry>
                       <entry>TABLE, VIEW, URI</entry>
                   </row>
                   <row>
                       <entry>ALL</entry>
                       <entry>SERVER, DB, URI</entry>
                   </row>
               </tbody>
           </tgroup>
           </table>
 -->

       <table conref="../shared/impala_common.xml#common/sentry_privileges_objects">
         <tgroup cols="2">
           <colspec colnum="1" colname="col1" colwidth="1*"/>
           <tbody>
             <row>
               <entry/>
             </row>
           </tbody>
         </tgroup>
       </table>

       <note>
         <p>
           Although this document refers to the <codeph>ALL</codeph> privilege, currently if you use the policy file
           mode, you do not use the actual keyword <codeph>ALL</codeph> in the policy file. When you code role
           entries in the policy file:
         </p>
         <ul>
           <li>
             To specify the <codeph>ALL</codeph> privilege for a server, use a role like
             <codeph>server=<varname>server_name</varname></codeph>.
           </li>

           <li>
             To specify the <codeph>ALL</codeph> privilege for a database, use a role like
             <codeph>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname></codeph>.
           </li>

           <li>
             To specify the <codeph>ALL</codeph> privilege for a table, use a role like
             <codeph>server=<varname>server_name</varname>-&gt;db=<varname>database_name</varname>-&gt;table=<varname>table_name</varname>-&gt;action=*</codeph>.
           </li>
         </ul>
       </note>
       <table>
         <tgroup cols="4">
           <colspec colnum="1" colname="col1" colwidth="1.31*"/>
           <colspec colnum="2" colname="col2" colwidth="1.17*"/>
           <colspec colnum="3" colname="col3" colwidth="1*"/>
           <colspec colname="newCol4" colnum="4" colwidth="1*"/>
           <thead>
             <row>
               <entry>
                 Operation
               </entry>
               <entry>
                 Scope
               </entry>
               <entry>
                 Privileges
               </entry>
               <entry>
                 URI
               </entry>
             </row>
           </thead>
           <tbody>
             <row conref="../shared/impala_common.xml#common/explain_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/load_data_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/create_database_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/drop_database_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/create_table_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/drop_table_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/describe_table_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_add_columns_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_replace_columns_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_change_column_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_rename_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_set_tblproperties_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_set_fileformat_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_set_location_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_add_partition_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_add_partition_location_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_drop_partition_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_partition_set_fileformat_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_set_serdeproperties_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/create_view_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/drop_view_privs">
               <entry/>
             </row>
             <row id="alter_view_privs">
               <entry>
                 ALTER VIEW
               </entry>
               <entry rev="2.3.0 collevelauth">
                 You need <codeph>ALL</codeph> privilege on the named view <ph rev="1.4.0">and the parent
                 database</ph>, plus <codeph>SELECT</codeph> privilege for any tables or views referenced by the
                 view query. Once the view is created or altered by a high-privileged system administrator, it can
                 be queried by a lower-privileged user who does not have full query privileges for the base tables.
               </entry>
               <entry>
                 ALL, SELECT
               </entry>
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/alter_table_set_location_privs">
               <entry/>
             </row>
             <row id="create_external_table_privs">
               <entry>
                 CREATE EXTERNAL TABLE
               </entry>
               <entry>
                 Database (ALL), URI (SELECT)
               </entry>
               <entry>
                 ALL, SELECT
               </entry>
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/select_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/use_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/create_function_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/drop_function_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/refresh_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/invalidate_metadata_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/invalidate_metadata_table_privs">
               <entry/>
             </row>
             <row conref="../shared/impala_common.xml#common/compute_stats_privs">
               <entry/>
             </row>
             <row id="show_table_stats_privs">
               <entry>
                 SHOW TABLE STATS, SHOW PARTITIONS
               </entry>
               <entry>
                 TABLE
               </entry>
               <entry>
                 SELECT/INSERT
               </entry>
               <entry/>
             </row>
             <row>
               <entry id="show_column_stats_privs">
                 SHOW COLUMN STATS
               </entry>
               <entry>
                 TABLE
               </entry>
               <entry>
                 SELECT/INSERT
               </entry>
               <entry/>
             </row>
             <row>
               <entry id="show_functions_privs">
                 SHOW FUNCTIONS
               </entry>
               <entry>
                 DATABASE
               </entry>
               <entry>
                 SELECT
               </entry>
               <entry/>
             </row>
             <row id="show_tables_privs">
               <entry>
                 SHOW TABLES
               </entry>
               <entry/>
               <entry>
                 No special privileges needed to issue the statement, but only shows objects you are authorized for
               </entry>
               <entry/>
             </row>
             <row id="show_databases_privs">
               <entry>
                 SHOW DATABASES, SHOW SCHEMAS
               </entry>
               <entry/>
               <entry>
                 No special privileges needed to issue the statement, but only shows objects you are authorized for
               </entry>
               <entry/>
             </row>
           </tbody>
         </tgroup>
       </table>

     </conbody>
   </concept>

   <concept id="sentry_debug">

     <title><ph conref="../shared/impala_common.xml#common/title_sentry_debug"/></title>

     <conbody>

       <p conref="../shared/impala_common.xml#common/sentry_debug"/>
     </conbody>
   </concept>

   <concept id="sec_ex_default">

     <title>The DEFAULT Database in a Secure Deployment</title>

     <conbody>

       <p>
         Because of the extra emphasis on granular access controls in a secure deployment, you should move any
         important or sensitive information out of the <codeph>DEFAULT</codeph> database into a named database whose
         privileges are specified in the policy file. Sometimes you might need to give privileges on the
         <codeph>DEFAULT</codeph> database for administrative reasons; for example, as a place you can reliably
         specify with a <codeph>USE</codeph> statement when preparing to drop a database.
       </p>

 <!-- Maybe have an example later, but not for initial 1.1 release.
 <codeblock></codeblock>
 -->
     </conbody>
   </concept>
 </concept>