blob: f729bac96cfb0e9d1b657acecd746a48bf8020aa [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data" />
<meta name="author" content="Cloudera" />
<title>Apache Kudu - Fine-Grained Authorization with Apache Kudu and Impala</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
crossorigin="anonymous">
<!-- Custom styles for this template -->
<link href="/css/kudu.css" rel="stylesheet"/>
<link href="/css/asciidoc.css" rel="stylesheet"/>
<link rel="shortcut icon" href="/img/logo-favicon.ico" />
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" />
<link rel="alternate" type="application/atom+xml"
title="RSS Feed for Apache Kudu blog"
href="/feed.xml" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="kudu-site container-fluid">
<!-- Static navbar -->
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="logo" href="/"><img
src="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png"
srcset="//d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_80px.png 1x, //d3dr9sfxru4sde.cloudfront.net/i/k/apachekudu_logo_0716_160px.png 2x"
alt="Apache Kudu"/></a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li >
<a href="/">Home</a>
</li>
<li >
<a href="/overview.html">Overview</a>
</li>
<li >
<a href="/docs/">Documentation</a>
</li>
<li >
<a href="/releases/">Releases</a>
</li>
<li class="active">
<a href="/blog/">Blog</a>
</li>
<!-- NOTE: this dropdown menu does not appear on Mobile, so don't add anything here
that doesn't also appear elsewhere on the site. -->
<li class="dropdown">
<a href="/community.html" role="button" aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">GET IN TOUCH</li>
<li><a class="icon email" href="/community.html">Mailing Lists</a></li>
<li><a class="icon slack" href="https://getkudu-slack.herokuapp.com/">Slack Channel</a></li>
<li role="separator" class="divider"></li>
<li><a href="/community.html#meetups-user-groups-and-conference-presentations">Events and Meetups</a></li>
<li><a href="/committers.html">Project Committers</a></li>
<!--<li><a href="/roadmap.html">Roadmap</a></li>-->
<li><a href="/community.html#contributions">How to Contribute</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">DEVELOPER RESOURCES</li>
<li><a class="icon github" href="https://github.com/apache/incubator-kudu">GitHub</a></li>
<li><a class="icon gerrit" href="http://gerrit.cloudera.org:8080/#/q/status:open+project:kudu">Gerrit Code Review</a></li>
<li><a class="icon jira" href="https://issues.apache.org/jira/browse/KUDU">JIRA Issue Tracker</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">SOCIAL MEDIA</li>
<li><a class="icon twitter" href="https://twitter.com/ApacheKudu">Twitter</a></li>
<li><a href="https://www.reddit.com/r/kudu/">Reddit</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">APACHE SOFTWARE FOUNDATION</li>
<li><a href="https://www.apache.org/security/" target="_blank">Security</a></li>
<li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a></li>
<li><a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a></li>
<li><a href="https://www.apache.org/licenses/" target="_blank">License</a></li>
</ul>
</li>
<li >
<a href="/faq.html">FAQ</a>
</li>
</ul><!-- /.nav -->
</div><!-- /#navbar -->
</div><!-- /.container-fluid -->
</nav>
<div class="row header">
<div class="col-lg-12">
<h2><a href="/blog">Apache Kudu Blog</a></h2>
</div>
</div>
<div class="row-fluid">
<div class="col-lg-9">
<article>
<header>
<h1 class="entry-title">Fine-Grained Authorization with Apache Kudu and Impala</h1>
<p class="meta">Posted 22 Apr 2019 by Grant Henke</p>
</header>
<div class="entry-content">
<p>Note: This is a cross-post from the Cloudera Engineering Blog
<a href="https://blog.cloudera.com/blog/2019/04/fine-grained-authorization-with-apache-kudu-and-impala/">Fine-Grained Authorization with Apache Kudu and Impala</a></p>
<p>Apache Impala supports fine-grained authorization via Apache Sentry on all of the tables it
manages including Apache Kudu tables. Given Impala is a very common way to access the data stored
in Kudu, this capability allows users deploying Impala and Kudu to fully secure the Kudu data in
multi-tenant clusters even though Kudu does not yet have native fine-grained authorization of its
own. This solution works because Kudu natively supports coarse-grained (all or nothing)
authorization which enables blocking all access to Kudu directly except for the impala user and
an optional whitelist of other trusted users. This post will describe how to use Apache Impala’s
fine-grained authorization support along with Apache Kudu’s coarse-grained authorization to
achieve a secure multi-tenant deployment.</p>
<!--more-->
<h2 id="sample-workflow">Sample Workflow</h2>
<p>The examples in this post enable a workflow that uses Apache Spark to ingest data directly into
Kudu and Impala to run analytic queries on that data. The Spark job, run as the <code>etl_service</code> user,
is permitted to access the Kudu data via coarse-grained authorization. Even though this gives
access to all the data in Kudu, the <code>etl_service</code> user is only used for scheduled jobs or by an
administrator. All queries on the data, from a wide array of users, will use Impala and leverage
Impala’s fine-grained authorization. Impala’s
<a href="https://impala.apache.org/docs/build/html/topics/impala_grant.html"><code>GRANT</code> statements</a>
allow you to flexibly control the privileges on the Kudu storage tables. Impala’s fine-grained
privileges along with support for
<a href="https://impala.apache.org/docs/build/html/topics/impala_select.html"><code>SELECT</code></a>,
<a href="https://impala.apache.org/docs/build/html/topics/impala_insert.html"><code>INSERT</code></a>,
<a href="https://impala.apache.org/docs/build/html/topics/impala_update.html"><code>UPDATE</code></a>,
<a href="https://impala.apache.org/docs/build/html/topics/impala_upsert.html"><code>UPSERT</code></a>,
and <a href="https://impala.apache.org/docs/build/html/topics/impala_delete.html"><code>DELETE</code></a>
statements, allow you to finely control who can read and write data to your Kudu tables while
using Impala. Below is a diagram showing the workflow described:</p>
<p><img src="/img/fine-grained-authorization-with-apache-kudu.png" alt="png" class="img-responsive" /></p>
<p><em>Note</em>: The examples below assume that Authorization has already been configured for Kudu, Impala,
and Spark. For help configuring authorization see the Cloudera
<a href="https://www.cloudera.com/documentation/enterprise/latest/topics/sg_auth_overview.html">authorization documentation</a>.</p>
<h2 id="configuring-kudus-coarse-grained-authorization">Configuring Kudu’s Coarse-Grained Authorization</h2>
<p>Kudu supports coarse-grained authorization of client requests based on the authenticated client
Kerberos principal. The two levels of access which can be configured are:</p>
<ul>
<li><em>Superuser</em> – principals authorized as a superuser are able to perform certain administrative
functionality such as using the kudu command line tool to diagnose or repair cluster issues.</li>
<li><em>User</em> – principals authorized as a user are able to access and modify all data in the Kudu
cluster. This includes the ability to create, drop, and alter tables as well as read, insert,
update, and delete data.</li>
</ul>
<p>Access levels are granted using whitelist-style Access Control Lists (ACLs), one for each of the
two levels. Each access control list either specifies a comma-separated list of users, or may be
set to <code>*</code> to indicate that all authenticated users are able to gain access at the specified level.</p>
<p><em>Note</em>: The default value for the User ACL is <code>*</code>, which allows all users access to the cluster.</p>
<h3 id="example-configuration">Example Configuration</h3>
<p>The first and most important step is to remove the default ACL of <code>*</code> from Kudu’s
<a href="https://kudu.apache.org/docs/configuration_reference.html#kudu-master_user_acl"><code>–user_acl</code> configuration</a>.
This will ensure only the users you list will have access to the Kudu cluster. Then, to allow the
Impala service to access all of the data in Kudu, the Impala service user, usually impala, should
be added to the Kudu <code>–user_acl</code> configuration. Any user that is not using Impala will also need
to be added to this list. For example, an Apache Spark job might be used to load data directly
into Kudu. Generally, a single user is used to run scheduled jobs of applications that do not
support fine-grained authorization on their own. For this example, that user is <code>etl_service</code>. The
full <code>–user_acl</code> configuration is:</p>
<div class="highlight"><pre><code class="language-bash" data-lang="bash">--user_acl<span class="o">=</span>impala,etl_service</code></pre></div>
<p>For more details see the Kudu
<a href="https://kudu.apache.org/docs/security.html#_coarse_grained_authorization">authorization documentation</a>.</p>
<h2 id="using-impalas-fine-grained-authorization">Using Impala’s Fine-Grained Authorization</h2>
<p>Follow Impala’s
<a href="https://impala.apache.org/docs/build/html/topics/impala_authorization.html">authorization documentation</a>
to configure fine-grained authorization. Once configured, you can use Impala’s
<a href="https://impala.apache.org/docs/build/html/topics/impala_grant.html"><code>GRANT</code> statements</a>
to control the privileges of Kudu tables. These fine-grained privileges can be set at the database,
table and column level. Additionally you can individually control <code>SELECT</code>, <code>INSERT</code>, <code>CREATE</code>,
<code>ALTER</code>, and <code>DROP</code> privileges.</p>
<p><em>Note</em>: A user needs the <code>ALL</code> privilege in order to run <code>DELETE</code>, <code>UPDATE</code>, or <code>UPSERT</code>
statements against a Kudu table.</p>
<p>Below is a brief example with a couple tables stored in Kudu:</p>
<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">messages</span>
<span class="p">(</span>
<span class="n">name</span> <span class="n">STRING</span><span class="p">,</span>
<span class="n">time</span> <span class="k">TIMESTAMP</span><span class="p">,</span>
<span class="n">message</span> <span class="n">STRING</span><span class="p">,</span>
<span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">time</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">PARTITION</span> <span class="k">BY</span> <span class="n">HASH</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="n">PARTITIONS</span> <span class="mi">4</span>
<span class="n">STORED</span> <span class="k">AS</span> <span class="n">KUDU</span><span class="p">;</span>
<span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">ON</span> <span class="k">TABLE</span> <span class="n">messages</span> <span class="k">TO</span> <span class="n">userA</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">metrics</span>
<span class="p">(</span>
<span class="k">host</span> <span class="n">STRING</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">metric</span> <span class="n">STRING</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">time</span> <span class="n">INT64</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">value</span> <span class="n">DOUBLE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="k">host</span><span class="p">,</span> <span class="n">metric</span><span class="p">,</span> <span class="n">time</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">PARTITION</span> <span class="k">BY</span> <span class="n">HASH</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="n">PARTITIONS</span> <span class="mi">4</span>
<span class="n">STORED</span> <span class="k">AS</span> <span class="n">KUDU</span><span class="p">;</span>
<span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">ON</span> <span class="k">TABLE</span> <span class="n">messages</span> <span class="k">TO</span> <span class="n">userB</span><span class="p">;</span></code></pre></div>
<h2 id="conclusion">Conclusion</h2>
<p>This brief example that combines Kudu’s coarse-grained authorization and Impala’s fine-grained
authorization should enable you to meet the security needs of your data workflow today. The
pattern described here can be applied to other services and workflows using Kudu as well. For
greater authorization flexibility, you can look forward to the near future when Kudu supports
native fine-grained authorization on its own. The Apache Kudu contributors understand the
importance of native fine-grained authorization and they are working on integrations with
Apache Sentry and Apache Ranger.</p>
</div>
</article>
</div>
<div class="col-lg-3 recent-posts">
<h3>Recent posts</h3>
<ul>
<li> <a href="/2019/11/20/apache-kudu-1-11-1-release.html">Apache Kudu 1.11.1 released</a> </li>
<li> <a href="/2019/11/20/apache-kudu-1-10-1-release.html">Apache Kudu 1.10.1 released</a> </li>
<li> <a href="/2019/07/09/apache-kudu-1-10-0-release.html">Apache Kudu 1.10.0 Released</a> </li>
<li> <a href="/2019/04/30/location-awareness.html">Location Awareness in Kudu</a> </li>
<li> <a href="/2019/04/22/fine-grained-authorization-with-apache-kudu-and-impala.html">Fine-Grained Authorization with Apache Kudu and Impala</a> </li>
<li> <a href="/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html">Testing Apache Kudu Applications on the JVM</a> </li>
<li> <a href="/2019/03/15/apache-kudu-1-9-0-release.html">Apache Kudu 1.9.0 Released</a> </li>
<li> <a href="/2019/03/05/transparent-hierarchical-storage-management-with-apache-kudu-and-impala.html">Transparent Hierarchical Storage Management with Apache Kudu and Impala</a> </li>
<li> <a href="/2018/12/11/call-for-posts.html">Call for Posts</a> </li>
<li> <a href="/2018/10/26/apache-kudu-1-8-0-released.html">Apache Kudu 1.8.0 Released</a> </li>
<li> <a href="/2018/09/26/index-skip-scan-optimization-in-kudu.html">Index Skip Scan Optimization in Kudu</a> </li>
<li> <a href="/2018/09/11/simplified-pipelines-with-kudu.html">Simplified Data Pipelines with Kudu</a> </li>
<li> <a href="/2018/08/06/getting-started-with-kudu-an-oreilly-title.html">Getting Started with Kudu - an O'Reilly Title</a> </li>
<li> <a href="/2018/07/10/instrumentation-in-kudu.html">Instrumentation in Apache Kudu</a> </li>
<li> <a href="/2018/03/23/apache-kudu-1-7-0-released.html">Apache Kudu 1.7.0 released</a> </li>
</ul>
</div>
</div>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p class="small">
Copyright &copy; 2019 The Apache Software Foundation.
</p>
<p class="small">
Apache Kudu, Kudu, Apache, the Apache feather logo, and the Apache Kudu
project logo are either registered trademarks or trademarks of The
Apache Software Foundation in the United States and other countries.
</p>
</div>
<div class="col-md-3">
<a class="pull-right" href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png"/>
</a>
</div>
</div>
</footer>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
// Try to detect touch-screen devices. Note: Many laptops have touch screens.
$(document).ready(function() {
if ("ontouchstart" in document.documentElement) {
$(document.documentElement).addClass("touch");
} else {
$(document.documentElement).addClass("no-touch");
}
});
</script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"
integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS"
crossorigin="anonymous"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68448017-1', 'auto');
ga('send', 'pageview');
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/3.1.0/anchor.js"></script>
<script>
anchors.options = {
placement: 'right',
visible: 'touch',
};
anchors.add();
</script>
</body>
</html>