blob: 85ae261b3c852fac0dc54c36e31161b6ae92657c [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[security]]
= Apache Kudu Security
:author: Kudu Team
:imagesdir: ./images
:icons: font
:toc: left
:toclevels: 3
:doctype: book
:backend: html5
:sectlinks:
:experimental:
Kudu includes security features which allow Kudu clusters to be hardened against
access from unauthorized users. This guide describes the security features
provided by Kudu. <<configuration>> lists essential configuration options when
deploying a secure Kudu cluster. <<known-limitations>> contains a list of
known deficiencies in Kudu's security capabilities.
== Authentication
Kudu can be configured to enforce secure authentication among servers, and
between clients and servers. Authentication prevents untrusted actors from
gaining access to Kudu, and securely identifies the connecting user or services
for authorization checks. Authentication in Kudu is designed to interoperate
with other secure Hadoop components by utilizing Kerberos.
Authentication can be configured on Kudu servers using the
`--rpc-authentication` flag, which can be set to `required`, `optional`, or
`disabled`. By default, the flag is set to `optional`. When `required`, Kudu
will reject connections from clients and servers who lack authentication
credentials. When `optional`, Kudu will attempt to use strong authentication.
When `disabled` or strong authentication fails for 'optional', by default Kudu
will only allow unauthenticated connections from trusted subnets, which are
private networks (127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,
169.254.0.0/16) and local subnets of all local network interfaces. Unauthenticated
connections from publicly routable IPs will be rejected.
The trusted subnets can be configured using the `--trusted_subnets` flag,
which can be set to IP blocks in CIDR notation separated by comma. Set it to
'0.0.0.0/0' to allow unauthenticated connections from all remote IP addresses.
However, if network access is not otherwise restricted by a firewall,
malicious users may be able to gain unauthorized access. This can be mitigated
if authentication is configured to be required.
WARNING: When the `--rpc-authentication` flag is set to `optional`,
the cluster does not prevent access from unauthenticated users. To secure a
cluster, use `--rpc-authentication=required`.
=== Internal PKI
Kudu uses an internal PKI system to issue X.509 certificates to servers in
the cluster. Connections between peers who have both obtained certificates will
use TLS for authentication, which doesn't require contacting the Kerberos KDC.
These certificates are _only_ used for internal communication among Kudu
servers, and between Kudu clients and servers. The certificates are never
presented in a public facing protocol.
By using internally-issued certificates, Kudu offers strong authentication which
scales to huge clusters, and allows TLS encryption to be used without requiring
you to manually deploy certificates on every node.
=== Authentication Tokens
After authenticating to a secure cluster, the Kudu client will automatically
request an authentication token from the Kudu master. An authentication token
encapsulates the identity of the authenticated user and carries the master's
RSA signature so that its authenticity can be verified.
This token will be used to authenticate subsequent connections. By default,
authentication tokens are only valid for seven days, so that even if a token
were compromised, it could not be used indefinitely. For the most part,
authentication tokens should be completely transparent to users. By using
authentication tokens, Kudu takes advantage of strong authentication without
paying the scalability cost of communicating with a central authority for every
connection.
When used with distributed compute frameworks such as Spark, authentication
tokens can simplify configuration and improve security. For example, the Kudu
Spark connector will automatically retrieve an authentication token during the
planning stage, and distribute the token to tasks. This allows Spark to work
against a secured Kudu cluster where only the planner node has Kerberos
credentials.
=== Client Authentication to Secure Kudu Clusters
Users running client Kudu applications must first run the `kinit` command to
obtain a Kerberos ticket-granting ticket. For example:
[source,bash]
----
$ kinit admin@EXAMPLE-REALM.COM
----
Once authenticated, you use the same client code to read from and write to Kudu
servers with and without Kerberos configuration.
== Scalability
Kudu authentication is designed to scale to thousands of nodes, which requires
avoiding unnecessary coordination with a central authentication authority (such
as the Kerberos KDC). Instead, Kudu servers and clients will use Kerberos to
establish initial trust with the Kudu master, and then use alternate credentials
for subsequent connections. In particular, the master will issue internal
X.509 certificates to servers, and temporary authentication tokens to clients.
== Encryption
Kudu allows all communications among servers and between clients and servers
to be encrypted with TLS.
Encryption can be configured on Kudu servers using the `--rpc-encryption` flag,
which can be set to `required`, `optional`, or `disabled`. By default, the flag
is set to `optional`. When `required`, Kudu will reject unencrypted connections.
When `optional`, Kudu will attempt to use encryption. Same as authentication,
when `disabled` or encryption fails for `optional`, Kudu will only allow
unencrypted connections from trusted subnets and reject any unencrypted connections
from publicly routable IPs. To secure a cluster, use `--rpc-encryption=required`.
NOTE: Kudu will automatically turn off encryption on local loopback connections,
since traffic from these connections is never exposed externally. This allows
locality-aware compute frameworks like Spark and Impala to avoid encryption
overhead, while still ensuring data confidentiality.
== Coarse-Grained Authorization
Kudu supports coarse-grained authorization of client requests based on the
authenticated client Kerberos principal (i.e. user or service). The two levels
of access which can be configured are:
* *Superuser* - principals authorized as a superuser are able to perform
certain administrative functionality such as using the `kudu` command line tool
to diagnose or repair cluster issues.
* *User* - principals authorized as a user are able to access and modify all
data in the Kudu cluster. This includes the ability to create, drop, and alter
tables as well as read, insert, update, and delete data.
NOTE: Internally, Kudu has a third access level for the daemons themselves.
This ensures that users cannot connect to the cluster and pose as tablet
servers.
Access levels are granted using whitelist-style Access Control Lists (ACLs), one
for each of the two levels. Each access control list either specifies a
comma-separated list of users, or may be set to `*` to indicate that all
authenticated users are able to gain access at the specified level. See
<<configuration>> below for examples.
NOTE: The default value for the User ACL is `*`, which allows all users access
to the cluster. However, if authentication is enabled, this still restricts access
to only those users who are able to successfully authenticate via Kerberos.
Unauthenticated users on the same network as the Kudu servers will be unable
to access the cluster.
[[web-ui]]
== Web UI Encryption
The Kudu web UI can be configured to use secure HTTPS encryption by providing
each server with TLS certificates. See <<configuration>> for more information on
web UI HTTPS configuration.
== Web UI Redaction
To prevent sensitive data from being exposed in the web UI, all row data is
redacted. Table metadata, such as table names, column names, and partitioning
information is not redacted. The web UI can be completely disabled by setting
the `--webserver-enabled=false` flag on Kudu servers.
WARNING: Disabling the web UI will also disable REST endpoints such as
`/metrics`. Monitoring systems rely on these endpoints to gather metrics data.
[[logs]]
== Log Security
To prevent sensitive data from being included in Kudu server logs, all row data
is redacted by default. By setting the `--redact=log` flag, redaction will be
disabled in the web UI but retained for server logs. Alternatively, `--redact=none`
can be used to disable redaction completely.
// TODO(dan): add link to configuration reference.
[[configuration]]
== Configuring a Secure Kudu Cluster
The following configuration parameters should be set on all servers (master and
tablet server) in order to ensure that a Kudu cluster is secure:
```
# Connection Security
#--------------------
--rpc-authentication=required
--rpc-encryption=required
--keytab-file=<path-to-kerberos-keytab>
# Web UI Security
#--------------------
--webserver-certificate-file=<path-to-cert-pem>
--webserver-private-key-file=<path-to-key-pem>
# optional
--webserver-private-key-password-cmd=<password-cmd>
# If you prefer to disable the web UI entirely:
--webserver-enabled=false
# Coarse-grained authorization
#--------------------------------
# This example ACL setup allows the 'impala' user as well as the
# 'nightly_etl_service_account' principal access to all data in the
# Kudu cluster. The 'hadoopadmin' user is allowed to use administrative
# tooling. Note that, by granting access to 'impala', other users
# may access data in Kudu via the Impala service subject to its own
# authorization rules.
--user-acl=impala,nightly_etl_service_account
--superuser-acl=hadoopadmin
```
Further information about these flags can be found in the configuration
flag reference.
// TODO(todd) add a link
[[known-limitations]]
== Known Limitations
Kudu has a few known security limitations:
// TODO(danburkert): add JIRA links for each of these.
Custom Kerberos Principal:: Kudu does not support setting a custom service
principal for Kudu processes. The principal must be 'kudu'.
External PKI:: Kudu does not support externally-issued certificates for internal
wire encryption (server to server and client to server).
Fine-grained Authorization:: Kudu does not have the ability to restrict access
based on operation type or target (table, column, etc). ACLs currently do not
support authorization based on membership in a group.
On-disk Encryption:: Kudu does not have built-in on-disk encryption. However,
Kudu can be used with whole-disk encryption tools such as dm-crypt.
Web UI Authentication:: The Kudu web UI lacks Kerberos-based authentication
(SPNEGO), so access cannot be restricted based on Kerberos principals.
Flume Integration:: Flume integration is not supported with secure Kudu clusters
which require authentication or encryption.