| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| [[security]] |
| = Apache Kudu Security |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| Kudu includes security features which allow Kudu clusters to be hardened against |
| access from unauthorized users. This guide describes the security features |
| provided by Kudu. <<configuration>> lists essential configuration options when |
| deploying a secure Kudu cluster. <<known-limitations>> contains a list of |
| known deficiencies in Kudu's security capabilities. |
| |
| == Authentication |
| |
| Kudu can be configured to enforce secure authentication among servers, and |
| between clients and servers. Authentication prevents untrusted actors from |
| gaining access to Kudu, and securely identifies the connecting user or services |
| for authorization checks. Authentication in Kudu is designed to interoperate |
| with other secure Hadoop components by utilizing Kerberos. |
| |
| Authentication can be configured on Kudu servers using the |
| `--rpc-authentication` flag, which can be set to `required`, `optional`, or |
| `disabled`. By default, the flag is set to `optional`. When `required`, Kudu |
| will reject connections from clients and servers who lack authentication |
| credentials. When `optional`, Kudu will attempt to use strong authentication. |
| When `disabled` or strong authentication fails for 'optional', by default Kudu |
| will only allow unauthenticated connections from trusted subnets, which are |
| private networks (127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16, |
| 169.254.0.0/16) and local subnets of all local network interfaces. Unauthenticated |
| connections from publicly routable IPs will be rejected. |
| |
| The trusted subnets can be configured using the `--trusted_subnets` flag, |
| which can be set to IP blocks in CIDR notation separated by comma. Set it to |
| '0.0.0.0/0' to allow unauthenticated connections from all remote IP addresses. |
| However, if network access is not otherwise restricted by a firewall, |
| malicious users may be able to gain unauthorized access. This can be mitigated |
| if authentication is configured to be required. |
| |
| WARNING: When the `--rpc-authentication` flag is set to `optional`, |
| the cluster does not prevent access from unauthenticated users. To secure a |
| cluster, use `--rpc-authentication=required`. |
| |
| === Internal PKI |
| |
| Kudu uses an internal PKI system to issue X.509 certificates to servers in |
| the cluster. Connections between peers who have both obtained certificates will |
| use TLS for authentication, which doesn't require contacting the Kerberos KDC. |
| These certificates are _only_ used for internal communication among Kudu |
| servers, and between Kudu clients and servers. The certificates are never |
| presented in a public facing protocol. |
| |
| By using internally-issued certificates, Kudu offers strong authentication which |
| scales to huge clusters, and allows TLS encryption to be used without requiring |
| you to manually deploy certificates on every node. |
| |
| === Authentication Tokens |
| |
| After authenticating to a secure cluster, the Kudu client will automatically |
| request an authentication token from the Kudu master. An authentication token |
| encapsulates the identity of the authenticated user and carries the master's |
| RSA signature so that its authenticity can be verified. |
| |
| This token will be used to authenticate subsequent connections. By default, |
| authentication tokens are only valid for seven days, so that even if a token |
| were compromised, it could not be used indefinitely. For the most part, |
| authentication tokens should be completely transparent to users. By using |
| authentication tokens, Kudu takes advantage of strong authentication without |
| paying the scalability cost of communicating with a central authority for every |
| connection. |
| |
| When used with distributed compute frameworks such as Spark, authentication |
| tokens can simplify configuration and improve security. For example, the Kudu |
| Spark connector will automatically retrieve an authentication token during the |
| planning stage, and distribute the token to tasks. This allows Spark to work |
| against a secured Kudu cluster where only the planner node has Kerberos |
| credentials. |
| |
| === Client Authentication to Secure Kudu Clusters |
| |
| Users running client Kudu applications must first run the `kinit` command to |
| obtain a Kerberos ticket-granting ticket. For example: |
| |
| [source,bash] |
| ---- |
| $ kinit admin@EXAMPLE-REALM.COM |
| ---- |
| |
| Once authenticated, you use the same client code to read from and write to Kudu |
| servers with and without Kerberos configuration. |
| |
| == Scalability |
| |
| Kudu authentication is designed to scale to thousands of nodes, which requires |
| avoiding unnecessary coordination with a central authentication authority (such |
| as the Kerberos KDC). Instead, Kudu servers and clients will use Kerberos to |
| establish initial trust with the Kudu master, and then use alternate credentials |
| for subsequent connections. In particular, the master will issue internal |
| X.509 certificates to servers, and temporary authentication tokens to clients. |
| |
| == Encryption |
| |
| Kudu allows all communications among servers and between clients and servers |
| to be encrypted with TLS. |
| |
| Encryption can be configured on Kudu servers using the `--rpc-encryption` flag, |
| which can be set to `required`, `optional`, or `disabled`. By default, the flag |
| is set to `optional`. When `required`, Kudu will reject unencrypted connections. |
| When `optional`, Kudu will attempt to use encryption. Same as authentication, |
| when `disabled` or encryption fails for `optional`, Kudu will only allow |
| unencrypted connections from trusted subnets and reject any unencrypted connections |
| from publicly routable IPs. To secure a cluster, use `--rpc-encryption=required`. |
| |
| NOTE: Kudu will automatically turn off encryption on local loopback connections, |
| since traffic from these connections is never exposed externally. This allows |
| locality-aware compute frameworks like Spark and Impala to avoid encryption |
| overhead, while still ensuring data confidentiality. |
| |
| == Coarse-Grained Authorization |
| |
| Kudu supports coarse-grained authorization of client requests based on the |
| authenticated client Kerberos principal (i.e. user or service). The two levels |
| of access which can be configured are: |
| |
| * *Superuser* - principals authorized as a superuser are able to perform |
| certain administrative functionality such as using the `kudu` command line tool |
| to diagnose or repair cluster issues. |
| |
| * *User* - principals authorized as a user are able to access and modify all |
| data in the Kudu cluster. This includes the ability to create, drop, and alter |
| tables as well as read, insert, update, and delete data. |
| |
| NOTE: Internally, Kudu has a third access level for the daemons themselves. |
| This ensures that users cannot connect to the cluster and pose as tablet |
| servers. |
| |
| Access levels are granted using whitelist-style Access Control Lists (ACLs), one |
| for each of the two levels. Each access control list either specifies a |
| comma-separated list of users, or may be set to `*` to indicate that all |
| authenticated users are able to gain access at the specified level. See |
| <<configuration>> below for examples. |
| |
| NOTE: The default value for the User ACL is `*`, which allows all users access |
| to the cluster. However, if authentication is enabled, this still restricts access |
| to only those users who are able to successfully authenticate via Kerberos. |
| Unauthenticated users on the same network as the Kudu servers will be unable |
| to access the cluster. |
| |
| [[web-ui]] |
| == Web UI Encryption |
| |
| The Kudu web UI can be configured to use secure HTTPS encryption by providing |
| each server with TLS certificates. See <<configuration>> for more information on |
| web UI HTTPS configuration. |
| |
| == Web UI Redaction |
| |
| To prevent sensitive data from being exposed in the web UI, all row data is |
| redacted. Table metadata, such as table names, column names, and partitioning |
| information is not redacted. The web UI can be completely disabled by setting |
| the `--webserver-enabled=false` flag on Kudu servers. |
| |
| WARNING: Disabling the web UI will also disable REST endpoints such as |
| `/metrics`. Monitoring systems rely on these endpoints to gather metrics data. |
| |
| [[logs]] |
| == Log Security |
| |
| To prevent sensitive data from being included in Kudu server logs, all row data |
| is redacted by default. By setting the `--redact=log` flag, redaction will be |
| disabled in the web UI but retained for server logs. Alternatively, `--redact=none` |
| can be used to disable redaction completely. |
| // TODO(dan): add link to configuration reference. |
| |
| [[configuration]] |
| == Configuring a Secure Kudu Cluster |
| |
| The following configuration parameters should be set on all servers (master and |
| tablet server) in order to ensure that a Kudu cluster is secure: |
| |
| ``` |
| # Connection Security |
| #-------------------- |
| --rpc-authentication=required |
| --rpc-encryption=required |
| --keytab-file=<path-to-kerberos-keytab> |
| |
| # Web UI Security |
| #-------------------- |
| --webserver-certificate-file=<path-to-cert-pem> |
| --webserver-private-key-file=<path-to-key-pem> |
| # optional |
| --webserver-private-key-password-cmd=<password-cmd> |
| |
| # If you prefer to disable the web UI entirely: |
| --webserver-enabled=false |
| |
| # Coarse-grained authorization |
| #-------------------------------- |
| |
| # This example ACL setup allows the 'impala' user as well as the |
| # 'nightly_etl_service_account' principal access to all data in the |
| # Kudu cluster. The 'hadoopadmin' user is allowed to use administrative |
| # tooling. Note that, by granting access to 'impala', other users |
| # may access data in Kudu via the Impala service subject to its own |
| # authorization rules. |
| --user-acl=impala,nightly_etl_service_account |
| --superuser-acl=hadoopadmin |
| ``` |
| |
| Further information about these flags can be found in the configuration |
| flag reference. |
| // TODO(todd) add a link |
| |
| |
| [[known-limitations]] |
| == Known Limitations |
| |
| Kudu has a few known security limitations: |
| |
| // TODO(danburkert): add JIRA links for each of these. |
| |
| Custom Kerberos Principal:: Kudu does not support setting a custom service |
| principal for Kudu processes. The principal must be 'kudu'. |
| |
| External PKI:: Kudu does not support externally-issued certificates for internal |
| wire encryption (server to server and client to server). |
| |
| Fine-grained Authorization:: Kudu does not have the ability to restrict access |
| based on operation type or target (table, column, etc). ACLs currently do not |
| support authorization based on membership in a group. |
| |
| On-disk Encryption:: Kudu does not have built-in on-disk encryption. However, |
| Kudu can be used with whole-disk encryption tools such as dm-crypt. |
| |
| Web UI Authentication:: The Kudu web UI lacks Kerberos-based authentication |
| (SPNEGO), so access cannot be restricted based on Kerberos principals. |
| |
| Flume Integration:: Flume integration is not supported with secure Kudu clusters |
| which require authentication or encryption. |