| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| [[security]] |
| = Apache Kudu Security |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| Kudu includes security features which allow Kudu clusters to be hardened against |
| access from unauthorized users. This guide describes the security features |
| provided by Kudu. <<configuration>> lists essential configuration options when |
| deploying a secure Kudu cluster. <<known-limitations>> contains a list of |
| known deficiencies in Kudu's security capabilities. |
| |
| == Authentication |
| |
| Kudu can be configured to enforce secure authentication among servers, and |
| between clients and servers. Authentication prevents untrusted actors from |
| gaining access to Kudu, and securely identifies the connecting user or services |
| for authorization checks. Authentication in Kudu is designed to interoperate |
| with other secure Hadoop components by utilizing Kerberos. |
| |
| Authentication can be configured on Kudu servers using the |
| `--rpc_authentication` flag, which can be set to `required`, `optional`, or |
| `disabled`. By default, the flag is set to `optional`. When `required`, Kudu |
| will reject connections from clients and servers who lack authentication |
| credentials. When `optional`, Kudu will attempt to use strong authentication. |
| When `disabled` or strong authentication fails for 'optional', by default Kudu |
| will only allow unauthenticated connections from trusted subnets, which are |
| private networks (127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16, |
| 169.254.0.0/16) and local subnets of all local network interfaces. Unauthenticated |
| connections from publicly routable IPs will be rejected. |
| |
| The trusted subnets can be configured using the `--trusted_subnets` flag, |
| which can be set to IP blocks in CIDR notation separated by comma. Set it to |
| '0.0.0.0/0' to allow unauthenticated connections from all remote IP addresses. |
| However, if network access is not otherwise restricted by a firewall, |
| malicious users may be able to gain unauthorized access. This can be mitigated |
| if authentication is configured to be required. |
| |
| WARNING: When the `--rpc_authentication` flag is set to `optional`, |
| the cluster does not prevent access from unauthenticated users. To secure a |
| cluster, use `--rpc_authentication=required`. |
| |
| === Internal PKI |
| |
| Kudu uses an internal PKI system to issue X.509 certificates to servers in |
| the cluster. Connections between peers who have both obtained certificates will |
| use TLS for authentication, which doesn't require contacting the Kerberos KDC. |
| These certificates are _only_ used for internal communication among Kudu |
| servers, and between Kudu clients and servers. The certificates are never |
| presented in a public facing protocol. |
| |
| By using internally-issued certificates, Kudu offers strong authentication which |
| scales to huge clusters, and allows TLS encryption to be used without requiring |
| you to manually deploy certificates on every node. |
| |
| === Authentication Tokens |
| |
| After authenticating to a secure cluster, the Kudu client will automatically |
| request an authentication token from the Kudu master. An authentication token |
| encapsulates the identity of the authenticated user and carries the master's |
| RSA signature so that its authenticity can be verified. |
| |
| This token will be used to authenticate subsequent connections. By default, |
| authentication tokens are only valid for seven days, so that even if a token |
| were compromised, it could not be used indefinitely. For the most part, |
| authentication tokens should be completely transparent to users. By using |
| authentication tokens, Kudu takes advantage of strong authentication without |
| paying the scalability cost of communicating with a central authority for every |
| connection. |
| |
| When used with distributed compute frameworks such as Spark, authentication |
| tokens can simplify configuration and improve security. For example, the Kudu |
| Spark connector will automatically retrieve an authentication token during the |
| planning stage, and distribute the token to tasks. This allows Spark to work |
| against a secured Kudu cluster where only the planner node has Kerberos |
| credentials. |
| |
| === Client Authentication to Secure Kudu Clusters |
| |
| Users running client Kudu applications must first run the `kinit` command to |
| obtain a Kerberos ticket-granting ticket. For example: |
| |
| [source,bash] |
| ---- |
| $ kinit admin@EXAMPLE-REALM.COM |
| ---- |
| |
| Once authenticated, you use the same client code to read from and write to Kudu |
| servers with and without Kerberos configuration. |
| |
| === Scalability |
| |
| Kudu authentication is designed to scale to thousands of nodes, which requires |
| avoiding unnecessary coordination with a central authentication authority (such |
| as the Kerberos KDC). Instead, Kudu servers and clients will use Kerberos to |
| establish initial trust with the Kudu master, and then use alternate credentials |
| for subsequent connections. In particular, the master will issue internal |
| X.509 certificates to servers, and temporary authentication tokens to clients. |
| |
| == Coarse-Grained Authorization |
| |
| Kudu supports coarse-grained authorization of client requests based on the |
| authenticated client Kerberos principal (i.e. user or service). The two levels |
| of access which can be configured are: |
| |
| * *Superuser* - principals authorized as a superuser are able to perform |
| certain administrative functionality such as using the `kudu` command line tool |
| to diagnose or repair cluster issues. |
| |
| * *User* - principals authorized as a user are able to access and modify all |
| data in the Kudu cluster. This includes the ability to create, drop, and alter |
| tables as well as read, insert, update, and delete data. |
| |
| NOTE: Internally, Kudu has a third access level for the daemons themselves. |
| This ensures that users cannot connect to the cluster and pose as tablet |
| servers. |
| |
| Access levels are granted using whitelist-style Access Control Lists (ACLs), one |
| for each of the two levels. Each access control list either specifies a |
| comma-separated list of users, or may be set to `*` to indicate that all |
| authenticated users are able to gain access at the specified level. See |
| <<configuration>> below for examples. |
| |
| NOTE: The default value for the User ACL is `*`, which allows all users access |
| to the cluster. However, if authentication is enabled, this still restricts access |
| to only those users who are able to successfully authenticate via Kerberos. |
| Unauthenticated users on the same network as the Kudu servers will be unable |
| to access the cluster. |
| |
| == Fine-Grained Authorization |
| |
| As of Kudu 1.10.0, Kudu can be configured to enforce fine-grained authorization |
| across servers. This ensures that users can see only the data they are |
| explicitly authorized to see. Kudu currently supports this by leveraging |
| policies defined in Apache Sentry 2.2 and later. |
| |
| WARNING: Fine-grained authorization policies are not enforced when accessing |
| the web UI. User data may appear on various pages of the web UI (e.g. in logs, |
| metrics, scans, etc.). As such, it is recommended to either limit access to the |
| web UI ports, or redact or disable the web UI entirely, as desired. See the |
| <<web-ui,instructions for securing the web UI>> for more details. |
| |
| === Apache Sentry |
| |
| Apache Sentry models tabular objects in the following hierarchy: |
| |
| * *Server* - indicated by the Kudu configuration flag `--server_name`. |
| Everything stored in a Kudu cluster falls within the given "server". |
| |
| * *Database* - indicated as a prefix of table names with the format |
| `<database>.<table>`. |
| |
| * *Table* - a single Kudu table. |
| |
| * *Column* - a column within a Kudu table. |
| |
| Each level of this hierarchy defines a "scope" on which privileges can be |
| granted. Privileges granted on a higher scope imply privileges on a lower |
| scope. For example, if a user has `SELECT` privilege on a database, that user |
| implicitly has `SELECT` privileges on every table belonging to that database. |
| |
| Privileges are also associated with specific actions. Access to Kudu tables may |
| rely on privileges on the following actions: |
| |
| * `ALTER` |
| * `CREATE` |
| * `DELETE` |
| * `DROP` |
| * `INSERT` |
| * `UPDATE` |
| * `SELECT` |
| |
| Additionally, there are three special actions recognized by Kudu: `ALL`, |
| `OWNER`, and `METADATA`. If a user has the `ALL` or `OWNER` privileges on a |
| given table, that user has all of the above privileges on the table. |
| `METADATA` privilege is not an actual privilege per se, rather, it is a |
| conceptual privilege with which Kudu models any privilege. If a user has any |
| privilege on a given table, that user has `METADATA` privileges on the table, |
| i.e. a privilege granted on any action on a table implies that the user has |
| the `METADATA` privilege on that table. |
| |
| For more details about Sentry privileges, see the Apache Sentry |
| link:https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Privileges[documentation]. |
| |
| NOTE: Depending on the value of the `sentry.db.explicit.grants.permitted` |
| configuration in Sentry, certain privileges may not be grantable in Sentry. For |
| example, in Sentry deployments that don't support `UPDATE` privileges, to |
| perform an operation that requires `UPDATE` privileges, a user must instead |
| have `ALL` privileges. |
| |
| When a Kudu master receives a request, it consults Sentry to determine what |
| privileges a user has. If the user is not authorized to perform the requested |
| action, the request is rejected. Kudu leverages the authenticated identity of a |
| user to decide whether to perform or reject a request. |
| |
| === Authorization Tokens |
| |
| Rather than having every tablet server communicate directly with Sentry, |
| privileges are propagated and checked via *authorization tokens*. These tokens |
| encapsulate what privileges a user has on a given table. Tokens are generated |
| by the master and returned to Kudu clients upon opening a Kudu table. Kudu |
| clients automatically attach authorization tokens when sending requests to |
| tablet servers. |
| |
| NOTE: Authorization tokens are a means to limiting the number of nodes directly |
| accessing Sentry to retrieve privileges. As such, since the expected number of |
| tablet servers in a cluster is much higher than the number of Kudu masters, |
| they are only used to authorize requests sent to tablet servers. Kudu masters |
| fetch privileges directly from Sentry or cache. See <<privilege-caching>> for |
| more details of Kudu's privilege cache. |
| |
| Similar to the validity interval for authentication tokens, to limit the |
| window of potential unwanted access if a token becomes compromised, |
| authorization tokens are valid for five minutes by default. The acquisition and |
| renewal of a token is hidden from the user, as Kudu clients automatically |
| retrieve new tokens when existing tokens expire. |
| |
| When a tablet server that has been configured to enforce fine-grained access |
| control receives a request, it checks the privileges in the attached token, |
| rejecting it if the privileges are not sufficient to perform the requested |
| operation, or if it is invalid (e.g. expired). |
| |
| [[trusted-users]] |
| === Trusted Users |
| |
| It may be desirable to allow certain users to view and modify any data stored |
| in Kudu. Such users can be specified via the `--trusted_user_acl` master |
| configuration. Trusted users can perform any operation that would otherwise |
| require fine-grained privileges, without Kudu consulting Sentry. |
| |
| Additionally, some services that interact with Kudu may authorize requests on |
| behalf of their end users. For example, Apache Impala authorizes queries on |
| behalf of its users, and sends requests to Kudu as the Impala service user, |
| commonly "impala". Since Impala authorizes requests on its own, to avoid |
| extraneous communication between Sentry and Kudu, the Impala service user |
| should be listed as a trusted user. |
| |
| NOTE: When accessing Kudu through Impala, Impala enforces its own fine-grained |
| authorization policy. This policy is similar to Kudu's and can be found in |
| Impala's |
| link:https://impala.apache.org/docs/build/html/topics/impala_authorization.html#authorization[authorization |
| documentation]. |
| |
| [[sentry-configuration]] |
| === Configuring the Integration with Apache Sentry |
| |
| NOTE: Sentry is often configured with Kerberos authentication. See |
| <<configuration>> for how to configure Kudu to authenticate via Kerberos. |
| |
| NOTE: In order to enable integration with Sentry, a cluster must first be |
| integrated with the Apache Hive Metastore. See the |
| <<hive_metastore.adoc#enabling-the-hive-metastore-integration,documentation>> |
| for how to configure Kudu to synchronize its internal catalog with the Hive |
| Metastore. |
| |
| The following configurations must be set on the master: |
| |
| ``` |
| --sentry_service_rpc_addresses=<Sentry RPC address> |
| --server_name=<value of HiveServer2's hive.sentry.server configuration> |
| --kudu_service_name=kudu |
| --sentry_service_kerberos_principal=sentry |
| --sentry_service_security_mode=kerberos |
| |
| # This example ACL setup allows the 'impala' user to access all data stored in |
| # Kudu, assuming Impala will authorize requests on its own. The 'hadoopadmin' |
| # user is also granted access to all Kudu data, which may facilitate testing |
| # and debugging. |
| --trusted_user_acl=impala,hadoopadmin |
| ``` |
| |
| The following configurations must be set on the tablet servers: |
| |
| ``` |
| --tserver_enforce_access_control=true |
| ``` |
| |
| The following configurations must be set in `sentry-site.xml` on the Sentry servers: |
| ```xml |
| # This example setup configures the Kudu service user as a privileged user to be |
| # able to retrieve authorization policies stored in Sentry. |
| <property> |
| <name>sentry.service.allow.connect</name> |
| <value>kudu</value> |
| </property> |
| |
| <property> |
| <name>sentry.service.admin.group</name> |
| <value>kudu</value> |
| </property> |
| ``` |
| [[privilege-caching]] |
| === Caching |
| |
| To avoid overwhelming Sentry with requests to fetch user privileges, the Kudu |
| master can be configured to cache user privileges. A by-product of this caching |
| is that when privileges are changed in Sentry, they may not be reflected in |
| Kudu for a configurable amount of time, defined by the following Kudu master |
| configurations: |
| |
| `--sentry_privileges_cache_ttl_factor * --authz_token_validity_interval_secs` |
| |
| The default value is fifty minutes. If privilege updates need to be reflected |
| in Kudu sooner than this, the Kudu CLI tool can be used to invalidate the |
| cached privileges to force Kudu to fetch new ones from Sentry: |
| |
| [source,bash] |
| ---- |
| kudu master authz_cache reset <master-addresses> |
| ---- |
| |
| === Policy for Kudu Masters |
| |
| The following authorization policy is enforced by Kudu masters. |
| |
| .Authorization Policy for Masters |
| [options="header"] |
| |=== |
| | Operation | Required Privilege |
| | `CreateTable` | `CREATE ON DATABASE` |
| | `CreateTable` with a different owner specified than the requesting user | `ALL ON DATABASE` with the Sentry `GRANT OPTION` (see link:https://cwiki.apache.org/confluence/display/SENTRY/Support+Delegated+GRANT+and+REVOKE+in+Hive+and+Impala[here]) |
| | `DeleteTable` | `DROP ON TABLE` |
| | `AlterTable` (with no rename) | `ALTER ON TABLE` |
| | `AlterTable` (with rename) | `ALL ON TABLE <old-table>` and `CREATE ON DATABASE <new-database>` |
| | `IsCreateTableDone` | `METADATA ON TABLE` |
| | `IsAlterTableDone` | `METADATA ON TABLE` |
| | `ListTables` | `METADATA ON TABLE` |
| | `GetTableLocations` | `METADATA ON TABLE` |
| | `GetTableSchema` | `METADATA ON TABLE` |
| | `GetTabletLocations` | `METADATA ON TABLE` |
| |=== |
| |
| === Policy for Kudu Tablet Servers |
| |
| The following authorization policy is enforced by Kudu tablet servers. |
| |
| .Authorization Policy for Tablet Servers |
| [options="header"] |
| |=== |
| | Operation | Required Privilege |
| | `Scan` | `SELECT ON TABLE`, or |
| |
| `METADATA ON TABLE` and `SELECT ON COLUMN` for each projected column and each predicate column |
| | `Scan` (no projected columns, equivalent to `COUNT(*)`) | `SELECT ON TABLE`, or |
| |
| `SELECT ON COLUMN` for each column in the table |
| | `Scan` (with virtual columns) | `SELECT ON TABLE`, or |
| |
| `SELECT ON COLUMN` for each column in the table |
| | `Scan` (in `ORDERED` mode) | `<privileges required for a Scan>` and `SELECT ON COLUMN` for each primary key column |
| | `Insert` | `INSERT ON TABLE` |
| | `Update` | `UPDATE ON TABLE` |
| | `Upsert` | `INSERT ON TABLE` and `UPDATE ON TABLE` |
| | `Delete` | `DELETE ON TABLE` |
| | `SplitKeyRange` | `SELECT ON COLUMN` for each primary key column and `SELECT ON COLUMN` for each projected column |
| | `Checksum` | User must be configured in `--superuser_acl` |
| | `ListTablets` | User must be configured in `--superuser_acl` |
| |=== |
| |
| NOTE: Unlike Impala, Kudu only supports all-or-nothing access to a table's |
| schema, rather than showing only authorized columns. |
| |
| == Encryption |
| |
| Kudu allows all communications among servers and between clients and servers |
| to be encrypted with TLS. |
| |
| Encryption can be configured on Kudu servers using the `--rpc_encryption` flag, |
| which can be set to `required`, `optional`, or `disabled`. By default, the flag |
| is set to `optional`. When `required`, Kudu will reject unencrypted connections. |
| When `optional`, Kudu will attempt to use encryption. Same as authentication, |
| when `disabled` or encryption fails for `optional`, Kudu will only allow |
| unencrypted connections from trusted subnets and reject any unencrypted connections |
| from publicly routable IPs. To secure a cluster, use `--rpc_encryption=required`. |
| |
| NOTE: Kudu will automatically turn off encryption on local loopback connections, |
| since traffic from these connections is never exposed externally. This allows |
| locality-aware compute frameworks like Spark and Impala to avoid encryption |
| overhead, while still ensuring data confidentiality. |
| |
| [[web-ui]] |
| == Web UI Encryption |
| |
| The Kudu web UI can be configured to use secure HTTPS encryption by providing |
| each server with TLS certificates. See <<configuration>> for more information on |
| web UI HTTPS configuration. |
| |
| == Web UI Redaction |
| |
| To prevent sensitive data from being exposed in the web UI, all row data is |
| redacted. Table metadata, such as table names, column names, and partitioning |
| information is not redacted. The web UI can be completely disabled by setting |
| the `--webserver_enabled=false` flag on Kudu servers. |
| |
| WARNING: Disabling the web UI will also disable REST endpoints such as |
| `/metrics`. Monitoring systems rely on these endpoints to gather metrics data. |
| |
| [[logs]] |
| == Log Security |
| |
| To prevent sensitive data from being included in Kudu server logs, all row data |
| is redacted by default. By setting the `--redact=log` flag, redaction will be |
| disabled in the web UI but retained for server logs. Alternatively, `--redact=none` |
| can be used to disable redaction completely. |
| // TODO(dan): add link to configuration reference. |
| |
| [[configuration]] |
| == Configuring a Secure Kudu Cluster |
| |
| The following configuration parameters should be set on all servers (master and |
| tablet server) in order to ensure that a Kudu cluster is secure: |
| |
| ``` |
| # Connection Security |
| #-------------------- |
| --rpc_authentication=required |
| --rpc_encryption=required |
| --keytab_file=<path-to-kerberos-keytab> |
| |
| # Web UI Security |
| #-------------------- |
| --webserver_certificate_file=<path-to-cert-pem> |
| --webserver_private_key_file=<path-to-key-pem> |
| # optional |
| --webserver_private_key_password_cmd=<password-cmd> |
| |
| # If you prefer to disable the web UI entirely: |
| --webserver_enabled=false |
| |
| # Coarse-grained authorization |
| #-------------------------------- |
| |
| # This example ACL setup allows the 'impala' user as well as the |
| # 'nightly_etl_service_account' principal access to all data in the |
| # Kudu cluster. The 'hadoopadmin' user is allowed to use administrative |
| # tooling. Note that, by granting access to 'impala', other users |
| # may access data in Kudu via the Impala service subject to its own |
| # authorization rules. |
| --user_acl=impala,nightly_etl_service_account |
| --superuser_acl=hadoopadmin |
| ``` |
| |
| See <<sentry-configuration>> to see an example of how to enable fine-grained |
| authorization via Apache Sentry. |
| |
| Further information about these flags can be found in the configuration |
| flag reference. |
| // TODO(todd) add a link |
| |
| |
| [[known-limitations]] |
| == Known Limitations |
| |
| Kudu has a few known security limitations: |
| |
| // TODO(danburkert): add JIRA links for each of these. |
| |
| Custom Kerberos Principal:: Kudu does not support setting a custom service |
| principal for Kudu processes. The principal must be 'kudu'. |
| |
| External PKI:: Kudu does not support externally-issued certificates for internal |
| wire encryption (server to server and client to server). |
| |
| On-disk Encryption:: Kudu does not have built-in on-disk encryption. However, |
| Kudu can be used with whole-disk encryption tools such as dm-crypt. |
| |