| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| [[configuration]] |
| = Configuring Apache Kudu |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| include::top.adoc[tags=version] |
| |
| == Configure Kudu |
| |
| === Configuration Basics |
| To configure the behavior of each Kudu process, you can pass command-line flags when |
| you start it, or read those options from configuration files by passing them using |
| one or more `--flagfile=<file>` options. You can even include the |
| `--flagfile` option within your configuration file to include other files. Learn more about gflags |
| by reading link:https://gflags.github.io/gflags/[its documentation]. |
| |
| You can place options for masters and tablet servers into the same configuration |
| file, and each will ignore options that do not apply. |
| |
| Flags can be prefixed with either one or two `-` characters. This |
| documentation standardizes on two: `--example_flag`. |
| |
| === Discovering Configuration Options |
| Only the most common configuration options are documented here. For a more exhaustive |
| list of configuration options, see the link:configuration_reference.html[Configuration Reference]. |
| |
| To see all configuration flags for a given executable, run it with the `--help` option. |
| Take care when configuring undocumented flags, as not every possible |
| configuration has been tested, and undocumented options are not guaranteed to be |
| maintained in future releases. |
| |
| [[clock_and_time_source]] |
| === Configuring Clock and Time Source |
| Kudu relies on timestamps generated by its clock implementation for the MVCC |
| and for providing consistency guarantees when processing write and read |
| requests. Aside from the test-only mock clock, Kudu has two different clock |
| implementations: one is based on logical time and the other is based on |
| so-called hybrid time. The former is a plain Lamport clock, the latter |
| is a combination of the node's system clock and a Lamport clock. Below, |
| the former is referred to as `LogicalClock` and the latter as `HybridClock`. |
| |
| Using the `HybridClock` implementation is a must for any production-grade, POC, |
| and other regular Kudu deployments: that's why `--use_hybrid_clock` is set |
| `true` by default. Setting the flag to `false` makes Kudu servers use the |
| `LogicalClock` implementation: running with such a clock implementation is |
| acceptable only in the context of running specifically crafted test scenarios |
| in Kudu development environment. |
| |
| WARNING: Setting `--use_hybrid_clock=false` is strongly discouraged in any |
| production-grade deployment since that could introduce out-of-control latency |
| and not-quite-expected behavior, especially when working with multiple tables |
| in a multi-node Kudu cluster. |
| |
| To provide better accuracy for multi-node cluster deployments where each node |
| maintains its own system clock, the `HybridClock` implementation requires each |
| node's system clock to be synchronized by NTP. |
| |
| NOTE: Setting `--time_source=system_unsync` removes the requirement for the |
| node's system clock to be synchronized by NTP -- this allows users to run test |
| clusters on a single node where there is only one clock used by all Kudu |
| servers. Setting `--time_source=system_unsync` is strongly discouraged in any |
| multi-node Kudu cluster, unless system clocks of all Kudu nodes are guaranteed |
| to always be synchronized with each other. |
| |
| For Kudu masters and tablet servers, there are two options to make the |
| `HybridClock` implementation use a clock synchronized by NTP: |
| |
| - Ensure that the system clock of the Kudu node is synchronized with reference |
| servers using an NTP daemon running on the node. Usually, the NTP daemon is |
| a part of the node's OS distribution. As of Kudu 1.12.0 and newer, both |
| `ntpd` and `chronyd` are supported. Prior Kudu versions were tested only |
| with `ntpd`, but might work just fine with `chronyd` as well if `chronyd` is |
| configured as recommended by the |
| link:troubleshooting.html#chronyd[chronyd configuration tips for Kudu]. |
| - Make Kudu servers maintain their own local clock, synchronizing it with |
| reference NTP servers. For that, Kudu servers use their built-in NTP client. |
| This option is available in Kudu 1.11.0 and newer versions. |
| |
| The latter option is provided as a last resort for deployments where properly |
| configuring NTP daemons at every node of a Kudu cluster is not feasible for |
| some reason and to simplify Kudu deployments in public cloud environments such |
| as EC2 and GCP. For on-prem deployments, it's still recommended to use the |
| former option since the current implementation of the Kudu built-in NTP client |
| might not be as robust as the battle-tested `ntpd` and `chronyd` system |
| NTP daemons. |
| |
| To switch between these two options above, use the `--time_source` flag: |
| |
| - Setting `--time_source=system` makes the `HybridClock` rely on the node's |
| system clock. |
| - Setting `--time_source=builtin` turns on the built-in NTP client in |
| Kudu masters and tablet servers. Use the `--builtin_ntp_servers` flag to |
| customize the set of reference NTP servers for the built-in NTP client: the |
| value is expected to be a comma-separated list. |
| |
| NOTE: The default setting for the `--builtin_ntp_servers` flag might require |
| access to the NTP servers hosted by the |
| link:https://www.ntppool.org/[NTP Pool Project]. |
| |
| If deploying a Kudu cluster in AWS/EC2 or GCE/GCP public clouds, it might make |
| sense to set `--time_source=auto` for all Kudu masters and tablet servers in |
| the cluster. In this context, setting `--time_source=auto` leads to the |
| following: |
| |
| - Upon every start, a Kudu server runs the auto-detection procedure to |
| determine the type of the cloud environment it runs at. |
| - If the procedure of the cloud type auto-detection completes successfully, |
| the Kudu server starts using its built-in NTP client to synchronize with the |
| NTP server provided by the cloud environment (see the appropriate |
| documentation for |
| link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html[EC2] |
| and link:https://cloud.google.com/compute/docs/instances/configure-ntp[GCP] |
| correspondingly). |
| |
| NOTE: Running a Kudu server with `--time_source=auto` in cloud environments |
| other than EC2 and GCP, or when the cloud type auto-detection fails, makes |
| the Kudu server fall back to using the built-in NTP client with the list |
| of NTP servers as specified by the `--builtin_ntp_servers` flag, unless it's |
| empty or otherwise unparsable. When `--builtin_ntp_servers` is set to an empty |
| list and the cloud type auto-detection fails, the Kudu server runs as if it |
| were configured with the `system` time source if the OS/platform supports the |
| `get_ntptime()` API. Finally, the catch-all case is `system_unsync` for the |
| time source. As already mentioned, the `system_unsync` time source is targeted |
| for development-only platforms or single-node-runs-it-all proof-of-concept |
| Kudu clusters. |
| |
| The `kudu cluster ksck` CLI utility reports the configured and the effective |
| time source for every Kudu master and tablet server in a cluster. The list of |
| the NTP servers for the built-in client is reported as well when the effective |
| time source is `builtin`. The utility is also able to show the difference in |
| settings of the related time source flags and warn operators if a discrepancy |
| is detected. In addition, the information on the configured and effective time |
| source is reported by the embedded Web server in the `Time Source` panel at |
| the `/config` page. |
| |
| NOTE: Changing the value of the `--time_source` flag implies restarting a Kudu |
| server. Keep the time source the same for all master and tablet servers in |
| a Kudu cluster. If using the built-in NTP Kudu client, make sure to use |
| the same list of reference NTP servers for every Kudu server in a cluster. |
| |
| [[directory_configuration]] |
| === Directory Configurations |
| Every Kudu node requires the specification of directory flags. The |
| `--fs_wal_dir` configuration indicates where Kudu will place its write-ahead |
| logs. The `--fs_metadata_dir` configuration indicates where Kudu will place |
| metadata for each tablet. It is recommended, although not necessary, that these |
| directories be placed on a high-performance drives with high bandwidth and low |
| latency, e.g. solid-state drives. If `--fs_metadata_dir` is not specified, |
| metadata will be placed in the directory specified by `--fs_wal_dir`. Since |
| a Kudu node cannot tolerate the loss of its WAL or metadata directories, it |
| may be wise to mirror the drives containing these directories in order to |
| make recovering from a drive failure easier; however, mirroring may increase |
| the latency of Kudu writes. |
| |
| The `--fs_data_dirs` configuration indicates where Kudu will write its data |
| blocks. This is a comma-separated list of directories; if multiple values are |
| specified, data will be striped across the directories. If not specified, data |
| blocks will be placed in the directory specified by `--fs_wal_dir`. Note that |
| while a single data directory backed by a RAID-0 array will outperform a single |
| data directory backed by a single storage device, it is better to let Kudu |
| manage its own striping over multiple devices rather than delegating the |
| striping to a RAID-0 array. |
| |
| Additionally, `--fs_wal_dir` and `--fs_metadata_dir` may be the same as _one |
| of_ the directories listed in `--fs_data_dirs`, but must not be sub-directories |
| of any of them. |
| |
| WARNING: Each directory specified by a configuration flag on a given machine |
| should be used by at most one Kudu process. If multiple Kudu processes on the |
| same machine are configured to use the same directory, Kudu may refuse to start |
| up. |
| |
| WARNING: Once `--fs_data_dirs` is set, extra tooling is required to change it. |
| For more details, see the link:administration.html#change_dir_config[Kudu |
| Administration docs]. |
| |
| NOTE: The `--fs_wal_dir` and `--fs_metadata_dir` configurations can be changed, |
| provided the contents of the directories are also moved to match the flags. |
| |
| === Configuring the Kudu Master |
| To see all available configuration options for the `kudu-master` executable, run it |
| with the `--help` option: |
| ---- |
| $ kudu-master --help |
| ---- |
| |
| [cols="m,d,m,d"] |
| .Supported Configuration Flags for Kudu Masters |
| |=== |
| | Flag | Valid Options | Default | Description |
| |
| |--master_addresses | string | localhost | Comma-separated list of all the RPC |
| addresses for Master consensus-configuration. If not specified, assumes a standalone Master. |
| |--fs_data_dirs | string | | List of directories where the Master will place its data blocks. |
| |--fs_metadata_dir | string | | The directory where the Master will place its tablet metadata. |
| |--fs_wal_dir | string | | The directory where the Master will place its write-ahead logs. |
| |--log_dir | string | /tmp | The directory to store Master log files. |
| |=== |
| |
| For the full list of flags for masters, see the |
| link:configuration_reference.html#kudu-master_supported[Kudu Master Configuration Reference]. |
| |
| === Configuring Tablet Servers |
| To see all available configuration options for the `kudu-tserver` executable, |
| run it with the `--help` option: |
| ---- |
| $ kudu-tserver --help |
| ---- |
| |
| .Supported Configuration Flags for Kudu Tablet Servers |
| |=== |
| | Flag | Valid Options | Default | Description |
| |
| |--fs_data_dirs | string | | List of directories where the Tablet Server will place its data blocks. |
| |--fs_metadata_dir | string | | The directory where the Tablet Server will place its tablet metadata. |
| |--fs_wal_dir | string | | The directory where the Tablet Server will place its write-ahead logs. |
| |--log_dir | string | /tmp | The directory to store Tablet Server log files |
| |--tserver_master_addrs | string | `127.0.0.1:7051` | Comma separated |
| addresses of the masters which the tablet server should connect to. The masters |
| do not read this flag. |
| |--block_cache_capacity_mb | integer | 512 | Maximum amount of memory allocated to the Kudu Tablet Server's block cache. |
| |--memory_limit_hard_bytes | integer | 4294967296 | Maximum amount of memory a Tablet Server can consume before it starts rejecting all incoming writes. |
| |=== |
| |
| For the full list of flags for tablet servers, see the |
| link:configuration_reference.html#kudu-tserver_supported[Kudu Tablet Server Configuration Reference]. |
| |
| == Configure Kudu Tables |
| Kudu allows certain configurations to be set per table. To configure the behavior of a Kudu table, |
| you can set these configurations at table creation, or alter them via the Kudu API or Kudu command |
| line tool. |
| |
| .Supported Configurable Properties for Kudu Tables |
| |=== |
| | Configuration | Valid Options | Default | Description |
| |
| | kudu.table.history_max_age_sec | integer | | Number of seconds to retain history for tablets in this table. |
| | kudu.table.maintenance_priority | integer | 0 | Priority level of a table for maintenance. |
| | kudu.table.disable_compaction | false, true | false | Whether to disable data compaction maintenance tasks for all tablets of this table. |
| |=== |
| |
| == Next Steps |
| - link:quickstart.html[Get Started With Kudu] |
| - link:developing.html[Developing Applications With Kudu] |