blob: 7e39f0cbd61ee50bd92af62721de7aac64b47658 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[configuration]]
= Configuring Apache Kudu
:author: Kudu Team
:imagesdir: ./images
:icons: font
:toc: left
:toclevels: 3
:doctype: book
:backend: html5
:sectlinks:
:experimental:
include::top.adoc[tags=version]
== Configure Kudu
=== Configuration Basics
To configure the behavior of each Kudu process, you can pass command-line flags when
you start it, or read those options from configuration files by passing them using
one or more `--flagfile=<file>` options. You can even include the
`--flagfile` option within your configuration file to include other files. Learn more about gflags
by reading link:https://gflags.github.io/gflags/[its documentation].
You can place options for masters and tablet servers into the same configuration
file, and each will ignore options that do not apply.
Flags can be prefixed with either one or two `-` characters. This
documentation standardizes on two: `--example_flag`.
=== Discovering Configuration Options
Only the most common configuration options are documented here. For a more exhaustive
list of configuration options, see the link:configuration_reference.html[Configuration Reference].
To see all configuration flags for a given executable, run it with the `--help` option.
Take care when configuring undocumented flags, as not every possible
configuration has been tested, and undocumented options are not guaranteed to be
maintained in future releases.
[[clock_and_time_source]]
=== Configuring Clock and Time Source
Kudu relies on timestamps generated by its clock implementation for the MVCC
and for providing consistency guarantees when processing write and read
requests. Aside from the test-only mock clock, Kudu has two different clock
implementations: one is based on logical time and the other is based on
so-called hybrid time. The former is a plain Lamport clock, the latter
is a combination of the node's system clock and a Lamport clock. Below,
the former is referred to as `LogicalClock` and the latter as `HybridClock`.
Using the `HybridClock` implementation is a must for any production-grade, POC,
and other regular Kudu deployments: that's why `--use_hybrid_clock` is set
`true` by default. Setting the flag to `false` makes Kudu servers use the
`LogicalClock` implementation: running with such a clock implementation is
acceptable only in the context of running specifically crafted test scenarios
in Kudu development environment.
WARNING: Setting `--use_hybrid_clock=false` is strongly discouraged in any
production-grade deployment since that could introduce out-of-control latency
and not-quite-expected behavior, especially when working with multiple tables
in a multi-node Kudu cluster.
To provide better accuracy for multi-node cluster deployments where each node
maintains its own system clock, the `HybridClock` implementation requires each
node's system clock to be synchronized by NTP.
NOTE: Setting `--time_source=system_unsync` removes the requirement for the
node's system clock to be synchronized by NTP -- this allows users to run test
clusters on a single node where there is only one clock used by all Kudu
servers. Setting `--time_source=system_unsync` is strongly discouraged in any
multi-node Kudu cluster, unless system clocks of all Kudu nodes are guaranteed
to always be synchronized with each other.
For Kudu masters and tablet servers, there are two options to make the
`HybridClock` implementation use a clock synchronized by NTP:
- Ensure that the system clock of the Kudu node is synchronized with reference
servers using an NTP daemon running on the node. Usually, the NTP daemon is
a part of the node's OS distribution. As of Kudu 1.12.0 and newer, both
`ntpd` and `chronyd` are supported. Prior Kudu versions were tested only
with `ntpd`, but might work just fine with `chronyd` as well if `chronyd` is
configured as recommended by the
link:troubleshooting.html#chronyd[chronyd configuration tips for Kudu].
- Make Kudu servers maintain their own local clock, synchronizing it with
reference NTP servers. For that, Kudu servers use their built-in NTP client.
This option is available in Kudu 1.11.0 and newer versions.
The latter option is provided as a last resort for deployments where properly
configuring NTP daemons at every node of a Kudu cluster is not feasible for
some reason and to simplify Kudu deployments in public cloud environments such
as EC2 and GCP. For on-prem deployments, it's still recommended to use the
former option since the current implementation of the Kudu built-in NTP client
might not be as robust as the battle-tested `ntpd` and `chronyd` system
NTP daemons.
To switch between these two options above, use the `--time_source` flag:
- Setting `--time_source=system` makes the `HybridClock` rely on the node's
system clock.
- Setting `--time_source=builtin` turns on the built-in NTP client in
Kudu masters and tablet servers. Use the `--builtin_ntp_servers` flag to
customize the set of reference NTP servers for the built-in NTP client: the
value is expected to be a comma-separated list.
NOTE: The default setting for the `--builtin_ntp_servers` flag might require
access to the NTP servers hosted by the
link:https://www.ntppool.org/[NTP Pool Project].
If deploying a Kudu cluster in AWS/EC2 or GCE/GCP public clouds, it might make
sense to set `--time_source=auto` for all Kudu masters and tablet servers in
the cluster. In this context, setting `--time_source=auto` leads to the
following:
- Upon every start, a Kudu server runs the auto-detection procedure to
determine the type of the cloud environment it runs at.
- If the procedure of the cloud type auto-detection completes successfully,
the Kudu server starts using its built-in NTP client to synchronize with the
NTP server provided by the cloud environment (see the appropriate
documentation for
link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html[EC2]
and link:https://cloud.google.com/compute/docs/instances/configure-ntp[GCP]
correspondingly).
NOTE: Running a Kudu server with `--time_source=auto` in cloud environments
other than EC2 and GCP, or when the cloud type auto-detection fails, makes
the Kudu server fall back to using the built-in NTP client with the list
of NTP servers as specified by the `--builtin_ntp_servers` flag, unless it's
empty or otherwise unparsable. When `--builtin_ntp_servers` is set to an empty
list and the cloud type auto-detection fails, the Kudu server runs as if it
were configured with the `system` time source if the OS/platform supports the
`get_ntptime()` API. Finally, the catch-all case is `system_unsync` for the
time source. As already mentioned, the `system_unsync` time source is targeted
for development-only platforms or single-node-runs-it-all proof-of-concept
Kudu clusters.
The `kudu cluster ksck` CLI utility reports the configured and the effective
time source for every Kudu master and tablet server in a cluster. The list of
the NTP servers for the built-in client is reported as well when the effective
time source is `builtin`. The utility is also able to show the difference in
settings of the related time source flags and warn operators if a discrepancy
is detected. In addition, the information on the configured and effective time
source is reported by the embedded Web server in the `Time Source` panel at
the `/config` page.
NOTE: Changing the value of the `--time_source` flag implies restarting a Kudu
server. Keep the time source the same for all master and tablet servers in
a Kudu cluster. If using the built-in NTP Kudu client, make sure to use
the same list of reference NTP servers for every Kudu server in a cluster.
[[directory_configuration]]
=== Directory Configurations
Every Kudu node requires the specification of directory flags. The
`--fs_wal_dir` configuration indicates where Kudu will place its write-ahead
logs. The `--fs_metadata_dir` configuration indicates where Kudu will place
metadata for each tablet. It is recommended, although not necessary, that these
directories be placed on a high-performance drives with high bandwidth and low
latency, e.g. solid-state drives. If `--fs_metadata_dir` is not specified,
metadata will be placed in the directory specified by `--fs_wal_dir`. Since
a Kudu node cannot tolerate the loss of its WAL or metadata directories, it
may be wise to mirror the drives containing these directories in order to
make recovering from a drive failure easier; however, mirroring may increase
the latency of Kudu writes.
The `--fs_data_dirs` configuration indicates where Kudu will write its data
blocks. This is a comma-separated list of directories; if multiple values are
specified, data will be striped across the directories. If not specified, data
blocks will be placed in the directory specified by `--fs_wal_dir`. Note that
while a single data directory backed by a RAID-0 array will outperform a single
data directory backed by a single storage device, it is better to let Kudu
manage its own striping over multiple devices rather than delegating the
striping to a RAID-0 array.
Additionally, `--fs_wal_dir` and `--fs_metadata_dir` may be the same as _one
of_ the directories listed in `--fs_data_dirs`, but must not be sub-directories
of any of them.
WARNING: Each directory specified by a configuration flag on a given machine
should be used by at most one Kudu process. If multiple Kudu processes on the
same machine are configured to use the same directory, Kudu may refuse to start
up.
WARNING: Once `--fs_data_dirs` is set, extra tooling is required to change it.
For more details, see the link:administration.html#change_dir_config[Kudu
Administration docs].
NOTE: The `--fs_wal_dir` and `--fs_metadata_dir` configurations can be changed,
provided the contents of the directories are also moved to match the flags.
=== Configuring the Kudu Master
To see all available configuration options for the `kudu-master` executable, run it
with the `--help` option:
----
$ kudu-master --help
----
[cols="m,d,m,d"]
.Supported Configuration Flags for Kudu Masters
|===
| Flag | Valid Options | Default | Description
|--master_addresses | string | localhost | Comma-separated list of all the RPC
addresses for Master consensus-configuration. If not specified, assumes a standalone Master.
|--fs_data_dirs | string | | List of directories where the Master will place its data blocks.
|--fs_metadata_dir | string | | The directory where the Master will place its tablet metadata.
|--fs_wal_dir | string | | The directory where the Master will place its write-ahead logs.
|--log_dir | string | /tmp | The directory to store Master log files.
|===
For the full list of flags for masters, see the
link:configuration_reference.html#kudu-master_supported[Kudu Master Configuration Reference].
=== Configuring Tablet Servers
To see all available configuration options for the `kudu-tserver` executable,
run it with the `--help` option:
----
$ kudu-tserver --help
----
.Supported Configuration Flags for Kudu Tablet Servers
|===
| Flag | Valid Options | Default | Description
|--fs_data_dirs | string | | List of directories where the Tablet Server will place its data blocks.
|--fs_metadata_dir | string | | The directory where the Tablet Server will place its tablet metadata.
|--fs_wal_dir | string | | The directory where the Tablet Server will place its write-ahead logs.
|--log_dir | string | /tmp | The directory to store Tablet Server log files
|--tserver_master_addrs | string | `127.0.0.1:7051` | Comma separated
addresses of the masters which the tablet server should connect to. The masters
do not read this flag.
|--block_cache_capacity_mb | integer | 512 | Maximum amount of memory allocated to the Kudu Tablet Server's block cache.
|--memory_limit_hard_bytes | integer | 4294967296 | Maximum amount of memory a Tablet Server can consume before it starts rejecting all incoming writes.
|===
For the full list of flags for tablet servers, see the
link:configuration_reference.html#kudu-tserver_supported[Kudu Tablet Server Configuration Reference].
== Configure Kudu Tables
Kudu allows certain configurations to be set per table. To configure the behavior of a Kudu table,
you can set these configurations at table creation, or alter them via the Kudu API or Kudu command
line tool.
.Supported Configurable Properties for Kudu Tables
|===
| Configuration | Valid Options | Default | Description
| kudu.table.history_max_age_sec | integer | | Number of seconds to retain history for tablets in this table.
| kudu.table.maintenance_priority | integer | 0 | Priority level of a table for maintenance.
| kudu.table.disable_compaction | false, true | false | Whether to disable data compaction maintenance tasks for all tablets of this table.
|===
== Next Steps
- link:quickstart.html[Get Started With Kudu]
- link:developing.html[Developing Applications With Kudu]