docs/administration.adoc - kudu - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 [[administration]]
 = Apache Kudu Administration

 :author: Kudu Team
 :imagesdir: ./images
 :icons: font
 :toc: left
 :toclevels: 3
 :doctype: book
 :backend: html5
 :sectlinks:
 :experimental:

 include::top.adoc[tags=version]

 == Starting and Stopping Kudu Processes

 NOTE: These instructions are relevant only when Kudu is installed using operating system packages
 (e.g. `rpm` or `deb`).

 include::installation.adoc[tags=start_stop]

 == Kudu Web Interfaces

 Kudu tablet servers and masters expose useful operational information on a built-in web interface,

 === Kudu Master Web Interface

 Kudu master processes serve their web interface on port 8051. The interface exposes several pages
 with information about the cluster state:

 - A list of tablet servers, their host names, and the time of their last heartbeat.
 - A list of tables, including schema and tablet location information for each.
 - SQL code which you can paste into Impala Shell to add an existing table to Impala's list of known data sources.

 === Kudu Tablet Server Web Interface

 Each tablet server serves a web interface on port 8050. The interface exposes information
 about each tablet hosted on the server, its current state, and debugging information
 about maintenance background operations.

 === Common Web Interface Pages

 Both Kudu masters and tablet servers expose a common set of information via their web interfaces:

 - HTTP access to server logs.
 - an `/rpcz` endpoint which lists currently running RPCs via JSON.
 - pages giving an overview and detailed information on the memory usage of different
   components of the process.
 - information on the current set of configuration flags.
 - information on the currently running threads and their resource consumption.
 - a JSON endpoint exposing metrics about the server.
 - information on the deployed version number of the daemon.

 These interfaces are linked from the landing page of each daemon's web UI.

 == Kudu Metrics

 Kudu daemons expose a large number of metrics. Some metrics are associated with an entire
 server process, whereas others are associated with a particular tablet replica.

 === Listing available metrics

 The full set of available metrics for a Kudu server can be dumped via a special command
 line flag:

 [source,bash]
 ----
 $ kudu-tserver --dump_metrics_json
 $ kudu-master --dump_metrics_json
 ----

 This will output a large JSON document. Each metric indicates its name, label, description,
 units, and type. Because the output is JSON-formatted, this information can easily be
 parsed and fed into other tooling which collects metrics from Kudu servers.

 === Collecting metrics via HTTP

 Metrics can be collected from a server process via its HTTP interface by visiting
 `/metrics`. The output of this page is JSON for easy parsing by monitoring services.
 This endpoint accepts several `GET` parameters in its query string:

 - `/metrics?metrics=<substring1>,<substring2>,...` - limits the returned metrics to those which contain
 at least one of the provided substrings. The substrings also match entity names, so this
 may be used to collect metrics for a specific tablet.

 - `/metrics?include_schema=1` - includes metrics schema information such as unit, description,
 and label in the JSON output. This information is typically elided to save space.

 - `/metrics?compact=1` - eliminates unnecessary whitespace from the resulting JSON, which can decrease
 bandwidth when fetching this page from a remote host.

 - `/metrics?include_raw_histograms=1` - include the raw buckets and values for histogram metrics,
 enabling accurate aggregation of percentile metrics over time and across hosts.

 - `/metrics?level=info` - limits the returned metrics based on their severity level.
 The levels are ordered and lower levels include the levels above them. If no level is specified,
 `debug` is used to include all metrics. The valid values are:
    * `debug` - Metrics that are diagnostically helpful but generally not monitored
                during normal operation.
    * `info` - Generally useful metrics that operators always want to have available
               but may not be monitored under normal circumstances.
    * `warn` - Metrics which can often indicate operational oddities that may need
               more investigation.

 - `/metrics?types=type1,type2` - limits the returned metrics based on their types.

 - `/metrics?ids=id1,id2` - limits the returned metrics based on their ids.

 - `/metrics?attributes=table_name,table1,table_name,table2` - limits the returned metrics based on their attributes.

 For example:

 [source,bash]
 ----
 $ curl -s 'http://example-ts:8050/metrics?include_schema=1&metrics=connections_accepted'
 ----

 NOTE: See the `link:metrics_reference.html[metrics reference]`
 page for more information on the available metrics.

 [source,json]
 ----
 [
     {
         "type": "server",
         "id": "kudu.tabletserver",
         "attributes": {},
         "metrics": [
             {
                 "name": "rpc_connections_accepted",
                 "label": "RPC Connections Accepted",
                 "type": "counter",
                 "unit": "connections",
                 "description": "Number of incoming TCP connections made to the RPC server",
                 "value": 92
             }
         ]
     }
 ]
 ----

 [source,bash]
 ----
 $ curl -s 'http://example-ts:8050/metrics?metrics=log_append_latency'
 ----

 [source,json]
 ----
 [
     {
         "type": "tablet",
         "id": "c0ebf9fef1b847e2a83c7bd35c2056b1",
         "attributes": {
             "table_name": "lineitem",
             "partition": "hash buckets: (55), range: [(<start>), (<end>))",
             "table_id": ""
         },
         "metrics": [
             {
                 "name": "log_append_latency",
                 "total_count": 7498,
                 "min": 4,
                 "mean": 69.3649,
                 "percentile_75": 29,
                 "percentile_95": 38,
                 "percentile_99": 45,
                 "percentile_99_9": 95,
                 "percentile_99_99": 167,
                 "max": 367244,
                 "total_sum": 520098
             }
         ]
     }
 ]
 ----

 NOTE: All histograms and counters are measured since the server start time, and are not reset upon collection.

 === Diagnostics Logging

 Kudu may be configured to dump various diagnostics information to a local log file.
 The diagnostics log will be written to the same directory as the other Kudu log files, with a
 similar naming format, substituting `diagnostics` instead of a log level like `INFO`.
 After any diagnostics log file reaches 64MB uncompressed, the log will be rolled and
 the previous file will be gzip-compressed.

 Each line in the diagnostics log consists of the following components:

 * A human-readable timestamp formatted in the same fashion as the other Kudu log files.
 * The type of record. For example, a metrics record consists of the word `metrics`.
 * A machine-readable timestamp, in microseconds since the Unix epoch.
 * The record itself.

 Currently, the only type of diagnostics record is a periodic dump of the server metrics.
 Each record is encoded in compact JSON format, and the server attempts to elide any metrics
 which have not changed since the previous record. In addition, counters which have never
 been incremented are elided. Otherwise, the format of the JSON record is identical to the
 format exposed by the HTTP endpoint above.

 The frequency with which metrics are dumped to the diagnostics log is configured using the
 `--metrics_log_interval_ms` flag. By default, Kudu logs metrics every 60 seconds.

 [[rack_awareness]]
 == Rack Awareness

 As of version 1.9, Kudu supports a rack awareness feature. Kudu's ordinary
 re-replication methods ensure the availability of the cluster in the event of a
 single node failure. However, clusters can be vulnerable to correlated failures
 of multiple nodes. For example, all of the physical hosts on the same rack in
 a datacenter may become unavailable simultaneously if the top-of-rack switch
 fails. Kudu's rack awareness feature provides protection from some kinds of
 correlated failures, like the failure of a whole rack in a datacenter. Rack
 awareness increases the availability of a Kudu cluster if there are at least
 three different _locations_ defined in the cluster.

 The first element of Kudu's rack awareness feature is _location assignment_.
 When a tablet server or client registers with a master, the master assigns it a
 _location_. A location is a `/`-separated string that begins with a `/` and
 where each `/`-separated component consists of characters from the set
 `[a-zA-Z0-9_-.]`. For example, `/dc-0/rack-09` is a valid location, while
 `rack-04` and `/rack=1` are not valid locations. Thus location strings resemble
 absolute UNIX file paths where characters in directory and file names are
 restricted to the set `[a-zA-Z0-9_-.]`. Presently, Kudu does not use the
 hierarchical structure of locations, but it may in the future. Location
 assignment is done by a user-provided command, whose path should be specified
 using the `--location_mapping_cmd` master flag. The command should take a single
 argument, the IP address or hostname of a tablet server or client, and return
 the location for the tablet server or client. Make sure that all Kudu masters
 are using the same location mapping command.

 The second element of Kudu's rack awareness feature is the _placement policy_,
 which is

   Do not place a majority of replicas of a tablet on tablet servers in the same location.

 The leader master, when placing newly created replicas on tablet servers and
 when re-replicating existing tablets, will attempt to place the replicas in a
 way that complies with the placement policy. For example, in a cluster with five
 tablet servers `A`, `B`, `C`, `D`, and `E`, with respective locations `/L0`,
 `/L0`, `/L1`, `/L1`, `/L2`, to comply with the placement policy a new 3x
 replicated tablet could have its replicas placed on `A`, `C`, and `E`, but not
 on `A`, `B`, and `C`, because then the tablet would have 2/3 replicas in
 location `/L0`. As another example, if a tablet has replicas on tablet servers
 `A`, `C`, and `E`, and then `C` fails, the replacement replica must be placed on
 `D` in order to comply with the placement policy.

 It's necessary to have at least three locations defined in a Kudu cluster to
 improve its high availability with the location awareness feature. If there are
 only two or just one location defined in a Kudu cluster, any tablet will
 inevitably have a majority of its replicas placed in a single location.

 In the case where it is impossible to place replicas in a way that complies with
 the placement policy, Kudu will violate the policy and place a replica anyway.
 For example, using the setup described in the previous paragraph, if a tablet
 has replicas on tablet servers `A`, `C`, and `E`, and then `E` fails, Kudu will
 re-replicate the tablet onto one of `B` or `D`, violating the placement policy,
 rather than leaving the tablet under-replicated indefinitely. The
 `kudu cluster rebalance` tool can reestablish the placement policy if it is
 possible to do so. The `kudu cluster rebalance` tool can also be used to
 establish the placement policy on a cluster if the cluster has just been
 configured to use the rack awareness feature and existing replicas need to be
 moved to comply with the placement policy. See
 <<rebalancer_tool_with_rack_awareness,running the tablet rebalancing tool on a rack-aware cluster>>
 for more information.

 The third and final element of Kudu's rack awareness feature is the _use of
 client locations to find "nearby" servers_. As mentioned, the masters also
 assign a location to clients when they connect to the cluster. The client
 (whether Java, {cpp}, or Python) uses its own location and the locations of
 tablet servers in the cluster to prefer "nearby" replicas when scanning in
 `CLOSEST_REPLICA` mode. Clients choose replicas to scan in the following order:

 . Scan a replica on a tablet server on the same host, if there is one.
 . Scan a replica on a tablet server in the same location, if there is one.
 . Scan some replica.

 For example, using the cluster setup described above, if a client on the same
 host as tablet server `A` scans a tablet with replicas on tablet servers
 `A`, `C`, and `E` in `CLOSEST_REPLICA` mode, it will choose to scan from the
 replica on `A`, since the client and the replica on `A` are on the same host.
 If the client scans a tablet with replicas on tablet servers `B`, `C`, and `E`,
 it will choose to scan from the replica on `B`, since it is in the same
 location as the client, `/L0`. If there are multiple replicas meeting a
 criterion, one is chosen arbitrarily.

 [[backup]]
 == Backup and Restore

 [[logical_backup]]
 === Logical backup and restore

 As of Kudu 1.10.0, Kudu supports both full and incremental table backups via a
 job implemented using Apache Spark. Additionally it supports restoring tables
 from full and incremental backups via a restore job implemented using Apache Spark.

 Given the Kudu backup and restore jobs use Apache Spark, ensure Apache Spark
 is installed in your environment following the
 link:https://spark.apache.org/docs/latest/#downloading[Spark documentation].
 Additionally review the Apache Spark documentation for
 link:https://spark.apache.org/docs/latest/submitting-applications.html[Submitting Applications].

 ==== Backing up tables

 To backup one or more Kudu tables the `KuduBackup` Spark job can be used.
 The first time the job is run for a table, a full backup will be run.
 Additional runs will perform incremental backups which will only contain the
 rows that have changed since the initial full backup. A new set of full
 backups can be forced at anytime by passing the `--forceFull` flag to the
 backup job.

 The common flags that will be used when taking a backup are:

 * `--rootPath`: The root path to output backup data. Accepts any Spark-compatible path.
 ** See <<backup_directory>> for the directory structure used in the `rootPath`.
 * `--kuduMasterAddresses`: Comma-separated addresses of Kudu masters. Default: localhost
 * `<table>...`:  A list of tables to be backed up.

 Note: You can see the full list of Job options at anytime by passing the `--help` flag.

 Below is a full example of a `KuduBackup` job execution which will backup the tables
 `foo` and `bar` to the HDFS directory `kudu-backups`:

 [source,bash]
 ----
 spark-submit --class org.apache.kudu.backup.KuduBackup kudu-backup2_2.11-1.14.0.jar \
   --kuduMasterAddresses master1-host,master-2-host,master-3-host \
   --rootPath hdfs:///kudu-backups \
   foo bar
 ----

 ==== Restoring tables from Backups

 To restore one or more Kudu tables, the `KuduRestore` Spark job can be used.
 For each backed up table, the `KuduRestore` job will restore the full backup
 and each associated incremental backup until the full table state is restored.
 Restoring the full series of full and incremental backups is possible because
 the backups are linked via the `from_ms` and `to_ms` fields in the backup metadata.
 By default the restore job will create tables with the same name as the table
 that was backed up. If you want to side-load the tables without affecting the
 existing tables, you can pass `--tableSuffix` to append a suffix to each
 restored table.

 The common flags that will be used when restoring are:

 * `--rootPath`: The root path to the backup data. Accepts any Spark-compatible path.
 ** See <<backup_directory>> for the directory structure used in the `rootPath`.
 * `--kuduMasterAddresses`: Comma-separated addresses of Kudu masters. Default: `localhost`
 * `--createTables`: If set to `true`, the restore process creates the tables.
   Set to `false` if the target tables already exist.  Default: `true`.
 * `--tableSuffix`: If set, the suffix to add to the restored table names.
   Only used when `createTables` is `true`.
 * `--timestampMs`: A UNIX timestamp in milliseconds that defines the latest time
   to use when selecting restore candidates. Default: `System.currentTimeMillis()`
 * `<table>...`:  A list of tables to restore.

 Note: You can see the full list of job options at anytime by passing the `--help` flag.

 Below is a full example of a `KuduRestore` job execution which will restore the tables
 `foo` and `bar` from the HDFS directory `kudu-backups`:

 [source,bash]
 ----
 spark-submit --class org.apache.kudu.backup.KuduRestore kudu-backup2_2.11-1.14.0.jar \
   --kuduMasterAddresses master1-host,master-2-host,master-3-host \
   --rootPath hdfs:///kudu-backups \
   foo bar
 ----

 ==== Backup tools

 An additional `backup-tools` jar is available to provide some backup exploration and
 garbage collection capabilities. This jar does not use Spark directly, but instead
 only requires the Hadoop classpath to run.

 Commands:

 * `list`: Lists the backups in the rootPath.
 * `clean`: Cleans up old backup data in the rootPath.

 Note: You can see the full list of command options at anytime by passing the `--help` flag.

 Below is an example execution which will print the command options:

 [source,bash]
 ----
 java -cp $(hadoop classpath):kudu-backup-tools-1.14.0.jar org.apache.kudu.backup.KuduBackupCLI --help
 ----

 [[backup_directory]]
 ==== Backup Directory Structure

 The backup directory structure in the `rootPath` is considered an internal detail
 and could change in future versions of Kudu. Additionally the format and content
 of the data and metadata files is meant for the backup and restore process only
 and could change in future versions of Kudu. That said, understanding the structure
 of the backup `rootPath` and how it is used can be useful when working with Kudu backups.

 The backup directory structure in the `rootPath` is as follows:

 [source,bash]
 ----
 /<rootPath>/<tableId>-<tableName>/<backup-id>/
    .kudu-metadata.json
    part-*.<format>
 ----

 * `rootPath`: Can be used to distinguish separate backup groups, jobs, or concerns.
 * `tableId`: The unique internal ID of the table being backed up.
 * `tableName`: The name of the table being backed up.
 ** Note: Table names are URL encoded to prevent pathing issues.
 * `backup-id`: A way to uniquely identify/group the data for a single backup run.
 * `.kudu-metadata.json`: Contains all of the metadata to support recreating the table,
   linking backups by time, and handling data format changes.
 ** Written last so that failed backups will not have a metadata file and will not be
   considered at restore time or backup linking time.
 * `part-*.<format>`: The data files containing the tables data.
 ** Currently 1 part file per Kudu partition.
 ** Incremental backups contain an additional “RowAction” byte column at the end.
 ** Currently the only supported format/suffix is `parquet`

 ==== Troubleshooting

 ===== Generating a table list

 To generate a list of tables to backup using the `kudu table list` tool along
 with `grep` can be useful. Below is an example that will generate a list
 of all tables that start with `my_db.`:

 [source,bash]
 ----
 kudu table list <master_addresses> | grep "^my_db\.*" | tr '\n' ' '
 ----

 *Note*: This list could be saved as a part of you backup process to be used
 at restore time as well.

 ===== Spark Tuning

 In general the Spark jobs were designed to run with minimal tuning and configuration.
 You can adjust the number of executors and resources to increase parallelism and performance
 using Spark's
 link:https://spark.apache.org/docs/latest/configuration.html[configuration options].

 If your tables are super wide and your default memory allocation is fairly low, you
 may see jobs fail. To resolve this increase the Spark executor memory. A conservative
 rule of thumb is 1 GiB per 50 columns.

 If your Spark resources drastically outscale the Kudu cluster you may want to limit the
 number of concurrent tasks allowed to run on restore.

 ===== Backups on Kudu 1.9 and earlier

 If your Kudu cluster is version 1.9 or earlier you can still use the backup tool
 introduced in Kudu 1.10 to backup your tables. However, because the incremental
 backup feature requires server side changes, you are limited to full backups only.
 The process to backup tables is the same as documented above, but you will need to
 download and use the kudu-backup jar from a Kudu 1.10+ release. Before running
 the backup job you should adjust the configuration of your servers by setting
 `--tablet_history_max_age_sec=604800`. This is the new default value in Kudu 1.10+
 to ensure long running backup jobs can complete successfully and consistently.
 Additionally, when running the backup you need to pass `--forceFull` to disable
 the incremental backup feature. Now each time the job is run a full backup will be taken.

 NOTE: Taking full backups on a regular basis is far more resource and time intensive
 than incremental backups. It is recommended to upgrade to Kudu 1.10+ soon as possible.

 [[physical_backup]]
 === Physical backups of an entire node

 Kudu does not yet provide built-in physical backup and restore functionality.
 However, it is possible to create a physical backup of a Kudu node (either
 tablet server or master) and restore it later.

 WARNING: The node to be backed up must be offline during the procedure, or else
 the backed up (or restored) data will be inconsistent.

 WARNING: Certain aspects of the Kudu node (such as its hostname) are embedded in
 the on-disk data. As such, it's not yet possible to restore a physical backup of
 a node onto another machine.

 . Stop all Kudu processes in the cluster. This prevents the tablets on the
   backed up node from being rereplicated elsewhere unnecessarily.

 . If creating a backup, make a copy of the WAL, metadata, and data directories
   on each node to be backed up. It is important that this copy preserve all file
   attributes as well as sparseness.

 . If restoring from a backup, delete the existing WAL, metadata, and data
   directories, then restore the backup via move or copy. As with creating a
   backup, it is important that the restore preserve all file attributes and
   sparseness.

 . Start all Kudu processes in the cluster.

 == Common Kudu workflows

 [[migrate_to_multi_master]]
 === Migrating to Multiple Kudu Masters

 For high availability and to avoid a single point of failure, Kudu clusters should be created with
 multiple masters. Many Kudu clusters were created with just a single master, either for simplicity
 or because Kudu multi-master support was still experimental at the time. This workflow demonstrates
 how to migrate to a multi-master configuration. It can also be used to migrate from two masters to
 three, with straightforward modifications. Note that the number of masters must be odd.

 NOTE: From Kudu version 1.16.0 onwards, new Kudu Masters automatically attempt to contact other
 masters, determine whether a quorum has already been established, and if so, attempt to join the
 quorum. The below steps do not need to be performed in these versions.

 NOTE: From Kudu version 1.15.0 onwards, the `kudu master add` command has been added that simplifies
 the orchestration to migrate an existing Kudu cluster to multiple masters.

 WARNING: An even number of masters doesn't provide any benefit over having one fewer masters.  This
 guide should always be used for migrating to three masters.

 WARNING: All of the command line steps below should be executed as the Kudu UNIX user. The example
 commands assume the Kudu Unix user is `kudu`, which is typical.

 WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If
 using vendor-specific tools the workflow also presupposes familiarity with
 it and the vendor's instructions should be used instead as details may differ.

 ==== Prepare for the migration

 . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
   will be unavailable.

 . Decide how many masters to use. The number of masters should be odd. Three or five node master
   configurations are recommended; they can tolerate one or two failures respectively.

 . Perform the following preparatory steps for the existing masters:
 * If migrating from a single master to multiple masters, ensure `--master_addresses` is specified
 for a single master configuration as it's required to migrate to multiple masters. This can be
 checked using the `kudu master get_flags` command.
 If not specified, supply `--master_addresses=<hostname>:<port>` to master's configuration
 and restart the single master.

 * Optional: configure a DNS alias for the master. The alias is a DNS CNAME
 record.
 +
 WARNING: Without DNS aliases it is not possible to recover from permanent
 master failures without restarting the tablet servers in the cluster
 to pick up the replacement master node at different hostname. It is highly
 recommended to use DNS aliases for Kudu master nodes to avoid that.
 +
 . Perform the following preparatory steps for each new master:
 * Choose an unused machine in the cluster. The master generates very little load
   so it can be collocated with other data services or load-generating processes,
   though not with another Kudu master from the same configuration.
 * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and
   `kudu-master` packages should be installed), or via some other means.
 * Choose and record the directory where the master's data will live.
 * Choose and record the port the master should use for RPCs.

 [[perform-the-migration]]
 ==== Perform the migration
 From version 1.15.0, a new `kudu master add` CLI command has been added that orchestrates migration
 to multiple masters in an existing Kudu cluster.

 The procedure doesn't require stopping all the Kudu processes in the entire cluster but once the
 migration procedure is complete, all the Kudu processes must be restarted to
 incorporate the newly added master which can be done without incurring downtime as mentioned in
 the steps below.

 The procedure supports adding only one master at a time. In order to add multiple masters follow
 the same procedure again for the next new master.

 . On the new master host (not on any of the existing masters), run the `kudu master add` command
 to add the master. Look for any success or error messages on the console or the new master log file.
 The command is designed to be idempotent so in case of an error after the issue mentioned in the
 error messages is fixed, run the same command again to make forward progress. After the completion
 of the procedure irrespective of whether the procedure is successful, the new master is shutdown.
 The example below adds `master-2` to existing Kudu cluster with `master-1`.
 +
 WARNING: If your Kudu cluster is secure, in addition to running as the Kudu UNIX user, you must
 authenticate as the Kudu service user prior to running this command.
 +
 [source, bash]
 ----
 $ sudo -u kudu kudu master add master-1 master-2 --fs_wal_dir=/data/kudu/master/wal \
 --fs_data_dirs=/data/kudu/master/data
 ----
 +
 . Modify the value of the `master_addresses` configuration parameter for existing masters only
 as the new master is already configured with the updated `master_addresses`.
 The new value must be a comma-separated list of all of the masters.
 Each entry is a string of the form `<hostname>:<port>`
 hostname:: master's previously recorded hostname or alias
 port:: master's previously recorded RPC port number

 . Restart the existing masters one by one.
 . Start the new master.
 . Repeat the steps above to add the desired number of masters into the cluster
 (for example, in case of adding two extra masters `master-2` and `master-3`,
 the sequence of those steps should be run for `master-2` and then for
 `master-3`).
 . *After adding all the desired masters into the cluster*, modify the
 value of the `tserver_master_addrs` configuration parameter for each tablet
 server. The new value must be a comma-separated list of masters where each
 entry is a string of the form `<hostname>:<port>`
 hostname:: master's hostname
 port:: master's RPC port number
 . Restart all the tablet servers to pick up the new masters' configuration.
 . If you have Kudu tables that are accessed from Impala, update the master
 addresses in the Apache Hive Metastore (HMS) database.
 * The following is an example SQL statement you should run manually in the
 underlying database that provides the storage for HMS (e.g., PostgreSQL)
 after migrating from one to three masters in a Kudu cluster:
 +
 [source,sql]
 ----
 UPDATE TABLE_PARAMS
 SET PARAM_VALUE =
   'master-1.example.com,master-2.example.com,master-3.example.com'
 WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 'master-1.example.com';
 ----
 +
 * Alternatively, instead of running a single query in the HMS database,
 you can run the following query for every Kudu table in `impala-shell`:
 +
 [source,sql]
 ----
 ALTER TABLE <table_name>
 SET TBLPROPERTIES
 ('kudu.master_addresses' = 'master-1.example.com,master-2.example.com,master-3.example.com');
 ----
 +
 . In `impala-shell`, run:
 +
 [source,bash]
 ----
 INVALIDATE METADATA;
 ----

 ==== Verify the migration was successful
 To verify that all masters are working properly, perform the following sanity checks:

 * Using a browser, visit each master's web UI. Look at the `/masters` page. All the masters should
   be listed there with one master in the LEADER role and the others in the FOLLOWER role. The
   contents of `/masters` on each master should be the same.

 * Run a Kudu system check (ksck) on the cluster using the `kudu` command line
   tool. See <<ksck>> for more details.

 === Recovering from a dead Kudu Master in a Multi-Master Deployment

 Kudu multi-master deployments function normally in the event of a master loss. However, it is
 important to replace the dead master; otherwise a second failure may lead to a loss of availability,
 depending on the number of available masters. This workflow describes how to replace the dead
 master.

 WARNING: Replacing a master created without DNS aliases requires an unavailability window
 when tablet servers are restarted to pick up the replacement master at different hostname.
 See the <<migrate_to_multi_master,multi-master migration workflow>> for more details on deploying
 with DNS aliases.

 WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If
 using vendor-specific tools the workflow also presupposes familiarity with
 it and the vendor's instructions should be used instead as details may differ.

 WARNING: All of the command line steps below should be executed as the Kudu UNIX user, typically
 `kudu`.

 ==== Prepare for the recovery

 . If the deployment was configured without DNS aliases perform the following steps:
 * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
   will be unavailable.
 * Shut down all Kudu tablet server processes in the cluster.

 . Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from
   accidentally restarting; this can be quite dangerous for the cluster post-recovery.

 . Choose an unused machine in the cluster where the new master will live. The master generates very
   little load, so it can be collocated with other data services or load-generating processes, though
   not with another Kudu master from the same configuration.
   The rest of this workflow will refer to this master as the "replacement" master.

 . Perform the following preparatory steps for the replacement master:
 * If using the same dead master as the replacement master then delete the master's directories.
 * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and
 `kudu-master` packages should be installed), or via some other means.
 * Choose and record the directory where the master's data will live.

 ==== Perform the recovery
 . Remove the dead master from the Raft configuration of the master using the `kudu master remove`
 command. In the example below, dead master `master-2` is being recovered.
 +
 [source,bash]
 ----
 $ sudo -u kudu kudu master remove master-1,master-2 master-2
 ----
 +
 . On the replacement master host, add the replacement master to the cluster using
 `kudu master add` command. Look for any success or error messages on the console or the replacement
 master log file. The command is designed to be idempotent so in case of an error after the issue
 mentioned in the error messages is fixed, run the same command again to make forward progress.
 After the completion of the procedure irrespective of whether the procedure is successful,
 the replacement master is shutdown. In the example below, replacement master `master-2` is used.
 In case DNS alias is not being used, use the hostname of the replacement master.
 +
 [source, bash]
 ----
 $ sudo -u kudu kudu master add master-1 master-2 --fs_wal_dir=/data/kudu/master/wal \
 --fs_data_dirs=/data/kudu/master/data
 ----
 +

 . If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead master to point
   at the replacement master.

 . If the cluster was set up without DNS aliases, perform the following steps:
 .. Modify the value of the `master_addresses` configuration parameter for each live master
 removing the dead master and substituting it with the replacement master.
 The new value must be a comma-separated list of masters where each entry is a string of the form
 `<hostname>:<port>`
 hostname:: master's previously recorded hostname or alias
 port:: master's previously recorded RPC port number
 .. Restart the remaining live masters.

 . Start the replacement master.

 . If the cluster was set up without DNS aliases, follow the steps below for tablet servers:
 .. Modify the value of the `tserver_master_addrs` configuration parameter for each tablet server
 removing the dead master and substituting it with the replacement master.
 The new value must be a comma-separated list of masters where each entry is a string of the form
 `<hostname>:<port>`
 hostname:: master's previously recorded hostname or alias
 port:: master's previously recorded RPC port number

 .. Restart all the tablet servers.

 Congratulations, the dead master has been replaced! To verify that all masters are working properly,
 consider performing the following sanity checks:

 * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should
   be listed there with one master in the LEADER role and the others in the FOLLOWER role. The
   contents of /masters on each master should be the same.

 * Run a Kudu system check (ksck) on the cluster using the `kudu` command line
   tool. See <<ksck>> for more details.

 === Removing Kudu Masters from a Multi-Master Deployment

 In the event that a multi-master deployment has been overallocated nodes, the following steps should
 be taken to remove the unwanted masters.

 WARNING: In planning the new multi-master configuration, keep in mind that the number of masters
 should be odd and that three or five node master configurations are recommended.

 WARNING: Dropping the number of masters below the number of masters currently needed for a Raft
 majority can incur data loss. To mitigate this, ensure that the leader master is not removed during
 this process.

 ==== Prepare for the removal

 . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
 will be unavailable.

 . Identify the UUID and RPC address current leader of the multi-master deployment by visiting the
 `/masters` page of any master's web UI. This master must not be removed during this process; its
 removal may result in severe data loss.

 . Stop the unwanted Kudu master processes.

 ==== Perform the removal

 . Perform the Raft configuration change. Run the `kudu master remove` tool.
 Only a single master can be removed at a time. If multiple masters need to be removed, run the
 tool multiple times. In the example below, `master-2` is being removed from a Kudu cluster with two
 masters `master-1,master-2`.
 +
 [source,bash]
 ----
 $ sudo -u kudu kudu master remove master-1,master-2 master-2
 ----
 +

 . Remove the data directories and WAL directory on the unwanted masters. This is a precaution to
 ensure that they cannot start up again and interfere with the new multi-master deployment.

 . Modify the value of the `master_addresses` configuration parameter for the masters of the new
 multi-master deployment.

 . Restart all the masters that were not removed.

 . Modify the value of the `tserver_master_addrs` configuration parameter for the tablet servers to
 remove any unwanted masters.

 . Restart all the tablet servers.

 ==== Verify the migration was successful

 To verify that all masters are working properly, perform the following sanity checks:

 * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should
   be listed there with one master in the LEADER role and the others in the FOLLOWER role. The
   contents of /masters on each master should be the same.

 * Run a Kudu system check (ksck) on the cluster using the `kudu` command line
   tool. See <<ksck>> for more details.

 === Changing the master hostnames

 To prevent long maintenance windows when replacing dead masters, DNS aliases should be used. If the
 cluster was set up without aliases, changing the host names can be done by following the below
 steps.

 ==== Prepare for the hostname change

 . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster
 will be unavailable.

 . Note the UUID and RPC address of every master by visiting the `/masters` page of any master's web
 UI.

 . Stop all the Kudu processes in the entire cluster.

 . Set up the new hostnames to point to the masters and verify all servers and clients properly
 resolve them.

 ==== Perform the hostname change

 . Rewrite each master’s Raft configuration with the following command, executed on all master hosts:
 ----
 $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 00000000000000000000000000000000 <all_masters>
 ----

 For example:

 ----
 $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:new-master-name-1:7051 f5624e05f40649b79a757629a69d061e:new-master-name-2:7051 988d8ac6530f426cbe180be5ba52033d:new-master-name-3:7051
 ----

 . Change the masters' gflagfile so the `master_addresses` parameter reflects the new hostnames.

 . Change the `tserver_master_addrs` parameter in the tablet servers' gflagfiles to the new
 hostnames.

 . Start up the masters.

 . To verify that all masters are working properly, perform the following sanity checks:

 .. Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should
   be listed there with one master in the LEADER role and the others in the FOLLOWER role. The
   contents of /masters on each master should be the same.

 .. Run the below command to verify all masters are up and listening. The UUIDs
 should be the same and belong to the same master as before the hostname change:
 +
 ----
 $ sudo -u kudu kudu master list new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051
 ----

 . Start all of the tablet servers.

 . Run a Kudu system check (ksck) on the cluster using the `kudu` command line
 tool. See <<ksck>> for more details. After startup, some tablets may be
 unavailable as it takes some time to initialize all of them.

 . If you have Kudu tables that are accessed from Impala, update the HMS
 database manually in the underlying database that provides the storage for HMS.

 .. The following is an example SQL statement you should run in the HMS database:
 +
 [source,sql]
 ----
 UPDATE TABLE_PARAMS
 SET PARAM_VALUE =
   'new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051'
 WHERE PARAM_KEY = 'kudu.master_addresses'
 AND PARAM_VALUE = 'master-1:7051,master-2:7051,master-3:7051';
 ----
 +
 .. In `impala-shell`, run:
 +
 [source,bash]
 ----
 INVALIDATE METADATA;
 ----
 +
 .. Verify updating the metadata worked by running a simple `SELECT` query on a
 Kudu-backed Impala table.

 [[adding_tablet_servers]]
 === Best Practices When Adding New Tablet Servers

 A common workflow when administering a Kudu cluster is adding additional tablet
 server instances, in an effort to increase storage capacity, decrease load or
 utilization on individual hosts, increase compute power, etc.

 By default, any newly added tablet servers will not be utilized immediately
 after their addition to the cluster. Instead, newly added tablet servers will
 only be utilized when new tablets are created or when existing tablets need to
 be replicated, which can lead to imbalanced nodes. It's recommended to run
 the rebalancer CLI tool just after adding a new tablet server into the cluster,
 as described in the enumerated steps below.

 Avoid placing multiple tablet servers on a single node. Doing so
 nullifies the point of increasing the overall storage capacity of a Kudu
 cluster and increases the likelihood of tablet unavailability when a single
 node fails (the latter drawback is not applicable if the cluster is properly
 configured to use the
 link:https://kudu.apache.org/docs/administration.html#rack_awareness[location
 awareness] feature).

 To add additional tablet servers to an existing cluster, the
 following steps can be taken to ensure tablet replicas are uniformly
 distributed across the cluster:

 1. Ensure that Kudu is installed on the new machines being added to the
 cluster, and that the new instances have been
 link:https://kudu.apache.org/docs/configuration.html#_configuring_tablet_servers[
 correctly configured] to point to the pre-existing cluster. Then, start up
 the new tablet server instances.
 2. Verify that the new instances check in with the Kudu Master(s)
 successfully. A quick method for verifying they've successfully checked in
 with the existing Master instances is to view the Kudu Master WebUI,
 specifically the `/tablet-servers` section, and validate that the newly
 added instances are registered, and heartbeating.
 3. Once the tablet server(s) are successfully online and healthy, follow
 the steps to run the
 link:https://kudu.apache.org/docs/administration.html#rebalancer_tool[
 rebalancing tool] which will spread existing tablet replicas to the newly added
 tablet servers.
 4. After the rebalancer tool has completed, or even during its execution,
 you can check on the health of the cluster using the `ksck` command-line utility
 (see <<ksck>> for more details).

 [[ksck]]
 === Checking Cluster Health with `ksck`

 The `kudu` CLI includes a tool named `ksck` that can be used for gathering
 information about the state of a Kudu cluster, including checking its health.
 `ksck` will identify issues such as under-replicated tablets, unreachable
 tablet servers, or tablets without a leader.

 `ksck` should be run from the command line as the Kudu admin user, and requires
 the full list of master addresses to be specified:

 [source,bash]
 ----
 $ sudo -u kudu kudu cluster ksck master-01.example.com,master-02.example.com,master-03.example.com
 ----

 To see a full list of the options available with `ksck`, use the `--help` flag.
 If the cluster is healthy, `ksck` will print information about the cluster, a
 success message, and return a zero (success) exit status.

 ----
 Master Summary
                UUID               |       Address         | Status
 ----------------------------------+-----------------------+---------
  a811c07b99394df799e6650e7310f282 | master-01.example.com | HEALTHY
  b579355eeeea446e998606bcb7e87844 | master-02.example.com | HEALTHY
  cfdcc8592711485fad32ec4eea4fbfcd | master-02.example.com | HEALTHY

 Tablet Server Summary
                UUID               |        Address         | Status
 ----------------------------------+------------------------+---------
  a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY
  e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY
  e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | HEALTHY

 Version Summary
  Version |      Servers
 ---------+-------------------------
   1.7.1  | all 6 server(s) checked

 Summary by table
    Name   | RF | Status  | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable
 ----------+----+---------+---------------+---------+------------+------------------+-------------
  my_table | 3  | HEALTHY | 8             | 8       | 0          | 0                | 0

                 | Total Count
 ----------------+-------------
  Masters        | 3
  Tablet Servers | 3
  Tables         | 1
  Tablets        | 8
  Replicas       | 24
 OK
 ----

 If the cluster is unhealthy, for instance if a tablet server process has
 stopped, `ksck` will report the issue(s) and return a non-zero exit status, as
 shown in the abbreviated snippet of `ksck` output below:

 ----
 Tablet Server Summary
                UUID               |        Address         |   Status
 ----------------------------------+------------------------+-------------
  a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY
  e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY
  e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | UNAVAILABLE
 Error from 127.0.0.1:7150: Network error: could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7150: connect: Connection refused (error 61) (UNAVAILABLE)

 ... (full output elided)

 ==================
 Errors:
 ==================
 Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors
 Corruption: table consistency check error: 1 out of 1 table(s) are not healthy

 FAILED
 Runtime error: ksck discovered errors
 ----

 To verify data integrity, the optional `--checksum_scan` flag can be set, which
 will ensure the cluster has consistent data by scanning each tablet replica and
 comparing results. The `--tables` or `--tablets` flags can be used to limit the
 scope of the checksum scan to specific tables or tablets, respectively. For
 example, checking data integrity on the `my_table` table can be done with the
 following command:

 [source,bash]
 ----
 $ sudo -u kudu kudu cluster ksck --checksum_scan --tables my_table master-01.example.com,master-02.example.com,master-03.example.com
 ----

 By default, `ksck` will attempt to use a snapshot scan of the table, so the
 checksum scan can be done while writes continue.

 Finally, `ksck` also supports output in JSON format using the `--ksck_format`
 flag. JSON output contains the same information as the plain text output, but
 in a format that can be used by other tools. See `kudu cluster ksck --help` for
 more information.

 [[change_dir_config]]
 === Changing Directory Configurations

 For higher read parallelism and larger volumes of storage per server, users may
 want to configure servers to store data in multiple directories on different
 devices. Users can add or remove data directories to an existing master or
 tablet server by updating the `--fs_data_dirs` gflag configuration and
 restarting the server. Data is striped across data directories, and when a new
 data directory is added, new data will be striped across the union of the old
 and new directories.

 WARNING: Removing a data directory from `--fs_data_dirs` may result in failed tablet
 replicas in cases where there were data blocks in the directory that was
 removed. Use `ksck` to ensure the cluster can fully recover from the directory
 removal before moving onto another server.

 WARNING: In versions of Kudu below 1.12, Kudu requires that the `kudu fs
 update_dirs` tool be run before restarting with a different set of data
 directories. Such versions will fail to start if not run.

 If on a Kudu version below 1.12, once a server is started, users must go
 through the below steps to change the directory configuration:

 NOTE: Unless the `--force` flag is specified, Kudu will not allow for the
 removal of a directory across which tablets are configured to spread data. If
 `--force` is specified, all tablets configured to use that directory will fail
 upon starting up and be replicated elsewhere.

 NOTE: If the link:configuration.html#directory_configuration[metadata
 directory] overlaps with a data directory, as was the default prior to Kudu
 1.7, or if a non-default metadata directory is configured, the
 `--fs_metadata_dir` configuration must be specified when running the `kudu fs
 update_dirs` tool.

 NOTE: Only new tablet replicas (i.e. brand new tablets' replicas and replicas
 that are copied to the server for high availability) will use the new
 directory. Existing tablet replicas on the server will not be rebalanced across
 the new directory.

 WARNING: All of the command line steps below should be executed as the Kudu
 UNIX user, typically `kudu`.

 . Use `ksck` to ensure the cluster is healthy, and establish a
   <<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance
   window>> to bring the tablet server offline.

 . Run the tool with the desired directory configuration flags. For example, if a
   cluster was set up with `--fs_wal_dir=/wals`, `--fs_metadata_dir=/meta`, and
   `--fs_data_dirs=/data/1,/data/2,/data/3`, and `/data/3` is to be removed (e.g.
   due to a disk error), run the command:

 +
 [source,bash]
 ----
 $ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_metadata_dir=/meta --fs_data_dirs=/data/1,/data/2
 ----
 +

 . Modify the value of the `--fs_data_dirs` flag for the updated server. If using
   CM, make sure to only update the configurations of the updated server, rather
   than of the entire Kudu service.

 . Once complete, the server process can be started. When Kudu is installed using
   system packages, `service` is typically used:

 +
 [source,bash]
 ----
 $ sudo service kudu-tserver start
 ----
 +

 . Use `ksck` to ensure Kudu returns to a healthy state before resuming normal
   operation.


 [[disk_failure_recovery]]
 === Recovering from Disk Failure
 Kudu nodes can only survive failures of disks on which certain Kudu directories
 are mounted. For more information about the different Kudu directory types, see
 the section on link:configuration.html#directory_configuration[Kudu Directory
 Configurations]. Below describes this behavior across different Apache Kudu
 releases.

 [[disk_failure_behavior]]
 .Kudu Disk Failure Behavior
 [cols="<,<,<",options="header"]
 |===
 | Node Type | Kudu Directory Type | Kudu Releases that Crash on Disk Failure
 | Master | All | All
 | Tablet Server | Directory containing WALs | All
 | Tablet Server | Directory containing tablet metadata | All
 | Tablet Server | Directory containing data blocks only | Pre-1.6.0
 |===

 When a disk failure occurs that does not lead to a crash, Kudu will stop using
 the affected directory, shut down tablets with blocks on the affected
 directories, and automatically re-replicate the affected tablets to other
 tablet servers. The affected server will remain alive and print messages to the
 log indicating the disk failure, for example:

 ----
 E1205 19:06:24.163748 27115 data_dirs.cc:1011] Directory /data/8/kudu/data marked as failed
 E1205 19:06:30.324795 27064 log_block_manager.cc:1822] Not using report from /data/8/kudu/data: IO error: Could not open container 0a6283cab82d4e75848f49772d2638fe: /data/8/kudu/data/0a6283cab82d4e75848f49772d2638fe.metadata: Read-only file system (error 30)
 E1205 19:06:33.564638 27220 ts_tablet_manager.cc:946] T 4957808439314e0d97795c1394348d80 P 70f7ee61ead54b1885d819f354eb3405: aborting tablet bootstrap: tablet has data in a failed directory
 ----

 While in this state, the affected node will avoid using the failed disk,
 leading to lower storage volume and reduced read parallelism. The administrator
 can remove the failed directory from the `--fs_data_dirs` gflag to avoid seeing
 these errors.

 WARNING: In versions of Kudu below 1.12, in order to start Kudu with a
 different set of directories, the administrator should schedule a brief window
 to <<change_dir_config,update the node's directory configuration>>. Kudu will
 fail to start otherwise.

 When the disk is repaired, remounted, and ready to be reused by Kudu, take the
 following steps:

 . Make sure that the Kudu portion of the disk is completely empty.
 . Stop the tablet server.
 . Update the `--fs_data_dirs` gflag to add `/data/3`, potentially using the
   `update_dirs` tool if on a version of Kudu that is below 1.12:
 +
 [source,bash]
 ----
 $ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_data_dirs=/data/1,/data/2,/data/3
 ----
 +

 . Start the tablet server.
 . Run `ksck` to verify cluster health.
 +
 [source,bash]
 ----
 sudo -u kudu kudu cluster ksck master-01.example.com
 ----
 +


 Note that existing tablets will not stripe to the restored disk, but any new tablets
 will stripe to the restored disk.

 [[disk_full_recovery]]
 === Recovering from Full Disks
 By default, Kudu reserves a small amount of space (1% by capacity) in its
 directories; Kudu considers a disk full if there is less free space available
 than the reservation. Kudu nodes can only tolerate running out of space on disks
 on which certain Kudu directories are mounted. For more information about the
 different Kudu directory types, see
 link:configuration.html#directory_configuration[Kudu Directory Configurations].
 The table below describes this behavior for each type of directory. The behavior
 is uniform across masters and tablet servers.
 [[disk_full_behavior]]
 .Kudu Full Disk Behavior
 [options="header"]
 |===
 | Kudu Directory Type | Crash on a Full Disk?
 | Directory containing WALs | Yes
 | Directory containing tablet metadata | Yes
 | Directory containing data blocks only | No (see below)
 |===

 Prior to Kudu 1.7.0, Kudu stripes tablet data across all directories, and will
 avoid writing data to full directories. Kudu will crash if all data directories
 are full.

 In 1.7.0 and later, new tablets are assigned a disk group consisting of
 `--fs_target_data_dirs_per_tablet` data dirs (default 3). If Kudu is not
 configured with enough data directories for a full disk group, all data
 directories are used. When a data directory is full, Kudu will stop writing new
 data to it and each tablet that uses that data directory will write new data to
 other data directories within its group. If all data directories for a tablet
 are full, Kudu will crash. Periodically, Kudu will check if full data
 directories are still full, and will resume writing to those data directories
 if space has become available.

 If Kudu does crash because its data directories are full, freeing space on the
 full directories will allow the affected daemon to restart and resume writing.
 Note that it may be possible for Kudu to free some space by running

 [source,bash]
 ----
 $ sudo -u kudu kudu fs check --repair
 ----

 but this command may also fail if there is too little space left.

 It's also possible to allocate additional data directories to Kudu in order to
 increase the overall amount of storage available. See the documentation on
 <<change_dir_config,updating a node's directory configuration>> for more
 information. Note that existing tablets will not use new data directories, so
 adding a new data directory does not resolve issues with full disks.

 [[tablet_majority_down_recovery]]
 === Bringing a tablet that has lost a majority of replicas back online

 If a tablet has permanently lost a majority of its replicas, it cannot recover
 automatically and operator intervention is required. If the tablet servers
 hosting a majority of the replicas are down (i.e. ones reported as "TS
 unavailable" by ksck), they should be recovered instead if possible.

 WARNING: The steps below may cause recent edits to the tablet to be lost,
 potentially resulting in permanent data loss. Only attempt the procedure below
 if it is impossible to bring a majority back online.

 Suppose a tablet has lost a majority of its replicas. The first step in
 diagnosing and fixing the problem is to examine the tablet's state using ksck:

 [source,bash]
 ----
 $ sudo -u kudu kudu cluster ksck --tablets=e822cab6c0584bc0858219d1539a17e6 master-00,master-01,master-02
 Connected to the Master
 Fetched info from all 5 Tablet Servers
 Tablet e822cab6c0584bc0858219d1539a17e6 of table 'my_table' is unavailable: 2 replica(s) not RUNNING
   638a20403e3e4ae3b55d4d07d920e6de (tserver-00:7150): RUNNING
   9a56fa85a38a4edc99c6229cba68aeaa (tserver-01:7150): bad state
     State:       FAILED
     Data state:  TABLET_DATA_READY
     Last status: <failure message>
   c311fef7708a4cf9bb11a3e4cbcaab8c (tserver-02:7150): bad state
     State:       FAILED
     Data state:  TABLET_DATA_READY
     Last status: <failure message>
 ----

 This output shows that, for tablet `e822cab6c0584bc0858219d1539a17e6`, the two
 tablet replicas on `tserver-01` and `tserver-02` failed. The remaining replica
 is not the leader, so the leader replica failed as well. This means the chance
 of data loss is higher since the remaining replica on `tserver-00` may have
 been lagging. In general, to accept the potential data loss and restore the
 tablet from the remaining replicas, divide the tablet replicas into two groups:

 1. Healthy replicas: Those in `RUNNING` state as reported by ksck
 2. Unhealthy replicas

 For example, in the above ksck output, the replica on tablet server `tserver-00`
 is healthy, while the replicas on `tserver-01` and `tserver-02` are unhealthy.
 On each tablet server with a healthy replica, alter the consensus configuration
 to remove unhealthy replicas. In the typical case of 1 out of 3 surviving
 replicas, there will be only one healthy replica, so the consensus configuration
 will be rewritten to include only the healthy replica.

 [source,bash]
 ----
 $ sudo -u kudu kudu remote_replica unsafe_change_config tserver-00:7150 <tablet-id> <tserver-00-uuid>
 ----

 where `<tablet-id>` is `e822cab6c0584bc0858219d1539a17e6` and
 `<tserver-00-uuid>` is the uuid of `tserver-00`,
 `638a20403e3e4ae3b55d4d07d920e6de`.

 Once the healthy replicas' consensus configurations have been forced to exclude
 the unhealthy replicas, the healthy replicas will be able to elect a leader.
 The tablet will become available for writes, though it will still be
 under-replicated. Shortly after the tablet becomes available, the leader master
 will notice that it is under-replicated, and will cause the tablet to
 re-replicate until the proper replication factor is restored. The unhealthy
 replicas will be tombstoned by the master, causing their remaining data to be
 deleted.

 [[rebuilding_kudu]]
 === Rebuilding a Kudu Filesystem Layout

 In the event that critical files are lost, i.e. WALs or tablet-specific
 metadata, all Kudu directories on the server must be deleted and rebuilt to
 ensure correctness. Doing so will destroy the copy of the data for each tablet
 replica hosted on the local server. Kudu will automatically re-replicate tablet
 replicas removed in this way, provided the replication factor is at least three
 and all other servers are online and healthy.

 NOTE: These steps use a tablet server as an example, but the steps are the same
 for Kudu master servers.

 WARNING: If multiple nodes need their FS layouts rebuilt, wait until all
 replicas previously hosted on each node have finished automatically
 re-replicating elsewhere before continuing. Failure to do so can result in
 permanent data loss.

 WARNING: Before proceeding, ensure the contents of the directories are backed
 up, either as a copy or in the form of other tablet replicas.

 . The first step to rebuilding a server with a new directory configuration is
   emptying all of the server's existing directories. For example, if a tablet
   server is configured with `--fs_wal_dir=/data/0/kudu-tserver-wal`,
   `--fs_metadata_dir=/data/0/kudu-tserver-meta`, and
   `--fs_data_dirs=/data/1/kudu-tserver,/data/2/kudu-tserver`, the following
   commands will remove the WAL directory's and data directories' contents:

 +
 [source,bash]
 ----
 # Note: this will delete all of the data from the local tablet server.
 $ rm -rf /data/0/kudu-tserver-wal/* /data/0/kudu-tserver-meta/* /data/1/kudu-tserver/* /data/2/kudu-tserver/*
 ----
 +

 . If using CM, update the configurations for the rebuilt server to include only
   the desired directories. Make sure to only update the configurations of servers
   to which changes were applied, rather than of the entire Kudu service.

 . After directories are deleted, the server process can be started with the new
   directory configuration. The appropriate sub-directories will be created by
   Kudu upon starting up.

 [[minimizing_cluster_disruption_during_temporary_single_ts_downtime]]
 === Minimizing cluster disruption during temporary planned downtime of a single tablet server

 If a single tablet server is brought down temporarily in a healthy cluster, all
 tablets will remain available and clients will function as normal, after
 potential short delays due to leader elections. However, if the downtime lasts
 for more than `--follower_unavailable_considered_failed_sec` (default 300)
 seconds, the tablet replicas on the down tablet server will be replaced by new
 replicas on available tablet servers. This will cause stress on the cluster
 as tablets re-replicate and, if the downtime lasts long enough, significant
 reduction in the number of replicas on the down tablet server, which would
 require the rebalancer to fix.

 To work around this, in Kudu versions from 1.11 onwards, the `kudu` CLI
 contains a tool to put tablet servers into maintenance mode. While in this
 state, the tablet server’s replicas are not re-replicated due to its downtime
 alone, though re-replication may still occur in the event that the server in
 maintenance suffers from a disk failure or if a follower replica on the tablet
 server falls too far behind its leader replica. Upon exiting maintenance,
 re-replication is triggered for any remaining under-replicated tablets.

 The `kudu tserver state enter_maintenance` and `kudu tserver state
 exit_maintenance` tools are added to orchestrate tablet server maintenance.
 The following can be run from a tablet server to put it into maintenance:

 [source,bash]
 ----
 $ TS_UUID=$(sudo -u kudu kudu fs dump uuid --fs_wal_dir=<wal_dir> --fs_data_dirs=<data_dirs>)
 $ sudo -u kudu kudu tserver state enter_maintenance <master_addresses> "$TS_UUID"
 ----

 The tablet server maintenance mode is shown in the "Tablet Servers" page of the
 Kudu leader master's web UI, and in the output of `kudu cluster ksck`.  To exit
 maintenance mode, run the following:

 [source,bash]
 ----
 $ sudo -u kudu kudu tserver state exit_maintenance <master_addresses> "$TS_UUID"
 ----

 In versions prior to 1.11, a different approach must be used to prevent
 unwanted re-replication. Increase
 `--follower_unavailable_considered_failed_sec` on all tablet servers so the
 amount of time before re-replication starts is longer than the expected
 downtime of the tablet server, including the time it takes the tablet server to
 restart and bootstrap its tablet replicas. To do this, run the following
 command for each tablet server:

 [source,bash]
 ----
 $ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <num_seconds>
 ----

 where `<num_seconds>` is the number of seconds that will encompass the downtime.
 Once the downtime is finished, reset the flag to its original value.

 ----
 $ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <original_value>
 ----

 WARNING: Be sure to reset the value of `--follower_unavailable_considered_failed_sec`
 to its original value.

 NOTE: On Kudu versions prior to 1.8, the `--force` flag must be provided in the above
 `set_flag` commands.

 [[rolling_restart]]
 === Orchestrating a rolling restart with no downtime

 As of Kudu 1.12, tooling is available to restart a cluster with no downtime. To
 perform such a "rolling restart", perform the following sequence:

 . Restart the master(s) one-by-one. If there is only a single master, this may
   cause brief interference with on-going workloads.
 . Starting with a single tablet server, put the tablet server into
   <<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance
   mode>> by using the `kudu tserver state enter_maintenance` tool.
 . Start quiescing the tablet server using the `kudu tserver quiesce start`
   tool. This will signal to Kudu to stop hosting leaders on the specified
   tablet server and to redirect new scan requests to other tablet servers.
 . Periodically run `kudu tserver quiesce start` with the
   `--error_if_not_fully_quiesced` option, until it returns success, indicating
   that all leaders have been moved away from the tablet server and all on-going
   scans have completed.
 . Restart the tablet server.
 . Periodically run `ksck` until the cluster is reported to be healthy.
 . Exit maintenance mode on the tablet server by running `kudu tserver state
   exit_maintenance`. This will allow new tablet replicas to be placed on the
   tablet server.
 . Repeat these steps for all tablet servers in the cluster.

 WARNING: If any tables in the cluster have a replication factor of 1, some
 quiescing tablet servers will never become fully quiesced, as single-replica
 tablets will not naturally relinquish leadership. If such tables exist, use the
 `kudu cluster rebalance` tool to move replicas of these tables away from the
 quiescing tablet server by specifying the `--ignored_tservers`,
 `--move_replicas_from_ignored_tservers`, and `--tables` options.

 NOTE: If running with <<rack_awareness,rack awareness>>, the above steps can be
 performed restarting multiple tablet servers within a single rack at the same
 time. Users should use `ksck` to ensure the location assignment policy is
 enforced while going through these steps, and that no more than a single
 location is restarted at the same time. At least three locations should be
 defined in the cluster to safely restart multiple tablet service within one
 location.

 [[rebalancer_tool]]
 === Running the tablet rebalancing tool

 The `kudu` CLI contains a rebalancing tool that can be used to rebalance
 tablet replicas among tablet servers. For each table, the tool attempts to
 balance the number of replicas per tablet server. It will also, without
 unbalancing any table, attempt to even out the number of replicas per tablet
 server across the cluster as a whole. The rebalancing tool should be run as the
 Kudu admin user, specifying all master addresses:

 [source,bash]
 ----
 sudo -u kudu kudu cluster rebalance master-01.example.com,master-02.example.com,master-03.example.com
 ----

 When run, the rebalancer will report on the initial tablet replica distribution
 in the cluster, log the replicas it moves, and print a final summary of the
 distribution when it terminates:

 ----
 Per-server replica distribution summary:
        Statistic       |   Value
 -----------------------+-----------
  Minimum Replica Count | 0
  Maximum Replica Count | 24
  Average Replica Count | 14.400000

 Per-table replica distribution summary:
  Replica Skew |  Value
 --------------+----------
  Minimum      | 8
  Maximum      | 8
  Average      | 8.000000

 I0613 14:18:49.905897 3002065792 rebalancer.cc:779] tablet e7ee9ade95b342a7a94649b7862b345d: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled
 I0613 14:18:49.917578 3002065792 rebalancer.cc:779] tablet 5f03944529f44626a0d6ec8b1edc566e: 6e64c4165b864cbab0e67ccd82091d60 -> ba8c22ab030346b4baa289d6d11d0809 move scheduled
 I0613 14:18:49.928683 3002065792 rebalancer.cc:779] tablet 9373fee3bfe74cec9054737371a3b15d: fab382adf72c480984c6cc868fdd5f0e -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled

 ... (full output elided)

 I0613 14:19:01.162802 3002065792 rebalancer.cc:842] tablet f4c046f18b174cc2974c65ac0bf52767: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move completed: OK

 rebalancing is complete: cluster is balanced (moved 28 replicas)
 Per-server replica distribution summary:
        Statistic       |   Value
 -----------------------+-----------
  Minimum Replica Count | 14
  Maximum Replica Count | 15
  Average Replica Count | 14.400000

 Per-table replica distribution summary:
  Replica Skew |  Value
 --------------+----------
  Minimum      | 1
  Maximum      | 1
  Average      | 1.000000
 ----

 If more details are needed in addition to the replica distribution summary,
 use the `--output_replica_distribution_details` flag. If added, the flag makes
 the tool print per-table and per-tablet server replica distribution statistics
 as well.

 Use the `--report_only` flag to get a report on table- and cluster-wide
 replica distribution statistics without starting any rebalancing activity.

 The rebalancer can also be restricted to run on a subset of the tables by
 supplying the `--tables` flag. Note that, when running on a subset of tables,
 the tool will not attempt to balance the cluster as a whole.

 The length of time rebalancing is run for can be controlled with the flag
 `--max_run_time_sec`. By default, the rebalancer will run until the cluster is
 balanced. To control the amount of resources devoted to rebalancing, modify
 the flag `--max_moves_per_server`. See `kudu cluster rebalance --help` for more.

 It's safe to stop the rebalancer tool at any time. When restarted, the
 rebalancer will continue rebalancing the cluster.

 The rebalancer requires all registered tablet servers to be up and running
 to proceed with the rebalancing process. That's to avoid possible conflicts
 and races with the automatic re-replication and keep replica placement optimal
 for current configuration of the cluster. If a tablet server becomes
 unavailable during the rebalancing session, the rebalancer will exit. As noted
 above, it's safe to restart the rebalancer after resolving the issue with
 unavailable tablet servers.

 The rebalancing tool can rebalance Kudu clusters running older versions as well,
 with some restrictions. Consult the following table for more information. In the
 table, "RF" stands for "replication factor".

 [[rebalancer_compatibility]]
 .Kudu Rebalancing Tool Compatibility
 [options="header"]
 |===
 | Version Range | Rebalances RF = 1 Tables? | Rebalances RF > 1 Tables?
 | v < 1.4.0 | No | No
 | 1.4.0 +<=+ v < 1.7.1 | No | Yes
 | v >= 1.7.1 | Yes | Yes
 |===

 If the rebalancer is running against a cluster where rebalancing replication
 factor one tables is not supported, it will rebalance all the other tables
 and the cluster as if those singly-replicated tables did not exist.

 [[rebalancer_tool_with_rack_awareness]]
 === Running the tablet rebalancing tool on a rack-aware cluster

 As detailed in the <<rack_awareness, rack awareness>> section, it's possible
 to use the `kudu cluster rebalance` tool to establish the placement policy on a
 cluster. This might be necessary when the rack awareness feature is first
 configured or when re-replication violated the placement policy. The rebalancing
 tool breaks its work into three phases:

 . The rack-aware rebalancer tries to establish the placement policy. Use the
   `--disable_policy_fixer` flag to skip this phase.
 . The rebalancer tries to balance load by location, moving tablet replicas
   between locations in an attempt to spread tablet replicas among locations
   evenly. The load of a location is measured as the total number of replicas in
   the location divided by the number of tablet servers in the location. Use the
   `--disable_cross_location_rebalancing` flag to skip this phase.
 . The rebalancer tries to balance the tablet replica distribution within each
   location, as if the location were a cluster on its own. Use the
   `--disable_intra_location_rebalancing` flag to skip this phase.

 By using the `--report_only` flag, it's also possible to check if all tablets in
 the cluster conform to the placement policy without attempting any replica
 movement.

 [[tablet_server_decommissioning]]
 === Decommissioning or Permanently Removing a Tablet Server From a Cluster

 Starting with Kudu 1.12, the Kudu rebalancer tool can be used to decommission a
 tablet server by supplying the `--ignored_tservers` and
 `--move_replicas_from_ignored_tservers` arguments.

 WARNING: Do not decommission multiple tablet servers at once. To remove
 multiple tablet servers from the cluster, follow the below instructions for
 each tablet server, ensuring that the previous tablet server is removed from
 the cluster and `ksck` is healthy before shutting down the next.

 . Ensure the cluster is in good health by using `ksck`. See <<ksck>>.
 . Put the tablet server into
   <<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance
   mode>> by using the `kudu tserver state enter_maintenance` tool.
 . Run `kudu cluster rebalance` tool, supplying the `--ignored_tservers`
   argument with the UUIDs of the tablet servers to be decommissioned, and the
   `--move_replicas_from_ignored_tservers` flag.
 . Wait for the moves to complete and for `ksck` to show the cluster in a
   healthy state.
 . The decommissioned tablet server can be brought offline.
 . To completely remove it from the cluster so `ksck` shows the cluster as
   completely healthy, restart the masters. In the case of a single master,
   this will cause cluster downtime. With multi-master, restart the masters in
   sequence to avoid cluster downtime.

 In Kudu versions that do not support the above tooling, different steps must be
 followed to decommission a tablet server:

 . Ensure the cluster is in good health using `ksck`. See <<ksck>>.
 . If the tablet server contains any replicas of tables with replication factor
   1, these replicas must be manually moved off the tablet server prior to
   shutting it down. The `kudu tablet change_config move_replica` tool can be
   used for this.
 . Shut down the tablet server. After
   `-follower_unavailable_considered_failed_sec`, which defaults to 5 minutes,
   Kudu will begin to re-replicate the tablet server's replicas to other servers.
   Wait until the process is finished. Progress can be monitored using `ksck`.
 . Once all the copies are complete, `ksck` will continue to report the tablet
   server as unavailable. The cluster will otherwise operate fine without the
   tablet server. To completely remove it from the cluster so `ksck` shows the
   cluster as completely healthy, restart the masters. In the case of a single
   master, this will cause cluster downtime. With multi-master, restart the
   masters in sequence to avoid cluster downtime.

 [[using_cluster_names_in_kudu_tool]]
 === Using cluster names in the `kudu` command line tool

 When using the `kudu` command line tool, it can be difficult to remember the
 precise list of Kudu master RPC addresses needed to communicate with a cluster,
 especially when managing multiple clusters. As an alternative, the command line
 tool can identify clusters by name. To use this functionality:

 . Create a new directory to store the Kudu configuration file.
 . Export the path to this directory in the `KUDU_CONFIG` environment variable.
 . Create a file called `kudurc` in the new directory.
 . Populate `kudurc` as follows, substituting your own cluster names and RPC
   addresses:
 +
 ----
 clusters_info:
   cluster_name1:
     master_addresses: ip1:port1,ip2:port2,ip3:port3
   cluster_name2:
     master_addresses: ip4:port4
 ----
 +
 . When using the `kudu` command line tool, replace the list of Kudu master RPC
   addresses with the cluster name, prepended with the character `@`.

   Example::
 +
 ----
 $ sudo -u kudu kudu cluster ksck @cluster_name1
 ----
 +


 NOTE: Cluster names may be used as input in any invocation of the `kudu` command
 line tool that expects a list of Kudu master RPC addresses.