| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| [[administration]] |
| = Apache Kudu Administration |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| NOTE: Kudu is easier to manage with link:http://www.cloudera.com/content/www/en-us/products/cloudera-manager.html[Cloudera Manager] |
| than in a standalone installation. See Cloudera's |
| link:http://www.cloudera.com/documentation/kudu/latest/topics/kudu_installation.html[Kudu documentation] |
| for more details about using Kudu with Cloudera Manager. |
| |
| == Starting and Stopping Kudu Processes |
| |
| NOTE: These instructions are relevant only when Kudu is installed using operating system packages |
| (e.g. `rpm` or `deb`). |
| |
| include::installation.adoc[tags=start_stop] |
| |
| == Kudu Web Interfaces |
| |
| Kudu tablet servers and masters expose useful operational information on a built-in web interface, |
| |
| === Kudu Master Web Interface |
| |
| Kudu master processes serve their web interface on port 8051. The interface exposes several pages |
| with information about the cluster state: |
| |
| - A list of tablet servers, their host names, and the time of their last heartbeat. |
| - A list of tables, including schema and tablet location information for each. |
| - SQL code which you can paste into Impala Shell to add an existing table to Impala's list of known data sources. |
| |
| === Kudu Tablet Server Web Interface |
| |
| Each tablet server serves a web interface on port 8050. The interface exposes information |
| about each tablet hosted on the server, its current state, and debugging information |
| about maintenance background operations. |
| |
| === Common Web Interface Pages |
| |
| Both Kudu masters and tablet servers expose a common set of information via their web interfaces: |
| |
| - HTTP access to server logs. |
| - an `/rpcz` endpoint which lists currently running RPCs via JSON. |
| - pages giving an overview and detailed information on the memory usage of different |
| components of the process. |
| - information on the current set of configuration flags. |
| - information on the currently running threads and their resource consumption. |
| - a JSON endpoint exposing metrics about the server. |
| - information on the deployed version number of the daemon. |
| |
| These interfaces are linked from the landing page of each daemon's web UI. |
| |
| == Kudu Metrics |
| |
| Kudu daemons expose a large number of metrics. Some metrics are associated with an entire |
| server process, whereas others are associated with a particular tablet replica. |
| |
| === Listing available metrics |
| |
| The full set of available metrics for a Kudu server can be dumped via a special command |
| line flag: |
| |
| [source,bash] |
| ---- |
| $ kudu-tserver --dump_metrics_json |
| $ kudu-master --dump_metrics_json |
| ---- |
| |
| This will output a large JSON document. Each metric indicates its name, label, description, |
| units, and type. Because the output is JSON-formatted, this information can easily be |
| parsed and fed into other tooling which collects metrics from Kudu servers. |
| |
| === Collecting metrics via HTTP |
| |
| Metrics can be collected from a server process via its HTTP interface by visiting |
| `/metrics`. The output of this page is JSON for easy parsing by monitoring services. |
| This endpoint accepts several `GET` parameters in its query string: |
| |
| - `/metrics?metrics=<substring1>,<substring2>,...` - limits the returned metrics to those which contain |
| at least one of the provided substrings. The substrings also match entity names, so this |
| may be used to collect metrics for a specific tablet. |
| |
| - `/metrics?include_schema=1` - includes metrics schema information such as unit, description, |
| and label in the JSON output. This information is typically elided to save space. |
| |
| - `/metrics?compact=1` - eliminates unnecessary whitespace from the resulting JSON, which can decrease |
| bandwidth when fetching this page from a remote host. |
| |
| - `/metrics?include_raw_histograms=1` - include the raw buckets and values for histogram metrics, |
| enabling accurate aggregation of percentile metrics over time and across hosts. |
| |
| For example: |
| |
| [source,bash] |
| ---- |
| $ curl -s 'http://example-ts:8050/metrics?include_schema=1&metrics=connections_accepted' |
| ---- |
| |
| [source,json] |
| ---- |
| [ |
| { |
| "type": "server", |
| "id": "kudu.tabletserver", |
| "attributes": {}, |
| "metrics": [ |
| { |
| "name": "rpc_connections_accepted", |
| "label": "RPC Connections Accepted", |
| "type": "counter", |
| "unit": "connections", |
| "description": "Number of incoming TCP connections made to the RPC server", |
| "value": 92 |
| } |
| ] |
| } |
| ] |
| ---- |
| |
| [source,bash] |
| ---- |
| $ curl -s 'http://example-ts:8050/metrics?metrics=log_append_latency' |
| ---- |
| |
| [source,json] |
| ---- |
| [ |
| { |
| "type": "tablet", |
| "id": "c0ebf9fef1b847e2a83c7bd35c2056b1", |
| "attributes": { |
| "table_name": "lineitem", |
| "partition": "hash buckets: (55), range: [(<start>), (<end>))", |
| "table_id": "" |
| }, |
| "metrics": [ |
| { |
| "name": "log_append_latency", |
| "total_count": 7498, |
| "min": 4, |
| "mean": 69.3649, |
| "percentile_75": 29, |
| "percentile_95": 38, |
| "percentile_99": 45, |
| "percentile_99_9": 95, |
| "percentile_99_99": 167, |
| "max": 367244, |
| "total_sum": 520098 |
| } |
| ] |
| } |
| ] |
| ---- |
| |
| NOTE: All histograms and counters are measured since the server start time, and are not reset upon collection. |
| |
| === Diagnostics Logging |
| |
| Kudu may be configured to dump various diagnostics information to a local log file. |
| The diagnostics log will be written to the same directory as the other Kudu log files, with a |
| similar naming format, substituting `diagnostics` instead of a log level like `INFO`. |
| After any diagnostics log file reaches 64MB uncompressed, the log will be rolled and |
| the previous file will be gzip-compressed. |
| |
| Each line in the diagnostics log consists of the following components: |
| |
| * A human-readable timestamp formatted in the same fashion as the other Kudu log files. |
| * The type of record. For example, a metrics record consists of the word `metrics`. |
| * A machine-readable timestamp, in microseconds since the Unix epoch. |
| * The record itself. |
| |
| Currently, the only type of diagnostics record is a periodic dump of the server metrics. |
| Each record is encoded in compact JSON format, and the server attempts to elide any metrics |
| which have not changed since the previous record. In addition, counters which have never |
| been incremented are elided. Otherwise, the format of the JSON record is identical to the |
| format exposed by the HTTP endpoint above. |
| |
| The frequency with which metrics are dumped to the diagnostics log is configured using the |
| `--metrics_log_interval_ms` flag. By default, Kudu logs metrics every 60 seconds. |
| |
| == Common Kudu workflows |
| |
| [[migrate_to_multi_master]] |
| === Migrating to Multiple Kudu Masters |
| |
| For high availability and to avoid a single point of failure, Kudu clusters should be created with |
| multiple masters. Many Kudu clusters were created with just a single master, either for simplicity |
| or because Kudu multi-master support was still experimental at the time. This workflow demonstrates |
| how to migrate to a multi-master configuration. It can also be used to migrate from two masters to |
| three, with straightforward modifications. Note that the number of masters must be odd. |
| |
| WARNING: The workflow is unsafe for adding new masters to an existing configuration that already has |
| three or more masters. Do not use it for that purpose. |
| |
| WARNING: An even number of masters doesn't provide any benefit over having one fewer masters. This |
| guide should always be used for migrating to three masters. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu UNIX user. The example |
| commands assume the Kudu Unix user is `kudu`, which is typical. |
| |
| WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If |
| using vendor-specific tools (e.g. Cloudera Manager) the workflow also presupposes familiarity with |
| it and the vendor's instructions should be used instead as details may differ. |
| |
| ==== Prepare for the migration |
| |
| . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| |
| . Decide how many masters to use. The number of masters should be odd. Three or five node master |
| configurations are recommended; they can tolerate one or two failures respectively. |
| |
| . Perform the following preparatory steps for the existing master: |
| * Identify and record the directories where the master's write-ahead log (WAL) and data live. If |
| using Kudu system packages, their default locations are /var/lib/kudu/master, but they may be |
| customized via the `fs_wal_dir` and `fs_data_dirs` configuration parameters. The commands below |
| assume that `fs_wal_dir` is /data/kudu/master/wal and `fs_data_dirs` is /data/kudu/master/data. |
| Your configuration may differ. For more information on configuring these directories, see the |
| link:configuration.html#directory_configuration[Kudu Configuration docs]. |
| * Identify and record the port the master is using for RPCs. The default port value is 7051, but it |
| may have been customized using the `rpc_bind_addresses` configuration parameter. |
| * Identify the master's UUID. It can be fetched using the following command: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null |
| ---- |
| master_data_dir:: existing master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null |
| 4aab798a69e94fab8d77069edff28ce0 |
| ---- |
| + |
| * Optional: configure a DNS alias for the master. The alias could be a DNS cname (if the machine |
| already has an A record in DNS), an A record (if the machine is only known by its IP address), |
| or an alias in /etc/hosts. The alias should be an abstract representation of the master (e.g. |
| `master-1`). |
| + |
| WARNING: Without DNS aliases it is not possible to recover from permanent master failures without |
| bringing the cluster down for maintenance, and as such, it is highly recommended. |
| + |
| . If you have Kudu tables that are accessed from Impala, you must update |
| the master addresses in the Apache Hive Metastore (HMS) database. |
| * If you set up the DNS aliases, run the following statement in `impala-shell`, |
| replacing `master-1`, `master-2`, and `master-3` with your actual aliases. |
| + |
| [source,sql] |
| ---- |
| ALTER TABLE table_name |
| SET TBLPROPERTIES |
| ('kudu.master_addresses' = 'master-1,master-2,master-3'); |
| ---- |
| + |
| * If you do not have DNS aliases set up, see Step #11 in the Performing |
| the migration section for updating HMS. |
| + |
| . Perform the following preparatory steps for each new master: |
| * Choose an unused machine in the cluster. The master generates very little load so it can be |
| colocated with other data services or load-generating processes, though not with another Kudu |
| master from the same configuration. |
| * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and |
| `kudu-master` packages should be installed), or via some other means. |
| * Choose and record the directory where the master's data will live. |
| * Choose and record the port the master should use for RPCs. |
| * Optional: configure a DNS alias for the master (e.g. `master-2`, `master-3`, etc). |
| |
| [[perform-the-migration]] |
| ==== Perform the migration |
| |
| . Stop all the Kudu processes in the entire cluster. |
| |
| . Format the data directory on each new master machine, and record the generated UUID. Use the |
| following command sequence: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null |
| ---- |
| + |
| master_data_dir:: new master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null |
| f5624e05f40649b79a757629a69d061e |
| ---- |
| |
| . If using CM, add the new Kudu master roles now, but do not start them. |
| * If using DNS aliases, override the empty value of the `Master Address` parameter for each role |
| (including the existing master role) with that master's alias. |
| * Add the port number (separated by a colon) if using a non-default RPC port value. |
| |
| . Rewrite the master's Raft configuration with the following command, executed on the existing |
| master machine: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <all_masters> |
| ---- |
| + |
| master_data_dir:: existing master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| all_masters:: space-separated list of masters, both new and existing. Each entry in the list must be |
| a string of the form `<uuid>:<hostname>:<port>` |
| uuid::: master's previously recorded UUID |
| hostname::: master's previously recorded hostname or alias |
| port::: master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:master-1:7051 f5624e05f40649b79a757629a69d061e:master-2:7051 988d8ac6530f426cbe180be5ba52033d:master-3:7051 |
| ---- |
| |
| . Modify the value of the `master_addresses` configuration parameter for both existing master and new masters. |
| The new value must be a comma-separated list of all of the masters. Each entry is a string of the form `<hostname>:<port>` |
| hostname:: master's previously recorded hostname or alias |
| port:: master's previously recorded RPC port number |
| |
| . Start the existing master. |
| |
| . Copy the master data to each new master with the following command, executed on each new master |
| machine. |
| + |
| WARNING: If your Kudu cluster is secure, in addition to running as the Kudu UNIX user, you must |
| authenticate as the Kudu service user prior to running this command. |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <existing_master> |
| ---- |
| + |
| master_data_dir:: new master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| existing_master:: RPC address of the existing master and must be a string of the form |
| `<hostname>:<port>` |
| hostname::: existing master's previously recorded hostname or alias |
| port::: existing master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-1:7051 |
| ---- |
| |
| . Start all of the new masters. |
| + |
| WARNING: Skip the next step if using CM. |
| + |
| . Modify the value of the `tserver_master_addrs` configuration parameter for each tablet server. |
| The new value must be a comma-separated list of masters where each entry is a string of the form |
| `<hostname>:<port>` |
| hostname:: master's previously recorded hostname or alias |
| port:: master's previously recorded RPC port number |
| |
| . Start all of the tablet servers. |
| . If you have Kudu tables that are accessed from Impala and you didn't set up |
| DNS aliases, update the HMS database manually in the underlying database that |
| provides the storage for HMS. |
| * The following is an example SQL statement you should run in the HMS database: |
| + |
| [source,sql] |
| ---- |
| UPDATE TABLE_PARAMS |
| SET PARAM_VALUE = |
| 'master-1.example.com,master-2.example.com,master-3.example.com' |
| WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 'old-master'; |
| ---- |
| + |
| * In `impala-shell`, run: |
| + |
| [source,bash] |
| ---- |
| INVALIDATE METADATA; |
| ---- |
| + |
| |
| ==== Verify the migration was successful |
| |
| To verify that all masters are working properly, perform the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. |
| |
| === Recovering from a dead Kudu Master in a Multi-Master Deployment |
| |
| Kudu multi-master deployments function normally in the event of a master loss. However, it is |
| important to replace the dead master; otherwise a second failure may lead to a loss of availability, |
| depending on the number of available masters. This workflow describes how to replace the dead |
| master. |
| |
| Due to https://issues.apache.org/jira/browse/KUDU-1620[KUDU-1620], it is not possible to perform |
| this workflow without also restarting the live masters. As such, the workflow requires a |
| maintenance window, albeit a potentially brief one if the cluster was set up with DNS aliases. |
| |
| WARNING: Kudu does not yet support live Raft configuration changes for masters. As such, it is only |
| possible to replace a master if the deployment was created with DNS aliases or if every node in the |
| cluster is first shut down. See the <<migrate_to_multi_master,multi-master migration workflow>> for |
| more details on deploying with DNS aliases. |
| |
| WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If |
| using vendor-specific tools (e.g. Cloudera Manager) the workflow also presupposes familiarity with |
| it and the vendor's instructions should be used instead as details may differ. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu UNIX user, typically |
| `kudu`. |
| |
| ==== Prepare for the recovery |
| |
| . If the deployment was configured without DNS aliases perform the following steps: |
| * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| * Shut down all Kudu tablet server processes in the cluster. |
| |
| . Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from |
| accidentally restarting; this can be quite dangerous for the cluster post-recovery. |
| |
| . Choose one of the remaining live masters to serve as a basis for recovery. The rest of this |
| workflow will refer to this master as the "reference" master. |
| |
| . Choose an unused machine in the cluster where the new master will live. The master generates very |
| little load so it can be colocated with other data services or load-generating processes, though |
| not with another Kudu master from the same configuration. The rest of this workflow will refer to |
| this master as the "replacement" master. |
| |
| . Perform the following preparatory steps for the replacement master: |
| * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and |
| `kudu-master` packages should be installed), or via some other means. |
| * Choose and record the directory where the master's data will live. |
| |
| . Perform the following preparatory steps for each live master: |
| * Identify and record the directory where the master's data lives. If using Kudu system packages, |
| the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir` and |
| `fs_data_dirs` configuration parameters. Please note if you've set `fs_data_dirs` to some directories |
| other than the value of `fs_wal_dir`, it should be explicitly included in every command below where |
| `fs_wal_dir` is also included. For more information on configuring these directories, see the |
| link:configuration.html#directory_configuration[Kudu Configuration docs]. |
| * Identify and record the master's UUID. It can be fetched using the following command: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null |
| ---- |
| master_data_dir:: live master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null |
| 80a82c4b8a9f4c819bab744927ad765c |
| ---- |
| + |
| . Perform the following preparatory steps for the reference master: |
| * Identify and record the directory where the master's data lives. If using Kudu system packages, |
| the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir` and |
| `fs_data_dirs` configuration parameters. Please note if you've set `fs_data_dirs` to some directories |
| other than the value of `fs_wal_dir`, it should be explicitly included in every command below where |
| `fs_wal_dir` is also included. For more information on configuring these directories, see the |
| link:configuration.html#directory_configuration[Kudu Configuration docs]. |
| * Identify and record the UUIDs of every master in the cluster, using the following command: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> 2>/dev/null |
| ---- |
| master_data_dir:: reference master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 2>/dev/null |
| 80a82c4b8a9f4c819bab744927ad765c 2a73eeee5d47413981d9a1c637cce170 1c3f3094256347528d02ec107466aef3 |
| ---- |
| + |
| . Using the two previously-recorded lists of UUIDs (one for all live masters and one for all |
| masters), determine and record (by process of elimination) the UUID of the dead master. |
| |
| ==== Perform the recovery |
| |
| . Format the data directory on the replacement master machine using the previously recorded |
| UUID of the dead master. Use the following command sequence: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] --uuid=<uuid> |
| ---- |
| + |
| master_data_dir:: replacement master's previously recorded data directory |
| uuid:: dead master's previously recorded UUID |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data --uuid=80a82c4b8a9f4c819bab744927ad765c |
| ---- |
| + |
| . Copy the master data to the replacement master with the following command: |
| + |
| WARNING: If your Kudu cluster is secure, in addition to running as the Kudu UNIX user, you must |
| authenticate as the Kudu service user prior to running this command. |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <reference_master> |
| ---- |
| + |
| master_data_dir:: replacement master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| reference_master:: RPC address of the reference master and must be a string of the form |
| `<hostname>:<port>` |
| hostname::: reference master's previously recorded hostname or alias |
| port::: reference master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-2:7051 |
| ---- |
| + |
| . If using CM, add the replacement Kudu master role now, but do not start it. |
| * Override the empty value of the `Master Address` parameter for the new role with the replacement |
| master's alias. |
| * Add the port number (separated by a colon) if using a non-default RPC port value. |
| |
| . If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead master to point |
| at the replacement master. |
| |
| . If the cluster was set up without DNS aliases, perform the following steps: |
| * Stop the remaining live masters. |
| * Rewrite the Raft configurations on these masters to include the replacement master. See Step 4 of |
| <<perform-the-migration, Perform the Migration>> for more details. |
| |
| . Start the replacement master. |
| |
| . Restart the remaining masters in the new multi-master deployment. While the masters are shut down, |
| there will be an availability outage, but it should last only as long as it takes for the masters |
| to come back up. |
| |
| Congratulations, the dead master has been replaced! To verify that all masters are working properly, |
| consider performing the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. |
| |
| === Removing Kudu Masters from a Multi-Master Deployment |
| |
| In the event that a multi-master deployment has been overallocated nodes, the following steps should |
| be taken to remove the unwanted masters. |
| |
| WARNING: In planning the new multi-master configuration, keep in mind that the number of masters |
| should be odd and that three or five node master configurations are recommended. |
| |
| WARNING: Dropping the number of masters below the number of masters currently needed for a Raft |
| majority can incur data loss. To mitigate this, ensure that the leader master is not removed during |
| this process. |
| |
| ==== Prepare for the removal |
| |
| . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| |
| . Identify the UUID and RPC address current leader of the multi-master deployment by visiting the |
| `/masters` page of any master's web UI. This master must not be removed during this process; its |
| removal may result in severe data loss. |
| |
| . Stop all the Kudu processes in the entire cluster. |
| |
| . If using CM, remove the unwanted Kudu master. |
| |
| ==== Perform the removal |
| |
| . Rewrite the Raft configuration on the remaining masters to include only the remaining masters. See |
| Step 4 of <<perform-the-migration,Perform the Migration>> for more details. |
| |
| . Remove the data directories and WAL directory on the unwanted masters. This is a precaution to |
| ensure that they cannot start up again and interfere with the new multi-master deployment. |
| |
| . Modify the value of the `master_addresses` configuration parameter for the masters of the new |
| multi-master deployment. If migrating to a single-master deployment, the `master_addresses` flag |
| should be omitted entirely. |
| |
| . Start all of the masters that were not removed. |
| |
| . Modify the value of the `tserver_master_addrs` configuration parameter for the tablet servers to |
| remove any unwanted masters. |
| |
| . Start all of the tablet servers. |
| |
| ==== Verify the migration was successful |
| |
| To verify that all masters are working properly, perform the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. |
| |
| === Changing the master hostnames |
| |
| To prevent long maintenance windows when replacing dead masters, DNS aliases should be used. If the |
| cluster was set up without aliases, changing the host names can be done by following the below |
| steps. |
| |
| ==== Prepare for the hostname change |
| |
| . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| |
| . Note the UUID and RPC address of every master by visiting the `/masters` page of any master's web |
| UI. |
| |
| . Stop all the Kudu processes in the entire cluster. |
| |
| . Set up the new hostnames to point to the masters and verify all servers and clients properly |
| resolve them. |
| |
| ==== Perform the hostname change |
| |
| . Rewrite each master’s Raft configuration with the following command, executed on all master hosts: |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 00000000000000000000000000000000 <all_masters> |
| ---- |
| |
| For example: |
| |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:new-master-name-1:7051 f5624e05f40649b79a757629a69d061e:new-master-name-2:7051 988d8ac6530f426cbe180be5ba52033d:new-master-name-3:7051 |
| ---- |
| |
| . Change the masters' gflagfile so the `master_addresses` parameter reflects the new hostnames. |
| |
| . Change the `tserver_master_addrs` parameter in the tablet servers' gflagfiles to the new |
| hostnames. |
| |
| . Start up the masters. |
| |
| . To verify that all masters are working properly, perform the following sanity checks: |
| |
| .. Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| .. Run the below command to verify all masters are up and listening. The UUIDs |
| should be the same and belong to the same master as before the hostname change: |
| + |
| ---- |
| $ sudo -u kudu kudu master list new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051 |
| ---- |
| |
| . Start all of the tablet servers. |
| |
| . Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. After startup, some tablets may be |
| unavailable as it takes some time to initialize all of them. |
| |
| . If you have Kudu tables that are accessed from Impala, update the HMS |
| database manually in the underlying database that provides the storage for HMS. |
| |
| .. The following is an example SQL statement you should run in the HMS database: |
| + |
| [source,sql] |
| ---- |
| UPDATE TABLE_PARAMS |
| SET PARAM_VALUE = |
| 'new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051' |
| WHERE PARAM_KEY = 'kudu.master_addresses' |
| AND PARAM_VALUE = 'master-1:7051,master-2:7051,master-3:7051'; |
| ---- |
| + |
| .. In `impala-shell`, run: |
| + |
| [source,bash] |
| ---- |
| INVALIDATE METADATA; |
| ---- |
| + |
| .. Verify updating the metadata worked by running a simple `SELECT` query on a |
| Kudu-backed Impala table. |
| |
| [[ksck]] |
| === Checking Cluster Health with `ksck` |
| |
| The `kudu` CLI includes a tool named `ksck` that can be used for gathering |
| information about the state of a Kudu cluster, including checking its health. |
| `ksck` will identify issues such as under-replicated tablets, unreachable |
| tablet servers, or tablets without a leader. |
| |
| `ksck` should be run from the command line as the Kudu admin user, and requires |
| the full list of master addresses to be specified: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu cluster ksck master-01.example.com,master-02.example.com,master-03.example.com |
| ---- |
| |
| To see a full list of the options available with `ksck`, use the `--help` flag. |
| If the cluster is healthy, `ksck` will print information about the cluster, a |
| success message, and return a zero (success) exit status. |
| |
| ---- |
| Master Summary |
| UUID | Address | Status |
| ----------------------------------+-----------------------+--------- |
| a811c07b99394df799e6650e7310f282 | master-01.example.com | HEALTHY |
| b579355eeeea446e998606bcb7e87844 | master-02.example.com | HEALTHY |
| cfdcc8592711485fad32ec4eea4fbfcd | master-02.example.com | HEALTHY |
| |
| Tablet Server Summary |
| UUID | Address | Status |
| ----------------------------------+------------------------+--------- |
| a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY |
| e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY |
| e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | HEALTHY |
| |
| Version Summary |
| Version | Servers |
| ---------+------------------------- |
| 1.7.1 | all 6 server(s) checked |
| |
| Summary by table |
| Name | RF | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable |
| ----------+----+---------+---------------+---------+------------+------------------+------------- |
| my_table | 3 | HEALTHY | 8 | 8 | 0 | 0 | 0 |
| |
| | Total Count |
| ----------------+------------- |
| Masters | 3 |
| Tablet Servers | 3 |
| Tables | 1 |
| Tablets | 8 |
| Replicas | 24 |
| OK |
| ---- |
| |
| If the cluster is unhealthy, for instance if a tablet server process has |
| stopped, `ksck` will report the issue(s) and return a non-zero exit status, as |
| shown in the abbreviated snippet of `ksck` output below: |
| |
| ---- |
| Tablet Server Summary |
| UUID | Address | Status |
| ----------------------------------+------------------------+------------- |
| a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY |
| e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY |
| e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | UNAVAILABLE |
| Error from 127.0.0.1:7150: Network error: could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7150: connect: Connection refused (error 61) (UNAVAILABLE) |
| |
| ... (full output elided) |
| |
| ================== |
| Errors: |
| ================== |
| Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors |
| Corruption: table consistency check error: 1 out of 1 table(s) are not healthy |
| |
| FAILED |
| Runtime error: ksck discovered errors |
| ---- |
| |
| To verify data integrity, the optional `--checksum_scan` flag can be set, which |
| will ensure the cluster has consistent data by scanning each tablet replica and |
| comparing results. The `--tables` or `--tablets` flags can be used to limit the |
| scope of the checksum scan to specific tables or tablets, respectively. For |
| example, checking data integrity on the `my_table` table can be done with the |
| following command: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu cluster ksck --checksum_scan --tables my_table master-01.example.com,master-02.example.com,master-03.example.com |
| ---- |
| |
| By default, `ksck` will attempt to use a snapshot scan of the table, so the |
| checksum scan can be done while writes continue. |
| |
| Finally, `ksck` also supports output in JSON format using the `--ksck_format` |
| flag. JSON output contains the same information as the plain text output, but |
| in a format that can be used by other tools. See `kudu cluster ksck --help` for |
| more information. |
| |
| [[change_dir_config]] |
| === Changing Directory Configurations |
| |
| For higher read parallelism and larger volumes of storage per server, users may |
| want to configure servers to store data in multiple directories on different |
| devices. Once a server is started, users must go through the following steps |
| to change the directory configuration. |
| |
| Users can add or remove data directories to an existing master or tablet server |
| via the `kudu fs update_dirs` tool. Data is striped across data directories, |
| and when a new data directory is added, new data will be striped across the |
| union of the old and new directories. |
| |
| NOTE: Unless the `--force` flag is specified, Kudu will not allow for the |
| removal of a directory across which tablets are configured to spread data. If |
| `--force` is specified, all tablets configured to use that directory will fail |
| upon starting up and be replicated elsewhere. |
| |
| NOTE: If the link:configuration.html#directory_configuration[metadata |
| directory] overlaps with a data directory, as was the default prior to Kudu |
| 1.7, or if a non-default metadata directory is configured, the |
| `--fs_metadata_dir` configuration must be specified when running the `kudu fs |
| update_dirs` tool. |
| |
| NOTE: Only new tablet replicas (i.e. brand new tablets' replicas and replicas |
| that are copied to the server for high availability) will use the new |
| directory. Existing tablet replicas on the server will not be rebalanced across |
| the new directory. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu |
| UNIX user, typically `kudu`. |
| |
| . The tool can only run while the server is offline, so establish a maintenance |
| window to update the server. The tool itself runs quickly, so this offline |
| window should be brief, and as such, only the server to update needs to be |
| offline. However, if the server is offline for too long (see the |
| `follower_unavailable_considered_failed_sec` flag), the tablet replicas on it |
| may be evicted from their Raft groups. To avoid this, it may be desirable to |
| bring the entire cluster offline while performing the update. |
| |
| . Run the tool with the desired directory configuration flags. For example, if a |
| cluster was set up with `--fs_wal_dir=/wals`, `--fs_metadata_dir=/meta`, and |
| `--fs_data_dirs=/data/1,/data/2,/data/3`, and `/data/3` is to be removed (e.g. |
| due to a disk error), run the command: |
| |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_metadata_dir=/meta --fs_data_dirs=/data/1,/data/2 |
| ---- |
| + |
| |
| . Modify the values of the `fs_data_dirs` flags for the updated sever. If using |
| CM, make sure to only update the configurations of the updated server, rather |
| than of the entire Kudu service. |
| |
| . Once complete, the server process can be started. When Kudu is installed using |
| system packages, `service` is typically used: |
| |
| + |
| [source,bash] |
| ---- |
| $ sudo service kudu-tserver start |
| ---- |
| + |
| |
| |
| [[disk_failure_recovery]] |
| === Recovering from Disk Failure |
| Kudu nodes can only survive failures of disks on which certain Kudu directories |
| are mounted. For more information about the different Kudu directory types, see |
| the section on link:configuration.html#directory_configuration[Kudu Directory |
| Configurations]. Below describes this behavior across different Apache Kudu |
| releases. |
| |
| [[disk_failure_behavior]] |
| .Kudu Disk Failure Behavior |
| [cols="<,<,<",options="header"] |
| |=== |
| | Node Type | Kudu Directory Type | Kudu Releases that Crash on Disk Failure |
| | Master | All | All |
| | Tablet Server | Directory containing WALs | All |
| | Tablet Server | Directory containing tablet metadata | All |
| | Tablet Server | Directory containing data blocks only | Pre-1.6.0 |
| |=== |
| |
| When a disk failure occurs that does not lead to a crash, Kudu will stop using |
| the affected directory, shut down tablets with blocks on the affected |
| directories, and automatically re-replicate the affected tablets to other |
| tablet servers. The affected server will remain alive and print messages to the |
| log indicating the disk failure, for example: |
| |
| ---- |
| E1205 19:06:24.163748 27115 data_dirs.cc:1011] Directory /data/8/kudu/data marked as failed |
| E1205 19:06:30.324795 27064 log_block_manager.cc:1822] Not using report from /data/8/kudu/data: IO error: Could not open container 0a6283cab82d4e75848f49772d2638fe: /data/8/kudu/data/0a6283cab82d4e75848f49772d2638fe.metadata: Read-only file system (error 30) |
| E1205 19:06:33.564638 27220 ts_tablet_manager.cc:946] T 4957808439314e0d97795c1394348d80 P 70f7ee61ead54b1885d819f354eb3405: aborting tablet bootstrap: tablet has data in a failed directory |
| ---- |
| |
| While in this state, the affected node will avoid using the failed disk, |
| leading to lower storage volume and reduced read parallelism. The administrator |
| should schedule a brief window to <<change_dir_config,update the node's |
| directory configuration>> to exclude the failed disk. |
| |
| [[disk_full_recovery]] |
| === Recovering from Full Disks |
| By default, Kudu reserves a small amount of space (1% by capacity) in its |
| directories; Kudu considers a disk full if there is less free space available |
| than the reservation. Kudu nodes can only tolerate running out of space on disks |
| on which certain Kudu directories are mounted. For more information about the |
| different Kudu directory types, see |
| link:configuration.html#directory_configuration[Kudu Directory Configurations]. |
| The table below describes this behavior for each type of directory. The behavior |
| is uniform across masters and tablet servers. |
| [[disk_full_behavior]] |
| .Kudu Full Disk Behavior |
| [options="header"] |
| |=== |
| | Kudu Directory Type | Crash on a Full Disk? |
| | Directory containing WALs | Yes |
| | Directory containing tablet metadata | Yes |
| | Directory containing data blocks only | No (see below) |
| |=== |
| |
| Prior to Kudu 1.7.0, Kudu stripes tablet data across all directories, and will |
| avoid writing data to full directories. Kudu will crash if all data directories |
| are full. |
| |
| In 1.7.0 and later, new tablets are assigned a disk group consisting of |
| -fs_target_data_dirs_per_tablet data dirs (default 3). If Kudu is not configured |
| with enough data directories for a full disk group, all data directories are |
| used. When a data directory is full, Kudu will stop writing new data to it and |
| each tablet that uses that data directory will write new data to other data |
| directories within its group. If all data directories for a tablet are full, Kudu |
| will crash. Periodically, Kudu will check if full data directories are still |
| full, and will resume writing to those data directories if space has become |
| available. |
| |
| If Kudu does crash because its data directories are full, freeing space on the |
| full directories will allow the affected daemon to restart and resume writing. |
| Note that it may be possible for Kudu to free some space by running |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs check --repair |
| ---- |
| |
| but this command may also fail if there is too little space left. |
| |
| It's also possible to allocate additional data directories to Kudu in order to |
| increase the overall amount of storage available. See the documentation on |
| <<change_dir_config,updating a node's directory configuration>> for more |
| information. Note that existing tablets will not use new data directories, so |
| adding a new data directory does not resolve issues with full disks. |
| |
| [[tablet_majority_down_recovery]] |
| === Bringing a tablet that has lost a majority of replicas back online |
| |
| If a tablet has permanently lost a majority of its replicas, it cannot recover |
| automatically and operator intervention is required. If the tablet servers |
| hosting a majority of the replicas are down (i.e. ones reported as "TS |
| unavailable" by ksck), they should be recovered instead if possible. |
| |
| WARNING: The steps below may cause recent edits to the tablet to be lost, |
| potentially resulting in permanent data loss. Only attempt the procedure below |
| if it is impossible to bring a majority back online. |
| |
| Suppose a tablet has lost a majority of its replicas. The first step in |
| diagnosing and fixing the problem is to examine the tablet's state using ksck: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu cluster ksck --tablets=e822cab6c0584bc0858219d1539a17e6 master-00,master-01,master-02 |
| Connected to the Master |
| Fetched info from all 5 Tablet Servers |
| Tablet e822cab6c0584bc0858219d1539a17e6 of table 'my_table' is unavailable: 2 replica(s) not RUNNING |
| 638a20403e3e4ae3b55d4d07d920e6de (tserver-00:7150): RUNNING |
| 9a56fa85a38a4edc99c6229cba68aeaa (tserver-01:7150): bad state |
| State: FAILED |
| Data state: TABLET_DATA_READY |
| Last status: <failure message> |
| c311fef7708a4cf9bb11a3e4cbcaab8c (tserver-02:7150): bad state |
| State: FAILED |
| Data state: TABLET_DATA_READY |
| Last status: <failure message> |
| ---- |
| |
| This output shows that, for tablet `e822cab6c0584bc0858219d1539a17e6`, the two |
| tablet replicas on `tserver-01` and `tserver-02` failed. The remaining replica |
| is not the leader, so the leader replica failed as well. This means the chance |
| of data loss is higher since the remaining replica on `tserver-00` may have |
| been lagging. In general, to accept the potential data loss and restore the |
| tablet from the remaining replicas, divide the tablet replicas into two groups: |
| |
| 1. Healthy replicas: Those in `RUNNING` state as reported by ksck |
| 2. Unhealthy replicas |
| |
| For example, in the above ksck output, the replica on tablet server `tserver-00` |
| is healthy, while the replicas on `tserver-01` and `tserver-02` are unhealthy. |
| On each tablet server with a healthy replica, alter the consensus configuration |
| to remove unhealthy replicas. In the typical case of 1 out of 3 surviving |
| replicas, there will be only one healthy replica, so the consensus configuration |
| will be rewritten to include only the healthy replica. |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu remote_replica unsafe_change_config tserver-00:7150 <tablet-id> <tserver-00-uuid> |
| ---- |
| |
| where `<tablet-id>` is `e822cab6c0584bc0858219d1539a17e6` and |
| `<tserver-00-uuid>` is the uuid of `tserver-00`, |
| `638a20403e3e4ae3b55d4d07d920e6de`. |
| |
| Once the healthy replicas' consensus configurations have been forced to exclude |
| the unhealthy replicas, the healthy replicas will be able to elect a leader. |
| The tablet will become available for writes, though it will still be |
| under-replicated. Shortly after the tablet becomes available, the leader master |
| will notice that it is under-replicated, and will cause the tablet to |
| re-replicate until the proper replication factor is restored. The unhealthy |
| replicas will be tombstoned by the master, causing their remaining data to be |
| deleted. |
| |
| [[rebuilding_kudu]] |
| === Rebuilding a Kudu Filesystem Layout |
| |
| In the event that critical files are lost, i.e. WALs or tablet-specific |
| metadata, all Kudu directories on the server must be deleted and rebuilt to |
| ensure correctness. Doing so will destroy the copy of the data for each tablet |
| replica hosted on the local server. Kudu will automatically re-replicate tablet |
| replicas removed in this way, provided the replication factor is at least three |
| and all other servers are online and healthy. |
| |
| NOTE: These steps use a tablet server as an example, but the steps are the same |
| for Kudu master servers. |
| |
| WARNING: If multiple nodes need their FS layouts rebuilt, wait until all |
| replicas previously hosted on each node have finished automatically |
| re-replicating elsewhere before continuing. Failure to do so can result in |
| permanent data loss. |
| |
| WARNING: Before proceeding, ensure the contents of the directories are backed |
| up, either as a copy or in the form of other tablet replicas. |
| |
| . The first step to rebuilding a server with a new directory configuration is |
| emptying all of the server's existing directories. For example, if a tablet |
| server is configured with `--fs_wal_dir=/data/0/kudu-tserver-wal`, |
| `--fs_metadata_dir=/data/0/kudu-tserver-meta`, and |
| `--fs_data_dirs=/data/1/kudu-tserver,/data/2/kudu-tserver`, the following |
| commands will remove the WAL directory's and data directories' contents: |
| |
| + |
| [source,bash] |
| ---- |
| # Note: this will delete all of the data from the local tablet server. |
| $ rm -rf /data/0/kudu-tserver-wal/* /data/0/kudu-tserver-meta/* /data/1/kudu-tserver/* /data/2/kudu-tserver/* |
| ---- |
| + |
| |
| . If using CM, update the configurations for the rebuilt server to include only |
| the desired directories. Make sure to only update the configurations of servers |
| to which changes were applied, rather than of the entire Kudu service. |
| |
| . After directories are deleted, the server process can be started with the new |
| directory configuration. The appropriate sub-directories will be created by |
| Kudu upon starting up. |
| |
| [[physical_backup]] |
| === Physical backups of an entire node |
| |
| As documented in the link:known_issues.html#_replication_and_backup_limitations[Known Issues and Limitations], |
| Kudu does not yet provide any built-in backup and restore functionality. However, |
| it is possible to create a physical backup of a Kudu node (either tablet server |
| or master) and restore it later. |
| |
| WARNING: The node to be backed up must be offline during the procedure, or else |
| the backed up (or restored) data will be inconsistent. |
| |
| WARNING: Certain aspects of the Kudu node (such as its hostname) are embedded in |
| the on-disk data. As such, it's not yet possible to restore a physical backup of |
| a node onto another machine. |
| |
| . Stop all Kudu processes in the cluster. This prevents the tablets on the |
| backed up node from being rereplicated elsewhere unnecessarily. |
| |
| . If creating a backup, make a copy of the WAL, metadata, and data directories |
| on each node to be backed up. It is important that this copy preserve all file |
| attributes as well as sparseness. |
| |
| . If restoring from a backup, delete the existing WAL, metadata, and data |
| directories, then restore the backup via move or copy. As with creating a |
| backup, it is important that the restore preserve all file attributes and |
| sparseness. |
| |
| . Start all Kudu processes in the cluster. |
| |
| [[minimizing_cluster_disruption_during_temporary_single_ts_downtime]] |
| === Minimizing cluster disruption during temporary planned downtime of a single tablet server |
| |
| If a single tablet server is brought down temporarily in a healthy cluster, all |
| tablets will remain available and clients will function as normal, after |
| potential short delays due to leader elections. However, if the downtime lasts |
| for more than `--follower_unavailable_considered_failed_sec` (default 300) |
| seconds, the tablet replicas on the down tablet server will be replaced by new |
| replicas on available tablet servers. This will cause stress on the cluster |
| as tablets re-replicate and, if the downtime lasts long enough, significant |
| reduction in the number of replicas on the down tablet server. This may require |
| the rebalancer to fix. |
| |
| To work around this, increase `--follower_unavailable_considered_failed_sec` on |
| all tablet servers so the amount of time before re-replication will start is |
| longer than the expected downtime of the tablet server, including the time it |
| takes the tablet server to restart and bootstrap its tablet replicas. To do |
| this, run the following command for each tablet server: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <num_seconds> |
| ---- |
| |
| where `<num_seconds>` is the number of seconds that will encompass the downtime. |
| Once the downtime is finished, reset the flag to its original value. |
| |
| ---- |
| $ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <original_value> |
| ---- |
| |
| WARNING: Be sure to reset the value of `--follower_unavailable_considered_failed_sec` |
| to its original value. |
| |
| NOTE: On Kudu versions prior to 1.8, the `--force` flag must be provided in the above |
| commands. |
| |
| [[rebalancer_tool]] |
| === Running the tablet rebalancing tool |
| |
| The `kudu` CLI contains a rebalancing tool that can be used to rebalance |
| tablet replicas among tablet servers. For each table, the tool attempts to |
| balance the number of replicas per tablet server. It will also, without |
| unbalancing any table, attempt to even out the number of replicas per tablet |
| server across the cluster as a whole. The rebalancing tool should be run as the |
| Kudu admin user, specifying all master addresses: |
| |
| [source,bash] |
| ---- |
| sudo -u kudu kudu cluster rebalance master-01.example.com,master-02.example.com,master-03.example.com |
| ---- |
| |
| When run, the rebalancer will report on the initial tablet replica distribution |
| in the cluster, log the replicas it moves, and print a final summary of the |
| distribution when it terminates: |
| |
| ---- |
| Per-server replica distribution summary: |
| Statistic | Value |
| -----------------------+----------- |
| Minimum Replica Count | 0 |
| Maximum Replica Count | 24 |
| Average Replica Count | 14.400000 |
| |
| Per-table replica distribution summary: |
| Replica Skew | Value |
| --------------+---------- |
| Minimum | 8 |
| Maximum | 8 |
| Average | 8.000000 |
| |
| I0613 14:18:49.905897 3002065792 rebalancer.cc:779] tablet e7ee9ade95b342a7a94649b7862b345d: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled |
| I0613 14:18:49.917578 3002065792 rebalancer.cc:779] tablet 5f03944529f44626a0d6ec8b1edc566e: 6e64c4165b864cbab0e67ccd82091d60 -> ba8c22ab030346b4baa289d6d11d0809 move scheduled |
| I0613 14:18:49.928683 3002065792 rebalancer.cc:779] tablet 9373fee3bfe74cec9054737371a3b15d: fab382adf72c480984c6cc868fdd5f0e -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled |
| |
| ... (full output elided) |
| |
| I0613 14:19:01.162802 3002065792 rebalancer.cc:842] tablet f4c046f18b174cc2974c65ac0bf52767: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move completed: OK |
| |
| rebalancing is complete: cluster is balanced (moved 28 replicas) |
| Per-server replica distribution summary: |
| Statistic | Value |
| -----------------------+----------- |
| Minimum Replica Count | 14 |
| Maximum Replica Count | 15 |
| Average Replica Count | 14.400000 |
| |
| Per-table replica distribution summary: |
| Replica Skew | Value |
| --------------+---------- |
| Minimum | 1 |
| Maximum | 1 |
| Average | 1.000000 |
| ---- |
| |
| If more details are needed in addition to the replica distribution summary, |
| use the `--output_replica_distribution_details` flag. If added, the flag makes |
| the tool print per-table and per-tablet server replica distribution statistics |
| as well. |
| |
| Use the `--report_only` flag to get a report on table- and cluster-wide |
| replica distribution statistics without starting any rebalancing activity. |
| |
| The rebalancer can also be restricted to run on a subset of the tables by |
| supplying the `--tables` flag. Note that, when running on a subset of tables, |
| the tool will not attempt to balance the cluster as a whole. |
| |
| The length of time rebalancing is run for can be controlled with the flag |
| `--max_run_time_sec`. By default, the rebalancer will run until the cluster is |
| balanced. To control the amount of resources devoted to rebalancing, modify |
| the flag `--max_moves_per_server`. See `kudu cluster rebalance --help` for more. |
| |
| The rebalancing tool can rebalance Kudu clusters running older versions as well, |
| with some restrictions. Consult the following table for more information. In the |
| table, "RF" stands for "replication factor". |
| |
| [[rebalancer_compatibility]] |
| .Kudu Rebalancing Tool Compatibility |
| [options="header"] |
| |=== |
| | Version Range | Rebalances RF = 1 Tables? | Rebalances RF > 1 Tables? |
| | v < 1.4.0 | No | No |
| | 1.4.0 +<=+ v < 1.7.1 | No | Yes |
| | v >= 1.7.1 | Yes | Yes |
| |=== |
| |
| If the rebalancer is running against a cluster where rebalancing replication |
| factor one tables is not supported, it will rebalance all the other tables |
| and the cluster as if those singly-replicated tables did not exist. |
| |
| [[tablet_server_decommissioning]] |
| === Decommissioning or Permanently Removing a Tablet Server From a Cluster |
| |
| Kudu does not currently have an automated way to remove a tablet server from |
| a cluster permanently. Instead, use the following steps: |
| |
| . Ensure the cluster is in good health using `ksck`. See <<ksck>>. |
| . If the tablet server contains any replicas of tables with replication factor |
| 1, these replicas must be manually moved off the tablet server prior to |
| shutting it down. The `kudu tablet change_config move_replica` tool can be |
| used for this. |
| . Shut down the tablet server. After |
| `-follower_unavailable_considered_failed_sec`, which defaults to 5 minutes, |
| Kudu will begin to re-replicate the tablet server's replicas to other servers. |
| Wait until the process is finished. Progress can be monitored using `ksck`. |
| . Once all the copies are complete, `ksck` will continue to report the tablet |
| server as unavailable. The cluster will otherwise operate fine without the |
| tablet server. To completely remove it from the cluster so `ksck` shows the |
| cluster as completely healthy, restart the masters. In the case of a single |
| master, this will cause cluster downtime. With multimaster, restart the |
| masters in sequence to avoid cluster downtime. |
| |
| WARNING: Do not shut down multiple tablet servers at once. To remove multiple |
| tablet servers from the cluster, follow the above instructions for each tablet |
| server, ensuring that the previous tablet server is removed from the cluster and |
| `ksck` is healthy before shutting down the next. |