| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| [[administration]] |
| = Apache Kudu Administration |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| == Starting and Stopping Kudu Processes |
| |
| NOTE: These instructions are relevant only when Kudu is installed using operating system packages |
| (e.g. `rpm` or `deb`). |
| |
| include::installation.adoc[tags=start_stop] |
| |
| == Kudu Web Interfaces |
| |
| Kudu tablet servers and masters expose useful operational information on a built-in web interface, |
| |
| === Kudu Master Web Interface |
| |
| Kudu master processes serve their web interface on port 8051. The interface exposes several pages |
| with information about the cluster state: |
| |
| - A list of tablet servers, their host names, and the time of their last heartbeat. |
| - A list of tables, including schema and tablet location information for each. |
| - SQL code which you can paste into Impala Shell to add an existing table to Impala's list of known data sources. |
| |
| === Kudu Tablet Server Web Interface |
| |
| Each tablet server serves a web interface on port 8050. The interface exposes information |
| about each tablet hosted on the server, its current state, and debugging information |
| about maintenance background operations. |
| |
| === Common Web Interface Pages |
| |
| Both Kudu masters and tablet servers expose a common set of information via their web interfaces: |
| |
| - HTTP access to server logs. |
| - an `/rpcz` endpoint which lists currently running RPCs via JSON. |
| - pages giving an overview and detailed information on the memory usage of different |
| components of the process. |
| - information on the current set of configuration flags. |
| - information on the currently running threads and their resource consumption. |
| - a JSON endpoint exposing metrics about the server. |
| - information on the deployed version number of the daemon. |
| |
| These interfaces are linked from the landing page of each daemon's web UI. |
| |
| == Kudu Metrics |
| |
| Kudu daemons expose a large number of metrics. Some metrics are associated with an entire |
| server process, whereas others are associated with a particular tablet replica. |
| |
| === Listing available metrics |
| |
| The full set of available metrics for a Kudu server can be dumped via a special command |
| line flag: |
| |
| [source,bash] |
| ---- |
| $ kudu-tserver --dump_metrics_json |
| $ kudu-master --dump_metrics_json |
| ---- |
| |
| This will output a large JSON document. Each metric indicates its name, label, description, |
| units, and type. Because the output is JSON-formatted, this information can easily be |
| parsed and fed into other tooling which collects metrics from Kudu servers. |
| |
| === Collecting metrics via HTTP |
| |
| Metrics can be collected from a server process via its HTTP interface by visiting |
| `/metrics`. The output of this page is JSON for easy parsing by monitoring services. |
| This endpoint accepts several `GET` parameters in its query string: |
| |
| - `/metrics?metrics=<substring1>,<substring2>,...` - limits the returned metrics to those which contain |
| at least one of the provided substrings. The substrings also match entity names, so this |
| may be used to collect metrics for a specific tablet. |
| |
| - `/metrics?include_schema=1` - includes metrics schema information such as unit, description, |
| and label in the JSON output. This information is typically elided to save space. |
| |
| - `/metrics?compact=1` - eliminates unnecessary whitespace from the resulting JSON, which can decrease |
| bandwidth when fetching this page from a remote host. |
| |
| - `/metrics?include_raw_histograms=1` - include the raw buckets and values for histogram metrics, |
| enabling accurate aggregation of percentile metrics over time and across hosts. |
| |
| - `/metrics?level=info` - limits the returned metrics based on their severity level. |
| The levels are ordered and lower levels include the levels above them. If no level is specified, |
| `debug` is used to include all metrics. The valid values are: |
| * `debug` - Metrics that are diagnostically helpful but generally not monitored |
| during normal operation. |
| * `info` - Generally useful metrics that operators always want to have available |
| but may not be monitored under normal circumstances. |
| * `warn` - Metrics which can often indicate operational oddities that may need |
| more investigation. |
| |
| For example: |
| |
| [source,bash] |
| ---- |
| $ curl -s 'http://example-ts:8050/metrics?include_schema=1&metrics=connections_accepted' |
| ---- |
| |
| NOTE: See the `link:metrics_reference.html[metrics reference]` |
| page for more information on the available metrics. |
| |
| [source,json] |
| ---- |
| [ |
| { |
| "type": "server", |
| "id": "kudu.tabletserver", |
| "attributes": {}, |
| "metrics": [ |
| { |
| "name": "rpc_connections_accepted", |
| "label": "RPC Connections Accepted", |
| "type": "counter", |
| "unit": "connections", |
| "description": "Number of incoming TCP connections made to the RPC server", |
| "value": 92 |
| } |
| ] |
| } |
| ] |
| ---- |
| |
| [source,bash] |
| ---- |
| $ curl -s 'http://example-ts:8050/metrics?metrics=log_append_latency' |
| ---- |
| |
| [source,json] |
| ---- |
| [ |
| { |
| "type": "tablet", |
| "id": "c0ebf9fef1b847e2a83c7bd35c2056b1", |
| "attributes": { |
| "table_name": "lineitem", |
| "partition": "hash buckets: (55), range: [(<start>), (<end>))", |
| "table_id": "" |
| }, |
| "metrics": [ |
| { |
| "name": "log_append_latency", |
| "total_count": 7498, |
| "min": 4, |
| "mean": 69.3649, |
| "percentile_75": 29, |
| "percentile_95": 38, |
| "percentile_99": 45, |
| "percentile_99_9": 95, |
| "percentile_99_99": 167, |
| "max": 367244, |
| "total_sum": 520098 |
| } |
| ] |
| } |
| ] |
| ---- |
| |
| NOTE: All histograms and counters are measured since the server start time, and are not reset upon collection. |
| |
| === Diagnostics Logging |
| |
| Kudu may be configured to dump various diagnostics information to a local log file. |
| The diagnostics log will be written to the same directory as the other Kudu log files, with a |
| similar naming format, substituting `diagnostics` instead of a log level like `INFO`. |
| After any diagnostics log file reaches 64MB uncompressed, the log will be rolled and |
| the previous file will be gzip-compressed. |
| |
| Each line in the diagnostics log consists of the following components: |
| |
| * A human-readable timestamp formatted in the same fashion as the other Kudu log files. |
| * The type of record. For example, a metrics record consists of the word `metrics`. |
| * A machine-readable timestamp, in microseconds since the Unix epoch. |
| * The record itself. |
| |
| Currently, the only type of diagnostics record is a periodic dump of the server metrics. |
| Each record is encoded in compact JSON format, and the server attempts to elide any metrics |
| which have not changed since the previous record. In addition, counters which have never |
| been incremented are elided. Otherwise, the format of the JSON record is identical to the |
| format exposed by the HTTP endpoint above. |
| |
| The frequency with which metrics are dumped to the diagnostics log is configured using the |
| `--metrics_log_interval_ms` flag. By default, Kudu logs metrics every 60 seconds. |
| |
| [[rack_awareness]] |
| == Rack Awareness |
| |
| As of version 1.9, Kudu supports a rack awareness feature. Kudu's ordinary |
| re-replication methods ensure the availability of the cluster in the event of a |
| single node failure. However, clusters can be vulnerable to correlated failures |
| of multiple nodes. For example, all of the physical hosts on the same rack in |
| a datacenter may become unavailable simultaneously if the top-of-rack switch |
| fails. Kudu's rack awareness feature provides protection from some kinds of |
| correlated failures, like the failure of a whole rack in a datacenter. Rack |
| awareness increases the availability of a Kudu cluster if there are at least |
| three different _locations_ defined in the cluster. |
| |
| The first element of Kudu's rack awareness feature is _location assignment_. |
| When a tablet server or client registers with a master, the master assigns it a |
| _location_. A location is a `/`-separated string that begins with a `/` and |
| where each `/`-separated component consists of characters from the set |
| `[a-zA-Z0-9_-.]`. For example, `/dc-0/rack-09` is a valid location, while |
| `rack-04` and `/rack=1` are not valid locations. Thus location strings resemble |
| absolute UNIX file paths where characters in directory and file names are |
| restricted to the set `[a-zA-Z0-9_-.]`. Presently, Kudu does not use the |
| hierarchical structure of locations, but it may in the future. Location |
| assignment is done by a user-provided command, whose path should be specified |
| using the `--location_mapping_cmd` master flag. The command should take a single |
| argument, the IP address or hostname of a tablet server or client, and return |
| the location for the tablet server or client. Make sure that all Kudu masters |
| are using the same location mapping command. |
| |
| The second element of Kudu's rack awareness feature is the _placement policy_, |
| which is |
| |
| Do not place a majority of replicas of a tablet on tablet servers in the same location. |
| |
| The leader master, when placing newly created replicas on tablet servers and |
| when re-replicating existing tablets, will attempt to place the replicas in a |
| way that complies with the placement policy. For example, in a cluster with five |
| tablet servers `A`, `B`, `C`, `D`, and `E`, with respective locations `/L0`, |
| `/L0`, `/L1`, `/L1`, `/L2`, to comply with the placement policy a new 3x |
| replicated tablet could have its replicas placed on `A`, `C`, and `E`, but not |
| on `A`, `B`, and `C`, because then the tablet would have 2/3 replicas in |
| location `/L0`. As another example, if a tablet has replicas on tablet servers |
| `A`, `C`, and `E`, and then `C` fails, the replacement replica must be placed on |
| `D` in order to comply with the placement policy. |
| |
| It's necessary to have at least three locations defined in a Kudu cluster to |
| improve its high availability with the location awareness feature. If there are |
| only two or just one location defined in a Kudu cluster, any tablet will |
| inevitably have a majority of its replicas placed in a single location. |
| |
| In the case where it is impossible to place replicas in a way that complies with |
| the placement policy, Kudu will violate the policy and place a replica anyway. |
| For example, using the setup described in the previous paragraph, if a tablet |
| has replicas on tablet servers `A`, `C`, and `E`, and then `E` fails, Kudu will |
| re-replicate the tablet onto one of `B` or `D`, violating the placement policy, |
| rather than leaving the tablet under-replicated indefinitely. The |
| `kudu cluster rebalance` tool can reestablish the placement policy if it is |
| possible to do so. The `kudu cluster rebalance` tool can also be used to |
| establish the placement policy on a cluster if the cluster has just been |
| configured to use the rack awareness feature and existing replicas need to be |
| moved to comply with the placement policy. See |
| <<rebalancer_tool_with_rack_awareness,running the tablet rebalancing tool on a rack-aware cluster>> |
| for more information. |
| |
| The third and final element of Kudu's rack awareness feature is the _use of |
| client locations to find "nearby" servers_. As mentioned, the masters also |
| assign a location to clients when they connect to the cluster. The client |
| (whether Java, {cpp}, or Python) uses its own location and the locations of |
| tablet servers in the cluster to prefer "nearby" replicas when scanning in |
| `CLOSEST_REPLICA` mode. Clients choose replicas to scan in the following order: |
| |
| . Scan a replica on a tablet server on the same host, if there is one. |
| . Scan a replica on a tablet server in the same location, if there is one. |
| . Scan some replica. |
| |
| For example, using the cluster setup described above, if a client on the same |
| host as tablet server `A` scans a tablet with replicas on tablet servers |
| `A`, `C`, and `E` in `CLOSEST_REPLICA` mode, it will choose to scan from the |
| replica on `A`, since the client and the replica on `A` are on the same host. |
| If the client scans a tablet with replicas on tablet servers `B`, `C`, and `E`, |
| it will choose to scan from the replica on `B`, since it is in the same |
| location as the client, `/L0`. If there are multiple replicas meeting a |
| criterion, one is chosen arbitrarily. |
| |
| [[backup]] |
| == Backup and Restore |
| |
| [[logical_backup]] |
| === Logical backup and restore |
| |
| As of Kudu 1.10.0, Kudu supports both full and incremental table backups via a |
| job implemented using Apache Spark. Additionally it supports restoring tables |
| from full and incremental backups via a restore job implemented using Apache Spark. |
| |
| Given the Kudu backup and restore jobs use Apache Spark, ensure Apache Spark |
| is installed in your environment following the |
| link:https://spark.apache.org/docs/latest/#downloading[Spark documentation]. |
| Additionally review the Apache Spark documentation for |
| link:https://spark.apache.org/docs/latest/submitting-applications.html[Submitting Applications]. |
| |
| ==== Backing up tables |
| |
| To backup one or more Kudu tables the `KuduBackup` Spark job can be used. |
| The first time the job is run for a table, a full backup will be run. |
| Additional runs will perform incremental backups which will only contain the |
| rows that have changed since the initial full backup. A new set of full |
| backups can be forced at anytime by passing the `--forceFull` flag to the |
| backup job. |
| |
| The common flags that will be used when taking a backup are: |
| |
| * `--rootPath`: The root path to output backup data. Accepts any Spark-compatible path. |
| ** See <<backup_directory>> for the directory structure used in the `rootPath`. |
| * `--kuduMasterAddresses`: Comma-separated addresses of Kudu masters. Default: localhost |
| * `<table>...`: A list of tables to be backed up. |
| |
| Note: You can see the full list of Job options at anytime by passing the `--help` flag. |
| |
| Below is a full example of a `KuduBackup` job execution which will backup the tables |
| `foo` and `bar` to the HDFS directory `kudu-backups`: |
| |
| [source,bash] |
| ---- |
| spark-submit --class org.apache.kudu.backup.KuduBackup kudu-backup2_2.11-1.10.0.jar \ |
| --kuduMasterAddresses master1-host,master-2-host,master-3-host \ |
| --rootPath hdfs:///kudu-backups \ |
| foo bar |
| ---- |
| |
| ==== Restoring tables from Backups |
| |
| To restore one or more Kudu tables, the `KuduRestore` Spark job can be used. |
| For each backed up table, the `KuduRestore` job will restore the full backup |
| and each associated incremental backup until the full table state is restored. |
| Restoring the full series of full and incremental backups is possible because |
| the backups are linked via the `from_ms` and `to_ms` fields in the backup metadata. |
| By default the restore job will create tables with the same name as the table |
| that was backed up. If you want to side-load the tables without affecting the |
| existing tables, you can pass `--tableSuffix` to append a suffix to each |
| restored table. |
| |
| The common flags that will be used when restoring are: |
| |
| * `--rootPath`: The root path to the backup data. Accepts any Spark-compatible path. |
| ** See <<backup_directory>> for the directory structure used in the `rootPath`. |
| * `--kuduMasterAddresses`: Comma-separated addresses of Kudu masters. Default: `localhost` |
| * `--createTables`: If set to `true`, the restore process creates the tables. |
| Set to `false` if the target tables already exist. Default: `true`. |
| * `--tableSuffix`: If set, the suffix to add to the restored table names. |
| Only used when `createTables` is `true`. |
| * `--timestampMs`: A UNIX timestamp in milliseconds that defines the latest time |
| to use when selecting restore candidates. Default: `System.currentTimeMillis()` |
| * `<table>...`: A list of tables to restore. |
| |
| Note: You can see the full list of job options at anytime by passing the `--help` flag. |
| |
| Below is a full example of a `KuduRestore` job execution which will restore the tables |
| `foo` and `bar` from the HDFS directory `kudu-backups`: |
| |
| [source,bash] |
| ---- |
| spark-submit --class org.apache.kudu.backup.KuduRestore kudu-backup2_2.11-1.10.0.jar \ |
| --kuduMasterAddresses master1-host,master-2-host,master-3-host \ |
| --rootPath hdfs:///kudu-backups \ |
| foo bar |
| ---- |
| |
| ==== Backup tools |
| |
| An additional `backup-tools` jar is available to provide some backup exploration and |
| garbage collection capabilities. This jar does not use Spark directly, but instead |
| only requires the Hadoop classpath to run. |
| |
| Commands: |
| |
| * `list`: Lists the backups in the rootPath. |
| * `clean`: Cleans up old backup data in the rootPath. |
| |
| Note: You can see the full list of command options at anytime by passing the `--help` flag. |
| |
| Below is an example execution which will print the command options: |
| |
| [source,bash] |
| ---- |
| java -cp $(hadoop classpath):kudu-backup-tools-1.10.0.jar org.apache.kudu.backup.KuduBackupCLI --help |
| ---- |
| |
| [[backup_directory]] |
| ==== Backup Directory Structure |
| |
| The backup directory structure in the `rootPath` is considered an internal detail |
| and could change in future versions of Kudu. Additionally the format and content |
| of the data and metadata files is meant for the backup and restore process only |
| and could change in future versions of Kudu. That said, understanding the structure |
| of the backup `rootPath` and how it is used can be useful when working with Kudu backups. |
| |
| The backup directory structure in the `rootPath` is as follows: |
| |
| [source,bash] |
| ---- |
| /<rootPath>/<tableId>-<tableName>/<backup-id>/ |
| .kudu-metadata.json |
| part-*.<format> |
| ---- |
| |
| * `rootPath`: Can be used to distinguish separate backup groups, jobs, or concerns. |
| * `tableId`: The unique internal ID of the table being backed up. |
| * `tableName`: The name of the table being backed up. |
| ** Note: Table names are URL encoded to prevent pathing issues. |
| * `backup-id`: A way to uniquely identify/group the data for a single backup run. |
| * `.kudu-metadata.json`: Contains all of the metadata to support recreating the table, |
| linking backups by time, and handling data format changes. |
| ** Written last so that failed backups will not have a metadata file and will not be |
| considered at restore time or backup linking time. |
| * `part-*.<format>`: The data files containing the tables data. |
| ** Currently 1 part file per Kudu partition. |
| ** Incremental backups contain an additional “RowAction” byte column at the end. |
| ** Currently the only supported format/suffix is `parquet` |
| |
| ==== Troubleshooting |
| |
| ===== Generating a table list |
| |
| To generate a list of tables to backup using the `kudu table list` tool along |
| with `grep` can be useful. Below is an example that will generate a list |
| of all tables that start with `my_db.`: |
| |
| [source,bash] |
| ---- |
| kudu table list <master_addresses> | grep "^my_db\.*" | tr '\n' ' ' |
| ---- |
| |
| *Note*: This list could be saved as a part of you backup process to be used |
| at restore time as well. |
| |
| ===== Spark Tuning |
| |
| In general the Spark jobs were designed to run with minimal tuning and configuration. |
| You can adjust the number of executors and resources to increase parallelism and performance |
| using Spark's |
| link:https://spark.apache.org/docs/latest/configuration.html[configuration options]. |
| |
| If your tables are super wide and your default memory allocation is fairly low, you |
| may see jobs fail. To resolve this increase the Spark executor memory. A conservative |
| rule of thumb is 1 GiB per 50 columns. |
| |
| If your Spark resources drastically outscale the Kudu cluster you may want to limit the |
| number of concurrent tasks allowed to run on restore. |
| |
| ===== Backups on Kudu 1.9 and earlier |
| |
| If your Kudu cluster is version 1.9 or earlier you can still use the backup tool |
| introduced in Kudu 1.10 to backup your tables. However, because the incremental |
| backup feature requires server side changes, you are limited to full backups only. |
| The process to backup tables is the same as documented above, but you will need to |
| download and use the kudu-backup jar from a Kudu 1.10+ release. Before running |
| the backup job you should adjust the configuration of your servers by setting |
| `--tablet_history_max_age_sec=604800`. This is the new default value in Kudu 1.10+ |
| to ensure long running backup jobs can complete successfully and consistently. |
| Additionally, when running the backup you need to pass `--forceFull` to disable |
| the incremental backup feature. Now each time the job is run a full backup will be taken. |
| |
| NOTE: Taking full backups on a regular basis is far more resource and time intensive |
| than incremental backups. It is recommended to upgrade to Kudu 1.10+ soon as possible. |
| |
| [[physical_backup]] |
| === Physical backups of an entire node |
| |
| Kudu does not yet provide built-in physical backup and restore functionality. |
| However, it is possible to create a physical backup of a Kudu node (either |
| tablet server or master) and restore it later. |
| |
| WARNING: The node to be backed up must be offline during the procedure, or else |
| the backed up (or restored) data will be inconsistent. |
| |
| WARNING: Certain aspects of the Kudu node (such as its hostname) are embedded in |
| the on-disk data. As such, it's not yet possible to restore a physical backup of |
| a node onto another machine. |
| |
| . Stop all Kudu processes in the cluster. This prevents the tablets on the |
| backed up node from being rereplicated elsewhere unnecessarily. |
| |
| . If creating a backup, make a copy of the WAL, metadata, and data directories |
| on each node to be backed up. It is important that this copy preserve all file |
| attributes as well as sparseness. |
| |
| . If restoring from a backup, delete the existing WAL, metadata, and data |
| directories, then restore the backup via move or copy. As with creating a |
| backup, it is important that the restore preserve all file attributes and |
| sparseness. |
| |
| . Start all Kudu processes in the cluster. |
| |
| == Common Kudu workflows |
| |
| [[migrate_to_multi_master]] |
| === Migrating to Multiple Kudu Masters |
| |
| For high availability and to avoid a single point of failure, Kudu clusters should be created with |
| multiple masters. Many Kudu clusters were created with just a single master, either for simplicity |
| or because Kudu multi-master support was still experimental at the time. This workflow demonstrates |
| how to migrate to a multi-master configuration. It can also be used to migrate from two masters to |
| three, with straightforward modifications. Note that the number of masters must be odd. |
| |
| WARNING: The workflow is unsafe for adding new masters to an existing configuration that already has |
| three or more masters. Do not use it for that purpose. |
| |
| WARNING: An even number of masters doesn't provide any benefit over having one fewer masters. This |
| guide should always be used for migrating to three masters. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu UNIX user. The example |
| commands assume the Kudu Unix user is `kudu`, which is typical. |
| |
| WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If |
| using vendor-specific tools the workflow also presupposes familiarity with |
| it and the vendor's instructions should be used instead as details may differ. |
| |
| ==== Prepare for the migration |
| |
| . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| |
| . Decide how many masters to use. The number of masters should be odd. Three or five node master |
| configurations are recommended; they can tolerate one or two failures respectively. |
| |
| . Perform the following preparatory steps for the existing master: |
| * Identify and record the directories where the master's write-ahead log (WAL) and data live. If |
| using Kudu system packages, their default locations are /var/lib/kudu/master, but they may be |
| customized via the `fs_wal_dir` and `fs_data_dirs` configuration parameters. The commands below |
| assume that `fs_wal_dir` is /data/kudu/master/wal and `fs_data_dirs` is /data/kudu/master/data. |
| Your configuration may differ. For more information on configuring these directories, see the |
| link:configuration.html#directory_configuration[Kudu Configuration docs]. |
| * Identify and record the port the master is using for RPCs. The default port value is 7051, but it |
| may have been customized using the `rpc_bind_addresses` configuration parameter. |
| * Identify the master's UUID. It can be fetched using the following command: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null |
| ---- |
| master_data_dir:: existing master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null |
| 4aab798a69e94fab8d77069edff28ce0 |
| ---- |
| + |
| * Optional: configure a DNS alias for the master. The alias could be a DNS cname (if the machine |
| already has an A record in DNS), an A record (if the machine is only known by its IP address), |
| or an alias in /etc/hosts. The alias should be an abstract representation of the master (e.g. |
| `master-1`). |
| + |
| WARNING: Without DNS aliases it is not possible to recover from permanent master failures without |
| bringing the cluster down for maintenance, and as such, it is highly recommended. |
| + |
| . If you have Kudu tables that are accessed from Impala, you must update |
| the master addresses in the Apache Hive Metastore (HMS) database. |
| * If you set up the DNS aliases, run the following statement in `impala-shell`, |
| replacing `master-1`, `master-2`, and `master-3` with your actual aliases. |
| + |
| [source,sql] |
| ---- |
| ALTER TABLE table_name |
| SET TBLPROPERTIES |
| ('kudu.master_addresses' = 'master-1,master-2,master-3'); |
| ---- |
| + |
| * If you do not have DNS aliases set up, see Step #11 in the Performing |
| the migration section for updating HMS. |
| + |
| . Perform the following preparatory steps for each new master: |
| * Choose an unused machine in the cluster. The master generates very little load |
| so it can be collocated with other data services or load-generating processes, |
| though not with another Kudu master from the same configuration. |
| * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and |
| `kudu-master` packages should be installed), or via some other means. |
| * Choose and record the directory where the master's data will live. |
| * Choose and record the port the master should use for RPCs. |
| * Optional: configure a DNS alias for the master (e.g. `master-2`, `master-3`, etc). |
| |
| [[perform-the-migration]] |
| ==== Perform the migration |
| |
| . Stop all the Kudu processes in the entire cluster. |
| |
| . Format the data directory on each new master machine, and record the generated UUID. Use the |
| following command sequence: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null |
| ---- |
| + |
| master_data_dir:: new master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null |
| f5624e05f40649b79a757629a69d061e |
| ---- |
| |
| . If using CM, add the new Kudu master roles now, but do not start them. |
| * If using DNS aliases, override the empty value of the `Master Address` parameter for each role |
| (including the existing master role) with that master's alias. |
| * Add the port number (separated by a colon) if using a non-default RPC port value. |
| |
| . Rewrite the master's Raft configuration with the following command, executed on the existing |
| master machine: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <all_masters> |
| ---- |
| + |
| master_data_dir:: existing master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| all_masters:: space-separated list of masters, both new and existing. Each entry in the list must be |
| a string of the form `<uuid>:<hostname>:<port>` |
| uuid::: master's previously recorded UUID |
| hostname::: master's previously recorded hostname or alias |
| port::: master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:master-1:7051 f5624e05f40649b79a757629a69d061e:master-2:7051 988d8ac6530f426cbe180be5ba52033d:master-3:7051 |
| ---- |
| |
| . Modify the value of the `master_addresses` configuration parameter for both existing master and new masters. |
| The new value must be a comma-separated list of all of the masters. Each entry is a string of the form `<hostname>:<port>` |
| hostname:: master's previously recorded hostname or alias |
| port:: master's previously recorded RPC port number |
| |
| . Start the existing master. |
| |
| . Copy the master data to each new master with the following command, executed on each new master |
| machine. |
| + |
| WARNING: If your Kudu cluster is secure, in addition to running as the Kudu UNIX user, you must |
| authenticate as the Kudu service user prior to running this command. |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <existing_master> |
| ---- |
| + |
| master_data_dir:: new master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| existing_master:: RPC address of the existing master and must be a string of the form |
| `<hostname>:<port>` |
| hostname::: existing master's previously recorded hostname or alias |
| port::: existing master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-1:7051 |
| ---- |
| |
| . Start all of the new masters. |
| + |
| WARNING: Skip the next step if using CM. |
| + |
| . Modify the value of the `tserver_master_addrs` configuration parameter for each tablet server. |
| The new value must be a comma-separated list of masters where each entry is a string of the form |
| `<hostname>:<port>` |
| hostname:: master's previously recorded hostname or alias |
| port:: master's previously recorded RPC port number |
| |
| . Start all of the tablet servers. |
| . If you have Kudu tables that are accessed from Impala and you didn't set up |
| DNS aliases, update the HMS database manually in the underlying database that |
| provides the storage for HMS. |
| * The following is an example SQL statement you should run in the HMS database: |
| + |
| [source,sql] |
| ---- |
| UPDATE TABLE_PARAMS |
| SET PARAM_VALUE = |
| 'master-1.example.com,master-2.example.com,master-3.example.com' |
| WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 'old-master'; |
| ---- |
| + |
| * In `impala-shell`, run: |
| + |
| [source,bash] |
| ---- |
| INVALIDATE METADATA; |
| ---- |
| |
| |
| ==== Verify the migration was successful |
| |
| To verify that all masters are working properly, perform the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. |
| |
| === Recovering from a dead Kudu Master in a Multi-Master Deployment |
| |
| Kudu multi-master deployments function normally in the event of a master loss. However, it is |
| important to replace the dead master; otherwise a second failure may lead to a loss of availability, |
| depending on the number of available masters. This workflow describes how to replace the dead |
| master. |
| |
| Due to https://issues.apache.org/jira/browse/KUDU-1620[KUDU-1620], it is not possible to perform |
| this workflow without also restarting the live masters. As such, the workflow requires a |
| maintenance window, albeit a potentially brief one if the cluster was set up with DNS aliases. |
| |
| WARNING: Kudu does not yet support live Raft configuration changes for masters. As such, it is only |
| possible to replace a master if the deployment was created with DNS aliases or if every node in the |
| cluster is first shut down. See the <<migrate_to_multi_master,multi-master migration workflow>> for |
| more details on deploying with DNS aliases. |
| |
| WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If |
| using vendor-specific tools the workflow also presupposes familiarity with |
| it and the vendor's instructions should be used instead as details may differ. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu UNIX user, typically |
| `kudu`. |
| |
| ==== Prepare for the recovery |
| |
| . If the deployment was configured without DNS aliases perform the following steps: |
| * Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| * Shut down all Kudu tablet server processes in the cluster. |
| |
| . Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from |
| accidentally restarting; this can be quite dangerous for the cluster post-recovery. |
| |
| . Choose one of the remaining live masters to serve as a basis for recovery. The rest of this |
| workflow will refer to this master as the "reference" master. |
| |
| . Choose an unused machine in the cluster where the new master will live. The master generates very |
| little load so it can be collocated with other data services or load-generating processes, though |
| not with another Kudu master from the same configuration. |
| The rest of this workflow will refer to this master as the "replacement" master. |
| |
| . Perform the following preparatory steps for the replacement master: |
| * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and |
| `kudu-master` packages should be installed), or via some other means. |
| * Choose and record the directory where the master's data will live. |
| |
| . Perform the following preparatory steps for each live master: |
| * Identify and record the directory where the master's data lives. If using Kudu system packages, |
| the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir` and |
| `fs_data_dirs` configuration parameters. Please note if you've set `fs_data_dirs` to some directories |
| other than the value of `fs_wal_dir`, it should be explicitly included in every command below where |
| `fs_wal_dir` is also included. For more information on configuring these directories, see the |
| link:configuration.html#directory_configuration[Kudu Configuration docs]. |
| * Identify and record the master's UUID. It can be fetched using the following command: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null |
| ---- |
| master_data_dir:: live master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null |
| 80a82c4b8a9f4c819bab744927ad765c |
| ---- |
| + |
| . Perform the following preparatory steps for the reference master: |
| * Identify and record the directory where the master's data lives. If using Kudu system packages, |
| the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir` and |
| `fs_data_dirs` configuration parameters. Please note if you've set `fs_data_dirs` to some directories |
| other than the value of `fs_wal_dir`, it should be explicitly included in every command below where |
| `fs_wal_dir` is also included. For more information on configuring these directories, see the |
| link:configuration.html#directory_configuration[Kudu Configuration docs]. |
| * Identify and record the UUIDs of every master in the cluster, using the following command: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> 2>/dev/null |
| ---- |
| master_data_dir:: reference master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 2>/dev/null |
| 80a82c4b8a9f4c819bab744927ad765c 2a73eeee5d47413981d9a1c637cce170 1c3f3094256347528d02ec107466aef3 |
| ---- |
| + |
| . Using the two previously-recorded lists of UUIDs (one for all live masters and one for all |
| masters), determine and record (by process of elimination) the UUID of the dead master. |
| |
| ==== Perform the recovery |
| |
| . Format the data directory on the replacement master machine using the previously recorded |
| UUID of the dead master. Use the following command sequence: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] --uuid=<uuid> |
| ---- |
| + |
| master_data_dir:: replacement master's previously recorded data directory |
| uuid:: dead master's previously recorded UUID |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data --uuid=80a82c4b8a9f4c819bab744927ad765c |
| ---- |
| + |
| . Copy the master data to the replacement master with the following command: |
| + |
| WARNING: If your Kudu cluster is secure, in addition to running as the Kudu UNIX user, you must |
| authenticate as the Kudu service user prior to running this command. |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <reference_master> |
| ---- |
| + |
| master_data_dir:: replacement master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| reference_master:: RPC address of the reference master and must be a string of the form |
| `<hostname>:<port>` |
| hostname::: reference master's previously recorded hostname or alias |
| port::: reference master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-2:7051 |
| ---- |
| + |
| . If using CM, add the replacement Kudu master role now, but do not start it. |
| * Override the empty value of the `Master Address` parameter for the new role with the replacement |
| master's alias. |
| * Add the port number (separated by a colon) if using a non-default RPC port value. |
| |
| . If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead master to point |
| at the replacement master. |
| |
| . If the cluster was set up without DNS aliases, perform the following steps: |
| * Stop the remaining live masters. |
| * Rewrite the Raft configurations on these masters to include the replacement master. See Step 4 of |
| <<perform-the-migration, Perform the Migration>> for more details. |
| |
| . Start the replacement master. |
| |
| . Restart the remaining masters in the new multi-master deployment. While the masters are shut down, |
| there will be an availability outage, but it should last only as long as it takes for the masters |
| to come back up. |
| |
| Congratulations, the dead master has been replaced! To verify that all masters are working properly, |
| consider performing the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. |
| |
| === Removing Kudu Masters from a Multi-Master Deployment |
| |
| In the event that a multi-master deployment has been overallocated nodes, the following steps should |
| be taken to remove the unwanted masters. |
| |
| WARNING: In planning the new multi-master configuration, keep in mind that the number of masters |
| should be odd and that three or five node master configurations are recommended. |
| |
| WARNING: Dropping the number of masters below the number of masters currently needed for a Raft |
| majority can incur data loss. To mitigate this, ensure that the leader master is not removed during |
| this process. |
| |
| ==== Prepare for the removal |
| |
| . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| |
| . Identify the UUID and RPC address current leader of the multi-master deployment by visiting the |
| `/masters` page of any master's web UI. This master must not be removed during this process; its |
| removal may result in severe data loss. |
| |
| . Stop all the Kudu processes in the entire cluster. |
| |
| . If using CM, remove the unwanted Kudu master. |
| |
| ==== Perform the removal |
| |
| . Rewrite the Raft configuration on the remaining masters to include only the remaining masters. See |
| Step 4 of <<perform-the-migration,Perform the Migration>> for more details. |
| |
| . Remove the data directories and WAL directory on the unwanted masters. This is a precaution to |
| ensure that they cannot start up again and interfere with the new multi-master deployment. |
| |
| . Modify the value of the `master_addresses` configuration parameter for the masters of the new |
| multi-master deployment. If migrating to a single-master deployment, the `master_addresses` flag |
| should be omitted entirely. |
| |
| . Start all of the masters that were not removed. |
| |
| . Modify the value of the `tserver_master_addrs` configuration parameter for the tablet servers to |
| remove any unwanted masters. |
| |
| . Start all of the tablet servers. |
| |
| ==== Verify the migration was successful |
| |
| To verify that all masters are working properly, perform the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. |
| |
| === Changing the master hostnames |
| |
| To prevent long maintenance windows when replacing dead masters, DNS aliases should be used. If the |
| cluster was set up without aliases, changing the host names can be done by following the below |
| steps. |
| |
| ==== Prepare for the hostname change |
| |
| . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| |
| . Note the UUID and RPC address of every master by visiting the `/masters` page of any master's web |
| UI. |
| |
| . Stop all the Kudu processes in the entire cluster. |
| |
| . Set up the new hostnames to point to the masters and verify all servers and clients properly |
| resolve them. |
| |
| ==== Perform the hostname change |
| |
| . Rewrite each master’s Raft configuration with the following command, executed on all master hosts: |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 00000000000000000000000000000000 <all_masters> |
| ---- |
| |
| For example: |
| |
| ---- |
| $ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:new-master-name-1:7051 f5624e05f40649b79a757629a69d061e:new-master-name-2:7051 988d8ac6530f426cbe180be5ba52033d:new-master-name-3:7051 |
| ---- |
| |
| . Change the masters' gflagfile so the `master_addresses` parameter reflects the new hostnames. |
| |
| . Change the `tserver_master_addrs` parameter in the tablet servers' gflagfiles to the new |
| hostnames. |
| |
| . Start up the masters. |
| |
| . To verify that all masters are working properly, perform the following sanity checks: |
| |
| .. Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| .. Run the below command to verify all masters are up and listening. The UUIDs |
| should be the same and belong to the same master as before the hostname change: |
| + |
| ---- |
| $ sudo -u kudu kudu master list new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051 |
| ---- |
| |
| . Start all of the tablet servers. |
| |
| . Run a Kudu system check (ksck) on the cluster using the `kudu` command line |
| tool. See <<ksck>> for more details. After startup, some tablets may be |
| unavailable as it takes some time to initialize all of them. |
| |
| . If you have Kudu tables that are accessed from Impala, update the HMS |
| database manually in the underlying database that provides the storage for HMS. |
| |
| .. The following is an example SQL statement you should run in the HMS database: |
| + |
| [source,sql] |
| ---- |
| UPDATE TABLE_PARAMS |
| SET PARAM_VALUE = |
| 'new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051' |
| WHERE PARAM_KEY = 'kudu.master_addresses' |
| AND PARAM_VALUE = 'master-1:7051,master-2:7051,master-3:7051'; |
| ---- |
| + |
| .. In `impala-shell`, run: |
| + |
| [source,bash] |
| ---- |
| INVALIDATE METADATA; |
| ---- |
| + |
| .. Verify updating the metadata worked by running a simple `SELECT` query on a |
| Kudu-backed Impala table. |
| |
| [[adding_tablet_servers]] |
| === Best Practices When Adding New Tablet Servers |
| |
| A common workflow when administering a Kudu cluster is adding additional tablet |
| server instances, in an effort to increase storage capacity, decrease load or |
| utilization on individual hosts, increase compute power, etc. |
| |
| By default, any newly added tablet servers will not be utilized immediately |
| after their addition to the cluster. Instead, newly added tablet servers will |
| only be utilized when new tablets are created or when existing tablets need to |
| be replicated, which can lead to imbalanced nodes. It's recommended to run |
| the rebalancer CLI tool just after adding a new tablet server into the cluster, |
| as described in the enumerated steps below. |
| |
| Avoid placing multiple tablet servers on a single node. Doing so |
| nullifies the point of increasing the overall storage capacity of a Kudu |
| cluster and increases the likelihood of tablet unavailability when a single |
| node fails (the latter drawback is not applicable if the cluster is properly |
| configured to use the |
| link:https://kudu.apache.org/docs/administration.html#rack_awareness[location |
| awareness] feature). |
| |
| To add additional tablet servers to an existing cluster, the |
| following steps can be taken to ensure tablet replicas are uniformly |
| distributed across the cluster: |
| |
| 1. Ensure that Kudu is installed on the new machines being added to the |
| cluster, and that the new instances have been |
| link:https://kudu.apache.org/docs/configuration.html#_configuring_tablet_servers[ |
| correctly configured] to point to the pre-existing cluster. Then, start up |
| the new tablet server instances. |
| 2. Verify that the new instances check in with the Kudu Master(s) |
| successfully. A quick method for verifying they've successfully checked in |
| with the existing Master instances is to view the Kudu Master WebUI, |
| specifically the `/tablet-servers` section, and validate that the newly |
| added instances are registered, and heartbeating. |
| 3. Once the tablet server(s) are successfully online and healthy, follow |
| the steps to run the |
| link:https://kudu.apache.org/docs/administration.html#rebalancer_tool[ |
| rebalancing tool] which will spread existing tablet replicas to the newly added |
| tablet servers. |
| 4. After the rebalancer tool has completed, or even during its execution, |
| you can check on the health of the cluster using the `ksck` command-line utility |
| (see <<ksck>> for more details). |
| |
| [[ksck]] |
| === Checking Cluster Health with `ksck` |
| |
| The `kudu` CLI includes a tool named `ksck` that can be used for gathering |
| information about the state of a Kudu cluster, including checking its health. |
| `ksck` will identify issues such as under-replicated tablets, unreachable |
| tablet servers, or tablets without a leader. |
| |
| `ksck` should be run from the command line as the Kudu admin user, and requires |
| the full list of master addresses to be specified: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu cluster ksck master-01.example.com,master-02.example.com,master-03.example.com |
| ---- |
| |
| To see a full list of the options available with `ksck`, use the `--help` flag. |
| If the cluster is healthy, `ksck` will print information about the cluster, a |
| success message, and return a zero (success) exit status. |
| |
| ---- |
| Master Summary |
| UUID | Address | Status |
| ----------------------------------+-----------------------+--------- |
| a811c07b99394df799e6650e7310f282 | master-01.example.com | HEALTHY |
| b579355eeeea446e998606bcb7e87844 | master-02.example.com | HEALTHY |
| cfdcc8592711485fad32ec4eea4fbfcd | master-02.example.com | HEALTHY |
| |
| Tablet Server Summary |
| UUID | Address | Status |
| ----------------------------------+------------------------+--------- |
| a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY |
| e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY |
| e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | HEALTHY |
| |
| Version Summary |
| Version | Servers |
| ---------+------------------------- |
| 1.7.1 | all 6 server(s) checked |
| |
| Summary by table |
| Name | RF | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable |
| ----------+----+---------+---------------+---------+------------+------------------+------------- |
| my_table | 3 | HEALTHY | 8 | 8 | 0 | 0 | 0 |
| |
| | Total Count |
| ----------------+------------- |
| Masters | 3 |
| Tablet Servers | 3 |
| Tables | 1 |
| Tablets | 8 |
| Replicas | 24 |
| OK |
| ---- |
| |
| If the cluster is unhealthy, for instance if a tablet server process has |
| stopped, `ksck` will report the issue(s) and return a non-zero exit status, as |
| shown in the abbreviated snippet of `ksck` output below: |
| |
| ---- |
| Tablet Server Summary |
| UUID | Address | Status |
| ----------------------------------+------------------------+------------- |
| a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY |
| e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY |
| e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | UNAVAILABLE |
| Error from 127.0.0.1:7150: Network error: could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7150: connect: Connection refused (error 61) (UNAVAILABLE) |
| |
| ... (full output elided) |
| |
| ================== |
| Errors: |
| ================== |
| Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors |
| Corruption: table consistency check error: 1 out of 1 table(s) are not healthy |
| |
| FAILED |
| Runtime error: ksck discovered errors |
| ---- |
| |
| To verify data integrity, the optional `--checksum_scan` flag can be set, which |
| will ensure the cluster has consistent data by scanning each tablet replica and |
| comparing results. The `--tables` or `--tablets` flags can be used to limit the |
| scope of the checksum scan to specific tables or tablets, respectively. For |
| example, checking data integrity on the `my_table` table can be done with the |
| following command: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu cluster ksck --checksum_scan --tables my_table master-01.example.com,master-02.example.com,master-03.example.com |
| ---- |
| |
| By default, `ksck` will attempt to use a snapshot scan of the table, so the |
| checksum scan can be done while writes continue. |
| |
| Finally, `ksck` also supports output in JSON format using the `--ksck_format` |
| flag. JSON output contains the same information as the plain text output, but |
| in a format that can be used by other tools. See `kudu cluster ksck --help` for |
| more information. |
| |
| [[change_dir_config]] |
| === Changing Directory Configurations |
| |
| For higher read parallelism and larger volumes of storage per server, users may |
| want to configure servers to store data in multiple directories on different |
| devices. Users can add or remove data directories to an existing master or |
| tablet server by updating the `--fs_data_dirs` gflag configuration and |
| restarting the server. Data is striped across data directories, and when a new |
| data directory is added, new data will be striped across the union of the old |
| and new directories. |
| |
| WARNING: Removing a data directory from `--fs_data_dirs` may result in failed tablet |
| replicas in cases where there were data blocks in the directory that was |
| removed. Use `ksck` to ensure the cluster can fully recover from the directory |
| removal before moving onto another server. |
| |
| WARNING: In versions of Kudu below 1.12, Kudu requires that the `kudu fs |
| update_dirs` tool be run before restarting with a different set of data |
| directories. Such versions will fail to start if not run. |
| |
| If on a Kudu version below 1.12, once a server is started, users must go |
| through the below steps to change the directory configuration: |
| |
| NOTE: Unless the `--force` flag is specified, Kudu will not allow for the |
| removal of a directory across which tablets are configured to spread data. If |
| `--force` is specified, all tablets configured to use that directory will fail |
| upon starting up and be replicated elsewhere. |
| |
| NOTE: If the link:configuration.html#directory_configuration[metadata |
| directory] overlaps with a data directory, as was the default prior to Kudu |
| 1.7, or if a non-default metadata directory is configured, the |
| `--fs_metadata_dir` configuration must be specified when running the `kudu fs |
| update_dirs` tool. |
| |
| NOTE: Only new tablet replicas (i.e. brand new tablets' replicas and replicas |
| that are copied to the server for high availability) will use the new |
| directory. Existing tablet replicas on the server will not be rebalanced across |
| the new directory. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu |
| UNIX user, typically `kudu`. |
| |
| . Use `ksck` to ensure the cluster is healthy, and establish a |
| <<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance |
| window>> to bring the tablet server offline. |
| |
| . Run the tool with the desired directory configuration flags. For example, if a |
| cluster was set up with `--fs_wal_dir=/wals`, `--fs_metadata_dir=/meta`, and |
| `--fs_data_dirs=/data/1,/data/2,/data/3`, and `/data/3` is to be removed (e.g. |
| due to a disk error), run the command: |
| |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_metadata_dir=/meta --fs_data_dirs=/data/1,/data/2 |
| ---- |
| + |
| |
| . Modify the value of the `--fs_data_dirs` flag for the updated server. If using |
| CM, make sure to only update the configurations of the updated server, rather |
| than of the entire Kudu service. |
| |
| . Once complete, the server process can be started. When Kudu is installed using |
| system packages, `service` is typically used: |
| |
| + |
| [source,bash] |
| ---- |
| $ sudo service kudu-tserver start |
| ---- |
| + |
| |
| . Use `ksck` to ensure Kudu returns to a healthy state before resuming normal |
| operation. |
| |
| |
| [[disk_failure_recovery]] |
| === Recovering from Disk Failure |
| Kudu nodes can only survive failures of disks on which certain Kudu directories |
| are mounted. For more information about the different Kudu directory types, see |
| the section on link:configuration.html#directory_configuration[Kudu Directory |
| Configurations]. Below describes this behavior across different Apache Kudu |
| releases. |
| |
| [[disk_failure_behavior]] |
| .Kudu Disk Failure Behavior |
| [cols="<,<,<",options="header"] |
| |=== |
| | Node Type | Kudu Directory Type | Kudu Releases that Crash on Disk Failure |
| | Master | All | All |
| | Tablet Server | Directory containing WALs | All |
| | Tablet Server | Directory containing tablet metadata | All |
| | Tablet Server | Directory containing data blocks only | Pre-1.6.0 |
| |=== |
| |
| When a disk failure occurs that does not lead to a crash, Kudu will stop using |
| the affected directory, shut down tablets with blocks on the affected |
| directories, and automatically re-replicate the affected tablets to other |
| tablet servers. The affected server will remain alive and print messages to the |
| log indicating the disk failure, for example: |
| |
| ---- |
| E1205 19:06:24.163748 27115 data_dirs.cc:1011] Directory /data/8/kudu/data marked as failed |
| E1205 19:06:30.324795 27064 log_block_manager.cc:1822] Not using report from /data/8/kudu/data: IO error: Could not open container 0a6283cab82d4e75848f49772d2638fe: /data/8/kudu/data/0a6283cab82d4e75848f49772d2638fe.metadata: Read-only file system (error 30) |
| E1205 19:06:33.564638 27220 ts_tablet_manager.cc:946] T 4957808439314e0d97795c1394348d80 P 70f7ee61ead54b1885d819f354eb3405: aborting tablet bootstrap: tablet has data in a failed directory |
| ---- |
| |
| While in this state, the affected node will avoid using the failed disk, |
| leading to lower storage volume and reduced read parallelism. The administrator |
| can remove the failed directory from the `--fs_data_dirs` gflag to avoid seeing |
| these errors. |
| |
| WARNING: In versions of Kudu below 1.12, in order to start Kudu with a |
| different set of directories, the administrator should schedule a brief window |
| to <<change_dir_config,update the node's directory configuration>>. Kudu will |
| fail to start otherwise. |
| |
| When the disk is repaired, remounted, and ready to be reused by Kudu, take the |
| following steps: |
| |
| . Make sure that the Kudu portion of the disk is completely empty. |
| . Stop the tablet server. |
| . Update the `--fs_data_dirs` gflag to add `/data/3`, potentially using the |
| `update_dirs` tool if on a version of Kudu that is below 1.12: |
| + |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_data_dirs=/data/1,/data/2,/data/3 |
| ---- |
| + |
| |
| . Start the tablet server. |
| . Run `ksck` to verify cluster health. |
| + |
| [source,bash] |
| ---- |
| sudo -u kudu kudu cluster ksck master-01.example.com |
| ---- |
| + |
| |
| |
| Note that existing tablets will not stripe to the restored disk, but any new tablets |
| will stripe to the restored disk. |
| |
| [[disk_full_recovery]] |
| === Recovering from Full Disks |
| By default, Kudu reserves a small amount of space (1% by capacity) in its |
| directories; Kudu considers a disk full if there is less free space available |
| than the reservation. Kudu nodes can only tolerate running out of space on disks |
| on which certain Kudu directories are mounted. For more information about the |
| different Kudu directory types, see |
| link:configuration.html#directory_configuration[Kudu Directory Configurations]. |
| The table below describes this behavior for each type of directory. The behavior |
| is uniform across masters and tablet servers. |
| [[disk_full_behavior]] |
| .Kudu Full Disk Behavior |
| [options="header"] |
| |=== |
| | Kudu Directory Type | Crash on a Full Disk? |
| | Directory containing WALs | Yes |
| | Directory containing tablet metadata | Yes |
| | Directory containing data blocks only | No (see below) |
| |=== |
| |
| Prior to Kudu 1.7.0, Kudu stripes tablet data across all directories, and will |
| avoid writing data to full directories. Kudu will crash if all data directories |
| are full. |
| |
| In 1.7.0 and later, new tablets are assigned a disk group consisting of |
| `--fs_target_data_dirs_per_tablet` data dirs (default 3). If Kudu is not |
| configured with enough data directories for a full disk group, all data |
| directories are used. When a data directory is full, Kudu will stop writing new |
| data to it and each tablet that uses that data directory will write new data to |
| other data directories within its group. If all data directories for a tablet |
| are full, Kudu will crash. Periodically, Kudu will check if full data |
| directories are still full, and will resume writing to those data directories |
| if space has become available. |
| |
| If Kudu does crash because its data directories are full, freeing space on the |
| full directories will allow the affected daemon to restart and resume writing. |
| Note that it may be possible for Kudu to free some space by running |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu fs check --repair |
| ---- |
| |
| but this command may also fail if there is too little space left. |
| |
| It's also possible to allocate additional data directories to Kudu in order to |
| increase the overall amount of storage available. See the documentation on |
| <<change_dir_config,updating a node's directory configuration>> for more |
| information. Note that existing tablets will not use new data directories, so |
| adding a new data directory does not resolve issues with full disks. |
| |
| [[tablet_majority_down_recovery]] |
| === Bringing a tablet that has lost a majority of replicas back online |
| |
| If a tablet has permanently lost a majority of its replicas, it cannot recover |
| automatically and operator intervention is required. If the tablet servers |
| hosting a majority of the replicas are down (i.e. ones reported as "TS |
| unavailable" by ksck), they should be recovered instead if possible. |
| |
| WARNING: The steps below may cause recent edits to the tablet to be lost, |
| potentially resulting in permanent data loss. Only attempt the procedure below |
| if it is impossible to bring a majority back online. |
| |
| Suppose a tablet has lost a majority of its replicas. The first step in |
| diagnosing and fixing the problem is to examine the tablet's state using ksck: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu cluster ksck --tablets=e822cab6c0584bc0858219d1539a17e6 master-00,master-01,master-02 |
| Connected to the Master |
| Fetched info from all 5 Tablet Servers |
| Tablet e822cab6c0584bc0858219d1539a17e6 of table 'my_table' is unavailable: 2 replica(s) not RUNNING |
| 638a20403e3e4ae3b55d4d07d920e6de (tserver-00:7150): RUNNING |
| 9a56fa85a38a4edc99c6229cba68aeaa (tserver-01:7150): bad state |
| State: FAILED |
| Data state: TABLET_DATA_READY |
| Last status: <failure message> |
| c311fef7708a4cf9bb11a3e4cbcaab8c (tserver-02:7150): bad state |
| State: FAILED |
| Data state: TABLET_DATA_READY |
| Last status: <failure message> |
| ---- |
| |
| This output shows that, for tablet `e822cab6c0584bc0858219d1539a17e6`, the two |
| tablet replicas on `tserver-01` and `tserver-02` failed. The remaining replica |
| is not the leader, so the leader replica failed as well. This means the chance |
| of data loss is higher since the remaining replica on `tserver-00` may have |
| been lagging. In general, to accept the potential data loss and restore the |
| tablet from the remaining replicas, divide the tablet replicas into two groups: |
| |
| 1. Healthy replicas: Those in `RUNNING` state as reported by ksck |
| 2. Unhealthy replicas |
| |
| For example, in the above ksck output, the replica on tablet server `tserver-00` |
| is healthy, while the replicas on `tserver-01` and `tserver-02` are unhealthy. |
| On each tablet server with a healthy replica, alter the consensus configuration |
| to remove unhealthy replicas. In the typical case of 1 out of 3 surviving |
| replicas, there will be only one healthy replica, so the consensus configuration |
| will be rewritten to include only the healthy replica. |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu remote_replica unsafe_change_config tserver-00:7150 <tablet-id> <tserver-00-uuid> |
| ---- |
| |
| where `<tablet-id>` is `e822cab6c0584bc0858219d1539a17e6` and |
| `<tserver-00-uuid>` is the uuid of `tserver-00`, |
| `638a20403e3e4ae3b55d4d07d920e6de`. |
| |
| Once the healthy replicas' consensus configurations have been forced to exclude |
| the unhealthy replicas, the healthy replicas will be able to elect a leader. |
| The tablet will become available for writes, though it will still be |
| under-replicated. Shortly after the tablet becomes available, the leader master |
| will notice that it is under-replicated, and will cause the tablet to |
| re-replicate until the proper replication factor is restored. The unhealthy |
| replicas will be tombstoned by the master, causing their remaining data to be |
| deleted. |
| |
| [[rebuilding_kudu]] |
| === Rebuilding a Kudu Filesystem Layout |
| |
| In the event that critical files are lost, i.e. WALs or tablet-specific |
| metadata, all Kudu directories on the server must be deleted and rebuilt to |
| ensure correctness. Doing so will destroy the copy of the data for each tablet |
| replica hosted on the local server. Kudu will automatically re-replicate tablet |
| replicas removed in this way, provided the replication factor is at least three |
| and all other servers are online and healthy. |
| |
| NOTE: These steps use a tablet server as an example, but the steps are the same |
| for Kudu master servers. |
| |
| WARNING: If multiple nodes need their FS layouts rebuilt, wait until all |
| replicas previously hosted on each node have finished automatically |
| re-replicating elsewhere before continuing. Failure to do so can result in |
| permanent data loss. |
| |
| WARNING: Before proceeding, ensure the contents of the directories are backed |
| up, either as a copy or in the form of other tablet replicas. |
| |
| . The first step to rebuilding a server with a new directory configuration is |
| emptying all of the server's existing directories. For example, if a tablet |
| server is configured with `--fs_wal_dir=/data/0/kudu-tserver-wal`, |
| `--fs_metadata_dir=/data/0/kudu-tserver-meta`, and |
| `--fs_data_dirs=/data/1/kudu-tserver,/data/2/kudu-tserver`, the following |
| commands will remove the WAL directory's and data directories' contents: |
| |
| + |
| [source,bash] |
| ---- |
| # Note: this will delete all of the data from the local tablet server. |
| $ rm -rf /data/0/kudu-tserver-wal/* /data/0/kudu-tserver-meta/* /data/1/kudu-tserver/* /data/2/kudu-tserver/* |
| ---- |
| + |
| |
| . If using CM, update the configurations for the rebuilt server to include only |
| the desired directories. Make sure to only update the configurations of servers |
| to which changes were applied, rather than of the entire Kudu service. |
| |
| . After directories are deleted, the server process can be started with the new |
| directory configuration. The appropriate sub-directories will be created by |
| Kudu upon starting up. |
| |
| [[minimizing_cluster_disruption_during_temporary_single_ts_downtime]] |
| === Minimizing cluster disruption during temporary planned downtime of a single tablet server |
| |
| If a single tablet server is brought down temporarily in a healthy cluster, all |
| tablets will remain available and clients will function as normal, after |
| potential short delays due to leader elections. However, if the downtime lasts |
| for more than `--follower_unavailable_considered_failed_sec` (default 300) |
| seconds, the tablet replicas on the down tablet server will be replaced by new |
| replicas on available tablet servers. This will cause stress on the cluster |
| as tablets re-replicate and, if the downtime lasts long enough, significant |
| reduction in the number of replicas on the down tablet server, which would |
| require the rebalancer to fix. |
| |
| To work around this, in Kudu versions from 1.11 onwards, the `kudu` CLI |
| contains a tool to put tablet servers into maintenance mode. While in this |
| state, the tablet server’s replicas are not re-replicated due to its downtime |
| alone, though re-replication may still occur in the event that the server in |
| maintenance suffers from a disk failure or if a follower replica on the tablet |
| server falls too far behind its leader replica. Upon exiting maintenance, |
| re-replication is triggered for any remaining under-replicated tablets. |
| |
| The `kudu tserver state enter_maintenance` and `kudu tserver state |
| exit_maintenance` tools are added to orchestrate tablet server maintenance. |
| The following can be run from a tablet server to put it into maintenance: |
| |
| [source,bash] |
| ---- |
| $ TS_UUID=$(sudo -u kudu kudu fs dump uuid --fs_wal_dir=<wal_dir> --fs_data_dirs=<data_dirs>) |
| $ sudo -u kudu kudu tserver state enter_maintenance <master_addresses> "$TS_UUID" |
| ---- |
| |
| The tablet server maintenance mode is shown in the "Tablet Servers" page of the |
| Kudu leader master's web UI, and in the output of `kudu cluster ksck`. To exit |
| maintenance mode, run the following: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu tserver state exit_maintenance <master_addresses> "$TS_UUID" |
| ---- |
| |
| In versions prior to 1.11, a different approach must be used to prevent |
| unwanted re-replication. Increase |
| `--follower_unavailable_considered_failed_sec` on all tablet servers so the |
| amount of time before re-replication starts is longer than the expected |
| downtime of the tablet server, including the time it takes the tablet server to |
| restart and bootstrap its tablet replicas. To do this, run the following |
| command for each tablet server: |
| |
| [source,bash] |
| ---- |
| $ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <num_seconds> |
| ---- |
| |
| where `<num_seconds>` is the number of seconds that will encompass the downtime. |
| Once the downtime is finished, reset the flag to its original value. |
| |
| ---- |
| $ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <original_value> |
| ---- |
| |
| WARNING: Be sure to reset the value of `--follower_unavailable_considered_failed_sec` |
| to its original value. |
| |
| NOTE: On Kudu versions prior to 1.8, the `--force` flag must be provided in the above |
| `set_flag` commands. |
| |
| [[rolling_restart]] |
| === Orchestrating a rolling restart with no downtime |
| |
| As of Kudu 1.12, tooling is available to restart a cluster with no downtime. To |
| perform such a "rolling restart", perform the following sequence: |
| |
| . Restart the master(s) one-by-one. If there is only a single master, this may |
| cause brief interference with on-going workloads. |
| . Starting with a single tablet server, put the tablet server into |
| <<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance |
| mode>> by using the `kudu tserver state enter_maintenance` tool. |
| . Start quiescing the tablet server using the `kudu tserver quiesce start` |
| tool. This will signal to Kudu to stop hosting leaders on the specified |
| tablet server and to redirect new scan requests to other tablet servers. |
| . Periodically run `kudu tserver quiesce start` with the |
| `--error_if_not_fully_quiesced` option, until it returns success, indicating |
| that all leaders have been moved away from the tablet server and all on-going |
| scans have completed. |
| . Restart the tablet server. |
| . Periodically run `ksck` until the cluster is reported to be healthy. |
| . Exit maintenance mode on the tablet server by running `kudu tserver state |
| exit_maintenance`. This will allow new tablet replicas to be placed on the |
| tablet server. |
| . Repeat these steps for all tablet servers in the cluster. |
| |
| WARNING: If any tables in the cluster have a replication factor of 1, some |
| quiescing tablet servers will never become fully quiesced, as single-replica |
| tablets will not naturally relinquish leadership. If such tables exist, use the |
| `kudu cluster rebalance` tool to move replicas of these tables away from the |
| quiescing tablet server by specifying the `--ignored_tservers`, |
| `--move_replicas_from_ignored_tservers`, and `--tables` options. |
| |
| NOTE: If running with <<rack_awareness,rack awareness>>, the above steps can be |
| performed restarting multiple tablet servers within a single rack at the same |
| time. Users should use `ksck` to ensure the location assignment policy is |
| enforced while going through these steps, and that no more than a single |
| location is restarted at the same time. At least three locations should be |
| defined in the cluster to safely restart multiple tablet service within one |
| location. |
| |
| [[rebalancer_tool]] |
| === Running the tablet rebalancing tool |
| |
| The `kudu` CLI contains a rebalancing tool that can be used to rebalance |
| tablet replicas among tablet servers. For each table, the tool attempts to |
| balance the number of replicas per tablet server. It will also, without |
| unbalancing any table, attempt to even out the number of replicas per tablet |
| server across the cluster as a whole. The rebalancing tool should be run as the |
| Kudu admin user, specifying all master addresses: |
| |
| [source,bash] |
| ---- |
| sudo -u kudu kudu cluster rebalance master-01.example.com,master-02.example.com,master-03.example.com |
| ---- |
| |
| When run, the rebalancer will report on the initial tablet replica distribution |
| in the cluster, log the replicas it moves, and print a final summary of the |
| distribution when it terminates: |
| |
| ---- |
| Per-server replica distribution summary: |
| Statistic | Value |
| -----------------------+----------- |
| Minimum Replica Count | 0 |
| Maximum Replica Count | 24 |
| Average Replica Count | 14.400000 |
| |
| Per-table replica distribution summary: |
| Replica Skew | Value |
| --------------+---------- |
| Minimum | 8 |
| Maximum | 8 |
| Average | 8.000000 |
| |
| I0613 14:18:49.905897 3002065792 rebalancer.cc:779] tablet e7ee9ade95b342a7a94649b7862b345d: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled |
| I0613 14:18:49.917578 3002065792 rebalancer.cc:779] tablet 5f03944529f44626a0d6ec8b1edc566e: 6e64c4165b864cbab0e67ccd82091d60 -> ba8c22ab030346b4baa289d6d11d0809 move scheduled |
| I0613 14:18:49.928683 3002065792 rebalancer.cc:779] tablet 9373fee3bfe74cec9054737371a3b15d: fab382adf72c480984c6cc868fdd5f0e -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled |
| |
| ... (full output elided) |
| |
| I0613 14:19:01.162802 3002065792 rebalancer.cc:842] tablet f4c046f18b174cc2974c65ac0bf52767: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move completed: OK |
| |
| rebalancing is complete: cluster is balanced (moved 28 replicas) |
| Per-server replica distribution summary: |
| Statistic | Value |
| -----------------------+----------- |
| Minimum Replica Count | 14 |
| Maximum Replica Count | 15 |
| Average Replica Count | 14.400000 |
| |
| Per-table replica distribution summary: |
| Replica Skew | Value |
| --------------+---------- |
| Minimum | 1 |
| Maximum | 1 |
| Average | 1.000000 |
| ---- |
| |
| If more details are needed in addition to the replica distribution summary, |
| use the `--output_replica_distribution_details` flag. If added, the flag makes |
| the tool print per-table and per-tablet server replica distribution statistics |
| as well. |
| |
| Use the `--report_only` flag to get a report on table- and cluster-wide |
| replica distribution statistics without starting any rebalancing activity. |
| |
| The rebalancer can also be restricted to run on a subset of the tables by |
| supplying the `--tables` flag. Note that, when running on a subset of tables, |
| the tool will not attempt to balance the cluster as a whole. |
| |
| The length of time rebalancing is run for can be controlled with the flag |
| `--max_run_time_sec`. By default, the rebalancer will run until the cluster is |
| balanced. To control the amount of resources devoted to rebalancing, modify |
| the flag `--max_moves_per_server`. See `kudu cluster rebalance --help` for more. |
| |
| It's safe to stop the rebalancer tool at any time. When restarted, the |
| rebalancer will continue rebalancing the cluster. |
| |
| The rebalancer requires all registered tablet servers to be up and running |
| to proceed with the rebalancing process. That's to avoid possible conflicts |
| and races with the automatic re-replication and keep replica placement optimal |
| for current configuration of the cluster. If a tablet server becomes |
| unavailable during the rebalancing session, the rebalancer will exit. As noted |
| above, it's safe to restart the rebalancer after resolving the issue with |
| unavailable tablet servers. |
| |
| The rebalancing tool can rebalance Kudu clusters running older versions as well, |
| with some restrictions. Consult the following table for more information. In the |
| table, "RF" stands for "replication factor". |
| |
| [[rebalancer_compatibility]] |
| .Kudu Rebalancing Tool Compatibility |
| [options="header"] |
| |=== |
| | Version Range | Rebalances RF = 1 Tables? | Rebalances RF > 1 Tables? |
| | v < 1.4.0 | No | No |
| | 1.4.0 +<=+ v < 1.7.1 | No | Yes |
| | v >= 1.7.1 | Yes | Yes |
| |=== |
| |
| If the rebalancer is running against a cluster where rebalancing replication |
| factor one tables is not supported, it will rebalance all the other tables |
| and the cluster as if those singly-replicated tables did not exist. |
| |
| [[rebalancer_tool_with_rack_awareness]] |
| === Running the tablet rebalancing tool on a rack-aware cluster |
| |
| As detailed in the <<rack_awareness, rack awareness>> section, it's possible |
| to use the `kudu cluster rebalance` tool to establish the placement policy on a |
| cluster. This might be necessary when the rack awareness feature is first |
| configured or when re-replication violated the placement policy. The rebalancing |
| tool breaks its work into three phases: |
| |
| . The rack-aware rebalancer tries to establish the placement policy. Use the |
| `--disable_policy_fixer` flag to skip this phase. |
| . The rebalancer tries to balance load by location, moving tablet replicas |
| between locations in an attempt to spread tablet replicas among locations |
| evenly. The load of a location is measured as the total number of replicas in |
| the location divided by the number of tablet servers in the location. Use the |
| `--disable_cross_location_rebalancing` flag to skip this phase. |
| . The rebalancer tries to balance the tablet replica distribution within each |
| location, as if the location were a cluster on its own. Use the |
| `--disable_intra_location_rebalancing` flag to skip this phase. |
| |
| By using the `--report_only` flag, it's also possible to check if all tablets in |
| the cluster conform to the placement policy without attempting any replica |
| movement. |
| |
| [[tablet_server_decommissioning]] |
| === Decommissioning or Permanently Removing a Tablet Server From a Cluster |
| |
| Starting with Kudu 1.12, the Kudu rebalancer tool can be used to decommission a |
| tablet server by supplying the `--ignored_tservers` and |
| `--move_replicas_from_ignored_tservers` arguments. |
| |
| WARNING: Do not decommission multiple tablet servers at once. To remove |
| multiple tablet servers from the cluster, follow the below instructions for |
| each tablet server, ensuring that the previous tablet server is removed from |
| the cluster and `ksck` is healthy before shutting down the next. |
| |
| . Ensure the cluster is in good health by using `ksck`. See <<ksck>>. |
| . Put the tablet server into |
| <<minimizing_cluster_disruption_during_temporary_single_ts_downtime,maintenance |
| mode>> by using the `kudu tserver state enter_maintenance` tool. |
| . Run `kudu cluster rebalance` tool, supplying the `--ignored_tservers` |
| argument with the UUIDs of the tablet servers to be decommissioned, and the |
| `--move_replicas_from_ignored_tservers` flag. |
| . Wait for the moves to complete and for `ksck` to show the cluster in a |
| healthy state. |
| . The decommissioned tablet server can be brought offline. |
| . To completely remove it from the cluster so `ksck` shows the cluster as |
| completely healthy, restart the masters. In the case of a single master, |
| this will cause cluster downtime. With multi-master, restart the masters in |
| sequence to avoid cluster downtime. |
| |
| In Kudu versions that do not support the above tooling, different steps must be |
| followed to decommission a tablet server: |
| |
| . Ensure the cluster is in good health using `ksck`. See <<ksck>>. |
| . If the tablet server contains any replicas of tables with replication factor |
| 1, these replicas must be manually moved off the tablet server prior to |
| shutting it down. The `kudu tablet change_config move_replica` tool can be |
| used for this. |
| . Shut down the tablet server. After |
| `-follower_unavailable_considered_failed_sec`, which defaults to 5 minutes, |
| Kudu will begin to re-replicate the tablet server's replicas to other servers. |
| Wait until the process is finished. Progress can be monitored using `ksck`. |
| . Once all the copies are complete, `ksck` will continue to report the tablet |
| server as unavailable. The cluster will otherwise operate fine without the |
| tablet server. To completely remove it from the cluster so `ksck` shows the |
| cluster as completely healthy, restart the masters. In the case of a single |
| master, this will cause cluster downtime. With multi-master, restart the |
| masters in sequence to avoid cluster downtime. |
| |
| [[using_cluster_names_in_kudu_tool]] |
| === Using cluster names in the `kudu` command line tool |
| |
| When using the `kudu` command line tool, it can be difficult to remember the |
| precise list of Kudu master RPC addresses needed to communicate with a cluster, |
| especially when managing multiple clusters. As an alternative, the command line |
| tool can identify clusters by name. To use this functionality: |
| |
| . Create a new directory to store the Kudu configuration file. |
| . Export the path to this directory in the `KUDU_CONFIG` environment variable. |
| . Create a file called `kudurc` in the new directory. |
| . Populate `kudurc` as follows, substituting your own cluster names and RPC |
| addresses: |
| + |
| ---- |
| clusters_info: |
| cluster_name1: |
| master_addresses: ip1:port1,ip2:port2,ip3:port3 |
| cluster_name2: |
| master_addresses: ip4:port4 |
| ---- |
| + |
| . When using the `kudu` command line tool, replace the list of Kudu master RPC |
| addresses with the cluster name, prepended with the character `@`. |
| |
| Example:: |
| + |
| ---- |
| $ sudo -u kudu kudu cluster ksck @cluster_name1 |
| ---- |
| + |
| |
| |
| NOTE: Cluster names may be used as input in any invocation of the `kudu` command |
| line tool that expects a list of Kudu master RPC addresses. |