| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| [[administration]] |
| = Apache Kudu Administration |
| |
| :author: Kudu Team |
| :imagesdir: ./images |
| :icons: font |
| :toc: left |
| :toclevels: 3 |
| :doctype: book |
| :backend: html5 |
| :sectlinks: |
| :experimental: |
| |
| NOTE: Kudu is easier to manage with link:http://www.cloudera.com/content/www/en-us/products/cloudera-manager.html[Cloudera Manager] |
| than in a standalone installation. See Cloudera's |
| link:http://www.cloudera.com/content/www/en-us/documentation/betas/kudu/latest/topics/kudu_installation.html[Kudu documentation] |
| for more details about using Kudu with Cloudera Manager. |
| |
| == Starting and Stopping Kudu Processes |
| |
| include::installation.adoc[tags=start_stop] |
| |
| == Kudu Web Interfaces |
| |
| Kudu tablet servers and masters expose useful operational information on a built-in web interface, |
| |
| === Kudu Master Web Interface |
| |
| Kudu master processes serve their web interface on port 8051. The interface exposes several pages |
| with information about the cluster state: |
| |
| - A list of tablet servers, their host names, and the time of their last heartbeat. |
| - A list of tables, including schema and tablet location information for each. |
| - SQL code which you can paste into Impala Shell to add an existing table to Impala's list of known data sources. |
| |
| === Kudu Tablet Server Web Interface |
| |
| Each tablet server serves a web interface on port 8050. The interface exposes information |
| about each tablet hosted on the server, its current state, and debugging information |
| about maintenance background operations. |
| |
| === Common Web Interface Pages |
| |
| Both Kudu masters and tablet servers expose a common set of information via their web interfaces: |
| |
| - HTTP access to server logs. |
| - an `/rpcz` endpoint which lists currently running RPCs via JSON. |
| - pages giving an overview and detailed information on the memory usage of different |
| components of the process. |
| - information on the current set of configuration flags. |
| - information on the currently running threads and their resource consumption. |
| - a JSON endpoint exposing metrics about the server. |
| - information on the deployed version number of the daemon. |
| |
| These interfaces are linked from the landing page of each daemon's web UI. |
| |
| == Kudu Metrics |
| |
| Kudu daemons expose a large number of metrics. Some metrics are associated with an entire |
| server process, whereas others are associated with a particular tablet replica. |
| |
| === Listing available metrics |
| |
| The full set of available metrics for a Kudu server can be dumped via a special command |
| line flag: |
| |
| [source,bash] |
| ---- |
| $ kudu-tserver --dump_metrics_json |
| $ kudu-master --dump_metrics_json |
| ---- |
| |
| This will output a large JSON document. Each metric indicates its name, label, description, |
| units, and type. Because the output is JSON-formatted, this information can easily be |
| parsed and fed into other tooling which collects metrics from Kudu servers. |
| |
| === Collecting metrics via HTTP |
| |
| Metrics can be collected from a server process via its HTTP interface by visiting |
| `/metrics`. The output of this page is JSON for easy parsing by monitoring services. |
| This endpoint accepts several `GET` parameters in its query string: |
| |
| - `/metrics?metrics=<substring1>,<substring2>,...` - limits the returned metrics to those which contain |
| at least one of the provided substrings. The substrings also match entity names, so this |
| may be used to collect metrics for a specific tablet. |
| |
| - `/metrics?include_schema=1` - includes metrics schema information such as unit, description, |
| and label in the JSON output. This information is typically elided to save space. |
| |
| - `/metrics?compact=1` - eliminates unnecessary whitespace from the resulting JSON, which can decrease |
| bandwidth when fetching this page from a remote host. |
| |
| - `/metrics?include_raw_histograms=1` - include the raw buckets and values for histogram metrics, |
| enabling accurate aggregation of percentile metrics over time and across hosts. |
| |
| For example: |
| |
| [source,bash] |
| ---- |
| $ curl -s 'http://example-ts:8050/metrics?include_schema=1&metrics=connections_accepted' |
| ---- |
| |
| [source,json] |
| ---- |
| [ |
| { |
| "type": "server", |
| "id": "kudu.tabletserver", |
| "attributes": {}, |
| "metrics": [ |
| { |
| "name": "rpc_connections_accepted", |
| "label": "RPC Connections Accepted", |
| "type": "counter", |
| "unit": "connections", |
| "description": "Number of incoming TCP connections made to the RPC server", |
| "value": 92 |
| } |
| ] |
| } |
| ] |
| ---- |
| |
| [source,bash] |
| ---- |
| $ curl -s 'http://example-ts:8050/metrics?metrics=log_append_latency' |
| ---- |
| |
| [source,json] |
| ---- |
| [ |
| { |
| "type": "tablet", |
| "id": "c0ebf9fef1b847e2a83c7bd35c2056b1", |
| "attributes": { |
| "table_name": "lineitem", |
| "partition": "hash buckets: (55), range: [(<start>), (<end>))", |
| "table_id": "" |
| }, |
| "metrics": [ |
| { |
| "name": "log_append_latency", |
| "total_count": 7498, |
| "min": 4, |
| "mean": 69.3649, |
| "percentile_75": 29, |
| "percentile_95": 38, |
| "percentile_99": 45, |
| "percentile_99_9": 95, |
| "percentile_99_99": 167, |
| "max": 367244, |
| "total_sum": 520098 |
| } |
| ] |
| } |
| ] |
| ---- |
| |
| NOTE: All histograms and counters are measured since the server start time, and are not reset upon collection. |
| |
| === Collecting metrics to a log |
| |
| Kudu may be configured to periodically dump all of its metrics to a local log file using the |
| `--metrics_log_interval_ms` flag. Set this flag to the interval at which metrics should be written |
| to a log file. |
| |
| The metrics log will be written to the same directory as the other Kudu log files, with the same |
| naming format. After any metrics log file reaches 64MB uncompressed, the log will be rolled and |
| the previous file will be gzip-compressed. |
| |
| The log file generated has three space-separated fields. The first field is the word |
| `metrics`. The second field is the current timestamp in microseconds since the Unix epoch. |
| The third is the current value of all metrics on the server, using a compact JSON encoding. |
| The encoding is the same as the metrics fetched via HTTP described above. |
| |
| WARNING: Although metrics logging automatically rolls and compresses previous log files, it does |
| not remove old ones. Since metrics logging can use significant amounts of disk space, |
| consider setting up a system utility to monitor space in the log directory and archive or |
| delete old segments. |
| |
| == Common Kudu workflows |
| |
| [[migrate_to_multi_master]] |
| === Migrating to Multiple Kudu Masters |
| |
| For high availability and to avoid a single point of failure, Kudu clusters should be created with |
| multiple masters. Many Kudu clusters were created with just a single master, either for simplicity |
| or because Kudu multi-master support was still experimental at the time. This workflow demonstrates |
| how to migrate to a multi-master configuration. |
| |
| WARNING: The workflow is unsafe for adding new masters to an existing multi-master configuration. |
| Do not use it for that purpose. |
| |
| WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If |
| using Cloudera Manager (CM), the workflow also presupposes familiarity with it. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu UNIX user, typically |
| `kudu`. |
| |
| ==== Prepare for the migration |
| |
| . Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster |
| will be unavailable. |
| |
| . Decide how many masters to use. The number of masters should be odd. Three or five node master |
| configurations are recommendeded; they can tolerate one or two failures respectively. |
| |
| . Perform the following preparatory steps for the existing master: |
| * Identify and record the directory where the master's data lives. If using Kudu system packages, |
| the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir` |
| configuration parameter. |
| * Identify and record the port the master is using for RPCs. The default port value is 7051, but it |
| may have been customized using the `rpc_bind_addresses` configuration parameter. |
| * Identify the master's UUID. It can be fetched using the following command: |
| + |
| [source,bash] |
| ---- |
| $ kudu fs dump uuid --fs_wal_dir=<master_data_dir> 2>/dev/null |
| ---- |
| master_data_dir:: existing master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu fs dump uuid --fs_wal_dir=/var/lib/kudu/master 2>/dev/null |
| 4aab798a69e94fab8d77069edff28ce0 |
| ---- |
| + |
| * Optional: configure a DNS alias for the master. The alias could be a DNS cname (if the machine |
| already has an A record in DNS), an A record (if the machine is only known by its IP address), |
| or an alias in /etc/hosts. The alias should be an abstract representation of the master (e.g. |
| `master-1`). |
| + |
| WARNING: Without DNS aliases it is not possible to recover from permanent master failures, and as |
| such it is highly recommended. |
| + |
| . Perform the following preparatory steps for each new master: |
| * Choose an unused machine in the cluster. The master generates very little load so it can be |
| colocated with other data services or load-generating processes, though not with another Kudu |
| master from the same configuration. |
| * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and |
| `kudu-master` packages should be installed), or via some other means. |
| * Choose and record the directory where the master's data will live. |
| * Choose and record the port the master should use for RPCs. |
| * Optional: configure a DNS alias for the master (e.g. `master-2`, `master-3`, etc). |
| |
| ==== Perform the migration |
| |
| . Stop all the Kudu processes in the entire cluster. |
| |
| . Format the data directory on each new master machine, and record the generated UUID. Use the |
| following command sequence: |
| + |
| [source,bash] |
| ---- |
| $ kudu fs format --fs_wal_dir=<master_data_dir> |
| $ kudu fs dump uuid --fs_wal_dir=<master_data_dir> 2>/dev/null |
| ---- |
| + |
| master_data_dir:: new master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu fs format --fs_wal_dir=/var/lib/kudu/master |
| $ kudu fs dump uuid --fs_wal_dir=/var/lib/kudu/master 2>/dev/null |
| f5624e05f40649b79a757629a69d061e |
| ---- |
| |
| . If using CM, add the new Kudu master roles now, but do not start them. |
| * If using DNS aliases, override the empty value of the `Master Address` parameter for each role |
| (including the existing master role) with that master's alias. |
| * Add the port number (separated by a colon) if using a non-default RPC port value. |
| |
| . Rewrite the master's Raft configuration with the following command, executed on the existing |
| master machine: |
| + |
| [source,bash] |
| ---- |
| $ kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_data_dir> <tablet_id> <all_masters> |
| ---- |
| + |
| master_data_dir:: existing master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| all_masters:: space-separated list of masters, both new and existing. Each entry in the list must be |
| a string of the form `<uuid>:<hostname>:<port>` |
| uuid::: master's previously recorded UUID |
| hostname::: master's previously recorded hostname or alias |
| port::: master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/var/lib/kudu/master 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:master-1:7051 f5624e05f40649b79a757629a69d061e:master-2:7051 988d8ac6530f426cbe180be5ba52033d:master-3:7051 |
| ---- |
| |
| . Start the existing master. |
| |
| . Copy the master data to each new master with the following command, executed on each new master |
| machine: |
| + |
| [source,bash] |
| ---- |
| $ kudu local_replica copy_from_remote --fs_wal_dir=<master_data_dir> <tablet_id> <existing_master> |
| ---- |
| + |
| master_data_dir:: new master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| existing_master:: RPC address of the existing master and must be a string of the form |
| `<hostname>:<port>` |
| hostname::: existing master's previously recorded hostname or alias |
| port::: existing master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu local_replica copy_from_remote --fs_wal_dir=/var/lib/kudu/master 00000000000000000000000000000000 master-1:7051 |
| ---- |
| |
| . Start all of the new masters. |
| + |
| WARNING: Skip the next step if using CM. |
| + |
| . Modify the value of the `tserver_master_addrs` configuration parameter for each tablet server. |
| The new value must be a comma-separated list of masters where each entry is a string of the form |
| `<hostname>:<port>` |
| hostname:: master's previously recorded hostname or alias |
| port:: master's previously recorded RPC port number |
| |
| . Start all of the tablet servers. |
| |
| Congratulations, the cluster has now been migrated to multiple masters! To verify that all masters |
| are working properly, consider performing the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line tool. Help for ksck |
| can be viewed via `kudu cluster ksck --help`. |
| |
| === Recovering from a dead Kudu Master in a Multi-Master Deployment |
| |
| Kudu multi-master deployments function normally in the event of a master loss. However, it is |
| important to replace the dead master; otherwise a second failure may lead to a loss of availability, |
| depending on the number of available masters. This workflow describes how to replace the dead |
| master. |
| |
| Due to https://issues.apache.org/jira/browse/KUDU-1620[KUDU-1620], it is not possible to perform |
| this workflow without also restarting the live masters. As such, the workflow requires a |
| maintenance window, albeit a brief one as masters generally restart quickly. |
| |
| WARNING: Kudu does not yet support Raft configuration changes for masters. As such, it is only |
| possible to replace a master if the deployment was created with DNS aliases. See the |
| <<migrate_to_multi_master,multi-master migration workflow>> for more details. |
| |
| WARNING: The workflow presupposes at least basic familiarity with Kudu configuration management. If |
| using Cloudera Manager (CM), the workflow also presupposes familiarity with it. |
| |
| WARNING: All of the command line steps below should be executed as the Kudu UNIX user, typically |
| `kudu`. |
| |
| ==== Prepare for the recovery |
| |
| . Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from |
| accidentally restarting; this can be quite dangerous for the cluster post-recovery. |
| |
| . Choose one of the remaining live masters to serve as a basis for recovery. The rest of this |
| workflow will refer to this master as the "reference" master. |
| |
| . Choose an unused machine in the cluster where the new master will live. The master generates very |
| little load so it can be colocated with other data services or load-generating processes, though |
| not with another Kudu master from the same configuration. The rest of this workflow will refer to |
| this master as the "replacement" master. |
| |
| . Perform the following preparatory steps for the replacement master: |
| * Ensure Kudu is installed on the machine, either via system packages (in which case the `kudu` and |
| `kudu-master` packages should be installed), or via some other means. |
| * Choose and record the directory where the master's data will live. |
| |
| . Perform the following preparatory steps for each live master: |
| * Identify and record the directory where the master's data lives. If using Kudu system packages, |
| the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir` |
| configuration parameter. |
| * Identify and record the master's UUID. It can be fetched using the following command: |
| + |
| [source,bash] |
| ---- |
| $ kudu fs dump uuid --fs_wal_dir=<master_data_dir> 2>/dev/null |
| ---- |
| master_data_dir:: live master's previously recorded data directory |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu fs dump uuid --fs_wal_dir=/var/lib/kudu/master 2>/dev/null |
| 80a82c4b8a9f4c819bab744927ad765c |
| ---- |
| + |
| . Perform the following preparatory steps for the reference master: |
| * Identify and record the directory where the master's data lives. If using Kudu system packages, |
| the default value is /var/lib/kudu/master, but it may be customized via the `fs_wal_dir` |
| configuration parameter. |
| * Identify and record the UUIDs of every master in the cluster, using the following command: |
| + |
| [source,bash] |
| ---- |
| $ kudu local_replica cmeta print_replica_uuids --fs_wal_dir=<master_data_dir> <tablet_id> 2>/dev/null |
| ---- |
| master_data_dir:: reference master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu local_replica cmeta print_replica_uuids --fs_wal_dir=/var/lib/kudu/master 00000000000000000000000000000000 2>/dev/null |
| 80a82c4b8a9f4c819bab744927ad765c 2a73eeee5d47413981d9a1c637cce170 1c3f3094256347528d02ec107466aef3 |
| ---- |
| + |
| . Using the two previously-recorded lists of UUIDs (one for all live masters and one for all |
| masters), determine and record (by process of elimination) the UUID of the dead master. |
| |
| ==== Perform the recovery |
| |
| . Format the data directory on the replacement master machine using the previously recorded |
| UUID of the dead master. Use the following command sequence: |
| + |
| [source,bash] |
| ---- |
| $ kudu fs format --fs_wal_dir=<master_data_dir> --uuid=<uuid> |
| ---- |
| + |
| master_data_dir:: replacement master's previously recorded data directory |
| uuid:: dead master's previously recorded UUID |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu fs format --fs_wal_dir=/var/lib/kudu/master --uuid=80a82c4b8a9f4c819bab744927ad765c |
| ---- |
| + |
| . Copy the master data to the replacement master with the following command: |
| + |
| [source,bash] |
| ---- |
| $ kudu local_replica copy_from_remote --fs_wal_dir=<master_data_dir> <tablet_id> <reference_master> |
| ---- |
| + |
| master_data_dir:: replacement master's previously recorded data directory |
| tablet_id:: must be the string `00000000000000000000000000000000` |
| reference_master:: RPC address of the reference master and must be a string of the form |
| `<hostname>:<port>` |
| hostname::: reference master's previously recorded hostname or alias |
| port::: reference master's previously recorded RPC port number |
| + |
| [source,bash] |
| Example:: |
| + |
| ---- |
| $ kudu local_replica copy_from_remote --fs_wal_dir=/var/lib/kudu/master 00000000000000000000000000000000 master-2:7051 |
| ---- |
| + |
| . If using CM, add the replacement Kudu master role now, but do not start it. |
| * Override the empty value of the `Master Address` parameter for the new role with the replacement |
| master's alias. |
| * Add the port number (separated by a colon) if using a non-default RPC port value. |
| |
| . Reconfigure the DNS alias for the dead master to point at the replacement master. |
| |
| . Start the replacement master. |
| |
| . Restart the existing live masters. This results in a brief availability outage, but it should |
| last only as long as it takes for the masters to come back up. |
| |
| Congratulations, the dead master has been replaced! To verify that all masters are working properly, |
| consider performing the following sanity checks: |
| |
| * Using a browser, visit each master's web UI. Look at the /masters page. All of the masters should |
| be listed there with one master in the LEADER role and the others in the FOLLOWER role. The |
| contents of /masters on each master should be the same. |
| |
| * Run a Kudu system check (ksck) on the cluster using the `kudu` command line tool. Help for ksck |
| can be viewed via `kudu cluster ksck --help`. |