| .. Licensed under the Apache License, Version 2.0 (the "License"); you may not |
| .. use this file except in compliance with the License. You may obtain a copy of |
| .. the License at |
| .. |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| .. |
| .. Unless required by applicable law or agreed to in writing, software |
| .. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT |
| .. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the |
| .. License for the specific language governing permissions and limitations under |
| .. the License. |
| |
| .. _cluster/troubleshooting: |
| |
| ============================================ |
| Troubleshooting CouchDB 3 with WeatherReport |
| ============================================ |
| |
| .. _cluster/troubleshooting/overview: |
| |
| Overview |
| ======== |
| |
| WeatherReport is an OTP application and set of tools that diagnoses |
| common problems which could affect a CouchDB version 3 node or cluster |
| (version 4 or later is not supported). It is accessed via the |
| ``weatherreport`` command line escript. |
| |
| Here is a basic example of using ``weatherreport`` followed immediately |
| by the command's output: |
| |
| .. code-block:: bash |
| |
| $ weatherreport --etc /path/to/etc |
| [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down. |
| |
| .. _cluster/troubleshooting/usage: |
| |
| Usage |
| ===== |
| |
| For most cases, you can just run the ``weatherreport`` command as |
| shown above. However, sometimes you might want to know some extra |
| detail, or run only specific checks. For that, there are command-line |
| options. Execute ``weatherreport --help`` to learn more about these |
| options: |
| |
| .. code-block:: bash |
| |
| $ weatherreport --help |
| Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...] |
| |
| -c, --etc Path to the CouchDB configuration directory |
| -d, --level Minimum message severity level (default: notice) |
| -l, --list Describe available diagnostic tasks |
| -e, --expert Perform more detailed diagnostics |
| -h, --help Display help/usage |
| check_name A specific check to run |
| |
| To get an idea of what checks will be run, use the `--list` option: |
| |
| .. code-block:: bash |
| |
| $ weatherreport --list |
| Available diagnostic checks: |
| |
| custodian Shard safety/liveness checks |
| disk Data directory permissions and atime |
| internal_replication Check the number of pending internal replication jobs |
| ioq Check the total number of active IOQ requests |
| mem3_sync Check there is a registered mem3_sync process |
| membership Cluster membership validity |
| memory_use Measure memory usage |
| message_queues Check for processes with large mailboxes |
| node_stats Check useful erlang statistics for diagnostics |
| nodes_connected Cluster node liveness |
| process_calls Check for large numbers of processes with the same current/initial call |
| process_memory Check for processes with high memory usage |
| safe_to_rebuild Check whether the node can safely be taken out of service |
| search Check the local search node is responsive |
| tcp_queues Measure the length of tcp queues in the kernel |
| |
| If you want all the gory details about what WeatherReport is doing, |
| you can run the checks at a more verbose logging level with |
| the ``--level`` option: |
| |
| .. code-block:: bash |
| |
| $ weatherreport --etc /path/to/etc --level debug |
| [debug] Not connected to the local cluster node, trying to connect. alive:false connect_failed:undefined |
| [debug] Starting distributed Erlang. |
| [debug] Connected to local cluster node 'node1@127.0.0.1'. |
| [debug] Local RPC: mem3:nodes([]) [5000] |
| [debug] Local RPC: os:getpid([]) [5000] |
| [debug] Running shell command: ps -o pmem,rss -p 73905 |
| [debug] Shell command output: |
| %MEM RSS |
| 0.3 25116 |
| |
| [debug] Local RPC: erlang:nodes([]) [5000] |
| [debug] Local RPC: mem3:nodes([]) [5000] |
| [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down. |
| [info] Process is using 0.3% of available RAM, totalling 25116 KB of real memory. |
| |
| Most times you'll want to use the defaults, but any syslog severity |
| name will do (from most to least verbose): ``debug, info, notice, |
| warning, error, critical, alert, emergency``. |
| |
| Finally, if you want to run just a single diagnostic or a list of |
| specific ones, you can pass their name(s): |
| |
| .. code-block:: bash |
| |
| $ weatherreport --etc /path/to/etc nodes_connected |
| [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down. |