Couch Scanner

An application to scan the cluster with a plugin system to report various
things about databases and documents. The initial idea was to have something
like this to scan all the javascript design docs to check for compatibility
with the new QuickJS engine. It had since been split apart from the QuickJS
branch and made into a separate pull request.

The current implementation includes two plugins:
  * couch_scanner_plugin_find : scan for regexes in doc bodies
  * couch_scanner_ddoc_features : report various design doc features

A more detailed description is in the README.md file. The plugin API is defined
in the `couch_scanner_plugin` module. There are additional details in the
comments in the included Erlang modules. What follows is as summary description
of some of the implementation details and features.

Plugins are managed as individual process by the `couch_scanner_server` with
the `start_link/1` and `stop/1` functions. After a plugin runner process is
spawned, `couch_scanner_server` wait for it to exit. A process may exit with an
error, then it will be penalized with an exponential back-off, or it may also
exit with a special `{shutdown, {reschedule, TSec}}` value, in which case it
will be rescheduled to run again on or after the `TSec` time.

After the plugin process process starts, it will load and validate its plugin
module. Then, it will start scanning all the dbs and docs on the local node.
Shard ranges will be scanned only on one of the cluster nodes to avoid
duplicating work. For instance, if there are 2 shard ranges, `0-7`, `8-f`, with
copies on nodes `n1`, `n2`, `n3`. Then, `0-7` might be scanned on `n1` only,
and `8-f` on `n3`.

During various events the plugin process will call into the plugin module: on
startup, when resuming from a checkpoint, when checkpointing, when processing a
new db, design doc, a document, and when completing a scan. The plugin may
accumulate reporting data, or may indicate that some parts of the scan should
be skipped, or that the scanning session should be reset.

By default all plugins are disabled. Plugins are enabled and managed via the
config system. To enable a plugin, add a `$plugin = true` entry in the
`[couch_scanner_plugins]` section. For example:
```
[couch_scanner_plugins]
couch_scanner_plugin_ddoc_features = true
```

Plugins can be configured to run on or after a particular date and time or to
run periodically. That can be configured via `[$plugin] after = ...` and
`[$plugin] repeat = ...` settings. For instance, to run after 2024-03-20T15:00
and then run every Monday:

```
[couch_scanner_plugin_ddoc_features]
after = 2024-03-20T15:00
repeat = monday
```

The default values for `after` and `repeat` is ` = restart`, meaning to run
once after the node starts up.

To prevent the plugins from consuming too may resources. There is a simple rate
limiter which limits how many databases, shard and documents should e processed
by all the plugins. Rate limits are configurable:
```
[couch_scanner]
db_rate_limit = 50
shard_rate_limit = 50
doc_rate_limit = 500
```
19 files changed