Smoosh

Smoosh is CouchDB's auto-compaction daemon. It is notified when databases and views are updated and may then elect to enqueue them for compaction.

API

All API functions are in smoosh.erl and only the exported functions in this module should be called from outside of the smoosh application.

Additionally, smoosh responds to config changes dynamically and these changes are the principal means of interacting with smoosh.

Top-Level Settings

The main settings one interacts with are:

Sometimes it's necessary to use the following:

Channel Settings

A channel has several important settings that control runtime behavior.

Structure

Smoosh consists of a central gen_server (smoosh_server) which manages a number of subordinate smoosh_channel gen_servers. This is not properly managed by OTP yet.

Compaction Scheduling Algorithm

Smoosh decides whether to compact a database or view by evaluating the item against the selection criteria of each channel in the order they are configured. By default there are three channels for databases (“ratio_dbs”, “slack_dbs” and “upgrade_dbs”), three channels for views (“ratio_views”, “slack_views” and “upgrade_views”). The “cleanup_channels” has only the “index_cleanup” channel. That channel is for enqueueing stale index file cleanup jobs.

Smoosh will enqueue the new item to the first channel that accepts it. If none accept it, the item is not enqueued for compaction.

Notes on the data_size value

Every database and view shard has an active size value. In CouchDB this accurately reflects the post-compaction file size plus the b+tree metadata and database footer overhead.

Example config commands

Change the set of database channels;

config:set("smoosh", "db_channels", "small_dbs,medium_dbs,large_dbs").

Change the set of database channels on all live nodes in the cluster;

rpc:multicall(config, set, ["smoosh", "db_channels", "small_dbs,medium_dbs,large_dbs"]).

Change the concurrency of the ratio_dbs database channel to 2

config:set("smoosh.ratio_dbs", "concurrency", "2").

Change it on all live nodes in the cluster;

rpc:multicall(config, set, ["smoosh.ratio_dbs", "concurrency", "2"]).

Example API commands

smoosh:status()

This prints the state of each channel; how many jobs they are currently running and how many jobs are enqueued (as well as the lowest and highest priority of those enqueued items). The idea is to provide, at a glance, sufficient insight into smoosh that an operator can assess whether smoosh is adequately targeting the reclaimable space in the cluster. In general, a healthy status output will have items in the ratio_dbs and ratio_views channels. Owing to the default settings, the slack_dbs and slack_views will almost certainly have items in them. Historically, we've not found that the slack channels, on their own, are particularly adept at keeping things well compacted.

smoosh:enqueue_all_dbs(), smoosh:enqueue_all_views()

These functions do just what they say but should not generally need to be called, smoosh is supposed to be autonomous. Call them if you get alerted to a disk space issue, they might well help. If they do, that indicates a bug in smoosh as it should already have enqueued eligible shards once they met the configured settings.