20. Cassandra Mailbox object consistency

Date: 2020-02-27

Status

Accepted (lazy consensus) & implemented

Context

Mailboxes are denormalized in Cassandra in order to access them both by their immutable identifier and their mailbox path (name):

  • mailbox table stores mailboxes by their immutable identifier
  • mailboxPathV2 table stores mailboxes by their mailbox path

We furthermore maintain two invariants on top of these tables:

  • mailboxPath unicity. Each mailbox path can be used maximum once. This is ensured by writing the mailbox path first using Lightweight Transactions.
  • mailboxId unicity. Each mailbox identifier is used by only a single path. We have no real way to ensure a given mailbox is not referenced by two paths.

Failures during the denormalization process will lead to inconsistencies between the two tables.

This can lead to the following user experience:

BOB creates mailbox A
Denormalization fails and an error is returned to A

BOB retries mailbox A creation
BOB is being told mailbox A already exist

BOB tries to access mailbox A
BOB is being told mailbox A does not exist

Decision

We should provide an offline (meaning absence of user traffic via for exemple SMTP, IMAP or JMAP) webadmin task to solve mailbox object inconsistencies.

This task will read mailbox table and adapt path registrations in mailboxPathV2:

  • Missing registrations will be added
  • Orphan registrations will be removed
  • Mismatch in content between the two tables will require merging the two mailboxes together.

Consequences

As an administrator, if some of my users reports the bugs mentioned above, I have a way to sanitize my Cassandra mailbox database.

However, due to the two invariants mentioned above, we can not identify a clear source of trust based on existing tables for the mailbox object. The task previously mentioned is subject to concurrency issues that might cancel legitimate concurrent user actions.

Hence this task must be run offline (meaning absence of user traffic via for exemple SMTP, IMAP or JMAP). This can be achieved via reconfiguration (disabling the given protocols and restarting James) or via firewall rules.

Due to all of those risks, a Confirmation header I-KNOW-WHAT-I-M-DOING should be positioned to ALL-SERVICES-ARE-OFFLINE in order to prevent accidental calls.

In the future, we should revisit the mailbox object data-model and restructure it, to identify a source of truth to base the inconsistency fixing task on. Event sourcing is a good candidate for this.

References

This thread provides significant discussions leading to this Architecture Decision Record