| = 20. Cassandra Mailbox object consistency |
| |
| Date: 2020-02-27 |
| |
| == Status |
| |
| Accepted (lazy consensus) |
| |
| == Context |
| |
| Mailboxes are denormalized in Cassandra in order to access them both by their immutable identifier and their mailbox path (name): |
| |
| * `mailbox` table stores mailboxes by their immutable identifier |
| * `mailboxPathV2` table stores mailboxes by their mailbox path |
| |
| We furthermore maintain two invariants on top of these tables: |
| |
| * *mailboxPath* unicity. |
| Each mailbox path can be used maximum once. |
| This is ensured by writing the mailbox path first using Lightweight Transactions. |
| * *mailboxId* unicity. |
| Each mailbox identifier is used by only a single path. |
| We have no real way to ensure a given mailbox is not referenced by two paths. |
| |
| Failures during the denormalization process will lead to inconsistencies between the two tables. |
| |
| This can lead to the following user experience: |
| |
| ---- |
| BOB creates mailbox A |
| Denormalization fails and an error is returned to A |
| |
| BOB retries mailbox A creation |
| BOB is being told mailbox A already exist |
| |
| BOB tries to access mailbox A |
| BOB is being told mailbox A does not exist |
| ---- |
| |
| == Decision |
| |
| We should provide an offline (meaning absence of user traffic via for exemple SMTP, IMAP or JMAP) webadmin task to solve mailbox object inconsistencies. |
| |
| This task will read `mailbox` table and adapt path registrations in `mailboxPathV2`: |
| |
| * Missing registrations will be added |
| * Orphan registrations will be removed |
| * Mismatch in content between the two tables will require merging the two mailboxes together. |
| |
| == Consequences |
| |
| As an administrator, if some of my users reports the bugs mentioned above, I have a way to sanitize my Cassandra mailbox database. |
| |
| However, due to the two invariants mentioned above, we can not identify a clear source of trust based on existing tables for the mailbox object. |
| The task previously mentioned is subject to concurrency issues that might cancel legitimate concurrent user actions. |
| |
| Hence this task must be run offline (meaning absence of user traffic via for exemple SMTP, IMAP or JMAP). |
| This can be achieved via reconfiguration (disabling the given protocols and restarting James) or via firewall rules. |
| |
| Due to all of those risks, a Confirmation header `I-KNOW-WHAT-I-M-DOING` should be positioned to `ALL-SERVICES-ARE-OFFLINE` in order to prevent accidental calls. |
| |
| In the future, we should revisit the mailbox object data-model and restructure it, to identify a source of truth to base the inconsistency fixing task on. |
| Event sourcing is a good candidate for this. |
| |
| == References |
| |
| * https://issues.apache.org/jira/browse/JAMES-3058[JAMES-3058 Webadmin task to solve Cassandra Mailbox inconsistencies] |
| * https://github.com/linagora/james-project/pull/3110[Pull Request: mailbox-cassandra utility to solve Mailbox inconsistency] |
| * https://github.com/linagora/james-project/pull/3130[Pull Request: JAMES-3058 Concurrency testing for fixing Cassandra mailbox inconsistencies] |
| |
| This https://github.com/linagora/james-project/pull/3130#discussion_r383349596[thread] provides significant discussions leading to this Architecture Decision Record |
| |
| * https://www.mail-archive.com/server-dev@james.apache.org/msg64432.html[Discussion on the mailing list] |