| = Distributed James Server — Architecture |
| :navtitle: Architecture |
| |
| This sections presents the Distributed Server architecture. |
| |
| == Storage |
| |
| In order to deliver its promises, the Distributed Server leverages the following storage strategies: |
| |
| image::storage.png[Storage responsibilities for the Distributed Server] |
| |
| * *Cassandra* is used for metadata storage. Cassandra is efficient for a very high workload of small queries following |
| a known pattern. |
| * The *blob store* storage interface is responsible of storing potentially large binary data. For instance |
| email bodies, headers or attachments. Different technologies can be used: *Cassandra*, or S3 compatible *Object Storage* |
| (S3 or Swift). |
| * *OpenSearch* component empowers full text search on emails. It also enables querying data with unplanned access |
| patterns. OpenSearch throughput do not however match the one of Cassandra thus its use is avoided upon regular workloads. |
| * *RabbitMQ* enables James nodes of a same cluster to collaborate together. It is used to implement connected protocols, |
| notification patterns as well as distributed resilient work queues and mail queue. |
| * *Tika* (optional) enables text extraction from attachments, thus improving full text search results. |
| * *link:https://spamassassin.apache.org/[SpamAssassin] or link:https://rspamd.com/[Rspamd]* (optional) can be used for Spam detection and user feedback is supported. |
| |
| xref:architecture/consistency-model.adoc[This page] further details Distributed James consistency model. |
| |
| == Protocols |
| |
| The following protocols are supported and can be used to interact with the Distributed Server: |
| |
| * *SMTP* |
| * *IMAP* |
| * xref:operate/webadmin.adoc[WebAdmin] REST Administration API |
| * *LMTP* |
| * *POP3* |
| |
| The following protocols should be considered experimental: |
| |
| * *JMAP* (RFC-8620 &RFC-8621 specifications and known limitations of the James implementation are defined link:https://github.com/apache/james-project/tree/master/server/protocols/jmap-rfc-8621/doc[here]) |
| * *ManagedSieve* |
| |
| Read more on xref:architecture/implemented-standards.adoc[implemented standards]. |
| |
| == Topology |
| |
| While it is perfectly possible to deploy homogeneous James instances, with the same configuration and thus the same |
| protocols and the same responsibilities one might want to investigate in |
| xref:architecture/specialized-instances.adoc['Specialized instances']. |
| |
| == Components |
| |
| This section presents the various components of the Distributed server, providing context about |
| their interactions, and about their implementations. |
| |
| === High level view |
| |
| Here is a high level view of the various server components and their interactions: |
| |
| image::server-components.png[Server components mobilized for SMTP & IMAP] |
| |
| - The SMTP protocol receives a mail, and enqueue it on the MailQueue |
| - The MailetContainer will start processing the mail Asynchronously and will take business decisions like storing the |
| email locally in a user mailbox. The behaviour of the MailetContainer is highly customizable thanks to the Mailets and |
| the Matcher composibility. |
| - The Mailbox component is responsible of storing a user's mails. |
| - The user can use the IMAP or the JMAP protocol to retrieve and read his mails. |
| |
| These components will be presented more in depth below. |
| |
| === Mail processing |
| |
| Mail processing allows to take asynchronously business decisions on |
| received emails. |
| |
| Here are its components: |
| |
| * The `spooler` takes mail out of the mailQueue and executes mail |
| processing within the `mailet container`. |
| * The `mailet container` synchronously executes the user defined logic. |
| This `logic' is written through the use of `mailet`, `matcher` and |
| `processor`. |
| * A `mailet` represents an action: mail modification, envelop |
| modification, a side effect, or stop processing. |
| * A `matcher` represents a condition to execute a mailet. |
| * A `processor` is a flow of pair of `matcher` and `mailet` executed |
| sequentially. The `ToProcessor` mailet is a `goto` instruction to start |
| executing another `processor` |
| * A `mail repository` allows storage of a mail as part of its |
| processing. Standard configuration relies on the following mail |
| repository: |
| ** `cassandra://var/mail/error/` : unexpected errors that occurred |
| during mail processing. Emails impacted by performance related |
| exceptions, or logical bug within James code are typically stored here. |
| These mails could be reprocessed once the cause of the error is fixed. |
| The `Mail.error` field can help diagnose the issue. Correlation with |
| logs can be achieved via the use of the `Mail.name` field. |
| ** `cassandra://var/mail/address-error/` : mail addressed to a |
| non-existing recipient of a handled local domain. These mails could be |
| reprocessed once the user is created, for instance. |
| ** `cassandra://var/mail/relay-denied/` : mail for whom relay was |
| denied: missing authentication can, for instance, be a cause. In |
| addition to prevent disasters upon miss configuration, an email review |
| of this mail repository can help refine a host spammer blacklist. |
| ** `cassandra://var/mail/rrt-error/` : runtime error upon Recipient |
| Rewriting occurred. This is typically due to a loop. |
| |
| === Mail Queue |
| |
| An email queue is a mandatory component of SMTP servers. It is a system |
| that creates a queue of emails that are waiting to be processed for |
| delivery. Email queuing is a form of Message Queuing – an asynchronous |
| service-to-service communication. A message queue is meant to decouple a |
| producing process from a consuming one. An email queue decouples email |
| reception from email processing. It allows them to communicate without |
| being connected. As such, the queued emails wait for processing until |
| the recipient is available to receive them. As James is an Email Server, |
| it also supports mail queue as well. |
| |
| ==== Why Mail Queue is necessary |
| |
| You might often need to check mail queue to make sure all emails are |
| delivered properly. At first, you need to know why email queues get |
| clogged. Here are the two core reasons for that: |
| |
| * Exceeded volume of emails |
| |
| Some mailbox providers enforce email rate limits on IP addresses. The |
| limits are based on the sender reputation. If you exceeded this rate and |
| queued too many emails, the delivery speed will decrease. |
| |
| * Spam-related issues |
| |
| Another common reason is that your email has been busted by spam |
| filters. The filters will let the emails gradually pass to analyze how |
| the rest of the recipients react to the message. If there is slow |
| progress, it’s okay. Your email campaign is being observed and assessed. |
| If it’s stuck, there could be different reasons including the blockage |
| of your IP address. |
| |
| ==== Why combining Cassandra, RabbitMQ and Object storage for MailQueue |
| |
| * RabbitMQ ensures the messaging function, and avoids polling. |
| * Cassandra enables administrative operations such as browsing, deleting |
| using a time series which might require fine performance tuning (see |
| http://cassandra.apache.org/doc/latest/operating/index.html[Operating |
| Casandra documentation]). |
| * Object Storage stores potentially large binary payload. |
| |
| However the current design do not implement delays. Delays allow to |
| define the time a mail have to be living in the mailqueue before being |
| dequeued and is used for example for exponential wait delays upon remote |
| delivery retries, or |
| |
| === Mailbox |
| |
| Storage for emails belonging for users. |
| |
| Metadata are stored in Cassandra while headers, bodies and attachments are stored |
| within the xref:#_blobstore[BlobStore]. |
| |
| ==== Search index |
| |
| Emails are indexed asynchronously in OpenSearch via the xref:#_event_bus[EventBus] |
| in order to empower advanced and fast email full text search. |
| |
| Text extraction can be set up using link:https://tika.apache.org/[Tika], allowing |
| to extract the text from attachment, allowing to search your emails based on the attachment |
| textual content. In such case, the OpenSearch indexer will call a Tika server prior |
| indexing. |
| |
| ==== Quotas |
| |
| Current Quotas of users are hold in a Cassandra projection. Limitations can be defined via |
| user, domain or globally. |
| |
| ==== Event Bus |
| |
| Distributed James relies on an event bus system to enrich mailbox capabilities. Each |
| operation performed on the mailbox will trigger related events, that can |
| be processed asynchronously by potentially any James node on a |
| distributed system. |
| |
| Many different kind of events can be triggered during a mailbox |
| operation, such as: |
| |
| * `MailboxEvent`: event related to an operation regarding a mailbox: |
| ** `MailboxDeletion`: a mailbox has been deleted |
| ** `MailboxAdded`: a mailbox has been added |
| ** `MailboxRenamed`: a mailbox has been renamed |
| ** `MailboxACLUpdated`: a mailbox got its rights and permissions updated |
| * `MessageEvent`: event related to an operation regarding a message: |
| ** `Added`: messages have been added to a mailbox |
| ** `Expunged`: messages have been expunged from a mailbox |
| ** `FlagsUpdated`: messages had their flags updated |
| ** `MessageMoveEvent`: messages have been moved from a mailbox to an |
| other |
| * `QuotaUsageUpdatedEvent`: event related to quota update |
| |
| Mailbox listeners can register themselves on this event bus system to be |
| called when an event is fired, allowing to do different kind of extra |
| operations on the system, like: |
| |
| * Current quota calculation |
| * Message indexation with OpenSearch |
| * Mailbox annotations cleanup |
| * Ham/spam reporting to Spam filtering system |
| * … |
| |
| ==== Deleted Messages Vault |
| |
| Deleted Messages Vault is an interesting feature that will help James |
| users have a chance to: |
| |
| * retain users deleted messages for some time. |
| * restore & export deleted messages by various criteria. |
| * permanently delete some retained messages. |
| |
| If the Deleted Messages Vault is enabled when users delete their mails, |
| and by that we mean when they try to definitely delete them by emptying |
| the trash, James will retain these mails into the Deleted Messages |
| Vault, before an email or a mailbox is going to be deleted. And only |
| administrators can interact with this component via |
| wref:webadmin.adoc#_deleted-messages-vault[WebAdmin] REST APIs]. |
| |
| However, mails are not retained forever as you have to configure a |
| retention period before using it (with one-year retention by default if |
| not defined). It’s also possible to permanently delete a mail if needed. |
| |
| === Data |
| |
| Storage for domains and users. |
| |
| Domains are persisted in Cassandra. |
| |
| Users can be managed in Cassandra, or via a LDAP (read only). |
| |
| === Recipient rewrite tables |
| |
| Storage of Recipients Rewriting rules, in Cassandra. |
| |
| ==== Mapping types |
| |
| James allows using various mapping types for better expressing the intent of your address rewriting logic: |
| |
| * *Domain mapping*: Rewrites the domain of mail addresses. Use it for technical purposes, user will not |
| be allowed to use the source in their FROM address headers. Domain mappings can be managed via the CLI and |
| added via xref:operate/webadmin.adoc#_domain_mappings[WebAdmin] |
| * *Domain aliases*: Rewrites the domain of mail addresses. Express the idea that both domains can be used |
| inter-changeably. User will be allowed to use the source in their FROM address headers. Domain aliases can |
| be managed via xref:operate/webadmin.adoc#_get_the_list_of_aliases_for_a_domain[WebAdmin] |
| * *Forwards*: Replaces the source address by another one. Vehicles the intent of forwarding incoming mails |
| to other users. Listing the forward source in the forward destinations keeps a local copy. User will not be |
| allowed to use the source in their FROM address headers. Forward can |
| be managed via xref:operate/webadmin.adoc#_address_forwards[WebAdmin] |
| * *Groups*: Replaces the source address by another one. Vehicles the intent of a group registration: group |
| address will be swapped by group member addresses (Feature poor mailing list). User will not be |
| allowed to use the source in their FROM address headers. Groups can |
| be managed via xref:operate/webadmin.adoc#_address_group[WebAdmin] |
| * *Aliases*: Replaces the source address by another one. Represents user owned mail address, with which |
| he can interact as if it was his main mail address. User will be allowed to use the source in their FROM |
| address headers. Aliases can be managed via xref:operate/webadmin.adoc#_address_aliases[WebAdmin] |
| * *Address mappings*: Replaces the source address by another one. Use for technical purposes, this mapping type do |
| not hold specific intent. Prefer using one of the above mapping types... User will not be allowed to use the source |
| in their FROM address headers. Address mappings can be managed via the CLI or via |
| xref:operate/webadmin.adoc#_address_mappings[WebAdmin] |
| * *Regex mappings*: Applies the regex on the supplied address. User will not be allowed to use the source |
| in their FROM address headers. Regex mappings can be managed via the CLI or via |
| xref:operate/webadmin.adoc#_regex_mapping[WebAdmin] |
| * *Error*: Throws an error upon processing. User will not be allowed to use the source |
| in their FROM address headers. Errors can be managed via the CLI |
| |
| === BlobStore |
| |
| Stores potentially large binary data. |
| |
| Mailbox component, Mail Queue component, Deleted Message Vault |
| component relies on it. |
| |
| Supported backends include S3 compatible ObjectStorage (link:https://wiki.openstack.org/wiki/Swift[Swift], S3 API). |
| |
| Encryption can be configured on top of ObjectStorage. |
| |
| Blobs can currently be deduplicated in order to reduce storage space. This means that two blobs with |
| the same content will be stored one once. |
| |
| The downside is that deletion is more complicated, and a garbage collection needs to be run. A first implementation |
| based on bloom filters can be used and triggered using the WebAdmin REST API. |
| |
| === Task Manager |
| |
| Allows to control and schedule long running tasks run by other |
| components. Among other it enables scheduling, progress monitoring, |
| cancellation of long running tasks. |
| |
| Distributed James leverage a task manager using Event Sourcing and RabbitMQ for messaging. |
| |
| === Event sourcing |
| |
| link:https://martinfowler.com/eaaDev/EventSourcing.html[Event sourcing] implementation |
| for the Distributed server stores events in Cassandra. It enables components |
| to rely on event sourcing technics for taking decisions. |
| |
| A short list of usage are: |
| |
| * Data leak prevention storage |
| * JMAP filtering rules storage |
| * Validation of the MailQueue configuration |
| * Sending email warnings to user close to their quota |
| * Implementation of the TaskManager |