blob: bf5ff120b29bf483ec3426fbfcb2caa811702f7e [file] [log] [blame]
= 11. Disable ElasticSearch source
Date: 2019-10-17
== Status
Rejected
The benefits do not outweigh the costs.
== Context
Though very handy to have around, the source field does incur storage overhead within the index.
== Decision
Disable `_source` for ElasticSearch indexed documents.
== Consequences
Given a dataset composed of small text/plain messages, we notice a 20% space reduction of data stored on ElasticSearch.
However, patch updates can no longer be performed upon flags updates.
Upon flag update we need to fully read the mail content, then mime-parse it, potentially html parse it, extract attachment content again and finally index again the full document.
Without `_source` field, flags update is two times slower, 99 percentile 4 times slower, and this impact negatively other requests.
Note please that `_source` allows admin flexibility like performing index level changes without downtime, amongst others:
* Increase shards
* Modifying replication factor
* Changing analysers (IE allows an admin to configure FR analyser instead of EN analyser)
== References
* https://www.elastic.co/guide/en/elasticsearch/reference/6.3/mapping-source-field.html
* https://issues.apache.org/jira/browse/JAMES-2906[JIRA]