39. Distributed blob garbage collector
Date: 2020-02-18
Status
Proposed, not implemented yet.
Work had been started on this topic.
An alternative is proposed in Deduplicated blobs GC with bloom filters and implemented.
Context
The body, headers, attachments of the mails are stored as blobs in a blob store. In order to save space in those stores, those blobs are de-duplicated using a hash of their content. To attain that the current blob store will read the content of the blob before saving it, and generate its id based on a hash of this content. This way two blobs with the same content will share the same id and thus be saved only once. This makes the safe deletion of one of those blobs a non trivial problem as we can't delete one blob without ensuring that all references to it are themselves deleted. For example if two messages share the same blob, when we delete one message there is at the time being no way to tell if the blob is still referenced by another message.
Decision
To address this issue, we propose to implement a distributed blob garbage collector built upon the previously developed Distributed Task Manager. The de-duplicating blob store will keep track of the references pointing toward a blob in a References table. It will also keep track of the deletion requests for a blob in a Deletions table. When the garbage collector algorithm runs it will fetch from the Deletions table the blobs considered to be effectively deleted, and will check in the References table if there are still some references to them. If there is no more reference to a blob, it will be effectively deleted from the blob store.
To avoid concurrency issues, where we could garbage collect a blob at the same time a new reference to it appear, a reference generation notion will be added. The de-duplicating id of the blobs which before where constructed using only the hash of their content, will now include this reference generation too. At a given interval a new reference generation will be emitted, since then all new blobs will point to this new generation.
So a garbage collection iteration will run only on the reference generation n-2 to avoid concurrency issues.
The switch of generation will be triggered by a task running on the distributed task manager. This task will emit an event into the event sourcing system to increment the reference generation.
Alternatives
Not de-duplicating the blobs' content, this simple approach which involves storing the same blob a lot of times can in some scenario be really slow and costly. Albeit it can in some case be preferred for the sake of simplicity, data security...
Consequences
This change will necessitate to extract the base blob store responsibilities (store a blob, delete a blob, read a blob) from the current blob store implementation which is doing the de-duplication, id generation... The garbage collector will use this low level blob store in order to effectively delete the blobs.
One other consequence of this work, is the fact that there will be no de-duplication on different reference generation, i.e two blobs with the same content will be stored twice now, if they were created during two different reference generation.
When writing a blob into the de-duplicating blob store, we will need to specify the reference to the object (MessageId, AttachmentId...) we store the blob for. This can make some components harder to implement as we will have to propagate the references.
Since we will not build a distributed task scheduler. To increment the reference generation and launch periodically a garbage collection iteration, the scheduling will be done by an external scheduler (cron job, kubernetes cronjob ...) which will call a webadmin endpoint to launch this task periodically.
Algorithm visualisation
Generation 1 and Iteration 1
- Events
rg1 reference generation is emittedgci1 garbage collection iteration is emitted- An email is sent to
user1, a m1 message, and a blob b1 are stored with rg1 - An email is sent to
user1 and user2, m2 and m3 messages, and a blob b2 are stored with rg1
Tables
Generations
| reference generation id |
|---|
| rg1 |
| garbage collection iteration id |
|---|
| gci1 |
Blobs
| blob id | reference generation id |
|---|
| b1 | rg1 |
| b2 | rg1 |
References
| message id | blob id | reference generation id |
|---|
| m1 | b1 | rg1 |
| m2 | b2 | rg1 |
| m3 | b2 | rg1 |
Deletions
Empty
Generation 2 / Iteration 2
- Events
rg2 reference generation is emittedgci2 garbage collection iteration is emitted- An email is sent to
user1, a m4 message, and a blob b3 are stored with rg2 - An email is sent to
user1 and user2, m5 and m6 messages, and a blob b4 are stored with rg2
Tables
Generations
| reference generation id |
|---|
| rg1 |
| rg2 |
| garbage collection iteration id |
|---|
| gci1 |
| gci2 |
Blobs
| blob id | reference generation id |
|---|
| b1 | rg1 |
| b2 | rg1 |
| b3 | rg2 |
| b4 | rg2 |
References
| message id | blob id | reference generation id |
|---|
| m1 | b1 | rg1 |
| m2 | b2 | rg1 |
| m3 | b2 | rg1 |
| m4 | b3 | rg2 |
| m5 | b4 | rg2 |
| m6 | b4 | rg2 |
Deletions
Empty
Generation 3 / Iteration 3
- Events
rg3 reference generation is emittedgci3 garbage collection iteration is emitted- An email is sent to
user1, a m7 message, and a blob b5 are stored with rg3 - An email is sent to
user1 and user2, m8 and m9 messages, and a blob b6 are stored with rg3 user1 deletes m1, m2, m7, and m8 with gi3user2 deletes m3 with gi3
Tables: before deletions
Generations
| reference generation id |
|---|
| rg1 |
| rg2 |
| rg3 |
| garbage collection iteration id |
|---|
| gci1 |
| gci2 |
| gci3 |
Blobs
| blob id | reference generation id |
|---|
| b1 | rg1 |
| b2 | rg1 |
| b3 | rg2 |
| b4 | rg2 |
| b5 | rg3 |
| b6 | rg3 |
References
| message id | blob id | reference generation id |
|---|
| m1 | b1 | rg1 |
| m2 | b2 | rg1 |
| m3 | b2 | rg1 |
| m4 | b3 | rg2 |
| m5 | b4 | rg2 |
| m6 | b4 | rg2 |
| m7 | b5 | rg3 |
| m8 | b6 | rg3 |
| m9 | b6 | rg3 |
Deletions
Empty
Tables: after deletions
Generations
| reference generation id |
|---|
| rg1 |
| rg2 |
| rg3 |
| garbage collection iteration id |
|---|
| gci1 |
| gci2 |
| gci3 |
Blobs
| blob id | reference generation id |
|---|
| b1 | rg1 |
| b2 | rg1 |
| b3 | rg2 |
| b4 | rg2 |
| b5 | rg3 |
| b6 | rg3 |
References
| message id | blob id | reference generation id |
|---|
| m4 | b3 | rg2 |
| m5 | b4 | rg2 |
| m6 | b4 | rg2 |
| m9 | b6 | rg3 |
Deletions
| blob id | reference generation id | date | garbage collection iteration id |
|---|
| b1 | rg1 | 10:42 | gci3 |
| b2 | rg1 | 10:42 | gci3 |
| b2 | rg1 | 13:37 | gci3 |
| b5 | rg3 | 10:42 | gci3 |
| b6 | rg3 | 10:42 | gci3 |
Running the algorithm
- fetch
Deletions for gci3 in deletions - find distinct
reference-generation-id of deletions in generations = {rg1, rg3} - For each generation
- rg1
- filter
deletions to keep only rg1 entries and extract blob-ids in concernedBlobs = {b1, b2} - fetch all references to
concernedBlobs and build a Bloom-Filter in foundedReferences = {} - filter
concernedBlobs to keep only those which are not present in foundedReferences in blobsToDelete = {b1, b2} - Remove
blobsToDelete from Blobs and Deletions
- rg3
- filter
deletions to keep only rg3 entries and extract blob-ids in concernedBlobs = {b5, b6} - fetch all references to
concernedBlobs and build a Bloom-Filter in foundedReferences = {b6} - filter
concernedBlobs to keep only those which are not present in foundedReferences in blobsToDelete = {b5} - Remove
blobsToDelete from Blobs and Deletions
Tables: after garbage collection
Generations
| reference generation id |
|---|
| rg1 |
| rg2 |
| rg3 |
| garbage collection iteration id |
|---|
| gci1 |
| gci2 |
| gci3 |
Blobs
| blob id | reference generation id |
|---|
| b3 | rg2 |
| b4 | rg2 |
| b6 | rg3 |
References
| message id | blob id | generation id |
|---|
| m4 | b3 | g2 |
| m5 | b4 | g2 |
| m6 | b4 | g2 |
| m9 | b6 | g3 |
Deletions
| blob id | reference generation id | date | garbage collection iteration id |
|---|
| b6 | rg3 | 10:42 | gci3 |
Generations 4
- Events
rg4 reference generation is emittedgci4 garbage collection iteration is emitteduser2 deletes m9 with gcg4
Tables: before deletions
Generations
| reference generation id |
|---|
| rg1 |
| rg2 |
| rg3 |
| rg4 |
| garbage collection iteration id |
|---|
| gci1 |
| gci2 |
| gci3 |
| gci4 |
Blobs
| blob id | reference generation id |
|---|
| b3 | rg2 |
| b4 | rg2 |
| b6 | rg3 |
References
| message id | blob id | reference generation id |
|---|
| m4 | b3 | rg2 |
| m5 | b4 | rg2 |
| m6 | b4 | rg2 |
| m9 | b6 | rg3 |
Deletions
| blob id | reference generation id | date | garbage collection iteration id |
|---|
| b6 | rg3 | 10:42 | gci3 |
Tables: after deletions
Generations
| reference generation id |
|---|
| rg1 |
| rg2 |
| rg3 |
| rg4 |
| garbage collection iteration id |
|---|
| gci1 |
| gci2 |
| gci3 |
| gci4 |
Blobs
| blob id | reference generation id |
|---|
| b3 | rg2 |
| b4 | rg2 |
| b6 | rg3 |
References
| message id | blob id | reference generation id |
|---|
| m4 | b3 | rg2 |
| m5 | b4 | rg2 |
| m6 | b4 | rg2 |
Deletions
| blob id | reference generation id | date | garbage collection iteration id |
|---|
| b6 | rg3 | 10:42 | gci3 |
| b6 | rg3 | 18:42 | gci4 |
Running the algorithm
- fetch
Deletions for gci4 in deletions - find distinct
generation-id of deletions in generations = {rg3} - For each generation
- rg3
- filter
deletions to keep only rg3 entries and extract blob-ids in concernedBlobs = {b6} - fetch all references to
concernedBlobs and build a Bloom-Filter in foundedReferences = {} - filter
concernedBlobs to keep only those which are not present in foundedReferences in blobsToDelete = {b6} - Remove
blobsToDelete from Blobs and Deletions
Tables: after garbage collection
Generations
| reference generation id |
|---|
| rg1 |
| rg2 |
| rg3 |
| rg4 |
| garbage collection iteration id |
|---|
| gci1 |
| gci2 |
| gci3 |
| gci4 |
Blobs
| blob id | reference generation id |
|---|
| b3 | rg2 |
| b4 | rg2 |
References
| message id | blob id | reference generation id |
|---|
| m4 | b3 | rg2 |
| m5 | b4 | rg2 |
| m6 | b4 | rg2 |
Deletions
Empty