Fix persistent 409s in _replicator with serialize_worker_startup
Replicator state updates are written to the local shard copy only, and the rest
of the copies catch up via internal replicator. Internal replicator pushes can
be kind of slow to start (it's a a hold-off wait, could have backups, etc) and
with `serialize_worker_startup=true` with job owner nodes and primary update
nodes diverging (60% chance) user would be stuck getting 409s after trying to
delete or update docs, even after they get bona-fide latest rev from the quorum
doc get.
To fix that update the doc states (they are small) eagerly. After we update the
local shard copy, do an async update of all the copies right away. Use the same
calls and options as the internal replicator. There i still a small chance of a
409s but a quick retry should let the user make progress. This should also help
with serialize_worker_startup=false case as well to make the replicator state
update a bit quicker.
Fix #6029
1 file changed