Stop replications on target document write failures
Previously when target document writes failed, because a VDU prevented the
write, or it exceed size limits on the target cluster, replication continued
and a `doc_write_failures` statistic counter was incremented. The counter was
visible in _active_tasks output for continuous replications and was visible in
the completion record written back to the replication documents.
That behavior might not be suitable in many cases. For instance, when migrating
data, if the counter is ignored by the user, combined with a successful
replication completion could lead to a perceived data loss. Until recently
00b28c265d97df675b725cd68897dc371cbd7168 this was even worse because on
replication scheduler would reset statistics counters on every job start. So if
a job restarted at least once, user might never find out that all their data
didn't copy from the source to the target.
Introduce a replicator config `stop_on_doc_write_failure = true | false` where
the replication crashes if a single document write fails. This will many write
failures visible to the and they would know exactly which document failed and
Replications which crash because of doc write failures would retry periodically
just like any other crashes related to missing source or target connectivity,
for example. They do not fail permanently because users could adjust document
size limit on a target cluster or change the VDU function and then replication
will start working again and complete. Here we rely on exponential backoffs
which were introduced with the scheduling replicator work. At the maximum
backoff interval replications would retry about once every 8 hours.
After the failure is handled and bubles up to the main replication job process,
there is an attempt to checkpoint before exiting with the error. That is done
in order to reduce change feed reprocessing during each retry attempt.
6 files changed