Fix purge infos replicating to the wrong shards during shard splitting.

Previously, internal replicator (mem3_rep) replicated purge infos to/from all
the target shards. Instead, it should push/pull changes only to
appropriate ranges if those purge infos belong there based on database's hash
function.

Users experienced this error as a failure in database which contains purges,
which was split twice in a row. For example, if a Q=8 database is split to
Q=16, then split again from Q=16 to Q=32, the second split operation might fail
with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The
misplaced purge infos would be noticed only during the second split, when the
initial copy phase would crash because some purge infos do not hash to neither
one of the two target ranges. Moreover, the crash would lead to repeated
retries, which generated a huge job history log.

The fix consists of three improvements:

  1) Internal replicator is updated to filter purge infos based on the db hash.

  2) Account for the fact that some users' dbs might already contain misplaced
    purge infos. Since it's a known bug, we anticipate that error and ignore
    misplaced purge info during the second shard split operation with a warning
    emitted in the logs.

  3) Make similar range errors fatal, and emit a clear error in the logs and
     job history so any future range errors are immediately obvious.

Fixes #4624
4 files changed