tree 5003952196405d9c180d3e68eef34ca83f6095e0
parent 646e59089bc1fd881eff4cc0fb4070220d23dc86
author Jack Vanlightly <vanlightly@gmail.com> 1618250812 -0700
committer GitHub <noreply@github.com> 1618250812 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsBcBAABCAAQBQJgdIw8CRBK7hj4Ov3rIwAAdHIIAAaAksFQeB3r8TXdC/fOKxb7
 uToswFJbabcyueLNkp7W9rAHC14jQHNcJ2jLlDo9qmNBjh7CTwUclCAj9HU9LfG0
 +n6DRTlsbU7oqpzTK4Yw65Cjharrv4+OLUopoSJUv9zvN0NeY5XWsZqmTgHASIhH
 5RJcSpPeIwLFBZsAx35UcZrhQsGNXAcbs1VNZy1CldfMheBttWQywyOdFM5dyagk
 Lq8H6JazG4xg7AFEpEb+Gig08f6ch2XbjA+K0AgrFWxEA9Yl/1h/LugUkuVcfAxL
 fq0N4j220QWLqVmSSfHk+dXmIo85F806nL0xniLlXAPvueO7USoxSxziGMe3zIo=
 =qPpF
 -----END PGP SIGNATURE-----
 

ISSUE #2615: Fix for invalid ensemble issue during ledger recovery

Ensures that only entries of the current ensemble are included in the ledger recovery process, thus avoiding a ledger recovery failure scenario where it tries to append an ensemble with a lower first entry id than the prior ensemble.

Descriptions of the changes in this PR:

This PR includes a small change in the LedgerRecoveryOp that avoids a scenario where ledger recovery tries to create an invalid ensemble thereby failing. This could cause data unavailability for as long as trigger conditions last.

During ledger recovery, only entries of the current ensemble can be included in the read and write back phase. Prior ensembles, if any, are immutable. But it is possible, in a multi-ensemble ledger, for the current ensemble to return an LAC of -1. This then causes the recovery to read entries from prior ensembles and write them back to the current ensemble. This does not cause any data loss, but it is wasteful of both space and time. The main issue is that if an ensemble change occurs when writing back entries, it will try and create a new ensemble with first entry id of 0. This causes an IllegalStateException as there is a check before the CAS metadata op to ensure that the ensemble does not have an entry id lower than an existing ensemble.

If a bookie of the current ensemble were to be down, then the ledger would be unrecoverable until it became available again. 

The solution is that the lowest safe LAC for recovery is: first entry id of current ensemble - 1.

### Changes

Change to LedgerRecoveryOp as described above.
New unit test in LedgerRecoveryTest2.

Master Issue: #2615


Reviewers: Andrey Yegorov, Enrico Olivelli <eolivelli@gmail.com>, Flavio Junqueira

This closes #2654 from Vanlightly/fix-invalid-ensemble-change, closes #2615