tree 9e97c0b3e4075ba508618a241eb12c41e3a7613c
parent 43bd09c652889919ad888d31d4e3a5461c93f21d
author Junfan Zhang <zuston@apache.org> 1715225452 +0800
committer GitHub <noreply@github.com> 1715225452 +0800
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsFcBAABCAAQBQJmPENsCRC1aQ7uu5UhlAAAJ5gQAJZKXVPF/RLOa2DIpsUXQUhf
 PBjrFh9wwAT//opnmClCO/nIlmhOBD0kzUSJRGIjXz8F9IbxKGtKtAzWC1/mVbGt
 cUuV+9HS5l/B+TgUWLqbFprz9JC29qSIuoHBxjtBCPw6c7vwle12ixmf77RpY81r
 EsbyTHhElb6ig7Bp2k6Ep4Gya0v/sC7mOIAdk5YmdO3K5HXQnt1Pzf0b37f3T5gL
 fnWQusYvssAYI0qYuKU0BT2MVHOYhQUcl4kfdLCCGQau12vOC+k6WBG0Fp+xq2K/
 Q02kvC9sa+8Nrm7aiK0KoJLDSdS4EXMpd3VqQye/1cW2EcHphgFJOJwCEmUuxPih
 qDEJMCZO0dNrywzUnWCOVMSTRVMM+YrLzM1qaxs6jKgDh2ynsxEjv9FM0GOu306z
 nw65qy9XCjRuM/YoCnZlGwNM9voX+PQ0ifXLyrmvt1EUU9zxGRWt0iphS70Z7IcU
 nlng/nx6b3VT4j/RDYfcic06Cu+dkcqVll3SRETv0XphD0snAvnN/P8HcHj9q7kK
 16epAxEsFFYPNPf4CPhAyeHpzgtC+v0j7yd/UnPcxglpnQCupYnbG5pprFUu0rEb
 uTedkqq9X85qA6/IJfHwgXRiNCo3Nb6Tceh/5v7eirt18pWki7HG1ytdPxegM03u
 ycnepoJX+zLER31yjI+c
 =+LHO
 -----END PGP SIGNATURE-----
 

[#1608][part-5] feat(spark3): always use the available assignment (#1652)

### What changes were proposed in this pull request?

1. make the write client always use the latest available assignment for the following writing when the block reassign happens.
2. support multi time retry for partition reassign
3. limit the max reassign server num of one partition
4. refactor the reassign rpc
5. rename the faultyServer -> receivingFailureServer. 

#### Reassign whole process
![image](https://github.com/apache/incubator-uniffle/assets/8609142/8afa5386-be39-4ccb-9c10-95ffb3154939)

#### Always using the latest assignment

To acheive always using the latest assignment, I introduce the `TaskAttemptAssignment` to get the latest assignment for current task. The creating process of AddBlockEvent also will apply the latest assignment by `TaskAttemptAssignment` 

And it will be updated by the `reassignOnBlockSendFailure` rpc. 
That means the original reassign rpc response will be refactored and replaced by the whole latest `shuffleHandleInfo`.

### Why are the changes needed?

This PR is the subtask for #1608.

Leverging the #1615 / #1610 / #1609, we have implemented the reassign servers mechansim when write client encounters the server failure or unhealthy. But this is not good enough that will not share the faulty server state to the unstarted tasks and latter `AddBlockEvent` .

### Does this PR introduce _any_ user-facing change?

Yes. 

### How was this patch tested?

Unit and integration tests.

Integration tests as follows:
1. `PartitionBlockDataReassignBasicTest` to validate the reassign mechanism valid
2. `PartitionBlockDataReassignMultiTimesTest` is to test the partition reassign mechanism of multiple retries.

---------

Co-authored-by: Enrico Minack <github@enrico.minack.dev>