Added retry mechanisms in Synchronizer.syncToPeerChain()

Until now, we required a perfect success rate when syncing with a peer via Synchronizer.syncToPeerChain(). Blocks were requested individually, but the node would give up and lose all progress if a single request failed. In practice, this happened very regularly, and it was difficult to succeed when there were a large number of blocks (e.g. 20+) that needed to be requested.

This commit adds two retry mechanisms, causing each of the two request types (block sigs and blocks) to retry 3 times before giving up, potentially avoiding a lot of wasted work. The number of retries is configurable in the MAXIMUM_RETRIES constant, which we could move to settings at some point if this feature proves useful.

The original issue seemed to result in a few side effects:

1. Nodes would spend a large amount of time requesting blocks from peers, only to throw it all away afterwards. This potentially added to network congestion, as nodes were using unnecessary network time to unproductively serve peers.

2. A large number of sync attempts were failing, particularly when a fork emerged with a significant number of divergent blocks (20+). This issue reduced the ability for nodes to sync to the correct chain while they still had time to do so. With every block that passed, it became made it more and more difficult to switch to the correct chain. Eventually, the correct chain would become TOO_DIVERGENT at which point there is no way to automatically switch without manual intervention. I hope that this retry mechanism will increase the chances of nodes automatically moving onto the right chain quickly, avoiding the need for a user to intervene.

3. The POST /admin/forcesync API was unlikely to succeed when the peer's chain had started to diverge from the user's chain. This should increase the success rate.

Also included in this commit is a MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST constant. This limits the number of block sigs requested in each batch (default 200). Without this, we are unable to increase MAXIMUM_COMMON_DELTA because it can try and request thousands of block sigs at once, which unsurprisingly doesn't succeed.
This commit is contained in:
CalDescent 2021-03-21 09:41:36 +00:00
parent f92f4dc1e2
commit f1422af95b

View File

@ -43,6 +43,8 @@ public class Synchronizer {
private static final int MAXIMUM_BLOCK_STEP = 500;
private static final int MAXIMUM_COMMON_DELTA = 240; // XXX move to Settings?
private static final int SYNC_BATCH_SIZE = 200;
private static final int MAXIMUM_RETRIES = 3; // Number of retry attempts if a peer fails to respond with the requested data
private static final int MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST = 200; // Limit the amount of block signatures in a single request
private static Synchronizer instance;
@ -356,38 +358,69 @@ public class Synchronizer {
int numberSignaturesRequired = additionalPeerBlocksAfterCommonBlock - peerBlockSignatures.size();
// Fetch remaining block signatures, if needed
if (numberSignaturesRequired > 0) {
int retryCount = 0;
while (numberSignaturesRequired > 0) {
byte[] latestPeerSignature = peerBlockSignatures.isEmpty() ? commonBlockSig : peerBlockSignatures.get(peerBlockSignatures.size() - 1);
int lastPeerHeight = commonBlockHeight + peerBlockSignatures.size();
int numberOfSignaturesToRequest = Math.min(numberSignaturesRequired, MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST);
LOGGER.trace(String.format("Requesting %d signature%s after height %d, sig %.8s",
numberSignaturesRequired, (numberSignaturesRequired != 1 ? "s": ""), lastPeerHeight, Base58.encode(latestPeerSignature)));
numberOfSignaturesToRequest, (numberOfSignaturesToRequest != 1 ? "s": ""), lastPeerHeight, Base58.encode(latestPeerSignature)));
List<byte[]> moreBlockSignatures = this.getBlockSignatures(peer, latestPeerSignature, numberSignaturesRequired);
List<byte[]> moreBlockSignatures = this.getBlockSignatures(peer, latestPeerSignature, numberOfSignaturesToRequest);
if (moreBlockSignatures == null || moreBlockSignatures.isEmpty()) {
LOGGER.info(String.format("Peer %s failed to respond with more block signatures after height %d, sig %.8s", peer,
lastPeerHeight, Base58.encode(latestPeerSignature)));
return SynchronizationResult.NO_REPLY;
if (retryCount >= MAXIMUM_RETRIES) {
// Give up with this peer
return SynchronizationResult.NO_REPLY;
}
else {
// Retry until retryCount reaches MAXIMUM_RETRIES
retryCount++;
int triesRemaining = MAXIMUM_RETRIES - retryCount;
LOGGER.info(String.format("Re-issuing request to peer %s (%d attempt%s remaining)", peer, triesRemaining, (triesRemaining != 1 ? "s": "")));
continue;
}
}
// Reset retryCount because the last request succeeded
retryCount = 0;
LOGGER.trace(String.format("Received %s signature%s", peerBlockSignatures.size(), (peerBlockSignatures.size() != 1 ? "s" : "")));
peerBlockSignatures.addAll(moreBlockSignatures);
numberSignaturesRequired = additionalPeerBlocksAfterCommonBlock - peerBlockSignatures.size();
}
// Fetch blocks using signatures
LOGGER.debug(String.format("Fetching new blocks from peer %s after height %d", peer, commonBlockHeight));
List<Block> peerBlocks = new ArrayList<>();
for (byte[] blockSignature : peerBlockSignatures) {
retryCount = 0;
while (peerBlocks.size() < peerBlockSignatures.size()) {
byte[] blockSignature = peerBlockSignatures.get(peerBlocks.size());
LOGGER.debug(String.format("Fetching block with signature %s", blockSignature));
int blockHeightToRequest = commonBlockHeight + peerBlocks.size() + 1; // +1 because we are requesting the next block, beyond what we already have in the peerBlocks array
Block newBlock = this.fetchBlock(repository, peer, blockSignature);
if (newBlock == null) {
LOGGER.info(String.format("Peer %s failed to respond with block for height %d, sig %.8s", peer,
blockHeightToRequest, Base58.encode(blockSignature)));
return SynchronizationResult.NO_REPLY;
LOGGER.info(String.format("Peer %s failed to respond with block for height %d, sig %.8s", peer, blockHeightToRequest, Base58.encode(blockSignature)));
if (retryCount >= MAXIMUM_RETRIES) {
// Give up with this peer
return SynchronizationResult.NO_REPLY;
}
else {
// Retry until retryCount reaches MAXIMUM_RETRIES
retryCount++;
int triesRemaining = MAXIMUM_RETRIES - retryCount;
LOGGER.info(String.format("Re-issuing request to peer %s (%d attempt%s remaining)", peer, triesRemaining, (triesRemaining != 1 ? "s": "")));
continue;
}
}
if (!newBlock.isSignatureValid()) {
@ -396,6 +429,11 @@ public class Synchronizer {
return SynchronizationResult.INVALID_DATA;
}
// Reset retryCount because the last request succeeded
retryCount = 0;
LOGGER.debug(String.format("Received block with height %d, sig: %.8s", newBlock.getBlockData().getHeight(), newBlock.getSignature()));
// Transactions are transmitted without approval status so determine that now
for (Transaction transaction : newBlock.getTransactions())
transaction.setInitialApprovalStatus();