Compare commits

...

13 Commits

Author SHA1 Message Date
CalDescent
b9a0d489d7 Bump version to 1.4.5 2021-03-21 17:06:10 +00:00
catbref
d9d4c4c302 Bump Peer response timeout from 2s to 3s 2021-03-21 16:17:40 +00:00
catbref
81c6d75d62 Adjust Synchronizer.MAXIMUM_BLOCK_STEP to 128, which means final summaries request will have enough to cover MAXIMUM_COMMON_DELTA (8+16+32+64+128 = 248, which is >240) 2021-03-21 16:12:41 +00:00
catbref
d1419bdfbd Minor comments, adjust max step size when searching for common block 2021-03-21 15:57:00 +00:00
CalDescent
8566d9b7e5 Merge branch 'master' into synchronization-improvements 2021-03-21 15:04:43 +00:00
catbref
b319d6db6b Rework BlockMessage caching with new pseudo outgoing-only message that only caches raw bytes 2021-03-21 14:14:15 +00:00
CalDescent
35fd1d8455 Base58 encode signatures in recently added logs. 2021-03-21 14:12:04 +00:00
CalDescent
be21771e49 Use SYNC_BATCH_SIZE instead of MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST. 2021-03-21 13:58:42 +00:00
catbref
745528a9b1 Peer.sendMessage() should return false when it can't send because it can't build the message 2021-03-21 13:19:59 +00:00
CalDescent
f1422af95b Added retry mechanisms in Synchronizer.syncToPeerChain()
Until now, we required a perfect success rate when syncing with a peer via Synchronizer.syncToPeerChain(). Blocks were requested individually, but the node would give up and lose all progress if a single request failed. In practice, this happened very regularly, and it was difficult to succeed when there were a large number of blocks (e.g. 20+) that needed to be requested.

This commit adds two retry mechanisms, causing each of the two request types (block sigs and blocks) to retry 3 times before giving up, potentially avoiding a lot of wasted work. The number of retries is configurable in the MAXIMUM_RETRIES constant, which we could move to settings at some point if this feature proves useful.

The original issue seemed to result in a few side effects:

1. Nodes would spend a large amount of time requesting blocks from peers, only to throw it all away afterwards. This potentially added to network congestion, as nodes were using unnecessary network time to unproductively serve peers.

2. A large number of sync attempts were failing, particularly when a fork emerged with a significant number of divergent blocks (20+). This issue reduced the ability for nodes to sync to the correct chain while they still had time to do so. With every block that passed, it became made it more and more difficult to switch to the correct chain. Eventually, the correct chain would become TOO_DIVERGENT at which point there is no way to automatically switch without manual intervention. I hope that this retry mechanism will increase the chances of nodes automatically moving onto the right chain quickly, avoiding the need for a user to intervene.

3. The POST /admin/forcesync API was unlikely to succeed when the peer's chain had started to diverge from the user's chain. This should increase the success rate.

Also included in this commit is a MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST constant. This limits the number of block sigs requested in each batch (default 200). Without this, we are unable to increase MAXIMUM_COMMON_DELTA because it can try and request thousands of block sigs at once, which unsurprisingly doesn't succeed.
2021-03-21 09:41:36 +00:00
CalDescent
f92f4dc1e2 Fixed some log entries in Controller.syncToPeerChain() which were incorrectly reporting our height instead of the height of block(s) being requested from the peer. Now reporting the height of the block (or block sigs) being retrieved, which should make it easier to interpret the logs. 2021-03-20 16:18:25 +00:00
catbref
019cfdc1db Minor comment re-org 2021-03-20 11:45:11 +00:00
CalDescent
e694a51cdd Fix for "numberSignaturesRequired" calculation error in Synchronizer.syncToPeerChain()
This bug often prevented the correct amount of block signatures (and blocks) from being requested from a peer, when trying to sync to it.

It could result in quite serious consequences, as it would trigger orphaning back to the common block without first requesting all of the necessary blocks from the peer's chain. Rather than applying a complete copy of the peer's chain, it could orphan back to the common block and then only apply a few blocks beyond that, leaving the node in an unexpected state, potentially hundreds of blocks behind the peer's current height, which it then has to try and obtain from other peers.

When there are forks present, this could result in it hopping from chain to chain, each time being unable to fully synchronise with the peer. Given that we currently discard our chain if it is deemed that our latest block isn't "recent", it is very important that nodes are brought up to the latest block when synchronising with a peer, to avoid constantly triggering discards.

The severity of this bug increased when there was a large disparity between the peer's latest block and the common block height, and prevented us from being able to increase MAXIMUM_COMMON_DELTA.
2021-03-20 10:33:23 +00:00
5 changed files with 155 additions and 25 deletions

View File

@@ -3,7 +3,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.qortal</groupId>
<artifactId>qortal</artifactId>
<version>1.4.4</version>
<version>1.4.5</version>
<packaging>jar</packaging>
<properties>
<skipTests>true</skipTests>

View File

@@ -67,8 +67,8 @@ import org.qortal.gui.SysTray;
import org.qortal.network.Network;
import org.qortal.network.Peer;
import org.qortal.network.message.ArbitraryDataMessage;
import org.qortal.network.message.BlockMessage;
import org.qortal.network.message.BlockSummariesMessage;
import org.qortal.network.message.CachedBlockMessage;
import org.qortal.network.message.GetArbitraryDataMessage;
import org.qortal.network.message.GetBlockMessage;
import org.qortal.network.message.GetBlockSummariesMessage;
@@ -148,9 +148,9 @@ public class Controller extends Thread {
/** Cache of BlockMessages, indexed by block signature */
@SuppressWarnings("serial")
private final LinkedHashMap<ByteArray, BlockMessage> blockMessageCache = new LinkedHashMap<>() {
private final LinkedHashMap<ByteArray, CachedBlockMessage> blockMessageCache = new LinkedHashMap<>() {
@Override
protected boolean removeEldestEntry(Map.Entry<ByteArray, BlockMessage> eldest) {
protected boolean removeEldestEntry(Map.Entry<ByteArray, CachedBlockMessage> eldest) {
return this.size() > Settings.getInstance().getBlockCacheSize();
}
};
@@ -1151,7 +1151,7 @@ public class Controller extends Thread {
ByteArray signatureAsByteArray = new ByteArray(signature);
BlockMessage cachedBlockMessage = this.blockMessageCache.get(signatureAsByteArray);
CachedBlockMessage cachedBlockMessage = this.blockMessageCache.get(signatureAsByteArray);
int blockCacheSize = Settings.getInstance().getBlockCacheSize();
// Check cached latest block message
@@ -1159,7 +1159,7 @@ public class Controller extends Thread {
this.stats.getBlockMessageStats.cacheHits.incrementAndGet();
// We need to duplicate it to prevent multiple threads setting ID on the same message
BlockMessage clonedBlockMessage = cachedBlockMessage.cloneWithNewId(message.getId());
CachedBlockMessage clonedBlockMessage = cachedBlockMessage.cloneWithNewId(message.getId());
if (!peer.sendMessage(clonedBlockMessage))
peer.disconnect("failed to send block");
@@ -1187,12 +1187,15 @@ public class Controller extends Thread {
Block block = new Block(repository, blockData);
BlockMessage blockMessage = new BlockMessage(block);
CachedBlockMessage blockMessage = new CachedBlockMessage(block);
blockMessage.setId(message.getId());
// This call also causes the other needed data to be pulled in from repository
if (!peer.sendMessage(blockMessage))
if (!peer.sendMessage(blockMessage)) {
peer.disconnect("failed to send block");
// Don't fall-through to caching because failure to send might be from failure to build message
return;
}
// If request is for a recent block, cache it
if (getChainHeight() - blockData.getHeight() <= blockCacheSize) {

View File

@@ -39,10 +39,23 @@ public class Synchronizer {
private static final Logger LOGGER = LogManager.getLogger(Synchronizer.class);
/** Max number of new blocks we aim to add to chain tip in each sync round */
private static final int SYNC_BATCH_SIZE = 200; // XXX move to Settings?
/** Initial jump back of block height when searching for common block with peer */
private static final int INITIAL_BLOCK_STEP = 8;
private static final int MAXIMUM_BLOCK_STEP = 500;
/** Maximum jump back of block height when searching for common block with peer */
private static final int MAXIMUM_BLOCK_STEP = 128;
/** Maximum difference in block height between tip and peer's common block before peer is considered TOO DIVERGENT */
private static final int MAXIMUM_COMMON_DELTA = 240; // XXX move to Settings?
private static final int SYNC_BATCH_SIZE = 200;
/** Maximum number of block signatures we ask from peer in one go */
private static final int MAXIMUM_REQUEST_SIZE = 200; // XXX move to Settings?
/** Number of retry attempts if a peer fails to respond with the requested data */
private static final int MAXIMUM_RETRIES = 3; // XXX move to Settings?
private static Synchronizer instance;
@@ -350,46 +363,88 @@ public class Synchronizer {
// Convert any leftover (post-common) block summaries into signatures to request from peer
List<byte[]> peerBlockSignatures = peerBlockSummaries.stream().map(BlockSummaryData::getSignature).collect(Collectors.toList());
// Calculate the total number of additional blocks this peer has beyond the common block
int additionalPeerBlocksAfterCommonBlock = peerHeight - commonBlockHeight;
// Subtract the number of signatures that we already have, as we don't need to request them again
int numberSignaturesRequired = additionalPeerBlocksAfterCommonBlock - peerBlockSignatures.size();
// Fetch remaining block signatures, if needed
int numberSignaturesRequired = peerBlockSignatures.size() - (peerHeight - commonBlockHeight);
if (numberSignaturesRequired > 0) {
int retryCount = 0;
while (numberSignaturesRequired > 0) {
byte[] latestPeerSignature = peerBlockSignatures.isEmpty() ? commonBlockSig : peerBlockSignatures.get(peerBlockSignatures.size() - 1);
int lastPeerHeight = commonBlockHeight + peerBlockSignatures.size();
int numberOfSignaturesToRequest = Math.min(numberSignaturesRequired, MAXIMUM_REQUEST_SIZE);
LOGGER.trace(String.format("Requesting %d signature%s after height %d, sig %.8s",
numberSignaturesRequired, (numberSignaturesRequired != 1 ? "s": ""), ourHeight, Base58.encode(latestPeerSignature)));
numberOfSignaturesToRequest, (numberOfSignaturesToRequest != 1 ? "s": ""), lastPeerHeight, Base58.encode(latestPeerSignature)));
List<byte[]> moreBlockSignatures = this.getBlockSignatures(peer, latestPeerSignature, numberSignaturesRequired);
List<byte[]> moreBlockSignatures = this.getBlockSignatures(peer, latestPeerSignature, numberOfSignaturesToRequest);
if (moreBlockSignatures == null || moreBlockSignatures.isEmpty()) {
LOGGER.info(String.format("Peer %s failed to respond with more block signatures after height %d, sig %.8s", peer,
ourHeight, Base58.encode(latestPeerSignature)));
return SynchronizationResult.NO_REPLY;
lastPeerHeight, Base58.encode(latestPeerSignature)));
if (retryCount >= MAXIMUM_RETRIES) {
// Give up with this peer
return SynchronizationResult.NO_REPLY;
}
else {
// Retry until retryCount reaches MAXIMUM_RETRIES
retryCount++;
int triesRemaining = MAXIMUM_RETRIES - retryCount;
LOGGER.info(String.format("Re-issuing request to peer %s (%d attempt%s remaining)", peer, triesRemaining, (triesRemaining != 1 ? "s": "")));
continue;
}
}
// Reset retryCount because the last request succeeded
retryCount = 0;
LOGGER.trace(String.format("Received %s signature%s", peerBlockSignatures.size(), (peerBlockSignatures.size() != 1 ? "s" : "")));
peerBlockSignatures.addAll(moreBlockSignatures);
numberSignaturesRequired = additionalPeerBlocksAfterCommonBlock - peerBlockSignatures.size();
}
// Fetch blocks using signatures
LOGGER.debug(String.format("Fetching new blocks from peer %s", peer));
LOGGER.debug(String.format("Fetching new blocks from peer %s after height %d", peer, commonBlockHeight));
List<Block> peerBlocks = new ArrayList<>();
for (byte[] blockSignature : peerBlockSignatures) {
retryCount = 0;
while (peerBlocks.size() < peerBlockSignatures.size()) {
byte[] blockSignature = peerBlockSignatures.get(peerBlocks.size());
LOGGER.debug(String.format("Fetching block with signature %.8s", Base58.encode(blockSignature)));
int blockHeightToRequest = commonBlockHeight + peerBlocks.size() + 1; // +1 because we are requesting the next block, beyond what we already have in the peerBlocks array
Block newBlock = this.fetchBlock(repository, peer, blockSignature);
if (newBlock == null) {
LOGGER.info(String.format("Peer %s failed to respond with block for height %d, sig %.8s", peer,
ourHeight, Base58.encode(blockSignature)));
return SynchronizationResult.NO_REPLY;
LOGGER.info(String.format("Peer %s failed to respond with block for height %d, sig %.8s", peer, blockHeightToRequest, Base58.encode(blockSignature)));
if (retryCount >= MAXIMUM_RETRIES) {
// Give up with this peer
return SynchronizationResult.NO_REPLY;
}
else {
// Retry until retryCount reaches MAXIMUM_RETRIES
retryCount++;
int triesRemaining = MAXIMUM_RETRIES - retryCount;
LOGGER.info(String.format("Re-issuing request to peer %s (%d attempt%s remaining)", peer, triesRemaining, (triesRemaining != 1 ? "s": "")));
continue;
}
}
if (!newBlock.isSignatureValid()) {
LOGGER.info(String.format("Peer %s sent block with invalid signature for height %d, sig %.8s", peer,
ourHeight, Base58.encode(blockSignature)));
blockHeightToRequest, Base58.encode(blockSignature)));
return SynchronizationResult.INVALID_DATA;
}
// Reset retryCount because the last request succeeded
retryCount = 0;
LOGGER.debug(String.format("Received block with height %d, sig: %.8s", newBlock.getBlockData().getHeight(), Base58.encode(blockSignature)));
// Transactions are transmitted without approval status so determine that now
for (Transaction transaction : newBlock.getTransactions())
transaction.setInitialApprovalStatus();
@@ -425,7 +480,7 @@ public class Synchronizer {
ValidationResult blockResult = newBlock.isValid();
if (blockResult != ValidationResult.OK) {
LOGGER.info(String.format("Peer %s sent invalid block for height %d, sig %.8s: %s", peer,
ourHeight, Base58.encode(newBlock.getSignature()), blockResult.name()));
newBlock.getBlockData().getHeight(), Base58.encode(newBlock.getSignature()), blockResult.name()));
return SynchronizationResult.INVALID_DATA;
}
@@ -469,7 +524,8 @@ public class Synchronizer {
// Do we need more signatures?
if (peerBlockSignatures.isEmpty()) {
int numberRequested = maxBatchHeight - ourHeight;
int numberRequested = Math.min(maxBatchHeight - ourHeight, MAXIMUM_REQUEST_SIZE);
LOGGER.trace(String.format("Requesting %d signature%s after height %d, sig %.8s",
numberRequested, (numberRequested != 1 ? "s": ""), ourHeight, Base58.encode(latestPeerSignature)));

View File

@@ -46,7 +46,7 @@ public class Peer {
private static final int CONNECT_TIMEOUT = 2000; // ms
/** Maximum time to wait for a message reply to arrive from peer. (ms) */
private static final int RESPONSE_TIMEOUT = 2000; // ms
private static final int RESPONSE_TIMEOUT = 3000; // ms
/**
* Interval between PING messages to a peer. (ms)
@@ -507,6 +507,7 @@ public class Peer {
}
} catch (MessageException e) {
LOGGER.warn(String.format("Failed to send %s message with ID %d to peer %s: %s", message.getType().name(), message.getId(), this, e.getMessage()));
return false;
} catch (IOException e) {
// Send failure
return false;

View File

@@ -0,0 +1,70 @@
package org.qortal.network.message;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import org.qortal.block.Block;
import org.qortal.transform.TransformationException;
import org.qortal.transform.block.BlockTransformer;
import com.google.common.primitives.Ints;
// This is an OUTGOING-only Message which more readily lends itself to being cached
public class CachedBlockMessage extends Message {
private Block block = null;
private byte[] cachedBytes = null;
public CachedBlockMessage(Block block) {
super(MessageType.BLOCK);
this.block = block;
}
private CachedBlockMessage(byte[] cachedBytes) {
super(MessageType.BLOCK);
this.block = null;
this.cachedBytes = cachedBytes;
}
public static Message fromByteBuffer(int id, ByteBuffer byteBuffer) throws UnsupportedEncodingException {
throw new UnsupportedOperationException("CachedBlockMessage is for outgoing messages only");
}
@Override
protected byte[] toData() {
// Already serialized?
if (this.cachedBytes != null)
return cachedBytes;
if (this.block == null)
return null;
try {
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
bytes.write(Ints.toByteArray(this.block.getBlockData().getHeight()));
bytes.write(BlockTransformer.toBytes(this.block));
this.cachedBytes = bytes.toByteArray();
// We no longer need source Block
// and Block contains repository handle which is highly likely to be invalid after this call
this.block = null;
return this.cachedBytes;
} catch (TransformationException | IOException e) {
return null;
}
}
public CachedBlockMessage cloneWithNewId(int newId) {
CachedBlockMessage clone = new CachedBlockMessage(this.cachedBytes);
clone.setId(newId);
return clone;
}
}