The previous query was taking almost half a second to run each time, whereas the new version runs 10-100x faster. This was the main bottleneck with block serialization and should therefore allow for much faster syncing once rolled out to the network. Tested several thousand blocks to ensure that the results returned by the new query match those returned by the old one.
A couple of classes were using the bitcoinj alternative, which is twice as slow. This mostly affected the API on port 12392, as byte arrays were automatically encoded as base58 strings via the Base58TypeAdapter / JAXB package-info.
This is probably more validation than is actually needed, but given that we use the same field for LTC and QORT receiving addresses in the database, it is best to be extra careful.
This returns serialized, base58 encoded data for the entire block. It is the same format as the data sent between nodes when synchronizing, with base58 encoding added so that it can be outputted cleanly in the API response.
This is the equivalent of the refund API but can be used by the seller to redeem LTC from a stuck transaction, by supplying the associated AT address, There are no lockTime requirements; it is redeemable as soon as the buyer has redeemed the QORT and sent the secret to the seller.
This is designed to be called by the buyer, and will force refund their P2SH transaction associated with the supplied AT. The tradebot responsible for this trade must be present in the user's db for this API access the necessary data. It must be called after lockTime has passed, which for LTC is currently 60 minutes from the time that the P2SH was funded. Trying to refund before this time will result in a FOREIGN_BLOCKCHAIN_TOO_SOON error.
This can currently be used by either the buyer or the seller, but it requires the seller's trade private key & receiving address to be specified, along with the buyer's secret. Currently hardcoded to LITECOIN but I will aim to make this generic as we start adding more coins.
This makes them more compatible with the output of the /crosschain/tradebot and /crosschain/trade/{ataddress} APIs which is likely where most people will be retrieving data from, rather than the database itself.
This is similar to the BTC equivalent, but removes secretB as an input parameter. It also signs and broadcasts the transaction, because the wallet isn't needed for this. These transactions have to be signed using the tradePrivateKey from the tradebot data rather than any of the wallet's keys.
There are two other LitecoinACCTv1 APIs still to implement, but I will leave these until they are needed.
This tightens up the decision making by adding two requirements:
1. The peer must return the same number of summaries to the ones requested.
2. The peer must return a summary that matches its latest reported signature.
This ensures we are always making sync decisions based on accurate data, and removes peers that are currently mid re-org. This is probably more validation than is actually necessary, but it's best to be really thorough here so it is as optimized as possible.
We have gone backwards and forwards on this one a lot recently, but now that stability has returned, it is best to tighten this up. Previously it was loosened to help reduce network load, but that is no longer a problem. With this stricter approach, it should prevent a node ending up in an incomplete state after syncing, which is the main cause of the shorter re-orgs we are seeing.
The existing HSQL export/import (PERFORM EXPORT SCRIPT and PERFORM IMPORT SCRIPT) have been replaced with a custom JSON import and export. Whilst this is less generic, it has some significant advantages:
- When exporting data, it is now able to combine the exported data with any data that already exists in the backup file. This prevents a backup after a bootstrap from overwriting data from before the bootstrap, and removes the need for all of the "archive" files that we currently create.
- Adds support for partial imports, and updates. Previously an import would fail if any of the data being imported already existed in the db. It will now add new rows and update existing ones.
- The format and contents of the exported trade bot data now matches the output of the /crosschain/tradebot API.
- Data is retrieved without the need for a database lock, and therefore the export process is much faster and less invasive. This should prevent the lockups and other problems seen when using the trade portal.
For now, there are a couple of trade-offs to using this new approach:
- The minting key import/export has been temporarily removed until there is more time to transition it to this new format.
- Existing .script backups can no longer be imported using versions higher than 1.5.1.
Both of these can be solved by temporarily running version 1.5.1, performing the necessary imports/exports, then returning to the latest version. Longer term the minting keys export/import will be reimplemented using the JSON format.
This controls whether to allow connections with peers below minPeerVersion.
If true, we won't sync with them but they can still sync with us, and will show in the peers list. This is the default, which allows older nodes to continue functioning, but prevents them from interfering with the sync behaviour of updated nodes.
If false, sync will be blocked both ways, and they will not appear in the peers list at all.
Initially set to 10 when used by the /crosschain/price/{blockchain} API, so that the price is based on the last 10 trades rather than every trade that has ever taken place.
Block.calcKeyDistance() cannot be called on some trimmed blocks, because the minter level is unable to be inferred in some cases. This generally hasn't been an issue, but the new Block.logDebugInfo() method is invoking it for all blocks. For now I am adding defensiveness to the debug method, but longer term we might want to add defensiveness to Block.calcKeyDistance() itself, if we ever encounter this issue again. I will leave it alone for now, to reduce risk.
Block.calcKeyDistance() cannot be called on some trimmed blocks, because the minter level is unable to be inferred in some cases. This generally hasn't been an issue, but the new Block.logDebugInfo() method is invoking it for all blocks. For now I am adding defensiveness to the debug method, but longer term we might want to add defensiveness to Block.calcKeyDistance() itself, if we ever encounter this issue again. I will leave it alone for now, to reduce risk.
# Conflicts:
# pom.xml
# src/main/java/org/qortal/controller/Synchronizer.java
Removed all fast sync code from Controller.syncToPeerChain(), so it is now the same as `master`.
Again, this wouldn't have affected anything in 1.5.0 or before, but it will become more significant if we switch to same-length chain weight comparisons.
This gives an insight into the contents of each chain when doing a re-org. To enable this logging, add the following to log4j2.properties:
logger.block.name = org.qortal.block.Block
logger.block.level = debug
This solves a common problem that is mostly seen when starting a node that has been switched off for some time, or when starting from a bootstrap. In these cases, it can be difficult get synced to the latest if you are starting from a small fork. This is because it required that the node was brought up to date via a single peer, and there wasn't much room for error if it failed to retrieve a block a couple of times. This generally caused the blocks to be thrown away and it would try the same process over and over.
The solution is to apply new blocks if the most recently received block is newer than our current latest block. This gets the node back on to the main fork where it can then sync using the regular applyNewBlocks() method.
If a peer fails to reply with all requested blocks, we will now only apply the blocks we have received so far if at least one of them is recent. This should prevent or greatly reduce the scenario where our chain is taken from a recent to an outdated state due to only partially syncing with a peer. It is best to keep our chain "recent" if possible, as this ensures that the peer selection code always runs, and therefore avoids unnecessarily syncing to a random peer on an inferior chain.
Now that we are spending a lot of time to carefully select a peer to sync with, it makes sense to retry a couple more times before giving up and starting the peer selection process all over again.
In these comparisons it's easy to incorrectly identify a bad chain, as we aren't comparing the same number of blocks. It's quite common for one peer to fail to return all blocks and be marked as an inferior chain, yet we have other "good" peers on that exact same chain. In those cases we would have stopped talking to the good peers again until they received another block.
Instead of complicating the logic and keeping track of the various good chain tip signatures, it is simpler to just remove the inferior peers from this round of syncing, and re-test them in the next round, in case they are in fact superior or equal.
The iterator was removing the peer from the "peersSharingCommonBlock" array, when it should have been removing it from the "peers" array. The result was that the bad peer would end up in the final list of good peers, and we could then sync with it when we shouldn't have.
The existing system was unable to resume without manual intervention if it stalled for more than 7.5 minutes. After this time, no peers would have "recent' blocks, which are prerequisites for synchronization and minting.
This new code monitors for such a situation, and enters "recovery mode" if there are no peers with recent blocks for at least 10 minutes. It also requires that there is at least one connected peer, to reduce false positives due to bad network connectivity.
Once in recovery mode, peers with no recent blocks are added back into the pool of available peers to sync with, and restrictions on minting are lifted. This should allow for peers to collaborate to bring the chain back to a "recent" block height. Once we have a peer with a recent block, the node will exit recovery mode and sync as normal.
Previously, lifting minting restrictions could have increased the risk of extra forks, however it is much less risky now that nodes no longer mint multiple blocks in a row.
In all cases, minBlockchainPeers is used, so a minimum number of connected peers is required for syncing and minting in recovery mode, too.
This could drastically reduce the number of forks being created. Currently, if a node is having problems syncing, it will continue adding to its own fork, which adds confusion to the network. With this new idea, the node would be prevented from adding to its own chain and is instead forced to wait until it has retrieved the next block from the network.
We will need to test this on the testnet very carefully. My worry is that, because all minters submit blocks, it could create a situation where the first block is submitted by everyone, and the second block is submitted by no-one, until a different candidate for the first block has been obtained from a peer. This may not be a problem at all, and could actually improve stability in a huge way, but at the same time it has the potential to introduce serious network problems if we are not careful.
It now has a new parameter - keepArchivedCopy - which when set to true will cause it to rename an existing TradeBotStates.script to TradeBotStates-archive-<timestamp>.script before creating a new backup. This should avoid keys being lost if a new backup is taken after replacing the db.
In a future version we can improve this in such a way that it combines existing and new backups into a single file. This is just a "quick fix" to increase the chances of keys being recoverable after accidentally bootstrapping without a backup.
In version 1.4.6, we would still sync with a peer even if we only received a partial number of the requested blocks/summaries. This could create a new problem, because the BlockMinter would often try and make up the difference by minting a new fork of up to 5 blocks in quick succession. This could have added to network confusion.
Longer term we may want to adjust the BlockMinter code to prevent this from taking place altogether, but in the short term I will revert this change from 1.4.6 until we have a better way.
Added a new step, which attempts to filter out peers that are on inferior chains, by comparing them against each other and our chain. The basic logic is as follows:
1. Take the list of peers that we'd previously have chosen from randomly.
2. Figure out our common block with each of those peers (if its within 240 blocks), using cached data if possible.
3. Remove peers with no common block.
4. Find the earliest common block, and compare all peers with that common block against each other (and against our chain) using the chain weight method. This involves fetching (up to 200) summaries from each peer after the common block, and (up to 200) summaries from our own chain after the common block.
5. If our chain was superior, remove all peers with this common block, then move up to the next common block (in ascending order), and repeat from step 4.
6. If our chain was inferior, remove any peers with lower weights, then remove all peers with higher common blocks.
7. We end up with a reduced list of peers, that should in theory be on superior or equal chains to us. Pick one of those at random and sync to it.
This is a high risk feature - we don't yet know the impact on network load. Nor do we know whether it will cause issues due to prioritising longer chains, since the chain weight algorithm currently prefers them.
Main differences / improvements:
- Only request a single batch of signatures upfront, instead of the entire peer's chain. There is no point in requesting them all, as the later ones may not be valid by the time we have finished requesting all the blocks before them.
- If we fail to fetch a block, clear any queued signatures that are in memory and re-fetch signatures after the last block received. This allows us to cope with peers that re-org whilst we are syncing with them.
- If we can't find any more block signatures, or the peer fails to respond to a block, apply our progress anyway. This should reduce wasted work and network congestion, and helps cope with larger peer re-orgs.
- The retry mechanism remains in place, but instead of fetching the same incorrect block over and over, it will attempt to locate a new block signature each time, as described above. To help reduce code complexity, block signature requests are no longer retried.
Currently set to 1, as serialization of the BlocksMessage data on mainnet is too slow to use this for any significant number of blocks right now. Hopefully we can find a way to optimise this process, which will allow us to use this for multiple block syncing.
Until then, sticking with single blocks should still be enough to help solve the network congestion and re-orgs we are seeing, because it gives us the ability to request the next block based on the previous block's signature, which was unavailable using GET_BLOCK. This removes the requirement to fetch all block signatures upfront, and therefore it shouldn't matter if the peer does a partial re-org whilst a node is syncing to it.
If fast syncing is enabled in the settings (by default it's disabled) AND the peer is running at least v1.5.0, then it will route through to a new method which fetches multiple blocks at a time, in a very similar way to Synchronizer.syncToPeerChain().
If fast syncing is disabled in the settings, or we are communicating with a peer on an older version, it will continue to sync blocks individually.
When communicating with a peer that is running at least version 1.5.0, it will now sync multiple blocks at once in Synchronizer.syncToPeerChain(). This allows us to bypass the single block syncing (and retry mechanism), which has proven to be unviable when there are multiple active forks with several blocks in each chain.
For peers below v1.5.0, the logic should remain unaffected and it will continue to sync blocks individually.