This is probably our number one reliability issue at the moment, and has been a problem for a very long time.
The existing CHECKPOINT_LOCK would prevent new connections being created when we are checkpointing or about to checkpoint. However, in many cases we obtain the db connection early on and then don't perform any queries until later. An example would be in synchronization, where the connection is obtained at the start of the process and then retained throughout the sync round. My suspicion is that we were encountering this series of events:
1. Open connection to database
2. Call maybeCheckpoint() and confirm there are no active transactions
3. An existing connection starts a new transaction
4. Checkpointing is performed, but deadlocks due to the in-progress transaction
This potential fix includes preparedStatement.execute() in the CHECKPOINT_LOCK, to block any new transactions being started when we are locked for checkpointing. It is fairly high risk so we need to build some confidence in this before releasing it.
This is probably the most efficient way to process the data on the fly, but it's still not very scalable. A better approach would be to pre-process the HTML when building the file structure, and then serve them completely statically (i.e. using a standard webserver rather than via application memory). But it makes sense to keep it this way for development and maybe early beta testing.
Rename to zh_SC for better distinguish between zh_SC (Simple Chinese)and zh_TC(Traditional Chinese)
Rephrase some of the words for better understanding.
This can be used to preview a site before signing a transaction and announcing it to the network. The response will need reworking to return JSON (along with most of the other new APIs)
This fixes an NPE when trying to send a file that doesn't exist. It also removes the caching, which we can add again later if it turns out to be needed.
Now that we aren't disconnecting mid sync, we can get away with more frequent disconnections. This brings the average connection length to around 9 mins.
Connection limits are defined in settings (denominated in seconds):
"minPeerConnectionTime": 120,
"maxPeerConnectionTime": 3600
Peers will disconnect after a randomly chosen amount of time between the min and the max. The default range is 2 minutes to 1 hour, as above.
This encourages nodes to connect to a wider range of peers across the course of each day, rather than staying connected to an "island" of peers for an extended period of time. Hopefully this will reduce the amount of parallel chains that can form due to permanently connected clusters of peers.
We may find that we need to reduce the defaults to get optimal results, however it is best to do this incrementally, with the option for reducing further via each node's settings. Being too aggressive here could cause some of the earlier problems (e.g. 20% missing blocks minted) to reappear. We can re-evaluate this in the next version. Note that if defaults are reduced significantly, we may need to add code to prevent this from happening mid-sync. With higher defaults, this is less of an issue.
Thanks to @szisti for supplying some base code for this commit, and also to @CWDSYSTEMS for diagnosing the original problem.
This indicates the size of the re-org/rollback that was required in order to perform this sync operation. It is only included if it's greater than 0 blocks.
This deletes a file referenced by a user supplied SHA256 digest string (which we will use as the file's "ID" in the Qortal data system). In the future this could be extended to delete all associated chunks, but first we need to build out the data chain so we have a way to look up chunks associated with a file hash.
When sending or requesting more than 1000 online accounts, peers would be disconnected with an EOF or connection reset error due to an intentional null response. This response has been removed and it will instead now only send the first 1000 accounts, which prevents the disconnections from occurring.
In theory, these accounts should be in a different order on each node, so the 1000 limit should still result in a fairly even propagation of accounts. However, we may want to consider increasing this limit, to maximise the propagation speed.
Thanks to szisti for tracking this one down.
This loops through all sell orders and attempts to redeem the LTC from each one. It will return true if at least one was redeemed, or false if none are available to be redeemed. Details are logged to the log.txt file rather than returned in the API response.
The previous query was taking almost half a second to run each time, whereas the new version runs 10-100x faster. This was the main bottleneck with block serialization and should therefore allow for much faster syncing once rolled out to the network. Tested several thousand blocks to ensure that the results returned by the new query match those returned by the old one.
A couple of classes were using the bitcoinj alternative, which is twice as slow. This mostly affected the API on port 12392, as byte arrays were automatically encoded as base58 strings via the Base58TypeAdapter / JAXB package-info.
This is probably more validation than is actually needed, but given that we use the same field for LTC and QORT receiving addresses in the database, it is best to be extra careful.
This returns serialized, base58 encoded data for the entire block. It is the same format as the data sent between nodes when synchronizing, with base58 encoding added so that it can be outputted cleanly in the API response.
This is the equivalent of the refund API but can be used by the seller to redeem LTC from a stuck transaction, by supplying the associated AT address, There are no lockTime requirements; it is redeemable as soon as the buyer has redeemed the QORT and sent the secret to the seller.
This is designed to be called by the buyer, and will force refund their P2SH transaction associated with the supplied AT. The tradebot responsible for this trade must be present in the user's db for this API access the necessary data. It must be called after lockTime has passed, which for LTC is currently 60 minutes from the time that the P2SH was funded. Trying to refund before this time will result in a FOREIGN_BLOCKCHAIN_TOO_SOON error.
This can currently be used by either the buyer or the seller, but it requires the seller's trade private key & receiving address to be specified, along with the buyer's secret. Currently hardcoded to LITECOIN but I will aim to make this generic as we start adding more coins.
This makes them more compatible with the output of the /crosschain/tradebot and /crosschain/trade/{ataddress} APIs which is likely where most people will be retrieving data from, rather than the database itself.
This is similar to the BTC equivalent, but removes secretB as an input parameter. It also signs and broadcasts the transaction, because the wallet isn't needed for this. These transactions have to be signed using the tradePrivateKey from the tradebot data rather than any of the wallet's keys.
There are two other LitecoinACCTv1 APIs still to implement, but I will leave these until they are needed.
This tightens up the decision making by adding two requirements:
1. The peer must return the same number of summaries to the ones requested.
2. The peer must return a summary that matches its latest reported signature.
This ensures we are always making sync decisions based on accurate data, and removes peers that are currently mid re-org. This is probably more validation than is actually necessary, but it's best to be really thorough here so it is as optimized as possible.
We have gone backwards and forwards on this one a lot recently, but now that stability has returned, it is best to tighten this up. Previously it was loosened to help reduce network load, but that is no longer a problem. With this stricter approach, it should prevent a node ending up in an incomplete state after syncing, which is the main cause of the shorter re-orgs we are seeing.
The existing HSQL export/import (PERFORM EXPORT SCRIPT and PERFORM IMPORT SCRIPT) have been replaced with a custom JSON import and export. Whilst this is less generic, it has some significant advantages:
- When exporting data, it is now able to combine the exported data with any data that already exists in the backup file. This prevents a backup after a bootstrap from overwriting data from before the bootstrap, and removes the need for all of the "archive" files that we currently create.
- Adds support for partial imports, and updates. Previously an import would fail if any of the data being imported already existed in the db. It will now add new rows and update existing ones.
- The format and contents of the exported trade bot data now matches the output of the /crosschain/tradebot API.
- Data is retrieved without the need for a database lock, and therefore the export process is much faster and less invasive. This should prevent the lockups and other problems seen when using the trade portal.
For now, there are a couple of trade-offs to using this new approach:
- The minting key import/export has been temporarily removed until there is more time to transition it to this new format.
- Existing .script backups can no longer be imported using versions higher than 1.5.1.
Both of these can be solved by temporarily running version 1.5.1, performing the necessary imports/exports, then returning to the latest version. Longer term the minting keys export/import will be reimplemented using the JSON format.
This controls whether to allow connections with peers below minPeerVersion.
If true, we won't sync with them but they can still sync with us, and will show in the peers list. This is the default, which allows older nodes to continue functioning, but prevents them from interfering with the sync behaviour of updated nodes.
If false, sync will be blocked both ways, and they will not appear in the peers list at all.
Initially set to 10 when used by the /crosschain/price/{blockchain} API, so that the price is based on the last 10 trades rather than every trade that has ever taken place.
Block.calcKeyDistance() cannot be called on some trimmed blocks, because the minter level is unable to be inferred in some cases. This generally hasn't been an issue, but the new Block.logDebugInfo() method is invoking it for all blocks. For now I am adding defensiveness to the debug method, but longer term we might want to add defensiveness to Block.calcKeyDistance() itself, if we ever encounter this issue again. I will leave it alone for now, to reduce risk.
Block.calcKeyDistance() cannot be called on some trimmed blocks, because the minter level is unable to be inferred in some cases. This generally hasn't been an issue, but the new Block.logDebugInfo() method is invoking it for all blocks. For now I am adding defensiveness to the debug method, but longer term we might want to add defensiveness to Block.calcKeyDistance() itself, if we ever encounter this issue again. I will leave it alone for now, to reduce risk.
# Conflicts:
# pom.xml
# src/main/java/org/qortal/controller/Synchronizer.java
Removed all fast sync code from Controller.syncToPeerChain(), so it is now the same as `master`.