Whilst not ideal, this is necessary to prevent the chain from getting stuck on future blocks due to duplicate name registrations. See Block535658.java for full details on this problem - this is simply a "catch-all" implementation of that class in order to futureproof this fix.
There is still a database inconsistency to be solved, as some nodes are failing to add a registered name to their Names table the first time around, but this will take some time. Once fixed, this commit could potentially be reverted.
Also added unit tests for both scenarios (same and different creator).
TLDR: this allows all past and future invalid blocks caused by NAME_ALREADY_REGISTERED (by the same creator) to now be valid.
This was accidentally missed out of the original code. Some pre-updated nodes on the network will be missing this index, but we can use the upcoming "auto-bootstrap" feature to get those back.
A PUT creates a new base layer meaning anything before that point is no longer needed. These files are now deleted automatically by the cleanup manager. This involved relocating a lot of the cleanup manager methods into a shared utility, so that they could be used by the arbitrary data manager. Without this, they would be fetched from the network again as soon as they were deleted.
This deletes redundant copies of data, and also converts complete files to chunks where needed. The idea being that nodes only hold chunks, since they currently are much more likely to serve a chunk to another peer than they are to serve a complete file.
It doesn't yet cleanup files that are unassociated with transactions, nor does it delete anything from the _temp folder.
This improves scalability but isn't sufficient for a long term solution. TODO: It probably makes sense to add an additional query for recent transactions only, so that they are fetched quickly.
This is needed because we want to allow brand new accounts to publish data without a fee. A similar approach to CrossChainResource.buildAtMessage(). We already require PoW on all arbitrary transactions, so no additional logic beyond this should be needed.
This adds the loadAsynchronously() method to ArbitraryDataReader, in addition to the existing loadSynchronously() method.
When requesting a website in a browser, previously the building of the resource's layers would be done synchronously in the API handler. This understandably caused many issues, so the building is now done asynchronously by a dedicated thread. A loading screen is shown in its place which auto refreshes every second until the build has completed.
It's possible that this concept will struggle in the real world if operating systems, virus scanners, etc start interfering with our file stucture. Right now it is using a zero tolerance approach when checking the validity of each layer. We may choose to loosen this slightly if we encounter problems, e.g. by excluding hidden files. But for now it is best to be as strict as possible.
This decides whether to build a new state or use an existing cached state when serving a data resource. It will cache a built resource until a new transaction (i.e. layer) arrives. This drastically reduces load, and still allows for almost instant propagation of new layers.
This is used to store the transaction signature and build timestamp for each built data resource. It involved a refactor of the ArbitraryDataMetadata class to introduce a subclass for each file ("patch" and "cache"). This allows more files to be easily added later.
This defends against a missing or out-of-order transaction. If this ever fails validation, we may need to rethink the way we are requesting transactions. But in theory this shouldn't happen, given that the "last reference" field of a transaction ensures that out-of-order transactions are invalid already.
This bug was introduced now that the temp directory is contained within the data directory. Without this, it would leave it in the temp folder and then fail at a later stage.
This ensures that the temporary files are being kept with the rest of the data, rather than somewhere inappropriate such as on flash storage. It also allows the user to locate them somewhere else, such as on a dedicated drive.
This adds support for the PATCH method in addition to the existing PUT method.
Currently, a patch includes only files that have been added or modified, as well as placeholder files to indicate those that have been removed.
This is not production ready, as I am hoping to create patches on a more granular level - i.e. just the modified bytes of each file. It would also make sense to track deletions using a metadata/manifest file in a hidden folder.
It also adds early support of accessing files using a name rather than a signature or hash.
Now only skipping the HTLC redemption if the AT is finished and the balance has been redeemed by the buyer. This allows HTLCs to be refunded for ATs that have been refunded or cancelled.
Previously, if an error was returned from an Electrum server (such as "server busy") it would throw a NetworkException that would be caught outside of the server loop and cause the entire request to fail.
Instead of throwing an exception, I am now logging the error and returning null, in the same way we do for IOException and NoSuchElementException further up in the same method.
This allows the caller - most likely connectedRpc() - to move on to the next server in the list and try again.
This should fix an issue seen where a "server busy" response from a single server was essentially breaking our implementation, as we would give up altogether instead of trying another server.
This is a workaround for an UnsupportedOperationException thrown when using X2Go, due to PERPIXEL_TRANSLUCENT translucency being unsupported in splashDialog.setBackground(). We could choose to use a different version of the splash screen with an opaque background in these cases, but it is low priority.
Updated the "localeLang" files with new keys and removed old unused keys for English, German, Dutch, Italian, Finnish, Hungarian, Russian and Chinese translations
These are the same as the /lists/blacklist/address/{address} endpoints but allow a JSON array of addresses to be specified in the request body. They currently return true if
The ResourceList class creates or updates a list for the purpose of tracking resources on the Qortal network. This can be used for local blocking, or even for curating and sharing content lists. Lists are backed off to JSON files (in the lists folder) to ease sharing between nodes and users.
This first implementation allows access to an address blacklist only, but has been written in such a way that other lists can be easily added. This might be needed in the future, e.g. to blacklist a group, a poll, or some hosted data. It could also be used by community members to curate lists of favourite or problematic content, which could then be shared or even subscribed to on the chain by other users.
The inputs and outputs contain a simpler version than the ones in the raw transaction, consisting of `address`, `amount`, and `addressInWallet`. The latter of the three is to know whether the address is one that is derived from the supplied xpub master public key.
The previous criteria was to stop searching for more leaf keys as soon as we found a batch of keys with no transactions, but it seems that there are occasions when subsequent batches do actually contain transactions. The solution/workaround is to require 5 consecutive empty batches before giving up. There may be ways to improve this further by copying approaches from other BIP32 implementations, but this is a quick fix that should solve the problem for now.
This involved a small refactor of the ACCT code to expose findSecretA() in a more generic way. Bitcoin is disabled for refunding and redeeming as it uses a legacy approach that we no longer support. The {blockchain} URL parameter has also been removed from the redeem and refund APIs, because it can be obtained from the ACCT via the code hash in the AT.
The "dust" threshold is around 1 DOGE - meaning orders below this size cannot be refunded or redeemed. The simplest solution is to prevent orders of this size being placed to begin with.
This ensures that nodes are storing unreadable files, outside of the context of Qortal. For public data, the decryption keys themselves are on-chain, included in the "secret" field of arbitrary transactions. When we introduce the concept of private data, we can simply exclude the secret key from the transaction so that only the owner can decrypt it.
When encrypting the file, I have added the 16 byte initialization vector as a prefix to the cyphertext, and it is then automatically extracted back out when decrypting. This gives us the option to encrypt more than one file with the same key, if we ever need it. Right now, we are using a unique key per file, so it's not actually needed, but it's good to have support.
Adds "name", "method", "secret", and "compression" properties. These are the foundations needed in order to handle updates, encryption, and name registration. Compression has been added so that we have the option of switching to different algorithms whilst maintaining support for existing transactions.
These combine some Qora services (SERVICE_NAME_STORAGE, SERVICE_BLOG_POST, and SERVICE_BLOG_COMMENT) with existing Qortal services (SERVICE_AUTO_UPDATE), and some new additions (SERVICE_ARBITRARY_DATA, SERVICE_WEBSITE, and SERVICE_GIT_REPOSITORY)
Previously we would ask all connected peers for the file itself, but this caused the network to be swamped when multiple peers responded with the same file.
This new approach instead asks all connected peers to send back a list of hashes for all files they have relating to a transaction signature. The requesting node then uses these lists to make separate requests for each missing file.
This is a quick solution to rebuild directory structures with missing files. This whole area of the code needs some reworking, as serving the site from a temporary folder is not a robust long term solution.
Domain names can be mapped to arbitrary transaction signatures via the node's settings, and then served over port 80 or 443. This allows Qortal hosted sites to be accessible via a traditional domain name.
Example configuration to map two domains:
"domainMapServiceEnabled": true,
"domainMapServicePort": 80,
"domainMap": [
{
"domain": "example.com",
"signature": "tEsw4kUn4ZJfPha7CotUL6BHkFPs79BwKXdY6yrf28YTpDn4KSY6ZKX3nwZCkqDF9RyXbgaVnB1rTEExY3h9CQA"
},
{
"domain": "demo.qortal.org",
"signature": "ZdBWWPMhR7AZwSu5xZm89mQEacekqkNfAimSCqFP6rQGKaGnXR9G4SWYpY5awFGfhmNBWzvRnXkWZKCsj6EMgc8"
}
]
Each domain needs to be pointed to the Qortal data node via an A record or CNAME. You can add redundant nodes by adding multiple A records for the same domain (this is known as DNS Failover).
Note that running a webserver on port 80 (or anything less than 1024) requires running the data node as root. There are workarounds to this, such as disabling privileged ports, or using a reverse proxy. I will investigate this more as time goes on, but this is okay for a proof of concept.
It's now capable of syncing chunks as well as complete files. This isn't production ready as it currently requests/receives the same file from multiple peers at once, which slows down the sync and wastes lots of bandwidth. Ideally we would find an appropriate peer first and then sync the file from them.
This introduces the hash58 property, which stores the base58 hash of the file passed in at initialization. It leaves digest() and digest58() for when we need to compute a new hash from the file itself.
Until now it wasn't possible to set up a chain with zero transaction fees due to a hardcoded zero check in Payment.isValid(), and a divide by zero error in Transaction.hasMinimumFeePerByte()
- Adds support for files up 500MiB per transaction (at 2MiB chunk sizes). Previously, the max data size was 4000 bytes.
- Adds a nonce, giving us the option to remove the transaction fees altogether on the data chain.
These features become enabled in version 5 of arbitrary transactions.
This is probably our number one reliability issue at the moment, and has been a problem for a very long time.
The existing CHECKPOINT_LOCK would prevent new connections being created when we are checkpointing or about to checkpoint. However, in many cases we obtain the db connection early on and then don't perform any queries until later. An example would be in synchronization, where the connection is obtained at the start of the process and then retained throughout the sync round. My suspicion is that we were encountering this series of events:
1. Open connection to database
2. Call maybeCheckpoint() and confirm there are no active transactions
3. An existing connection starts a new transaction
4. Checkpointing is performed, but deadlocks due to the in-progress transaction
This potential fix includes preparedStatement.execute() in the CHECKPOINT_LOCK, to block any new transactions being started when we are locked for checkpointing. It is fairly high risk so we need to build some confidence in this before releasing it.
This is probably the most efficient way to process the data on the fly, but it's still not very scalable. A better approach would be to pre-process the HTML when building the file structure, and then serve them completely statically (i.e. using a standard webserver rather than via application memory). But it makes sense to keep it this way for development and maybe early beta testing.
Rename to zh_SC for better distinguish between zh_SC (Simple Chinese)and zh_TC(Traditional Chinese)
Rephrase some of the words for better understanding.
This can be used to preview a site before signing a transaction and announcing it to the network. The response will need reworking to return JSON (along with most of the other new APIs)
This fixes an NPE when trying to send a file that doesn't exist. It also removes the caching, which we can add again later if it turns out to be needed.
Now that we aren't disconnecting mid sync, we can get away with more frequent disconnections. This brings the average connection length to around 9 mins.
Connection limits are defined in settings (denominated in seconds):
"minPeerConnectionTime": 120,
"maxPeerConnectionTime": 3600
Peers will disconnect after a randomly chosen amount of time between the min and the max. The default range is 2 minutes to 1 hour, as above.
This encourages nodes to connect to a wider range of peers across the course of each day, rather than staying connected to an "island" of peers for an extended period of time. Hopefully this will reduce the amount of parallel chains that can form due to permanently connected clusters of peers.
We may find that we need to reduce the defaults to get optimal results, however it is best to do this incrementally, with the option for reducing further via each node's settings. Being too aggressive here could cause some of the earlier problems (e.g. 20% missing blocks minted) to reappear. We can re-evaluate this in the next version. Note that if defaults are reduced significantly, we may need to add code to prevent this from happening mid-sync. With higher defaults, this is less of an issue.
Thanks to @szisti for supplying some base code for this commit, and also to @CWDSYSTEMS for diagnosing the original problem.
This indicates the size of the re-org/rollback that was required in order to perform this sync operation. It is only included if it's greater than 0 blocks.
This deletes a file referenced by a user supplied SHA256 digest string (which we will use as the file's "ID" in the Qortal data system). In the future this could be extended to delete all associated chunks, but first we need to build out the data chain so we have a way to look up chunks associated with a file hash.
When sending or requesting more than 1000 online accounts, peers would be disconnected with an EOF or connection reset error due to an intentional null response. This response has been removed and it will instead now only send the first 1000 accounts, which prevents the disconnections from occurring.
In theory, these accounts should be in a different order on each node, so the 1000 limit should still result in a fairly even propagation of accounts. However, we may want to consider increasing this limit, to maximise the propagation speed.
Thanks to szisti for tracking this one down.
This loops through all sell orders and attempts to redeem the LTC from each one. It will return true if at least one was redeemed, or false if none are available to be redeemed. Details are logged to the log.txt file rather than returned in the API response.
The previous query was taking almost half a second to run each time, whereas the new version runs 10-100x faster. This was the main bottleneck with block serialization and should therefore allow for much faster syncing once rolled out to the network. Tested several thousand blocks to ensure that the results returned by the new query match those returned by the old one.
A couple of classes were using the bitcoinj alternative, which is twice as slow. This mostly affected the API on port 12392, as byte arrays were automatically encoded as base58 strings via the Base58TypeAdapter / JAXB package-info.
This is probably more validation than is actually needed, but given that we use the same field for LTC and QORT receiving addresses in the database, it is best to be extra careful.
This returns serialized, base58 encoded data for the entire block. It is the same format as the data sent between nodes when synchronizing, with base58 encoding added so that it can be outputted cleanly in the API response.
This is the equivalent of the refund API but can be used by the seller to redeem LTC from a stuck transaction, by supplying the associated AT address, There are no lockTime requirements; it is redeemable as soon as the buyer has redeemed the QORT and sent the secret to the seller.
This is designed to be called by the buyer, and will force refund their P2SH transaction associated with the supplied AT. The tradebot responsible for this trade must be present in the user's db for this API access the necessary data. It must be called after lockTime has passed, which for LTC is currently 60 minutes from the time that the P2SH was funded. Trying to refund before this time will result in a FOREIGN_BLOCKCHAIN_TOO_SOON error.
This can currently be used by either the buyer or the seller, but it requires the seller's trade private key & receiving address to be specified, along with the buyer's secret. Currently hardcoded to LITECOIN but I will aim to make this generic as we start adding more coins.
This makes them more compatible with the output of the /crosschain/tradebot and /crosschain/trade/{ataddress} APIs which is likely where most people will be retrieving data from, rather than the database itself.
This is similar to the BTC equivalent, but removes secretB as an input parameter. It also signs and broadcasts the transaction, because the wallet isn't needed for this. These transactions have to be signed using the tradePrivateKey from the tradebot data rather than any of the wallet's keys.
There are two other LitecoinACCTv1 APIs still to implement, but I will leave these until they are needed.
This tightens up the decision making by adding two requirements:
1. The peer must return the same number of summaries to the ones requested.
2. The peer must return a summary that matches its latest reported signature.
This ensures we are always making sync decisions based on accurate data, and removes peers that are currently mid re-org. This is probably more validation than is actually necessary, but it's best to be really thorough here so it is as optimized as possible.
We have gone backwards and forwards on this one a lot recently, but now that stability has returned, it is best to tighten this up. Previously it was loosened to help reduce network load, but that is no longer a problem. With this stricter approach, it should prevent a node ending up in an incomplete state after syncing, which is the main cause of the shorter re-orgs we are seeing.
The existing HSQL export/import (PERFORM EXPORT SCRIPT and PERFORM IMPORT SCRIPT) have been replaced with a custom JSON import and export. Whilst this is less generic, it has some significant advantages:
- When exporting data, it is now able to combine the exported data with any data that already exists in the backup file. This prevents a backup after a bootstrap from overwriting data from before the bootstrap, and removes the need for all of the "archive" files that we currently create.
- Adds support for partial imports, and updates. Previously an import would fail if any of the data being imported already existed in the db. It will now add new rows and update existing ones.
- The format and contents of the exported trade bot data now matches the output of the /crosschain/tradebot API.
- Data is retrieved without the need for a database lock, and therefore the export process is much faster and less invasive. This should prevent the lockups and other problems seen when using the trade portal.
For now, there are a couple of trade-offs to using this new approach:
- The minting key import/export has been temporarily removed until there is more time to transition it to this new format.
- Existing .script backups can no longer be imported using versions higher than 1.5.1.
Both of these can be solved by temporarily running version 1.5.1, performing the necessary imports/exports, then returning to the latest version. Longer term the minting keys export/import will be reimplemented using the JSON format.
This controls whether to allow connections with peers below minPeerVersion.
If true, we won't sync with them but they can still sync with us, and will show in the peers list. This is the default, which allows older nodes to continue functioning, but prevents them from interfering with the sync behaviour of updated nodes.
If false, sync will be blocked both ways, and they will not appear in the peers list at all.
Initially set to 10 when used by the /crosschain/price/{blockchain} API, so that the price is based on the last 10 trades rather than every trade that has ever taken place.
Block.calcKeyDistance() cannot be called on some trimmed blocks, because the minter level is unable to be inferred in some cases. This generally hasn't been an issue, but the new Block.logDebugInfo() method is invoking it for all blocks. For now I am adding defensiveness to the debug method, but longer term we might want to add defensiveness to Block.calcKeyDistance() itself, if we ever encounter this issue again. I will leave it alone for now, to reduce risk.
Block.calcKeyDistance() cannot be called on some trimmed blocks, because the minter level is unable to be inferred in some cases. This generally hasn't been an issue, but the new Block.logDebugInfo() method is invoking it for all blocks. For now I am adding defensiveness to the debug method, but longer term we might want to add defensiveness to Block.calcKeyDistance() itself, if we ever encounter this issue again. I will leave it alone for now, to reduce risk.
# Conflicts:
# pom.xml
# src/main/java/org/qortal/controller/Synchronizer.java
Removed all fast sync code from Controller.syncToPeerChain(), so it is now the same as `master`.
Again, this wouldn't have affected anything in 1.5.0 or before, but it will become more significant if we switch to same-length chain weight comparisons.
This gives an insight into the contents of each chain when doing a re-org. To enable this logging, add the following to log4j2.properties:
logger.block.name = org.qortal.block.Block
logger.block.level = debug
This solves a common problem that is mostly seen when starting a node that has been switched off for some time, or when starting from a bootstrap. In these cases, it can be difficult get synced to the latest if you are starting from a small fork. This is because it required that the node was brought up to date via a single peer, and there wasn't much room for error if it failed to retrieve a block a couple of times. This generally caused the blocks to be thrown away and it would try the same process over and over.
The solution is to apply new blocks if the most recently received block is newer than our current latest block. This gets the node back on to the main fork where it can then sync using the regular applyNewBlocks() method.
If a peer fails to reply with all requested blocks, we will now only apply the blocks we have received so far if at least one of them is recent. This should prevent or greatly reduce the scenario where our chain is taken from a recent to an outdated state due to only partially syncing with a peer. It is best to keep our chain "recent" if possible, as this ensures that the peer selection code always runs, and therefore avoids unnecessarily syncing to a random peer on an inferior chain.
Now that we are spending a lot of time to carefully select a peer to sync with, it makes sense to retry a couple more times before giving up and starting the peer selection process all over again.
In these comparisons it's easy to incorrectly identify a bad chain, as we aren't comparing the same number of blocks. It's quite common for one peer to fail to return all blocks and be marked as an inferior chain, yet we have other "good" peers on that exact same chain. In those cases we would have stopped talking to the good peers again until they received another block.
Instead of complicating the logic and keeping track of the various good chain tip signatures, it is simpler to just remove the inferior peers from this round of syncing, and re-test them in the next round, in case they are in fact superior or equal.
The iterator was removing the peer from the "peersSharingCommonBlock" array, when it should have been removing it from the "peers" array. The result was that the bad peer would end up in the final list of good peers, and we could then sync with it when we shouldn't have.
The existing system was unable to resume without manual intervention if it stalled for more than 7.5 minutes. After this time, no peers would have "recent' blocks, which are prerequisites for synchronization and minting.
This new code monitors for such a situation, and enters "recovery mode" if there are no peers with recent blocks for at least 10 minutes. It also requires that there is at least one connected peer, to reduce false positives due to bad network connectivity.
Once in recovery mode, peers with no recent blocks are added back into the pool of available peers to sync with, and restrictions on minting are lifted. This should allow for peers to collaborate to bring the chain back to a "recent" block height. Once we have a peer with a recent block, the node will exit recovery mode and sync as normal.
Previously, lifting minting restrictions could have increased the risk of extra forks, however it is much less risky now that nodes no longer mint multiple blocks in a row.
In all cases, minBlockchainPeers is used, so a minimum number of connected peers is required for syncing and minting in recovery mode, too.
This could drastically reduce the number of forks being created. Currently, if a node is having problems syncing, it will continue adding to its own fork, which adds confusion to the network. With this new idea, the node would be prevented from adding to its own chain and is instead forced to wait until it has retrieved the next block from the network.
We will need to test this on the testnet very carefully. My worry is that, because all minters submit blocks, it could create a situation where the first block is submitted by everyone, and the second block is submitted by no-one, until a different candidate for the first block has been obtained from a peer. This may not be a problem at all, and could actually improve stability in a huge way, but at the same time it has the potential to introduce serious network problems if we are not careful.
It now has a new parameter - keepArchivedCopy - which when set to true will cause it to rename an existing TradeBotStates.script to TradeBotStates-archive-<timestamp>.script before creating a new backup. This should avoid keys being lost if a new backup is taken after replacing the db.
In a future version we can improve this in such a way that it combines existing and new backups into a single file. This is just a "quick fix" to increase the chances of keys being recoverable after accidentally bootstrapping without a backup.
In version 1.4.6, we would still sync with a peer even if we only received a partial number of the requested blocks/summaries. This could create a new problem, because the BlockMinter would often try and make up the difference by minting a new fork of up to 5 blocks in quick succession. This could have added to network confusion.
Longer term we may want to adjust the BlockMinter code to prevent this from taking place altogether, but in the short term I will revert this change from 1.4.6 until we have a better way.
Added a new step, which attempts to filter out peers that are on inferior chains, by comparing them against each other and our chain. The basic logic is as follows:
1. Take the list of peers that we'd previously have chosen from randomly.
2. Figure out our common block with each of those peers (if its within 240 blocks), using cached data if possible.
3. Remove peers with no common block.
4. Find the earliest common block, and compare all peers with that common block against each other (and against our chain) using the chain weight method. This involves fetching (up to 200) summaries from each peer after the common block, and (up to 200) summaries from our own chain after the common block.
5. If our chain was superior, remove all peers with this common block, then move up to the next common block (in ascending order), and repeat from step 4.
6. If our chain was inferior, remove any peers with lower weights, then remove all peers with higher common blocks.
7. We end up with a reduced list of peers, that should in theory be on superior or equal chains to us. Pick one of those at random and sync to it.
This is a high risk feature - we don't yet know the impact on network load. Nor do we know whether it will cause issues due to prioritising longer chains, since the chain weight algorithm currently prefers them.
Main differences / improvements:
- Only request a single batch of signatures upfront, instead of the entire peer's chain. There is no point in requesting them all, as the later ones may not be valid by the time we have finished requesting all the blocks before them.
- If we fail to fetch a block, clear any queued signatures that are in memory and re-fetch signatures after the last block received. This allows us to cope with peers that re-org whilst we are syncing with them.
- If we can't find any more block signatures, or the peer fails to respond to a block, apply our progress anyway. This should reduce wasted work and network congestion, and helps cope with larger peer re-orgs.
- The retry mechanism remains in place, but instead of fetching the same incorrect block over and over, it will attempt to locate a new block signature each time, as described above. To help reduce code complexity, block signature requests are no longer retried.
Currently set to 1, as serialization of the BlocksMessage data on mainnet is too slow to use this for any significant number of blocks right now. Hopefully we can find a way to optimise this process, which will allow us to use this for multiple block syncing.
Until then, sticking with single blocks should still be enough to help solve the network congestion and re-orgs we are seeing, because it gives us the ability to request the next block based on the previous block's signature, which was unavailable using GET_BLOCK. This removes the requirement to fetch all block signatures upfront, and therefore it shouldn't matter if the peer does a partial re-org whilst a node is syncing to it.
If fast syncing is enabled in the settings (by default it's disabled) AND the peer is running at least v1.5.0, then it will route through to a new method which fetches multiple blocks at a time, in a very similar way to Synchronizer.syncToPeerChain().
If fast syncing is disabled in the settings, or we are communicating with a peer on an older version, it will continue to sync blocks individually.
When communicating with a peer that is running at least version 1.5.0, it will now sync multiple blocks at once in Synchronizer.syncToPeerChain(). This allows us to bypass the single block syncing (and retry mechanism), which has proven to be unviable when there are multiple active forks with several blocks in each chain.
For peers below v1.5.0, the logic should remain unaffected and it will continue to sync blocks individually.
Until now, we required a perfect success rate when syncing with a peer via Synchronizer.syncToPeerChain(). Blocks were requested individually, but the node would give up and lose all progress if a single request failed. In practice, this happened very regularly, and it was difficult to succeed when there were a large number of blocks (e.g. 20+) that needed to be requested.
This commit adds two retry mechanisms, causing each of the two request types (block sigs and blocks) to retry 3 times before giving up, potentially avoiding a lot of wasted work. The number of retries is configurable in the MAXIMUM_RETRIES constant, which we could move to settings at some point if this feature proves useful.
The original issue seemed to result in a few side effects:
1. Nodes would spend a large amount of time requesting blocks from peers, only to throw it all away afterwards. This potentially added to network congestion, as nodes were using unnecessary network time to unproductively serve peers.
2. A large number of sync attempts were failing, particularly when a fork emerged with a significant number of divergent blocks (20+). This issue reduced the ability for nodes to sync to the correct chain while they still had time to do so. With every block that passed, it became made it more and more difficult to switch to the correct chain. Eventually, the correct chain would become TOO_DIVERGENT at which point there is no way to automatically switch without manual intervention. I hope that this retry mechanism will increase the chances of nodes automatically moving onto the right chain quickly, avoiding the need for a user to intervene.
3. The POST /admin/forcesync API was unlikely to succeed when the peer's chain had started to diverge from the user's chain. This should increase the success rate.
Also included in this commit is a MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST constant. This limits the number of block sigs requested in each batch (default 200). Without this, we are unable to increase MAXIMUM_COMMON_DELTA because it can try and request thousands of block sigs at once, which unsurprisingly doesn't succeed.
This bug often prevented the correct amount of block signatures (and blocks) from being requested from a peer, when trying to sync to it.
It could result in quite serious consequences, as it would trigger orphaning back to the common block without first requesting all of the necessary blocks from the peer's chain. Rather than applying a complete copy of the peer's chain, it could orphan back to the common block and then only apply a few blocks beyond that, leaving the node in an unexpected state, potentially hundreds of blocks behind the peer's current height, which it then has to try and obtain from other peers.
When there are forks present, this could result in it hopping from chain to chain, each time being unable to fully synchronise with the peer. Given that we currently discard our chain if it is deemed that our latest block isn't "recent", it is very important that nodes are brought up to the latest block when synchronising with a peer, to avoid constantly triggering discards.
The severity of this bug increased when there was a large disparity between the peer's latest block and the common block height, and prevented us from being able to increase MAXIMUM_COMMON_DELTA.
These are simpler than the level 1+2 tests; they only test that the rewards are correct for each level post-shareBinFix. I don't think we need multiple instances of the pre-shareBinFix or block orphaning tests. There are a few subtle differences between each test, such as the online status of Bob, in order the make the tests slightly more comprehensive.
1. Assign 3 minters (one founder, one level 1, one level 2)
2. Mint a block after the shareBinFix, ensuring that level 1 and 2 are being rewarded evenly from the same share bin.
3. Orphan the block and ensure the rewards are reversed.
4. Orphan two more blocks, each time checking that the balances are being reduced in accordance with the pre-shareBinFix mapping.
As importing a transaction requires blockchain lock, all the network threads
can be used up blocking for that lock, especially if Synchronizer is active.
So we simply discard incoming TRANSACTION messages if we can't immediately
obtain the blockchain lock. Some other peer will probably attempt to
send the transaction soon again anyway.
Plus we swap transaction lists after connection handshake.
Post trigger, account levels will map correctly to share bins, subtracting 1 to account for the 0th element of the shareBinsByLevel array.
Pre-trigger, the legacy mapping will remain in effect.
Post trigger, this change will use all 128 bytes of previous block's signature when
calculating/validating next block's "minter" signature (itself the first 64 bytes of a block signature).
Prior to trigger, current behaviour is to only use first 64 bytes of previous block's
signature, which doesn't encompass transactions signature.
New block sig code should help reduce forking and help improve transactional
security.
Added "newBlockSigHeight" to blockchain.json but initially set to block 999999
pending decision on when to merge, auto-update, go-live, etc.
Symptoms of a CHECKPOINT-related DB deadlock:
On Controller thread:
"Controller" #20 prio=5 os_prio=31 cpu=1577665.56ms elapsed=17666.97s allocated=475G defined_classes=412 tid=0x00007fe99f97b000 nid=0x1644b waiting on condition [0x0000700009a21000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@14.0.2/Native Method)
- parking to wait for <0x0000000602f2a6f8> (a org.hsqldb.lib.CountUpDownLatch$Sync)
[...some more lines...]
[this next line is the best indicator: ]
at org.qortal.repository.hsqldb.HSQLDBRepository.checkpoint(HSQLDBRepository.java:385)
at org.qortal.repository.RepositoryManager.checkpoint(RepositoryManager.java:51)
at org.qortal.controller.Controller.run(Controller.java:544)
Other threads stuck at:
- parking to wait for <0x00000007ff09f0b0> (a org.hsqldb.lib.CountUpDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(java.base@14.0.2/LockSupport.java:211)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@14.0.2/AbstractQueuedSynchronizer.java:714)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@14.0.2/AbstractQueuedSynchronizer.java:1046)
at org.hsqldb.lib.CountUpDownLatch.await(Unknown Source)
at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
Could have affected:
Controller.deleteExpiredTransactions()
Network.getConnectablePeer()
Network.opportunisticMergePeers()
Network.prunePeers()
Symptoms:
2021-02-12 16:46:06 WARN NetworkProcessor:152 - [1556] exception while trying to produce task
java.lang.NullPointerException: null
at org.qortal.repository.hsqldb.HSQLDBRepository.<init>(HSQLDBRepository.java:92) ~[qortal.jar:1.4.1]
at org.qortal.repository.hsqldb.HSQLDBRepositoryFactory.tryRepository(HSQLDBRepositoryFactory.java:97) ~[qortal.jar:1.4.1]
at org.qortal.repository.RepositoryManager.tryRepository(RepositoryManager.java:33) ~[qortal.jar:1.4.1]
at org.qortal.network.Network.getConnectablePeer(Network.java:525) ~[qortal.jar:1.4.1]
This is a serious change as it affects many callers.
Controller.onNetworkTransactionMessager()
-- should be OK to block as it's only one EPC thread
TransactionsResource.processTransaction()
-- should be OK to block as it's only one Jetty thread
AND has its own 30s timeout wrapper anyway
Implementations of AcctTradeBot.progress()
e.g. BitcoinACCTv1TradeBot.progress() & LitecoinACCTv1TradeBot.progress()
TradeBot.updatePresence()
-- these are called via BlockMinter/Synchronizer
when blockchain lock is already held, so will are unaffected
AcctTradeBot.createTrade() and AcctTradeBot.startResponse()
-- these are called via API
and previously would only perform non-blocking blockchainLock.try()
but now perform blocking blockchainLock.lock()
thus preventing NO_BLOCKCHAIN_LOCK when creating/responding to trades
especially after a (wasted) MESSAGE PoW compute
but with potential downside that API response might be delayed
e.g. by a very slow sync round
Future work could look into removing the need for blockchain lock
when calling Transaction.importAsUnconfirmed().
This table was previously defined using the TEMPORARY keyword as the rows were used as a cached/index to speed up AT trimming SQL statements.
However, the table definition was lacking the "ON COMMIT PRESERVE ROWS" clause, and possibly the "GLOBAL" keyword, which caused the table contents to be emptied, immediately after being filled, when <tt>repository.saveChanges()</tt> was called (i.e. "COMMIT").
This caused latest AT states to be trimmed in error.
AtRepositoryTests.testGetLatestATStatePostTrimming also fixed so that it fails with the previous, broken code.
This table was previously defined using the TEMPORARY keyword as the rows were used as a cached/index to speed up AT trimming SQL statements.
However, the table definition was lacking the "ON COMMIT PRESERVE ROWS" clause, and possibly the "GLOBAL" keyword, which caused the table contents to be emptied, immediately after being filled, when <tt>repository.saveChanges()</tt> was called (i.e. "COMMIT").
This caused latest AT states to be trimmed in error.
AtRepositoryTests.testGetLatestATStatePostTrimming also fixed so that it fails with the previous, broken code.
According to Bitcoin source, CheckFinalTx() in validation.cpp ~line 223,
we need to make sure median blocktime has passed P2SH refund transaction's
nLockTime.
Previously we were erroneously checking that median blocktime was in the past.
This should fix issues where refunding P2SH results in a "non-final" error
from the ElectrumX server network.
PRESENCE transactions were previously validated using Bob's trade key (in address form).
But as PRESENCE transactions are already emitted by Alice, her trade key is also used
(if present in trade data by virtue of AT being locked to Alice).
Similarly, Alice's trade-bot won't even try to build PRESENCE transactions if her
trade key isn't publicly visible to other peers, i.e. after AT is locked to Alice.
This aids matching PRESENCE to corresponding trade offers for use in UI.
Also tighten up visibility of some fields in ChainChainOfferSummary and
PresenceInfo to private.
PresenceInfo.address should map to CrossChainOfferSummary.qortalCreatorTradeAddress
which is "AT creator's ephemeral trading key-pair represented as Qortal address"
Add caching of transactions fetched via ElectrumX to reduce network load and speed up API response.
Fix handling ElectrumX servers that don't want to supply verbose transaction JSON.
Hide lots of data in BitcoinyTransaction that isn't needed by current API users.
Previous fixes for "transaction rollback: serialization failure" when updating trim heights
in commits 16397852 and 58ed7205 had the right idea but were broken due to being synchronized
on different objects.
this.repository.trimHeightsLock would be a new Object() for each repository connection/session
and so not actually synchronize concurrent updates.
Implicit saveChanges()/COMMIT is still needed.
Fix is to use a repository-wide object for synchronization - in this case the repositoryFactory
object as held by RepositoryManager.
Added test to cover.
Also reduced DB trim height read to one call at start of thread for both trimming threads.
Under certain conditions, e.g. non-existent database files, the repository would be created
and then immediately be re-created.
Not only was this unnecessary, but HSQLDBDatabaseUpdates would attempt to export the node-local
data twice, which would cause an error due to existing .script files.
The fix is three-pronged:
1. Don't immediately rebuild the repository if it's only just been built
2. Don't export the empty node-local data if repository has only just been built
3. Don't export the node-local data if it's empty
This involves a database reshape, but before this happens the node-local
data is exported to local files, giving the user the option to use a
bootstrap file instead of waiting.
apiKey in settings is null by default at this point, for backwards compatibility.
In the future, the Windows installer could generate a UUID for apiKey.
apiKey in settings needs to be at least 8 characters.
API calls in the documentation engine are now marked with an open/closed padlock
to show where API key might be required.
Add support for API key security, where X-API-KEY header must match apiKey from settings
apiKey in settings is null by default at this point, for backwards compatibility.
In the future, the Windows installer could generate a UUID for apiKey.
apiKey in settings needs to be at least 8 characters.
API calls in the documentation engine are now marked with an open/closed padlock
to show where API key might be required.
Checkpointing interval is 1 hour by default, changable in settings via
"repositoryCheckpointInterval"
plus corresponding "showCheckpointNotifications" SysTray flags (off by default).
Added entries to SysTray_en i18n properties, and converted SysTray_ru to ISO-8559-1.