Note - the rebuildLatestAtStates() must never be used by two different classes at the same time, or AT states could be incorrectly deleted. It is okay at the moment as we don't run the AT states trimmer and pruner in the same app session. However we should probably synchronize this method so that we don't accidentally call it from two places in the future.
When switching from a full node to a pruning node, we need to delete most of the database contents. If we do this entirely as a background process, it is very slow and can interfere with syncing. However, if we take the approach of transferring only the necessary rows to a new table and then deleting the original table, this makes the process much faster. It was taking several days to delete the AT states in the background, but only a couple of minutes to copy them to a new table.
The trade off is that we have to go through a form of "reshape" when starting the app for the first time after enabling pruning mode. But given that this is an opt-in mode, I don't think it will be a problem.
Once the pruning is complete, it automatically performs a CHECKPOINT DEFRAG in order to shrink the database file size down to a fraction of what it was before.
From this point, the original background process will run, but can be dialled right down so not to interfere with syncing.
Initially just deleting old and unused AT states, to get this table under control. I have had to delete them individually as the table can't handle complex queries due to its size.
Nodes in pruning mode will be unable to serve older blocks to peers.
Whilst we would ultimately like to drop these to 24 hours only, for now we need some headroom to allow for orphaning in the event of a problem. Orphaning currently fails if there is no ATStatesData available (which is the case for trimmed blocks). This could ultimately be solved by retaining older unique states, which is essentially what the sleeping AT feature will do.
onlineAccountSignaturesMinLifetime reduced from 720 hours to 12 hours
onlineAccountSignaturesMaxLifetime reduced from 888 hours to 24 hours
These were using up too much space in the database and so it makes sense to trim them more aggressively (assuming testing goes well). We will now stop validating online account signatures after 12 hours, which should be more than enough confirmations, and we will discard them after 24 hours.
Note: this will create some complexity once some of the network is running this code. It could cause out-of-sync nodes on old versions to start treating blocks as invalid from updated peers. It's likely not worth the complexity of a hard fork though, given that almost all nodes will be synced to the chain tip and will therefore be unaffected. And even with a hard fork, we'd still face this problem on out of date nodes.
Problem:
The "Names" table (the latest state of each name) drifts out of sync with the name-related transaction history on a subset of nodes for some unknown and seemingly difficult to find reason.
Solution:
Treat the "Names" table as a cache that can be rebuilt at any time. It now works like this:
- On node startup, rebuild the entire Names table by replaying the transaction history of all registered names. Includes registrations, updates, buys and sells.
- Add a "pre-process" stage to block/transaction processing. If the block contains a name related transaction, rebuild the Names cache for any names referenced by these transactions before validating anything.
The existing "integrity check" has been modified to just check basic attributes based on the latest transaction for a name. It will log if there are any inconsistencies found, but won't correct anything. This adds confidence that the rebuild has worked correctly.
There are also multiple unit tests to ensure that the rebuilds are coping with various different scenarios.
This ensures that all name-related transactions have resulted in correct entries in the Names table. A bug in the code has resulted in some nodes having missing data in their Names table. If this process finds a missing name, it will log it and add the name.
Missing names are added, but ownership issues are only logged. The known bug wasn't related to ownership, so the logging is only to alert us to any issues that may arise in the future.
In hindsight, the code could be rewritten to store all three transaction types in a single list, but this current approach has had a lot of testing, so it is best to stick with it for now.
This is necessary because it's possible (in theory) for a block to be considered invalid due to an internal failure such as an SQLException. This gives them more chances to be considered valid again. 1 hour is more than enough time for the node to find an alternate valid chain if there is one available.
This prevents a valid block candidate being discarded in favour of an invalid one. We can't actually validate a block before orphaning (because it will fail due to various reasons such as already existing transactions, an existing block with the same height, etc) so we will instead just check the signature against the list of known invalid blocks.
This should only be used if all of the following conditions are true:
a) Your node is private and not shared with others
b) Port 12391 (API port) isn't forwarded
c) You have granted access to specific IP addresses using the "apiWhitelist" setting
The node will warn on startup if this setting is used without a sensible access control whitelist.
Until now, a high weight invalid block can cause other valid, lower weight alternatives to be discarded. The solution to this problem is to track invalid blocks and quickly avoid them once discovered. This gives other valid alternative blocks the opportunity to become part of a valid chain, where they would otherwise have been discarded.
As with the block minter update, this will cause a fork when the highest weight block candidate is invalid. But it is likely that the fork would be short lived, assuming that the majority of nodes pick the valid chain.
If it has been more than 10 minutes since receiving the last valid block, but we have had at least one invalid block since then, this is indicative of a stuck chain due to no valid block candidates. In this case, we want to allow the block minter to mint an alternative candidate so that the chain can continue.
This would create a fork at the point of the invalid block, in which two chains (valid an invalid) would diverge. The valid chain could never rejoin the invalid one, however it's likely that the invalid chain would be discarded in favour of the valid one shortly after, on the assumption that the majority of nodes would have picked the valid one.
- Use the "{\"age\":30}" data to make the tests more similar to some real world data.
- Added tests to ensure that registering and orphaning works as expected.
Whilst not ideal, this is necessary to prevent the chain from getting stuck on future blocks due to duplicate name registrations. See Block535658.java for full details on this problem - this is simply a "catch-all" implementation of that class in order to futureproof this fix.
There is still a database inconsistency to be solved, as some nodes are failing to add a registered name to their Names table the first time around, but this will take some time. Once fixed, this commit could potentially be reverted.
Also added unit tests for both scenarios (same and different creator).
TLDR: this allows all past and future invalid blocks caused by NAME_ALREADY_REGISTERED (by the same creator) to now be valid.
This was accidentally missed out of the original code. Some pre-updated nodes on the network will be missing this index, but we can use the upcoming "auto-bootstrap" feature to get those back.
Now only skipping the HTLC redemption if the AT is finished and the balance has been redeemed by the buyer. This allows HTLCs to be refunded for ATs that have been refunded or cancelled.
Previously, if an error was returned from an Electrum server (such as "server busy") it would throw a NetworkException that would be caught outside of the server loop and cause the entire request to fail.
Instead of throwing an exception, I am now logging the error and returning null, in the same way we do for IOException and NoSuchElementException further up in the same method.
This allows the caller - most likely connectedRpc() - to move on to the next server in the list and try again.
This should fix an issue seen where a "server busy" response from a single server was essentially breaking our implementation, as we would give up altogether instead of trying another server.