This takes all trimmed blocks (which should now be all but the last 1450 or so) and moves them into flat files. Each file contains the serialized bytes of as many blocks that can fit within the file size target of 100MiB.
As a result, the HSQLDB size drops to less than 1GB, making it much faster and easier to maintain. It also significantly reduces the total size of each full node, because the data is stored in a highly optimized way.
HSQLDB then works similarly to the way it does in pruning mode - it holds all transactions, the latest state of every AT, as well as the full AT states data and hashes for the past 1450 blocks.
Each archive file contains headers and indexes in order to quickly locate blocks. When a peer requests a block that is within the archive, the serialized bytes are sent directly without the need to go via a BlockData object. Now that there are no slow queries or data serialization processes needed, it should greatly speed up the block serving.
The /block API endpoints have been modified in such a way that they will also check and retrieve blocks from the archive when needed.
A lightweight "BlockArchive" table is needed in HSQLDB to map block heights to signatures minters and timestamps. It made more sense to keep SQL support for these basic attributes of each block. These are located in a separate table from the full blocks, in order to create a clear distinction between HSQLDB blocks and archived blocks, and also to speed up query times in the Blocks table, which is the one we are using 99% of the time.
There is currently a restriction on the /admin/orphan API endpoint to prevent orphaning beyond the threshold of the block archive.
Note - the rebuildLatestAtStates() must never be used by two different classes at the same time, or AT states could be incorrectly deleted. It is okay at the moment as we don't run the AT states trimmer and pruner in the same app session. However we should probably synchronize this method so that we don't accidentally call it from two places in the future.
When switching from a full node to a pruning node, we need to delete most of the database contents. If we do this entirely as a background process, it is very slow and can interfere with syncing. However, if we take the approach of transferring only the necessary rows to a new table and then deleting the original table, this makes the process much faster. It was taking several days to delete the AT states in the background, but only a couple of minutes to copy them to a new table.
The trade off is that we have to go through a form of "reshape" when starting the app for the first time after enabling pruning mode. But given that this is an opt-in mode, I don't think it will be a problem.
Once the pruning is complete, it automatically performs a CHECKPOINT DEFRAG in order to shrink the database file size down to a fraction of what it was before.
From this point, the original background process will run, but can be dialled right down so not to interfere with syncing.
Initially just deleting old and unused AT states, to get this table under control. I have had to delete them individually as the table can't handle complex queries due to its size.
Nodes in pruning mode will be unable to serve older blocks to peers.
Whilst we would ultimately like to drop these to 24 hours only, for now we need some headroom to allow for orphaning in the event of a problem. Orphaning currently fails if there is no ATStatesData available (which is the case for trimmed blocks). This could ultimately be solved by retaining older unique states, which is essentially what the sleeping AT feature will do.
onlineAccountSignaturesMinLifetime reduced from 720 hours to 12 hours
onlineAccountSignaturesMaxLifetime reduced from 888 hours to 24 hours
These were using up too much space in the database and so it makes sense to trim them more aggressively (assuming testing goes well). We will now stop validating online account signatures after 12 hours, which should be more than enough confirmations, and we will discard them after 24 hours.
Note: this will create some complexity once some of the network is running this code. It could cause out-of-sync nodes on old versions to start treating blocks as invalid from updated peers. It's likely not worth the complexity of a hard fork though, given that almost all nodes will be synced to the chain tip and will therefore be unaffected. And even with a hard fork, we'd still face this problem on out of date nodes.
Problem:
The "Names" table (the latest state of each name) drifts out of sync with the name-related transaction history on a subset of nodes for some unknown and seemingly difficult to find reason.
Solution:
Treat the "Names" table as a cache that can be rebuilt at any time. It now works like this:
- On node startup, rebuild the entire Names table by replaying the transaction history of all registered names. Includes registrations, updates, buys and sells.
- Add a "pre-process" stage to block/transaction processing. If the block contains a name related transaction, rebuild the Names cache for any names referenced by these transactions before validating anything.
The existing "integrity check" has been modified to just check basic attributes based on the latest transaction for a name. It will log if there are any inconsistencies found, but won't correct anything. This adds confidence that the rebuild has worked correctly.
There are also multiple unit tests to ensure that the rebuilds are coping with various different scenarios.
This ensures that all name-related transactions have resulted in correct entries in the Names table. A bug in the code has resulted in some nodes having missing data in their Names table. If this process finds a missing name, it will log it and add the name.
Missing names are added, but ownership issues are only logged. The known bug wasn't related to ownership, so the logging is only to alert us to any issues that may arise in the future.
In hindsight, the code could be rewritten to store all three transaction types in a single list, but this current approach has had a lot of testing, so it is best to stick with it for now.
This is necessary because it's possible (in theory) for a block to be considered invalid due to an internal failure such as an SQLException. This gives them more chances to be considered valid again. 1 hour is more than enough time for the node to find an alternate valid chain if there is one available.
This prevents a valid block candidate being discarded in favour of an invalid one. We can't actually validate a block before orphaning (because it will fail due to various reasons such as already existing transactions, an existing block with the same height, etc) so we will instead just check the signature against the list of known invalid blocks.
This should only be used if all of the following conditions are true:
a) Your node is private and not shared with others
b) Port 12391 (API port) isn't forwarded
c) You have granted access to specific IP addresses using the "apiWhitelist" setting
The node will warn on startup if this setting is used without a sensible access control whitelist.
Until now, a high weight invalid block can cause other valid, lower weight alternatives to be discarded. The solution to this problem is to track invalid blocks and quickly avoid them once discovered. This gives other valid alternative blocks the opportunity to become part of a valid chain, where they would otherwise have been discarded.
As with the block minter update, this will cause a fork when the highest weight block candidate is invalid. But it is likely that the fork would be short lived, assuming that the majority of nodes pick the valid chain.
If it has been more than 10 minutes since receiving the last valid block, but we have had at least one invalid block since then, this is indicative of a stuck chain due to no valid block candidates. In this case, we want to allow the block minter to mint an alternative candidate so that the chain can continue.
This would create a fork at the point of the invalid block, in which two chains (valid an invalid) would diverge. The valid chain could never rejoin the invalid one, however it's likely that the invalid chain would be discarded in favour of the valid one shortly after, on the assumption that the majority of nodes would have picked the valid one.
- Use the "{\"age\":30}" data to make the tests more similar to some real world data.
- Added tests to ensure that registering and orphaning works as expected.
Whilst not ideal, this is necessary to prevent the chain from getting stuck on future blocks due to duplicate name registrations. See Block535658.java for full details on this problem - this is simply a "catch-all" implementation of that class in order to futureproof this fix.
There is still a database inconsistency to be solved, as some nodes are failing to add a registered name to their Names table the first time around, but this will take some time. Once fixed, this commit could potentially be reverted.
Also added unit tests for both scenarios (same and different creator).
TLDR: this allows all past and future invalid blocks caused by NAME_ALREADY_REGISTERED (by the same creator) to now be valid.
This was accidentally missed out of the original code. Some pre-updated nodes on the network will be missing this index, but we can use the upcoming "auto-bootstrap" feature to get those back.