Note - the rebuildLatestAtStates() must never be used by two different classes at the same time, or AT states could be incorrectly deleted. It is okay at the moment as we don't run the AT states trimmer and pruner in the same app session. However we should probably synchronize this method so that we don't accidentally call it from two places in the future.
When switching from a full node to a pruning node, we need to delete most of the database contents. If we do this entirely as a background process, it is very slow and can interfere with syncing. However, if we take the approach of transferring only the necessary rows to a new table and then deleting the original table, this makes the process much faster. It was taking several days to delete the AT states in the background, but only a couple of minutes to copy them to a new table.
The trade off is that we have to go through a form of "reshape" when starting the app for the first time after enabling pruning mode. But given that this is an opt-in mode, I don't think it will be a problem.
Once the pruning is complete, it automatically performs a CHECKPOINT DEFRAG in order to shrink the database file size down to a fraction of what it was before.
From this point, the original background process will run, but can be dialled right down so not to interfere with syncing.
Initially just deleting old and unused AT states, to get this table under control. I have had to delete them individually as the table can't handle complex queries due to its size.
Nodes in pruning mode will be unable to serve older blocks to peers.
This was accidentally missed out of the original code. Some pre-updated nodes on the network will be missing this index, but we can use the upcoming "auto-bootstrap" feature to get those back.
A PUT creates a new base layer meaning anything before that point is no longer needed. These files are now deleted automatically by the cleanup manager. This involved relocating a lot of the cleanup manager methods into a shared utility, so that they could be used by the arbitrary data manager. Without this, they would be fetched from the network again as soon as they were deleted.
This deletes redundant copies of data, and also converts complete files to chunks where needed. The idea being that nodes only hold chunks, since they currently are much more likely to serve a chunk to another peer than they are to serve a complete file.
It doesn't yet cleanup files that are unassociated with transactions, nor does it delete anything from the _temp folder.
This improves scalability but isn't sufficient for a long term solution. TODO: It probably makes sense to add an additional query for recent transactions only, so that they are fetched quickly.
This is needed because we want to allow brand new accounts to publish data without a fee. A similar approach to CrossChainResource.buildAtMessage(). We already require PoW on all arbitrary transactions, so no additional logic beyond this should be needed.