This bug often prevented the correct amount of block signatures (and blocks) from being requested from a peer, when trying to sync to it.
It could result in quite serious consequences, as it would trigger orphaning back to the common block without first requesting all of the necessary blocks from the peer's chain. Rather than applying a complete copy of the peer's chain, it could orphan back to the common block and then only apply a few blocks beyond that, leaving the node in an unexpected state, potentially hundreds of blocks behind the peer's current height, which it then has to try and obtain from other peers.
When there are forks present, this could result in it hopping from chain to chain, each time being unable to fully synchronise with the peer. Given that we currently discard our chain if it is deemed that our latest block isn't "recent", it is very important that nodes are brought up to the latest block when synchronising with a peer, to avoid constantly triggering discards.
The severity of this bug increased when there was a large disparity between the peer's latest block and the common block height, and prevented us from being able to increase MAXIMUM_COMMON_DELTA.
As importing a transaction requires blockchain lock, all the network threads
can be used up blocking for that lock, especially if Synchronizer is active.
So we simply discard incoming TRANSACTION messages if we can't immediately
obtain the blockchain lock. Some other peer will probably attempt to
send the transaction soon again anyway.
Plus we swap transaction lists after connection handshake.
Post trigger, account levels will map correctly to share bins, subtracting 1 to account for the 0th element of the shareBinsByLevel array.
Pre-trigger, the legacy mapping will remain in effect.
Post trigger, this change will use all 128 bytes of previous block's signature when
calculating/validating next block's "minter" signature (itself the first 64 bytes of a block signature).
Prior to trigger, current behaviour is to only use first 64 bytes of previous block's
signature, which doesn't encompass transactions signature.
New block sig code should help reduce forking and help improve transactional
security.
Added "newBlockSigHeight" to blockchain.json but initially set to block 999999
pending decision on when to merge, auto-update, go-live, etc.
Symptoms of a CHECKPOINT-related DB deadlock:
On Controller thread:
"Controller" #20 prio=5 os_prio=31 cpu=1577665.56ms elapsed=17666.97s allocated=475G defined_classes=412 tid=0x00007fe99f97b000 nid=0x1644b waiting on condition [0x0000700009a21000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@14.0.2/Native Method)
- parking to wait for <0x0000000602f2a6f8> (a org.hsqldb.lib.CountUpDownLatch$Sync)
[...some more lines...]
[this next line is the best indicator: ]
at org.qortal.repository.hsqldb.HSQLDBRepository.checkpoint(HSQLDBRepository.java:385)
at org.qortal.repository.RepositoryManager.checkpoint(RepositoryManager.java:51)
at org.qortal.controller.Controller.run(Controller.java:544)
Other threads stuck at:
- parking to wait for <0x00000007ff09f0b0> (a org.hsqldb.lib.CountUpDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(java.base@14.0.2/LockSupport.java:211)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@14.0.2/AbstractQueuedSynchronizer.java:714)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@14.0.2/AbstractQueuedSynchronizer.java:1046)
at org.hsqldb.lib.CountUpDownLatch.await(Unknown Source)
at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
Could have affected:
Controller.deleteExpiredTransactions()
Network.getConnectablePeer()
Network.opportunisticMergePeers()
Network.prunePeers()
Symptoms:
2021-02-12 16:46:06 WARN NetworkProcessor:152 - [1556] exception while trying to produce task
java.lang.NullPointerException: null
at org.qortal.repository.hsqldb.HSQLDBRepository.<init>(HSQLDBRepository.java:92) ~[qortal.jar:1.4.1]
at org.qortal.repository.hsqldb.HSQLDBRepositoryFactory.tryRepository(HSQLDBRepositoryFactory.java:97) ~[qortal.jar:1.4.1]
at org.qortal.repository.RepositoryManager.tryRepository(RepositoryManager.java:33) ~[qortal.jar:1.4.1]
at org.qortal.network.Network.getConnectablePeer(Network.java:525) ~[qortal.jar:1.4.1]
This is a serious change as it affects many callers.
Controller.onNetworkTransactionMessager()
-- should be OK to block as it's only one EPC thread
TransactionsResource.processTransaction()
-- should be OK to block as it's only one Jetty thread
AND has its own 30s timeout wrapper anyway
Implementations of AcctTradeBot.progress()
e.g. BitcoinACCTv1TradeBot.progress() & LitecoinACCTv1TradeBot.progress()
TradeBot.updatePresence()
-- these are called via BlockMinter/Synchronizer
when blockchain lock is already held, so will are unaffected
AcctTradeBot.createTrade() and AcctTradeBot.startResponse()
-- these are called via API
and previously would only perform non-blocking blockchainLock.try()
but now perform blocking blockchainLock.lock()
thus preventing NO_BLOCKCHAIN_LOCK when creating/responding to trades
especially after a (wasted) MESSAGE PoW compute
but with potential downside that API response might be delayed
e.g. by a very slow sync round
Future work could look into removing the need for blockchain lock
when calling Transaction.importAsUnconfirmed().
This table was previously defined using the TEMPORARY keyword as the rows were used as a cached/index to speed up AT trimming SQL statements.
However, the table definition was lacking the "ON COMMIT PRESERVE ROWS" clause, and possibly the "GLOBAL" keyword, which caused the table contents to be emptied, immediately after being filled, when <tt>repository.saveChanges()</tt> was called (i.e. "COMMIT").
This caused latest AT states to be trimmed in error.
AtRepositoryTests.testGetLatestATStatePostTrimming also fixed so that it fails with the previous, broken code.
This table was previously defined using the TEMPORARY keyword as the rows were used as a cached/index to speed up AT trimming SQL statements.
However, the table definition was lacking the "ON COMMIT PRESERVE ROWS" clause, and possibly the "GLOBAL" keyword, which caused the table contents to be emptied, immediately after being filled, when <tt>repository.saveChanges()</tt> was called (i.e. "COMMIT").
This caused latest AT states to be trimmed in error.
AtRepositoryTests.testGetLatestATStatePostTrimming also fixed so that it fails with the previous, broken code.
According to Bitcoin source, CheckFinalTx() in validation.cpp ~line 223,
we need to make sure median blocktime has passed P2SH refund transaction's
nLockTime.
Previously we were erroneously checking that median blocktime was in the past.
This should fix issues where refunding P2SH results in a "non-final" error
from the ElectrumX server network.