Commit Graph

779 Commits

Author SHA1 Message Date
CalDescent
35fd1d8455 Base58 encode signatures in recently added logs. 2021-03-21 14:12:04 +00:00
CalDescent
be21771e49 Use SYNC_BATCH_SIZE instead of MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST. 2021-03-21 13:58:42 +00:00
CalDescent
f1422af95b Added retry mechanisms in Synchronizer.syncToPeerChain()
Until now, we required a perfect success rate when syncing with a peer via Synchronizer.syncToPeerChain(). Blocks were requested individually, but the node would give up and lose all progress if a single request failed. In practice, this happened very regularly, and it was difficult to succeed when there were a large number of blocks (e.g. 20+) that needed to be requested.

This commit adds two retry mechanisms, causing each of the two request types (block sigs and blocks) to retry 3 times before giving up, potentially avoiding a lot of wasted work. The number of retries is configurable in the MAXIMUM_RETRIES constant, which we could move to settings at some point if this feature proves useful.

The original issue seemed to result in a few side effects:

1. Nodes would spend a large amount of time requesting blocks from peers, only to throw it all away afterwards. This potentially added to network congestion, as nodes were using unnecessary network time to unproductively serve peers.

2. A large number of sync attempts were failing, particularly when a fork emerged with a significant number of divergent blocks (20+). This issue reduced the ability for nodes to sync to the correct chain while they still had time to do so. With every block that passed, it became made it more and more difficult to switch to the correct chain. Eventually, the correct chain would become TOO_DIVERGENT at which point there is no way to automatically switch without manual intervention. I hope that this retry mechanism will increase the chances of nodes automatically moving onto the right chain quickly, avoiding the need for a user to intervene.

3. The POST /admin/forcesync API was unlikely to succeed when the peer's chain had started to diverge from the user's chain. This should increase the success rate.

Also included in this commit is a MAXIMUM_BLOCK_SIGNATURES_PER_REQUEST constant. This limits the number of block sigs requested in each batch (default 200). Without this, we are unable to increase MAXIMUM_COMMON_DELTA because it can try and request thousands of block sigs at once, which unsurprisingly doesn't succeed.
2021-03-21 09:41:36 +00:00
CalDescent
f92f4dc1e2 Fixed some log entries in Controller.syncToPeerChain() which were incorrectly reporting our height instead of the height of block(s) being requested from the peer. Now reporting the height of the block (or block sigs) being retrieved, which should make it easier to interpret the logs. 2021-03-20 16:18:25 +00:00
catbref
019cfdc1db Minor comment re-org 2021-03-20 11:45:11 +00:00
CalDescent
e694a51cdd Fix for "numberSignaturesRequired" calculation error in Synchronizer.syncToPeerChain()
This bug often prevented the correct amount of block signatures (and blocks) from being requested from a peer, when trying to sync to it.

It could result in quite serious consequences, as it would trigger orphaning back to the common block without first requesting all of the necessary blocks from the peer's chain. Rather than applying a complete copy of the peer's chain, it could orphan back to the common block and then only apply a few blocks beyond that, leaving the node in an unexpected state, potentially hundreds of blocks behind the peer's current height, which it then has to try and obtain from other peers.

When there are forks present, this could result in it hopping from chain to chain, each time being unable to fully synchronise with the peer. Given that we currently discard our chain if it is deemed that our latest block isn't "recent", it is very important that nodes are brought up to the latest block when synchronising with a peer, to avoid constantly triggering discards.

The severity of this bug increased when there was a large disparity between the peer's latest block and the common block height, and prevented us from being able to increase MAXIMUM_COMMON_DELTA.
2021-03-20 10:33:23 +00:00
catbref
ec7d4f4498 Changed "too busy" logging from debug to trace 2021-03-13 18:30:43 +00:00
catbref
d635de44a8 Added TODO in HSQLDBRepository about deadlock log spam 2021-03-13 18:29:31 +00:00
catbref
bce66bf57f Move HSQLDBRepositoryFactory.POOL_SIZE into Settings as "repositoryConnectionPoolSize" 2021-03-13 18:14:11 +00:00
catbref
0fc5153f9b Merge 'trade-bot-timeout-fix' into master 2021-03-13 17:13:40 +00:00
catbref
0398c2fae1 Try to avoid clogging up network threads by discarding incoming TRANSACTION messages if we're too busy
As importing a transaction requires blockchain lock, all the network threads
can be used up blocking for that lock, especially if Synchronizer is active.

So we simply discard incoming TRANSACTION messages if we can't immediately
obtain the blockchain lock. Some other peer will probably attempt to
send the transaction soon again anyway.

Plus we swap transaction lists after connection handshake.
2021-03-13 17:03:38 +00:00
CalDescent
5fc495eb6a Fix for possible logic bug introduced in commit 33a8f31. 2021-03-12 22:05:38 +00:00
CalDescent
427fa1816d "blockCacheSize" can now be configured via settings.json. 2021-03-07 10:00:49 +00:00
catbref
414399b2a0 Merge branch 'blocksig' into master 2021-02-27 18:20:13 +00:00
catbref
c592051a80 Speed up BlockMinter by filtering out 'unconfirmable' transaction types like CHAT & PRESENCE 2021-02-27 17:29:19 +00:00
catbref
33a8f311e5 Reduce logging noise from lost trade-bot ATs and self-clean if AT does not exist after 24 hours 2021-02-24 21:00:52 +00:00
catbref
018c3cdcd4 Allow users to delete trade-bot entries in any state if corresponding AT does not exist 2021-02-24 20:46:47 +00:00
catbref
3920933fc7 Add block fetch TRACE logging to Synchronizer 2021-02-20 13:36:54 +00:00
catbref
1fdd7f156c Reduced logging noise from deleteExpiredTransactions but increased detection & logging on "serilization failures" from HSQLDB 2021-02-20 13:25:53 +00:00
catbref
91925cf931 Change block "minter" signature code, to take effect at some future (undecided) height.
Post trigger, this change will use all 128 bytes of previous block's signature when
calculating/validating next block's "minter" signature (itself the first 64 bytes of a block signature).

Prior to trigger, current behaviour is to only use first 64 bytes of previous block's
signature, which doesn't encompass transactions signature.

New block sig code should help reduce forking and help improve transactional
security.

Added "newBlockSigHeight" to blockchain.json but initially set to block 999999
pending decision on when to merge, auto-update, go-live, etc.
2021-02-20 12:08:51 +00:00
catbref
3453f0efaf Rework HSQLDB CHECKPOINTing to defer until there are no ongoing SQL transactions, in order to prevent DB deadlocks.
Symptoms of a CHECKPOINT-related DB deadlock:

On Controller thread:
"Controller" #20 prio=5 os_prio=31 cpu=1577665.56ms elapsed=17666.97s allocated=475G defined_classes=412 tid=0x00007fe99f97b000 nid=0x1644b waiting on condition  [0x0000700009a21000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@14.0.2/Native Method)
	- parking to wait for  <0x0000000602f2a6f8> (a org.hsqldb.lib.CountUpDownLatch$Sync)
[...some more lines...]
[this next line is the best indicator: ]
	at org.qortal.repository.hsqldb.HSQLDBRepository.checkpoint(HSQLDBRepository.java:385)
	at org.qortal.repository.RepositoryManager.checkpoint(RepositoryManager.java:51)
	at org.qortal.controller.Controller.run(Controller.java:544)

Other threads stuck at:
	- parking to wait for  <0x00000007ff09f0b0> (a org.hsqldb.lib.CountUpDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@14.0.2/LockSupport.java:211)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@14.0.2/AbstractQueuedSynchronizer.java:714)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@14.0.2/AbstractQueuedSynchronizer.java:1046)
	at org.hsqldb.lib.CountUpDownLatch.await(Unknown Source)
	at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
2021-02-13 17:32:09 +00:00
catbref
eb23940996 Fix potential NPE when trying to obtain opportunistic database connection.
Could have affected:
Controller.deleteExpiredTransactions()
Network.getConnectablePeer()
Network.opportunisticMergePeers()
Network.prunePeers()

Symptoms:

2021-02-12 16:46:06 WARN  NetworkProcessor:152 - [1556] exception while trying to produce task
java.lang.NullPointerException: null
        at org.qortal.repository.hsqldb.HSQLDBRepository.<init>(HSQLDBRepository.java:92) ~[qortal.jar:1.4.1]
        at org.qortal.repository.hsqldb.HSQLDBRepositoryFactory.tryRepository(HSQLDBRepositoryFactory.java:97) ~[qortal.jar:1.4.1]
        at org.qortal.repository.RepositoryManager.tryRepository(RepositoryManager.java:33) ~[qortal.jar:1.4.1]
        at org.qortal.network.Network.getConnectablePeer(Network.java:525) ~[qortal.jar:1.4.1]
2021-02-13 11:24:26 +00:00
catbref
20e4a79130 Reduce logging level for deleting older PRESENCE transactions 2021-01-17 16:49:47 +00:00
catbref
d336200d75 Fix for off-by-one bug when ATs look for next transaction. Currently configured to take effect block 275,000 2021-01-17 16:01:36 +00:00
catbref
e5bb3e2f0a Added defensive try-catch around network engine calls (actually ExecuteProduceConsume) 2021-01-17 15:34:50 +00:00
catbref
5b2b2bab46 Two-pronged fix for HSQLDB 'serialization failure' errors when receiving multiple PRESENCE transactions -- reported by marracc 2021-01-17 15:33:59 +00:00
catbref
c17eea3ed9 Added timeout to Peer sendMessage() - same timeout as for awaiting incoming responses 2021-01-17 15:32:39 +00:00
catbref
83f4e2f5bf Added API call to view single trade's detailed info 2021-01-17 15:31:44 +00:00
catbref
c8e7a00c08 Remove unused JDBC statement 2021-01-16 13:20:58 +00:00
catbref
190014cf96 Fix minor NPE during shutdown 2021-01-16 13:20:20 +00:00
catbref
385064e324 Return foreign-chain wallet transactions in newest-timestamp-first order 2021-01-08 17:37:12 +00:00
catbref
6eb9447bb9 Merge branch 'LTCv3-with-presence' into master 2021-01-07 07:45:47 +00:00
catbref
0ee8d7da0f Improve removal of expired PRESENCE txns in websocket cache 2020-12-31 11:26:13 +00:00
catbref
918a331609 Transaction.importAsUnconfirmed() now BLOCKS for blockchain lock, instead of non-blocking try()
This is a serious change as it affects many callers.

Controller.onNetworkTransactionMessager()
-- should be OK to block as it's only one EPC thread

TransactionsResource.processTransaction()
-- should be OK to block as it's only one Jetty thread
AND has its own 30s timeout wrapper anyway

Implementations of AcctTradeBot.progress()
e.g. BitcoinACCTv1TradeBot.progress() & LitecoinACCTv1TradeBot.progress()
TradeBot.updatePresence()
-- these are called via BlockMinter/Synchronizer
when blockchain lock is already held, so will are unaffected

AcctTradeBot.createTrade() and AcctTradeBot.startResponse()
-- these are called via API
and previously would only perform non-blocking blockchainLock.try()
but now perform blocking blockchainLock.lock()
thus preventing NO_BLOCKCHAIN_LOCK when creating/responding to trades
especially after a (wasted) MESSAGE PoW compute
but with potential downside that API response might be delayed
e.g. by a very slow sync round

Future work could look into removing the need for blockchain lock
when calling Transaction.importAsUnconfirmed().
2020-12-30 12:43:41 +00:00
catbref
41453f5bd1 Add primary key index to latest AT state cache for extra speed! 2020-12-29 17:57:20 +00:00
catbref
78f62751e5 TESTNET ONLY: Correct "LitcoinACCTv1" TradeBotStates.acct_name values in DB 2020-12-29 14:16:47 +00:00
catbref
d59c30757c Fix conversion of double in ElectrumX JSON to long 2020-12-29 12:17:55 +00:00
catbref
30d2e4fdac Correct logging message to blockchain-agnostic text 2020-12-29 12:16:40 +00:00
catbref
d43a074cc1 Use *ACCT*.class.getSimpleName() for less error-prone ACCT.NAME 2020-12-29 11:00:43 +00:00
catbref
27783dc6de Add WARNING on start-up if repository is missing latest AT state data 2020-12-29 10:34:25 +00:00
catbref
2a789a9a9b Add WARNING on start-up if repository is missing latest AT state data 2020-12-29 10:15:56 +00:00
catbref
a66dba767e Fix incorrect LatestATStates SQL table definition
This table was previously defined using the TEMPORARY keyword as the rows were used as a cached/index to speed up AT trimming SQL statements.

However, the table definition was lacking the "ON COMMIT PRESERVE ROWS" clause, and possibly the "GLOBAL" keyword, which caused the table contents to be emptied, immediately after being filled, when <tt>repository.saveChanges()</tt> was called (i.e. "COMMIT").

This caused latest AT states to be trimmed in error.

AtRepositoryTests.testGetLatestATStatePostTrimming also fixed so that it fails with the previous, broken code.
2020-12-29 09:19:03 +00:00
catbref
e00579e1a2 Fix incorrect LatestATStates SQL table definition
This table was previously defined using the TEMPORARY keyword as the rows were used as a cached/index to speed up AT trimming SQL statements.

However, the table definition was lacking the "ON COMMIT PRESERVE ROWS" clause, and possibly the "GLOBAL" keyword, which caused the table contents to be emptied, immediately after being filled, when <tt>repository.saveChanges()</tt> was called (i.e. "COMMIT").

This caused latest AT states to be trimmed in error.

AtRepositoryTests.testGetLatestATStatePostTrimming also fixed so that it fails with the previous, broken code.
2020-12-29 09:11:11 +00:00
catbref
99f1a55de2 Improve logging for case where trade offer is locked to someone else 2020-12-29 08:56:26 +00:00
catbref
3ec307a2a1 Don't allow more than one (active) trade-bot entry per trade-offer 2020-12-28 15:55:14 +00:00
catbref
3fdef9ea6d Fix P2SH refund "non-final" error issue
According to Bitcoin source, CheckFinalTx() in validation.cpp ~line 223,
we need to make sure median blocktime has passed P2SH refund transaction's
nLockTime.

Previously we were erroneously checking that median blocktime was in the past.

This should fix issues where refunding P2SH results in a "non-final" error
from the ElectrumX server network.
2020-12-28 15:44:45 +00:00
catbref
332c917c94 Fix ALICE-based PRESENCE transactions.
PRESENCE transactions were previously validated using Bob's trade key (in address form).
But as PRESENCE transactions are already emitted by Alice, her trade key is also used
(if present in trade data by virtue of AT being locked to Alice).

Similarly, Alice's trade-bot won't even try to build PRESENCE transactions if her
trade key isn't publicly visible to other peers, i.e. after AT is locked to Alice.
2020-12-28 12:00:11 +00:00
catbref
35b0ac78b8 Bump ElectrumX transaction cache size from 100 to 200 2020-12-24 16:52:58 +00:00
catbref
047627a6e5 Force bitcoinj keychain lookaheadThreshold to zero so we always generate more keys 2020-12-24 16:48:47 +00:00
catbref
5d6811bd50 Workaround for block 212937 issue 2020-12-23 15:02:14 +00:00