Orphaning a BUY_NAME transaction would not reinstate the
sale price. Sale price could be nullified by (e.g.) orphaning
a SELL_NAME transaction.
Also added test case to cover above and other test-related support,
e.g. test-mode in NTP.
Added "volatile" to more fields, for thread-safety on reads.
Changes to field values are done inside synchronized blocks
so no need for AtomicInteger/AtomicBoolean. (Could be changed
in the future to show intention/readability though).
Added more statistics (tasks produced/consumed).
Limited Network's EPC executor to 10 threads max.
Some asset-related transactions (CREATE_ASSET_ORDER and TRANSFER_ASSET)
also try to fetch asset names at the same time.
If one of these transactions, and a corresponding ISSUE_ASSET transaction,
have been orphaned then it's possible that the asset no longer exists.
Thus the SQL "JOIN" fails during transaction retrieval, causing an error.
Changing the table-join to "LEFT OUTER JOIN" makes the asset name aspect
optional. Repercussions might be nameless assets when fetching transaction
info via API.
Transaction expiry wasn't happening. Use NTP.getTime to check whether
transactions have expired. Also reject expired transactions when trying
to add them to unconfirmed pool.
Sometimes producing a task took way too long, causing massive
spikes in the number of threads and peer disconnections.
This is down to a repository pool exhaustion, so
RepositoryManager.getRepository() would block (for up to 5 minutes).
The key method at fault was Network.getConnectablePeer().
Various fixes:
NetworkProcessor's executor now reaps old threads after only 10 seconds
instead of the usual 60 seconds.
Change logging in Network to help diagnose disconnection and repository
issues.
RepositoryManager now has a tryRepository() call that is non-blocking
and returns null if repository pool is exhausted.
Repository pool size increased from default (10) to 100.
Pruning peers is now opportunistic, using tryRepository(), and returns
early if repository pool is exhausted.
getConnectablePeer() is now opportunistic, using tryRepository(), and
returns null (no peer candidate for connection) if repository pool
is exhausted.
Merging peers is not opportunistic, using tryRepository().
Peer ping interval increased from 8s to 20s.
HSQLDBRepositoryFactory now logs when getConnection() takes over 1000ms.
Added more trace-level logging to ExecuteProduceConsume to
highlight slow produceTask() calls.
Symptoms were on Ubuntu with 8u222:
Exception in thread "Controller" java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807)
at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886)
at java.awt.SystemTray.isSupported(SystemTray.java:219)
at org.qora.gui.SysTray.<init>(SysTray.java:50)
at org.qora.gui.SysTray.getInstance(SysTray.java:249)
at org.qora.controller.Controller.updateSysTray(Controller.java:480)
at org.qora.controller.Controller.run(Controller.java:368)
(Underlying cause not known)
This would cause Controller thread to silently exit, and so no
synchronization would occur. Node would still have connections
and thus happily generate its own blocks on its on fork.
Maybe other peers would try to sync with this node but would
likely reject this node's chain after a short while.
New NTP class now runs as a simplistic NTP client, repeatedly polling
several NTP servers and maintaining a more accurate time independent
of operating system.
Several occurrences of System.currentTimeMillis() replaced with NTP.getTime()
particularly where block/transaction/networking is involved.
GET /admin/info now includes "currentTimestamp" as reported from NTP.
Added support for block timestamps determined by generator, instead of
supplied by clock. (BlockChain.newBlockTimestampHeight - not yet activated).
Incorrect timestamps will produce a TIMESTAMP_INCORRECT Block.ValidationResult.
Block.calcMinimumTimestamp repurposed as Block.calcTimestamp for above.
Block timestamps are now allowed to be max 2000ms in the future,
was previously max 500ms.
Block generation prohibited until initial NTP sync.
Instead of deleting INVALID unconfirmed transactions in BlockGenerator,
Controller now deletes EXPIRED unconfirmed transactions every so often.
This also fixes persistent expired unconfirmed transactions on nodes
that do not generate blocks, as BlockGenerator.deleteInvalidTransactions()
was never reached.
Abbreviated block sigs added to log entries declaring a new block is generated
in BlockGenerator.
Controller checks for NTP sync much faster during start-up and SysTray's
tooltip text starts as "Synchronizing clock" until NTP sync occurs.
After NTP sync, Controller logs NTP offset every so often (currently every 5 mins).
When considering synchronizing, Controller skips peers that have the same block sig
as last time when synchronization resulted in no action, e.g. INFERIOR_CHAIN,
NOTHING_TO_DO and also OK. OK is included as another sync attempt would result in
NOTHING_TO_DO.
Previously this skipping check only happened after prior INFERIOR_CHAIN.
During inbound peer handshaking, if we receive a peer ID that matches an existing inbound
peer then send peer ID of all zeros, then close connection.
Remote end should detect this and cleanly close connection instead of waiting for handshake timeout.
Randomly generated peer IDs have lowest bit set to avoid all zeros.
Might need further work.
Networking doesn't connect, or accept, until NTP has synced.
Transaction validation can fail with CLOCK_NOT_SYNCED if NTP not synced.
It seems unachievable for nodes to keep their clocks accurate to
within 500ms. It is unclear whether this is due to Windows'
implementation of client NTP or because of terrible network
conditions in China.
So increasing max NTP offset to allow block generation from
500ms to 30s. Correspondingly increasing max peer timestamp
delta from 5s to 30s.
The block consensus algorithm will need to change in the near
future to address clock issues.
Controller batches SysTray updates into one per second.
Block generation only allowed by Controller is clock
known to be accurate. ('not sure' stops generation).
NTP MAX_STDDEV increased from 25ms to 125ms to cater
for poorly connected nodes.
Controller sends peers list over outbound peer connections,
requests peers list from inbound peer connections.
When peer handshake completes, Network & Controller only send
initial messages over outbound peer connections.
This is to fix HEIGHT_V2 messages being processed out-of-order
breaking handshaking, as indicated by log entries like:
2019-07-26 16:16:35 DEBUG Network:702 - Unexpected HEIGHT_V2 message from xx.xxx.xx.xx:pppp, expected PROOF
2019-07-26 16:16:35 DEBUG Network:840 - Handshake completed with peer xx.xxx.xx.xx:pppp
Increased connection failure backoff from 1 minute to 5 minutes,
as handshake timeout is 1 minute and then nodes would immediately reconnect.
Changed default NTP servers from asia to cn.
Controller performs NTP check on startup (and every 5 minutes)
which determines whether block generation is allowed.
System Tray tooltip updated to reflect generating status.
Plus new translations.
Improved GuiTests.
BlockGenerator fetches forging accounts first, and sleeps
if none configured, which is less work than processing peer lists.
Now uses several NTP servers to determine mean offset from
system clock to internet time.
If abs(offset) > 500ms or NTP service not running then
user is 'nagged' via system tray pop-up notification
with instructions on how to fix.
Also improved system tray translations!
SysTray pop-up menu now includes entry for launching https://time.is
so node owners can check their system clocks against internet time.
Windows installs also have additional systray menu entry which
runs ntpcfg.bat script, included in resources.
Also available as download via node-UI servlet,
e.g. http://localhost:9880/downloads/ntpcfg.bat
ntpcfg.bat reconfigures Windows Time Service with many NTP servers,
restarts the service, and also makes sure it auto-starts on boot.
Added DEBUG-level logging when rejecting nodes due to excessive
time difference (during PROOF handshake stage).
Bumped default settings values for minOutboundPeers from 10 to 20.
Bumped default settings values for maxPeers from 30 to 50.
Defer the clearing of hasThreadPending flag until about to produce a task,
inside synchronized block.
This gives a new thread a chance to produce at least once before other threads
decide to spawn new threads.
Previously there could be an excessive number of unncessary threads,
all waiting for their initial attempt to produce a task.
Reworked networking execute-produce-consume threading.
Some networking task were wrongly performed during 'produce' phase,
and some producing was happening in 'consume' phase (also corrected).
Peer connection tasks are rate-limited to 1 per second to reduce CPU thrashing.
Show P2P listen port in logs on startup.
Tests for general purpose ExecuteProduceConsume class to cover both
random task scenario and mass-ping scenario.
Instead of 3 threads per peer:
1. peer main thread
2. peer's unsolicited messages processor
3. peer pinger
We now use a Jetty-style Execute-Produce-Consume server threading model.
For 60 connected peers, we no longer have 180 threads but typically only
the usual ~6 threads.
Also in this commit:
* merging peers locking changed from lock() to tryLock()
* PROOF handshake maximum time difference increased from 2000ms to 5000ms
* Peers still handshaking after 60s are considered stuck and hence disconnected
* We now use NIO SocketChannels instead of raw sockets
Change AutoUpdate to use 'quick' backup.
For HSQLDBRepository.backup, don't perform CHECKPOINT DEFRAG while repository in use
as it never completes.
Similarly, use BACKUP DATABASE ... NOT BLOCKING.
Controller requests 'quick' repository backup every 123 minutes.
On start-up, if repository fails to load, recovery is attempted using
backup (if present).
AutoUpdate also requests 'slow' repository backup just before
calling ApplyUpdate. ('Slow' means perform "CHECKPOINT DEFRAG" first).
Network.shutdown() called Peer.shutdown() on each Peer
while holding synchronization lock on this.connectedPeers.
This would cause a problem during Peer.shutdown() as some
other reachable code would also want synchronized access
to this.connectedPeers. Typical symptoms would be log
entries like:
2019-07-02 11:13:05 DEBUG Peer:512 - Message processor for peer 192.144.182.73:9889 failed to terminate
Eventually Network.shutdown() would exit, releasing synchronization
lock and awaking stuck Peer threads, which could then try to access
repository (now closed) causing further log spam.
Now it uses Network.getConnectedPeers to return duplicated
List<Peer>, minimizing lock time on this.connectedPeers.
Also made Peer main thread logging more informative when a IOException
occurs, as most situations are harmless EOF or connection reset by peer.
Refactored duplicate code into Transaction.setInitialApprovalStatus().
This is make sure transactions HAVE a group-approval status
in Synchronizer before Block.isValid is called.
This wasn't a problem for new, unconfirmed, individual transactions
arriving over the wire due to Transaction.importAsUnconfirmed()
doing the right thing.
Also added a groupApprovalTimestamp to BlockChain feature-triggers
to support legacy chains.
Tested by syncing mainnet from scratch.