Controller batches SysTray updates into one per second.
Block generation only allowed by Controller is clock
known to be accurate. ('not sure' stops generation).
NTP MAX_STDDEV increased from 25ms to 125ms to cater
for poorly connected nodes.
Controller sends peers list over outbound peer connections,
requests peers list from inbound peer connections.
When peer handshake completes, Network & Controller only send
initial messages over outbound peer connections.
This is to fix HEIGHT_V2 messages being processed out-of-order
breaking handshaking, as indicated by log entries like:
2019-07-26 16:16:35 DEBUG Network:702 - Unexpected HEIGHT_V2 message from xx.xxx.xx.xx:pppp, expected PROOF
2019-07-26 16:16:35 DEBUG Network:840 - Handshake completed with peer xx.xxx.xx.xx:pppp
Increased connection failure backoff from 1 minute to 5 minutes,
as handshake timeout is 1 minute and then nodes would immediately reconnect.
Changed default NTP servers from asia to cn.
Controller performs NTP check on startup (and every 5 minutes)
which determines whether block generation is allowed.
System Tray tooltip updated to reflect generating status.
Plus new translations.
Improved GuiTests.
BlockGenerator fetches forging accounts first, and sleeps
if none configured, which is less work than processing peer lists.
Now uses several NTP servers to determine mean offset from
system clock to internet time.
If abs(offset) > 500ms or NTP service not running then
user is 'nagged' via system tray pop-up notification
with instructions on how to fix.
Also improved system tray translations!
SysTray pop-up menu now includes entry for launching https://time.is
so node owners can check their system clocks against internet time.
Windows installs also have additional systray menu entry which
runs ntpcfg.bat script, included in resources.
Also available as download via node-UI servlet,
e.g. http://localhost:9880/downloads/ntpcfg.bat
ntpcfg.bat reconfigures Windows Time Service with many NTP servers,
restarts the service, and also makes sure it auto-starts on boot.
Added DEBUG-level logging when rejecting nodes due to excessive
time difference (during PROOF handshake stage).
Bumped default settings values for minOutboundPeers from 10 to 20.
Bumped default settings values for maxPeers from 30 to 50.
Defer the clearing of hasThreadPending flag until about to produce a task,
inside synchronized block.
This gives a new thread a chance to produce at least once before other threads
decide to spawn new threads.
Previously there could be an excessive number of unncessary threads,
all waiting for their initial attempt to produce a task.
Reworked networking execute-produce-consume threading.
Some networking task were wrongly performed during 'produce' phase,
and some producing was happening in 'consume' phase (also corrected).
Peer connection tasks are rate-limited to 1 per second to reduce CPU thrashing.
Show P2P listen port in logs on startup.
Tests for general purpose ExecuteProduceConsume class to cover both
random task scenario and mass-ping scenario.
Instead of 3 threads per peer:
1. peer main thread
2. peer's unsolicited messages processor
3. peer pinger
We now use a Jetty-style Execute-Produce-Consume server threading model.
For 60 connected peers, we no longer have 180 threads but typically only
the usual ~6 threads.
Also in this commit:
* merging peers locking changed from lock() to tryLock()
* PROOF handshake maximum time difference increased from 2000ms to 5000ms
* Peers still handshaking after 60s are considered stuck and hence disconnected
* We now use NIO SocketChannels instead of raw sockets
Change AutoUpdate to use 'quick' backup.
For HSQLDBRepository.backup, don't perform CHECKPOINT DEFRAG while repository in use
as it never completes.
Similarly, use BACKUP DATABASE ... NOT BLOCKING.
Controller requests 'quick' repository backup every 123 minutes.
On start-up, if repository fails to load, recovery is attempted using
backup (if present).
AutoUpdate also requests 'slow' repository backup just before
calling ApplyUpdate. ('Slow' means perform "CHECKPOINT DEFRAG" first).
Network.shutdown() called Peer.shutdown() on each Peer
while holding synchronization lock on this.connectedPeers.
This would cause a problem during Peer.shutdown() as some
other reachable code would also want synchronized access
to this.connectedPeers. Typical symptoms would be log
entries like:
2019-07-02 11:13:05 DEBUG Peer:512 - Message processor for peer 192.144.182.73:9889 failed to terminate
Eventually Network.shutdown() would exit, releasing synchronization
lock and awaking stuck Peer threads, which could then try to access
repository (now closed) causing further log spam.
Now it uses Network.getConnectedPeers to return duplicated
List<Peer>, minimizing lock time on this.connectedPeers.
Also made Peer main thread logging more informative when a IOException
occurs, as most situations are harmless EOF or connection reset by peer.
Refactored duplicate code into Transaction.setInitialApprovalStatus().
This is make sure transactions HAVE a group-approval status
in Synchronizer before Block.isValid is called.
This wasn't a problem for new, unconfirmed, individual transactions
arriving over the wire due to Transaction.importAsUnconfirmed()
doing the right thing.
Also added a groupApprovalTimestamp to BlockChain feature-triggers
to support legacy chains.
Tested by syncing mainnet from scratch.
lastHeight/blockSig/blockTimestamp, etc. moved from PeerData/repository
to Peer as it's transient so no need to store in repository.
Repository now keeps track of when/who added peer, e.g. API/INIT/another peer.
API calls DELETE /peers and DELETE /peers/known also disconnect peer
as well as deleting from repository.
Connection timestamp now reported by API call GET /peers
Some repository-updating code removed from Network/Controller as no
longer needed.
Removed obsolete Controller.hasShorterBlockchain predicate.
Previous version was prone to throwing exceptions during shutdown
sequence, especially in Peer/Network-related threads.
Shutdown now tries to wait for Peer/Network threads to cleanly exit.
API call POST /admin/forcesync now tries to grab blockchain lock,
with timeout, and then repeatedly sync with requested peer until
either fully synced or something unexpected happens.
BlockGenerator will now attempt to generate a new block if none of its
peers have a recent block either (in case of network stall). BlockGenerator
still needs a minimum number of peers before generating though.
Reduce BlockGenerator workload and use of blockchain lock if it
can't generate a block.
Reduce Synchronizer logging output.
Unify calculating timestamp threshold for 'recent' block into Controller.
Don't disconnect peers that fail to send a requested transaction,
as they may no longer have it. e.g. transaction might have expired
or become invalid.
For some other cases, e.g. we have transaction already, move on to
requesting the next transaction instead of giving up on the list.
NOTE: Downloaded update JARs are now expected to have been XORed with 0x5A!
This is to help prevent Windows Firewall from blocking update downloads
based on deep packet inspection.
Download read timeout reduced from 5s to 3s.
Download locations reordered so github entries are at the top as they have
better CDNs.
ApplyUpdate now assumes null response from GET /admin/stop means node
is not running.
ApplyUpdate now checks replacement JAR actually exists before attempting
to overwrite previous version.
ApplyUpdate now tries to use Windows EXE launcher in preference to raw
java command line. (This should improve Windows installer behaviour
in detecting running process and possibly firewall implications too).