Commit Graph

424 Commits

Author SHA1 Message Date
catbref
63b262a76e NTP and performance changes + fixes.
New NTP class now runs as a simplistic NTP client, repeatedly polling
several NTP servers and maintaining a more accurate time independent
of operating system.

Several occurrences of System.currentTimeMillis() replaced with NTP.getTime()
particularly where block/transaction/networking is involved.

GET /admin/info now includes "currentTimestamp" as reported from NTP.

Added support for block timestamps determined by generator, instead of
supplied by clock. (BlockChain.newBlockTimestampHeight - not yet activated).
Incorrect timestamps will produce a TIMESTAMP_INCORRECT Block.ValidationResult.

Block.calcMinimumTimestamp repurposed as Block.calcTimestamp for above.

Block timestamps are now allowed to be max 2000ms in the future,
was previously max 500ms.

Block generation prohibited until initial NTP sync.

Instead of deleting INVALID unconfirmed transactions in BlockGenerator,
Controller now deletes EXPIRED unconfirmed transactions every so often.
This also fixes persistent expired unconfirmed transactions on nodes
that do not generate blocks, as BlockGenerator.deleteInvalidTransactions()
was never reached.

Abbreviated block sigs added to log entries declaring a new block is generated
in BlockGenerator.

Controller checks for NTP sync much faster during start-up and SysTray's
tooltip text starts as "Synchronizing clock" until NTP sync occurs.
After NTP sync, Controller logs NTP offset every so often (currently every 5 mins).

When considering synchronizing, Controller skips peers that have the same block sig
as last time when synchronization resulted in no action, e.g. INFERIOR_CHAIN,
NOTHING_TO_DO and also OK. OK is included as another sync attempt would result in
NOTHING_TO_DO.
Previously this skipping check only happened after prior INFERIOR_CHAIN.

During inbound peer handshaking, if we receive a peer ID that matches an existing inbound
peer then send peer ID of all zeros, then close connection.
Remote end should detect this and cleanly close connection instead of waiting for handshake timeout.
Randomly generated peer IDs have lowest bit set to avoid all zeros.
Might need further work.

Networking doesn't connect, or accept, until NTP has synced.

Transaction validation can fail with CLOCK_NOT_SYNCED if NTP not synced.
2019-08-13 09:47:44 +01:00
catbref
05e491f65b Relax max clock offsets for block gen and peer connections.
It seems unachievable for nodes to keep their clocks accurate to
within 500ms. It is unclear whether this is due to Windows'
implementation of client NTP or because of terrible network
conditions in China.

So increasing max NTP offset to allow block generation from
500ms to 30s. Correspondingly increasing max peer timestamp
delta from 5s to 30s.

The block consensus algorithm will need to change in the near
future to address clock issues.
2019-08-13 09:18:30 +01:00
catbref
63036f3592 Increased NTP check interval from 5 minutes to 10 minutes.
Controller batches SysTray updates into one per second.

Block generation only allowed by Controller is clock
known to be accurate. ('not sure' stops generation).

NTP MAX_STDDEV increased from 25ms to 125ms to cater
for poorly connected nodes.

Controller sends peers list over outbound peer connections,
requests peers list from inbound peer connections.

When peer handshake completes, Network & Controller only send
initial messages over outbound peer connections.
This is to fix HEIGHT_V2 messages being processed out-of-order
breaking handshaking, as indicated by log entries like:

2019-07-26 16:16:35 DEBUG Network:702 - Unexpected HEIGHT_V2 message from xx.xxx.xx.xx:pppp, expected PROOF
2019-07-26 16:16:35 DEBUG Network:840 - Handshake completed with peer xx.xxx.xx.xx:pppp

Increased connection failure backoff from 1 minute to 5 minutes,
as handshake timeout is 1 minute and then nodes would immediately reconnect.

Changed default NTP servers from asia to cn.
2019-08-13 09:17:38 +01:00
catbref
33f3d35784 Fix DB backups not happening when no previous backup exists. 2019-08-13 09:15:30 +01:00
catbref
671dc5995a Don't allow block generation unless system clock is accurate.
Controller performs NTP check on startup (and every 5 minutes)
which determines whether block generation is allowed.

System Tray tooltip updated to reflect generating status.
Plus new translations.

Improved GuiTests.

BlockGenerator fetches forging accounts first, and sleeps
if none configured, which is less work than processing peer lists.
2019-08-13 09:15:21 +01:00
catbref
73e53120a9 Improved detection of inaccurate system clock & nagging.
Now uses several NTP servers to determine mean offset from
system clock to internet time.

If abs(offset) > 500ms or NTP service not running then
user is 'nagged' via system tray pop-up notification
with instructions on how to fix.

Also improved system tray translations!
2019-08-13 09:14:07 +01:00
catbref
0c17f9cff6 More useful Synchronizer logging + sync report tool.
Synchronizer logging now includes abbreviated block signature.
2019-08-13 09:08:57 +01:00
catbref
7042dd819f Include NTP checking/reconfigure tools + bump version to 1.3.1
SysTray pop-up menu now includes entry for launching https://time.is
so node owners can check their system clocks against internet time.

Windows installs also have additional systray menu entry which
runs ntpcfg.bat script, included in resources.
Also available as download via node-UI servlet,
e.g. http://localhost:9880/downloads/ntpcfg.bat

ntpcfg.bat reconfigures Windows Time Service with many NTP servers,
restarts the service, and also makes sure it auto-starts on boot.

Added DEBUG-level logging when rejecting nodes due to excessive
time difference (during PROOF handshake stage).

Bumped default settings values for minOutboundPeers from 10 to 20.
Bumped default settings values for maxPeers from 30 to 50.
2019-08-13 09:03:29 +01:00
catbref
9ee12f3e45 Reduce execute-produce-consume excessive thread spawning.
Defer the clearing of hasThreadPending flag until about to produce a task,
inside synchronized block.

This gives a new thread a chance to produce at least once before other threads
decide to spawn new threads.

Previously there could be an excessive number of unncessary threads,
all waiting for their initial attempt to produce a task.
2019-08-13 08:41:17 +01:00
catbref
b038e10ee7 Prevent multiple system tray icons
Added "synchronized" to SysTray.getInstance.

Also log launching of system tray icon.
2019-08-13 08:41:05 +01:00
catbref
964e0a02ca Fixes/improvements to networking
Reworked networking execute-produce-consume threading.
Some networking task were wrongly performed during 'produce' phase,
and some producing was happening in 'consume' phase (also corrected).

Peer connection tasks are rate-limited to 1 per second to reduce CPU thrashing.

Show P2P listen port in logs on startup.

Tests for general purpose ExecuteProduceConsume class to cover both
random task scenario and mass-ping scenario.
2019-08-13 08:40:50 +01:00
catbref
f8b496ff3c Convert SysTray to use JPopupMenu for Unicode support
Correct Systray_zh.properties to ISO 8859-1 instead of UTF-8.

Added SysTray test to GuiTests
2019-08-13 08:30:40 +01:00
catbref
6942c02700 Add ZH translations for SysTray pop-up menu.
Reduce log spam when SysTray can't open Node UI in browser,
e.g. no browser installed, or no association for URLs.
2019-08-13 08:29:16 +01:00
catbref
0d85a60c54 New network threading model
Instead of 3 threads per peer:

1. peer main thread
2. peer's unsolicited messages processor
3. peer pinger

We now use a Jetty-style Execute-Produce-Consume server threading model.

For 60 connected peers, we no longer have 180 threads but typically only
the usual ~6 threads.

Also in this commit:

* merging peers locking changed from lock() to tryLock()

* PROOF handshake maximum time difference increased from 2000ms to 5000ms

* Peers still handshaking after 60s are considered stuck and hence disconnected

* We now use NIO SocketChannels instead of raw sockets
2019-08-13 08:28:46 +01:00
catbref
67c245bb9d Don't attempt to synchronize with peers that we know have inferior chain 2019-08-13 08:28:08 +01:00
catbref
82910b6524 Add periodic system-tray pop-up if Windows Time service not running 2019-08-13 08:27:43 +01:00
catbref
9c1ca8de04 Fix HSQLDB backups.
Change AutoUpdate to use 'quick' backup.

For HSQLDBRepository.backup, don't perform CHECKPOINT DEFRAG while repository in use
as it never completes.

Similarly, use BACKUP DATABASE ... NOT BLOCKING.
2019-08-13 08:27:07 +01:00
catbref
8109214087 Reduce misbehaviour cooloff from 60min to 10min 2019-08-13 08:25:06 +01:00
catbref
b9737372d9 Use default testnet/mainnet listen port as appropriate when peer addresses are given without port 2019-08-13 08:22:46 +01:00
catbref
a154a7c073 Change default testnet ports to 9989 (P2P), 9988 (API) and 9980 (node UI) 2019-08-06 11:20:08 +01:00
catbref
21e64d0c8b Increase logging for ApplyUpdate to help debug issues 2019-08-06 11:18:15 +01:00
catbref
0da21356c7 Repository backup/recovery
Controller requests 'quick' repository backup every 123 minutes.

On start-up, if repository fails to load, recovery is attempted using
backup (if present).

AutoUpdate also requests 'slow' repository backup just before
calling ApplyUpdate. ('Slow' means perform "CHECKPOINT DEFRAG" first).
2019-08-06 11:17:49 +01:00
catbref
fdf35bba74 Commented-out commons-net in pom.xml as no longer needed 2019-08-06 11:17:08 +01:00
catbref
7eb5cd55ff Faster shutdown for Peers doing PROOF part of handshake 2019-08-06 11:16:45 +01:00
catbref
2f2d9a664d Replaced all NTP.getTime with System.currentTimeMillis. Clocks handled by O/S 2019-08-06 11:12:29 +01:00
catbref
4d265b8acb Don't log API exceptions or errors if Settings.isApiLoggingEnabled is false 2019-08-06 11:08:15 +01:00
catbref
a3e6c24a89 Minor performance improvement with lambda-based logging after initial profiling run 2019-08-06 11:07:55 +01:00
catbref
7b51b1e88d Improve Network.shutdown() and logging in Peer.
Network.shutdown() called Peer.shutdown() on each Peer
while holding synchronization lock on this.connectedPeers.

This would cause a problem during Peer.shutdown() as some
other reachable code would also want synchronized access
to this.connectedPeers. Typical symptoms would be log
entries like:

2019-07-02 11:13:05 DEBUG Peer:512 - Message processor for peer 192.144.182.73:9889 failed to terminate

Eventually Network.shutdown() would exit, releasing synchronization
lock and awaking stuck Peer threads, which could then try to access
repository (now closed) causing further log spam.

Now it uses Network.getConnectedPeers to return duplicated
List<Peer>, minimizing lock time on this.connectedPeers.

Also made Peer main thread logging more informative when a IOException
occurs, as most situations are harmless EOF or connection reset by peer.
2019-08-06 11:07:21 +01:00
catbref
192a072796 Unify setting Transaction's initial group-approval status.
Refactored duplicate code into Transaction.setInitialApprovalStatus().

This is make sure transactions HAVE a group-approval status
in Synchronizer before Block.isValid is called.

This wasn't a problem for new, unconfirmed, individual transactions
arriving over the wire due to Transaction.importAsUnconfirmed()
doing the right thing.

Also added a groupApprovalTimestamp to BlockChain feature-triggers
to support legacy chains.

Tested by syncing mainnet from scratch.
2019-08-06 11:05:36 +01:00
catbref
ad827ae01d Fix Controller's processing of HEIGHT_V2 to update all peers with same ID 2019-08-06 10:59:13 +01:00
catbref
bdfbea3a53 Remove OSGi/felix from pom.xml 2019-08-06 10:57:29 +01:00
catbref
976ea97af1 Improve pruning of old peers 2019-08-06 10:57:02 +01:00
catbref
9435e9576a Move getRewardByHeight from Block to BlockChain 2019-08-06 10:53:13 +01:00
catbref
f45cedb6ff Save lastConnected for outbound peers on completed handshake 2019-08-06 09:49:44 +01:00
catbref
840f52ff90 Added API call GET /blocks/timestamp/{timestamp} and increase timeout on POST /admin/forcesync from 5s to 30s 2019-08-06 09:47:38 +01:00
catbref
2621e04025 Change Peer ping interval to below that of SOCKET_TIMEOUT, used by blocking read()s 2019-08-06 09:47:12 +01:00
catbref
e47b4dceb2 StringBuilder, and other, optimizations for repository-related classes.
Add optional "excludeZero" to API call GET /assets/balances

Added tests to call most API calls to check no exceptions are thrown.
2019-08-06 09:46:31 +01:00
catbref
48eae0cb38 Minor change to test account assertion message. 2019-08-06 09:41:52 +01:00
catbref
99ffd62a6e Change to how "best block" is determined. 2019-08-02 15:29:48 +01:00
catbref
c559c16a4a Add extra isInterrupted() quick-exits to Controller during TRANSACTION_SIGNATURES processing. 2019-08-02 15:11:04 +01:00
catbref
fc82bcaf49 Tidying up peer info.
lastHeight/blockSig/blockTimestamp, etc. moved from PeerData/repository
to Peer as it's transient so no need to store in repository.

Repository now keeps track of when/who added peer, e.g. API/INIT/another peer.

API calls DELETE /peers and DELETE /peers/known also disconnect peer
as well as deleting from repository.

Connection timestamp now reported by API call GET /peers

Some repository-updating code removed from Network/Controller as no
longer needed.

Removed obsolete Controller.hasShorterBlockchain predicate.
2019-08-02 14:56:19 +01:00
catbref
8b135eb447 Improve startup & shutdown. Improve API call POST /admin/forcesync.
Previous version was prone to throwing exceptions during shutdown
sequence, especially in Peer/Network-related threads.

Shutdown now tries to wait for Peer/Network threads to cleanly exit.

API call POST /admin/forcesync now tries to grab blockchain lock,
with timeout, and then repeatedly sync with requested peer until
either fully synced or something unexpected happens.
2019-08-02 14:54:22 +01:00
catbref
84f5935d38 Improve shutdown of Network.
Detect interrupt during peer connect so we can return without trying
to set up new peer.

Do join() after interrupt() in Network.shutdown.
2019-08-02 14:52:46 +01:00
catbref
c68e0eb6ea change auto-update download locations & logging message 2019-08-02 14:50:17 +01:00
catbref
5b70f0004d Disregard misbehaved peers before counting to see if we can synchronize.
Also added simple system-tray tooltip update to show number of peers and current height.
2019-08-02 14:47:40 +01:00
catbref
915eebb8e5 BlockChain.isTestNet now BlockChain.isTestChain.
Added Settings.isTestNet.

Disabled ArbitraryDataManager for now.
2019-08-02 14:42:10 +01:00
catbref
1d81c4db6b BlockGen / Synchronizer improvements
BlockGenerator will now attempt to generate a new block if none of its
peers have a recent block either (in case of network stall). BlockGenerator
still needs a minimum number of peers before generating though.

Reduce BlockGenerator workload and use of blockchain lock if it
can't generate a block.

Reduce Synchronizer logging output.

Unify calculating timestamp threshold for 'recent' block into Controller.
2019-08-02 14:33:41 +01:00
catbref
5acc92ef26 Improve TRANSACTION_SIGNATURES handling in Controller.
Don't disconnect peers that fail to send a requested transaction,
as they may no longer have it. e.g. transaction might have expired
or become invalid.

For some other cases, e.g. we have transaction already, move on to
requesting the next transaction instead of giving up on the list.
2019-08-02 14:16:18 +01:00
catbref
b48f671774 In AutoUpdate, pass download buffer to SHA256 digester BEFORE deXORing. 2019-08-02 14:15:56 +01:00
catbref
8727780b77 Added XorUpdate utility to help prepare auto-updates. 2019-08-02 14:15:38 +01:00