Commit Graph

385 Commits

Author SHA1 Message Date
catbref
2889d04633 Fix incorrect orphaning of BUY_NAME transactions.
Orphaning a BUY_NAME transaction would not reinstate the
sale price. Sale price could be nullified by (e.g.) orphaning
a SELL_NAME transaction.

Also added test case to cover above and other test-related support,
e.g. test-mode in NTP.
2019-08-16 11:48:30 +01:00
catbref
f83dc26ae0 ExecuteProduceConsume and networking improvements
Added "volatile" to more fields, for thread-safety on reads.
Changes to field values are done inside synchronized blocks
so no need for AtomicInteger/AtomicBoolean. (Could be changed
in the future to show intention/readability though).

Added more statistics (tasks produced/consumed).

Limited Network's EPC executor to 10 threads max.
2019-08-16 08:28:26 +01:00
catbref
c597f11c37 Fix fetching asset-related transactions from DB where asset might not exist.
Some asset-related transactions (CREATE_ASSET_ORDER and TRANSFER_ASSET)
also try to fetch asset names at the same time.

If one of these transactions, and a corresponding ISSUE_ASSET transaction,
have been orphaned then it's possible that the asset no longer exists.

Thus the SQL "JOIN" fails during transaction retrieval, causing an error.

Changing the table-join to "LEFT OUTER JOIN" makes the asset name aspect
optional. Repercussions might be nameless assets when fetching transaction
info via API.
2019-08-16 08:19:12 +01:00
catbref
3b3888ae0d Fix asset ordering for old-pricing orders 2019-08-16 08:18:34 +01:00
catbref
6abc3f4d39 Networking improvements
Fix issue where sometimes the channelSelector.select(1000) would
block processing of queued messages.

Improve support with older v1 peers.
2019-08-14 15:15:20 +01:00
catbref
ea3528015a Add support for v1-protocol BLOCK message 2019-08-14 15:14:47 +01:00
catbref
84e812484b Fix GenesisBlock ISSUE_ASSET transactions for v1 chains 2019-08-14 15:14:28 +01:00
catbref
e631e69fa1 Use bindAddress from Settings for UI, API and P2P. 2019-08-14 15:11:20 +01:00
catbref
8a0d93f304 Fix up after epic git history rebuild due to LFS issue 2019-08-13 10:27:28 +01:00
catbref
fa0b7615a6 Fixing peer disconnections due to slow processing & Transaction expiry
Transaction expiry wasn't happening. Use NTP.getTime to check whether
transactions have expired. Also reject expired transactions when trying
to add them to unconfirmed pool.

Sometimes producing a task took way too long, causing massive
spikes in the number of threads and peer disconnections.

This is down to a repository pool exhaustion, so
RepositoryManager.getRepository() would block (for up to 5 minutes).

The key method at fault was Network.getConnectablePeer().

Various fixes:

NetworkProcessor's executor now reaps old threads after only 10 seconds
instead of the usual 60 seconds.

Change logging in Network to help diagnose disconnection and repository
issues.

RepositoryManager now has a tryRepository() call that is non-blocking
and returns null if repository pool is exhausted.

Repository pool size increased from default (10) to 100.

Pruning peers is now opportunistic, using tryRepository(), and returns
early if repository pool is exhausted.

getConnectablePeer() is now opportunistic, using tryRepository(), and
returns null (no peer candidate for connection) if repository pool
is exhausted.

Merging peers is not opportunistic, using tryRepository().

Peer ping interval increased from 8s to 20s.

HSQLDBRepositoryFactory now logs when getConnection() takes over 1000ms.

Added more trace-level logging to ExecuteProduceConsume to
highlight slow produceTask() calls.
2019-08-13 09:54:35 +01:00
catbref
1094db288e Catch SystemTray.isSupported errors (and act as if no support).
Symptoms were on Ubuntu with 8u222:

Exception in thread "Controller" java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
        at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807)
        at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886)
        at java.awt.SystemTray.isSupported(SystemTray.java:219)
        at org.qora.gui.SysTray.<init>(SysTray.java:50)
        at org.qora.gui.SysTray.getInstance(SysTray.java:249)
        at org.qora.controller.Controller.updateSysTray(Controller.java:480)
        at org.qora.controller.Controller.run(Controller.java:368)

(Underlying cause not known)

This would cause Controller thread to silently exit, and so no
synchronization would occur. Node would still have connections
and thus happily generate its own blocks on its on fork.

Maybe other peers would try to sync with this node but would
likely reject this node's chain after a short while.
2019-08-13 09:54:25 +01:00
catbref
63b262a76e NTP and performance changes + fixes.
New NTP class now runs as a simplistic NTP client, repeatedly polling
several NTP servers and maintaining a more accurate time independent
of operating system.

Several occurrences of System.currentTimeMillis() replaced with NTP.getTime()
particularly where block/transaction/networking is involved.

GET /admin/info now includes "currentTimestamp" as reported from NTP.

Added support for block timestamps determined by generator, instead of
supplied by clock. (BlockChain.newBlockTimestampHeight - not yet activated).
Incorrect timestamps will produce a TIMESTAMP_INCORRECT Block.ValidationResult.

Block.calcMinimumTimestamp repurposed as Block.calcTimestamp for above.

Block timestamps are now allowed to be max 2000ms in the future,
was previously max 500ms.

Block generation prohibited until initial NTP sync.

Instead of deleting INVALID unconfirmed transactions in BlockGenerator,
Controller now deletes EXPIRED unconfirmed transactions every so often.
This also fixes persistent expired unconfirmed transactions on nodes
that do not generate blocks, as BlockGenerator.deleteInvalidTransactions()
was never reached.

Abbreviated block sigs added to log entries declaring a new block is generated
in BlockGenerator.

Controller checks for NTP sync much faster during start-up and SysTray's
tooltip text starts as "Synchronizing clock" until NTP sync occurs.
After NTP sync, Controller logs NTP offset every so often (currently every 5 mins).

When considering synchronizing, Controller skips peers that have the same block sig
as last time when synchronization resulted in no action, e.g. INFERIOR_CHAIN,
NOTHING_TO_DO and also OK. OK is included as another sync attempt would result in
NOTHING_TO_DO.
Previously this skipping check only happened after prior INFERIOR_CHAIN.

During inbound peer handshaking, if we receive a peer ID that matches an existing inbound
peer then send peer ID of all zeros, then close connection.
Remote end should detect this and cleanly close connection instead of waiting for handshake timeout.
Randomly generated peer IDs have lowest bit set to avoid all zeros.
Might need further work.

Networking doesn't connect, or accept, until NTP has synced.

Transaction validation can fail with CLOCK_NOT_SYNCED if NTP not synced.
2019-08-13 09:47:44 +01:00
catbref
05e491f65b Relax max clock offsets for block gen and peer connections.
It seems unachievable for nodes to keep their clocks accurate to
within 500ms. It is unclear whether this is due to Windows'
implementation of client NTP or because of terrible network
conditions in China.

So increasing max NTP offset to allow block generation from
500ms to 30s. Correspondingly increasing max peer timestamp
delta from 5s to 30s.

The block consensus algorithm will need to change in the near
future to address clock issues.
2019-08-13 09:18:30 +01:00
catbref
63036f3592 Increased NTP check interval from 5 minutes to 10 minutes.
Controller batches SysTray updates into one per second.

Block generation only allowed by Controller is clock
known to be accurate. ('not sure' stops generation).

NTP MAX_STDDEV increased from 25ms to 125ms to cater
for poorly connected nodes.

Controller sends peers list over outbound peer connections,
requests peers list from inbound peer connections.

When peer handshake completes, Network & Controller only send
initial messages over outbound peer connections.
This is to fix HEIGHT_V2 messages being processed out-of-order
breaking handshaking, as indicated by log entries like:

2019-07-26 16:16:35 DEBUG Network:702 - Unexpected HEIGHT_V2 message from xx.xxx.xx.xx:pppp, expected PROOF
2019-07-26 16:16:35 DEBUG Network:840 - Handshake completed with peer xx.xxx.xx.xx:pppp

Increased connection failure backoff from 1 minute to 5 minutes,
as handshake timeout is 1 minute and then nodes would immediately reconnect.

Changed default NTP servers from asia to cn.
2019-08-13 09:17:38 +01:00
catbref
33f3d35784 Fix DB backups not happening when no previous backup exists. 2019-08-13 09:15:30 +01:00
catbref
671dc5995a Don't allow block generation unless system clock is accurate.
Controller performs NTP check on startup (and every 5 minutes)
which determines whether block generation is allowed.

System Tray tooltip updated to reflect generating status.
Plus new translations.

Improved GuiTests.

BlockGenerator fetches forging accounts first, and sleeps
if none configured, which is less work than processing peer lists.
2019-08-13 09:15:21 +01:00
catbref
73e53120a9 Improved detection of inaccurate system clock & nagging.
Now uses several NTP servers to determine mean offset from
system clock to internet time.

If abs(offset) > 500ms or NTP service not running then
user is 'nagged' via system tray pop-up notification
with instructions on how to fix.

Also improved system tray translations!
2019-08-13 09:14:07 +01:00
catbref
0c17f9cff6 More useful Synchronizer logging + sync report tool.
Synchronizer logging now includes abbreviated block signature.
2019-08-13 09:08:57 +01:00
catbref
7042dd819f Include NTP checking/reconfigure tools + bump version to 1.3.1
SysTray pop-up menu now includes entry for launching https://time.is
so node owners can check their system clocks against internet time.

Windows installs also have additional systray menu entry which
runs ntpcfg.bat script, included in resources.
Also available as download via node-UI servlet,
e.g. http://localhost:9880/downloads/ntpcfg.bat

ntpcfg.bat reconfigures Windows Time Service with many NTP servers,
restarts the service, and also makes sure it auto-starts on boot.

Added DEBUG-level logging when rejecting nodes due to excessive
time difference (during PROOF handshake stage).

Bumped default settings values for minOutboundPeers from 10 to 20.
Bumped default settings values for maxPeers from 30 to 50.
2019-08-13 09:03:29 +01:00
catbref
9ee12f3e45 Reduce execute-produce-consume excessive thread spawning.
Defer the clearing of hasThreadPending flag until about to produce a task,
inside synchronized block.

This gives a new thread a chance to produce at least once before other threads
decide to spawn new threads.

Previously there could be an excessive number of unncessary threads,
all waiting for their initial attempt to produce a task.
2019-08-13 08:41:17 +01:00
catbref
b038e10ee7 Prevent multiple system tray icons
Added "synchronized" to SysTray.getInstance.

Also log launching of system tray icon.
2019-08-13 08:41:05 +01:00
catbref
964e0a02ca Fixes/improvements to networking
Reworked networking execute-produce-consume threading.
Some networking task were wrongly performed during 'produce' phase,
and some producing was happening in 'consume' phase (also corrected).

Peer connection tasks are rate-limited to 1 per second to reduce CPU thrashing.

Show P2P listen port in logs on startup.

Tests for general purpose ExecuteProduceConsume class to cover both
random task scenario and mass-ping scenario.
2019-08-13 08:40:50 +01:00
catbref
f8b496ff3c Convert SysTray to use JPopupMenu for Unicode support
Correct Systray_zh.properties to ISO 8859-1 instead of UTF-8.

Added SysTray test to GuiTests
2019-08-13 08:30:40 +01:00
catbref
6942c02700 Add ZH translations for SysTray pop-up menu.
Reduce log spam when SysTray can't open Node UI in browser,
e.g. no browser installed, or no association for URLs.
2019-08-13 08:29:16 +01:00
catbref
0d85a60c54 New network threading model
Instead of 3 threads per peer:

1. peer main thread
2. peer's unsolicited messages processor
3. peer pinger

We now use a Jetty-style Execute-Produce-Consume server threading model.

For 60 connected peers, we no longer have 180 threads but typically only
the usual ~6 threads.

Also in this commit:

* merging peers locking changed from lock() to tryLock()

* PROOF handshake maximum time difference increased from 2000ms to 5000ms

* Peers still handshaking after 60s are considered stuck and hence disconnected

* We now use NIO SocketChannels instead of raw sockets
2019-08-13 08:28:46 +01:00
catbref
67c245bb9d Don't attempt to synchronize with peers that we know have inferior chain 2019-08-13 08:28:08 +01:00
catbref
82910b6524 Add periodic system-tray pop-up if Windows Time service not running 2019-08-13 08:27:43 +01:00
catbref
9c1ca8de04 Fix HSQLDB backups.
Change AutoUpdate to use 'quick' backup.

For HSQLDBRepository.backup, don't perform CHECKPOINT DEFRAG while repository in use
as it never completes.

Similarly, use BACKUP DATABASE ... NOT BLOCKING.
2019-08-13 08:27:07 +01:00
catbref
8109214087 Reduce misbehaviour cooloff from 60min to 10min 2019-08-13 08:25:06 +01:00
catbref
b9737372d9 Use default testnet/mainnet listen port as appropriate when peer addresses are given without port 2019-08-13 08:22:46 +01:00
catbref
a154a7c073 Change default testnet ports to 9989 (P2P), 9988 (API) and 9980 (node UI) 2019-08-06 11:20:08 +01:00
catbref
21e64d0c8b Increase logging for ApplyUpdate to help debug issues 2019-08-06 11:18:15 +01:00
catbref
0da21356c7 Repository backup/recovery
Controller requests 'quick' repository backup every 123 minutes.

On start-up, if repository fails to load, recovery is attempted using
backup (if present).

AutoUpdate also requests 'slow' repository backup just before
calling ApplyUpdate. ('Slow' means perform "CHECKPOINT DEFRAG" first).
2019-08-06 11:17:49 +01:00
catbref
fdf35bba74 Commented-out commons-net in pom.xml as no longer needed 2019-08-06 11:17:08 +01:00
catbref
7eb5cd55ff Faster shutdown for Peers doing PROOF part of handshake 2019-08-06 11:16:45 +01:00
catbref
2f2d9a664d Replaced all NTP.getTime with System.currentTimeMillis. Clocks handled by O/S 2019-08-06 11:12:29 +01:00
catbref
4d265b8acb Don't log API exceptions or errors if Settings.isApiLoggingEnabled is false 2019-08-06 11:08:15 +01:00
catbref
a3e6c24a89 Minor performance improvement with lambda-based logging after initial profiling run 2019-08-06 11:07:55 +01:00
catbref
7b51b1e88d Improve Network.shutdown() and logging in Peer.
Network.shutdown() called Peer.shutdown() on each Peer
while holding synchronization lock on this.connectedPeers.

This would cause a problem during Peer.shutdown() as some
other reachable code would also want synchronized access
to this.connectedPeers. Typical symptoms would be log
entries like:

2019-07-02 11:13:05 DEBUG Peer:512 - Message processor for peer 192.144.182.73:9889 failed to terminate

Eventually Network.shutdown() would exit, releasing synchronization
lock and awaking stuck Peer threads, which could then try to access
repository (now closed) causing further log spam.

Now it uses Network.getConnectedPeers to return duplicated
List<Peer>, minimizing lock time on this.connectedPeers.

Also made Peer main thread logging more informative when a IOException
occurs, as most situations are harmless EOF or connection reset by peer.
2019-08-06 11:07:21 +01:00
catbref
192a072796 Unify setting Transaction's initial group-approval status.
Refactored duplicate code into Transaction.setInitialApprovalStatus().

This is make sure transactions HAVE a group-approval status
in Synchronizer before Block.isValid is called.

This wasn't a problem for new, unconfirmed, individual transactions
arriving over the wire due to Transaction.importAsUnconfirmed()
doing the right thing.

Also added a groupApprovalTimestamp to BlockChain feature-triggers
to support legacy chains.

Tested by syncing mainnet from scratch.
2019-08-06 11:05:36 +01:00
catbref
ad827ae01d Fix Controller's processing of HEIGHT_V2 to update all peers with same ID 2019-08-06 10:59:13 +01:00
catbref
bdfbea3a53 Remove OSGi/felix from pom.xml 2019-08-06 10:57:29 +01:00
catbref
976ea97af1 Improve pruning of old peers 2019-08-06 10:57:02 +01:00
catbref
9435e9576a Move getRewardByHeight from Block to BlockChain 2019-08-06 10:53:13 +01:00
catbref
f45cedb6ff Save lastConnected for outbound peers on completed handshake 2019-08-06 09:49:44 +01:00
catbref
840f52ff90 Added API call GET /blocks/timestamp/{timestamp} and increase timeout on POST /admin/forcesync from 5s to 30s 2019-08-06 09:47:38 +01:00
catbref
2621e04025 Change Peer ping interval to below that of SOCKET_TIMEOUT, used by blocking read()s 2019-08-06 09:47:12 +01:00
catbref
e47b4dceb2 StringBuilder, and other, optimizations for repository-related classes.
Add optional "excludeZero" to API call GET /assets/balances

Added tests to call most API calls to check no exceptions are thrown.
2019-08-06 09:46:31 +01:00
catbref
48eae0cb38 Minor change to test account assertion message. 2019-08-06 09:41:52 +01:00
catbref
99ffd62a6e Change to how "best block" is determined. 2019-08-02 15:29:48 +01:00