API call GET /addresses/online reports online accounts,
including both addresses relating to the proxy-forge public key.
New PeerChainTipData class to replace the broken "peer data lock"
that was supposed to make sure peer's last height/blockSig/timestamp
were all in sync. Now peer's chain tip data is a single object
reference that can be replaced in one go.
Removed pointless API calls /blocks/time and /blocks/{generatingbalance}.
Various changes, mostly in Block class, to do with switching to BlockTimingByHeight
from old min/max block time.
New block 'weight' based on number of online accounts
and 'distance' of perturbed generator public key from 'ideal' public key
(for that particular block's height).
New sub-chain 'weight' based on accumulating block weights,
currently by shifting previous accumulator left by 8 bits then
adding next block's weight.
More validation of BlockChain config. Helpful for debugging, probably
not very useful to end-users.
BlockGenerator now uses unified Peer predicates from Controller, like:
Controller.hasMisbehaved, Controller.hasNoRecentBlock, etc.
Controller now keeps a list of chain-tip signatures that are for inferior
chains, so it doesn't try to synchronize with peers with inferior chains.
(This list is wiped when node's blockchain changes/block is generated).
Controller now asks Gui to display error box if it can't parse Settings.
Controller.potentiallySynchronize() does more filtering of potential peers
before calling actuallySynchronize(). (Mostly moved from Synchronizer,
so now we expect actuallySynchronize() to do something rather than bail
out because it doesn't like the peer after all).
If synchronization discovers that peer has an inferior chain,
then Controller notifies that peer of our superior chain, to help keep
the network in sync.
Renamed OnlineAccount to OnlineAccountData, as it is in package org.qora.data
after all...
Synchronizer reworked to request block summaries so it can judge which chain
is better, and hence whether to sync with peer or abort.
Slight optimization of Peer.readChannel() to exit earlier if no more network
messages can be extracted from buffer.
More tests.
Improved documentation and logging.
Block.MAX_BLOCK_BYTES now BlockChain.getMaxBlockSize()
Network.MAXIMUM_MESSAGE_SIZE now Network.getMaxMessageSize() as
it depends on block size (above).
Block data now includes number of online accounts, as encoded online account indexes can't be
validated by ConciseSet it seems.
Corresponding changes to repository, transformer, block validation, data object, block summaries...
Block timestamps are now calculated using parent block data and generator's public key,
instead of old qora1 generating balance code.
Generators are valid to forge if they have forging flag enabled. This will probably change
to an account-level check in the near future.
Added trimming of old online accounts signatures from blocks.
Tidied up SysTray/BlockGenerator generation enabled/possible flag.
Although we perform online accounts tasks (currently) every 10 seconds,
only broadcast our online accounts every 60 seconds.
In Controller.main(), if args are present then use first as a filename to
settings JSON file (overriding the default filename).
Still to do: change Block/BlockChain/Synchronizer to prefer blocks with more online accounts,
failing that use generator nearest 'ideal', etc.
Safety commit in case of data loss!
Lots of changes to do with "online accounts", including:
* networking
+ GET_ONLINE_ACCOUNTS * ONLINE_ACCOUNTS messages & handling
+ Changes to serialization of block data to include online accounts info
* block-related
+ Adding online accounts info when generating blocks
+ Validating online accounts info in block data
+ Repository changes to store online accounts info
* Controller
+ Managing in-memory cache of online accounts
+ Updating/broadcasting our online accounts
* BlockChain config
Added "account levels", so new code/changes required in the usual places, like:
* transaction data object
* repository
* transaction transformer
* transaction processing
Orphaning a BUY_NAME transaction would not reinstate the
sale price. Sale price could be nullified by (e.g.) orphaning
a SELL_NAME transaction.
Also added test case to cover above and other test-related support,
e.g. test-mode in NTP.
Added "volatile" to more fields, for thread-safety on reads.
Changes to field values are done inside synchronized blocks
so no need for AtomicInteger/AtomicBoolean. (Could be changed
in the future to show intention/readability though).
Added more statistics (tasks produced/consumed).
Limited Network's EPC executor to 10 threads max.
Some asset-related transactions (CREATE_ASSET_ORDER and TRANSFER_ASSET)
also try to fetch asset names at the same time.
If one of these transactions, and a corresponding ISSUE_ASSET transaction,
have been orphaned then it's possible that the asset no longer exists.
Thus the SQL "JOIN" fails during transaction retrieval, causing an error.
Changing the table-join to "LEFT OUTER JOIN" makes the asset name aspect
optional. Repercussions might be nameless assets when fetching transaction
info via API.
Transaction expiry wasn't happening. Use NTP.getTime to check whether
transactions have expired. Also reject expired transactions when trying
to add them to unconfirmed pool.
Sometimes producing a task took way too long, causing massive
spikes in the number of threads and peer disconnections.
This is down to a repository pool exhaustion, so
RepositoryManager.getRepository() would block (for up to 5 minutes).
The key method at fault was Network.getConnectablePeer().
Various fixes:
NetworkProcessor's executor now reaps old threads after only 10 seconds
instead of the usual 60 seconds.
Change logging in Network to help diagnose disconnection and repository
issues.
RepositoryManager now has a tryRepository() call that is non-blocking
and returns null if repository pool is exhausted.
Repository pool size increased from default (10) to 100.
Pruning peers is now opportunistic, using tryRepository(), and returns
early if repository pool is exhausted.
getConnectablePeer() is now opportunistic, using tryRepository(), and
returns null (no peer candidate for connection) if repository pool
is exhausted.
Merging peers is not opportunistic, using tryRepository().
Peer ping interval increased from 8s to 20s.
HSQLDBRepositoryFactory now logs when getConnection() takes over 1000ms.
Added more trace-level logging to ExecuteProduceConsume to
highlight slow produceTask() calls.
Symptoms were on Ubuntu with 8u222:
Exception in thread "Controller" java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807)
at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886)
at java.awt.SystemTray.isSupported(SystemTray.java:219)
at org.qora.gui.SysTray.<init>(SysTray.java:50)
at org.qora.gui.SysTray.getInstance(SysTray.java:249)
at org.qora.controller.Controller.updateSysTray(Controller.java:480)
at org.qora.controller.Controller.run(Controller.java:368)
(Underlying cause not known)
This would cause Controller thread to silently exit, and so no
synchronization would occur. Node would still have connections
and thus happily generate its own blocks on its on fork.
Maybe other peers would try to sync with this node but would
likely reject this node's chain after a short while.
New NTP class now runs as a simplistic NTP client, repeatedly polling
several NTP servers and maintaining a more accurate time independent
of operating system.
Several occurrences of System.currentTimeMillis() replaced with NTP.getTime()
particularly where block/transaction/networking is involved.
GET /admin/info now includes "currentTimestamp" as reported from NTP.
Added support for block timestamps determined by generator, instead of
supplied by clock. (BlockChain.newBlockTimestampHeight - not yet activated).
Incorrect timestamps will produce a TIMESTAMP_INCORRECT Block.ValidationResult.
Block.calcMinimumTimestamp repurposed as Block.calcTimestamp for above.
Block timestamps are now allowed to be max 2000ms in the future,
was previously max 500ms.
Block generation prohibited until initial NTP sync.
Instead of deleting INVALID unconfirmed transactions in BlockGenerator,
Controller now deletes EXPIRED unconfirmed transactions every so often.
This also fixes persistent expired unconfirmed transactions on nodes
that do not generate blocks, as BlockGenerator.deleteInvalidTransactions()
was never reached.
Abbreviated block sigs added to log entries declaring a new block is generated
in BlockGenerator.
Controller checks for NTP sync much faster during start-up and SysTray's
tooltip text starts as "Synchronizing clock" until NTP sync occurs.
After NTP sync, Controller logs NTP offset every so often (currently every 5 mins).
When considering synchronizing, Controller skips peers that have the same block sig
as last time when synchronization resulted in no action, e.g. INFERIOR_CHAIN,
NOTHING_TO_DO and also OK. OK is included as another sync attempt would result in
NOTHING_TO_DO.
Previously this skipping check only happened after prior INFERIOR_CHAIN.
During inbound peer handshaking, if we receive a peer ID that matches an existing inbound
peer then send peer ID of all zeros, then close connection.
Remote end should detect this and cleanly close connection instead of waiting for handshake timeout.
Randomly generated peer IDs have lowest bit set to avoid all zeros.
Might need further work.
Networking doesn't connect, or accept, until NTP has synced.
Transaction validation can fail with CLOCK_NOT_SYNCED if NTP not synced.
It seems unachievable for nodes to keep their clocks accurate to
within 500ms. It is unclear whether this is due to Windows'
implementation of client NTP or because of terrible network
conditions in China.
So increasing max NTP offset to allow block generation from
500ms to 30s. Correspondingly increasing max peer timestamp
delta from 5s to 30s.
The block consensus algorithm will need to change in the near
future to address clock issues.
Controller batches SysTray updates into one per second.
Block generation only allowed by Controller is clock
known to be accurate. ('not sure' stops generation).
NTP MAX_STDDEV increased from 25ms to 125ms to cater
for poorly connected nodes.
Controller sends peers list over outbound peer connections,
requests peers list from inbound peer connections.
When peer handshake completes, Network & Controller only send
initial messages over outbound peer connections.
This is to fix HEIGHT_V2 messages being processed out-of-order
breaking handshaking, as indicated by log entries like:
2019-07-26 16:16:35 DEBUG Network:702 - Unexpected HEIGHT_V2 message from xx.xxx.xx.xx:pppp, expected PROOF
2019-07-26 16:16:35 DEBUG Network:840 - Handshake completed with peer xx.xxx.xx.xx:pppp
Increased connection failure backoff from 1 minute to 5 minutes,
as handshake timeout is 1 minute and then nodes would immediately reconnect.
Changed default NTP servers from asia to cn.
Controller performs NTP check on startup (and every 5 minutes)
which determines whether block generation is allowed.
System Tray tooltip updated to reflect generating status.
Plus new translations.
Improved GuiTests.
BlockGenerator fetches forging accounts first, and sleeps
if none configured, which is less work than processing peer lists.
Now uses several NTP servers to determine mean offset from
system clock to internet time.
If abs(offset) > 500ms or NTP service not running then
user is 'nagged' via system tray pop-up notification
with instructions on how to fix.
Also improved system tray translations!
SysTray pop-up menu now includes entry for launching https://time.is
so node owners can check their system clocks against internet time.
Windows installs also have additional systray menu entry which
runs ntpcfg.bat script, included in resources.
Also available as download via node-UI servlet,
e.g. http://localhost:9880/downloads/ntpcfg.bat
ntpcfg.bat reconfigures Windows Time Service with many NTP servers,
restarts the service, and also makes sure it auto-starts on boot.
Added DEBUG-level logging when rejecting nodes due to excessive
time difference (during PROOF handshake stage).
Bumped default settings values for minOutboundPeers from 10 to 20.
Bumped default settings values for maxPeers from 30 to 50.
Defer the clearing of hasThreadPending flag until about to produce a task,
inside synchronized block.
This gives a new thread a chance to produce at least once before other threads
decide to spawn new threads.
Previously there could be an excessive number of unncessary threads,
all waiting for their initial attempt to produce a task.
Reworked networking execute-produce-consume threading.
Some networking task were wrongly performed during 'produce' phase,
and some producing was happening in 'consume' phase (also corrected).
Peer connection tasks are rate-limited to 1 per second to reduce CPU thrashing.
Show P2P listen port in logs on startup.
Tests for general purpose ExecuteProduceConsume class to cover both
random task scenario and mass-ping scenario.
Instead of 3 threads per peer:
1. peer main thread
2. peer's unsolicited messages processor
3. peer pinger
We now use a Jetty-style Execute-Produce-Consume server threading model.
For 60 connected peers, we no longer have 180 threads but typically only
the usual ~6 threads.
Also in this commit:
* merging peers locking changed from lock() to tryLock()
* PROOF handshake maximum time difference increased from 2000ms to 5000ms
* Peers still handshaking after 60s are considered stuck and hence disconnected
* We now use NIO SocketChannels instead of raw sockets
Change AutoUpdate to use 'quick' backup.
For HSQLDBRepository.backup, don't perform CHECKPOINT DEFRAG while repository in use
as it never completes.
Similarly, use BACKUP DATABASE ... NOT BLOCKING.
Controller requests 'quick' repository backup every 123 minutes.
On start-up, if repository fails to load, recovery is attempted using
backup (if present).
AutoUpdate also requests 'slow' repository backup just before
calling ApplyUpdate. ('Slow' means perform "CHECKPOINT DEFRAG" first).
Network.shutdown() called Peer.shutdown() on each Peer
while holding synchronization lock on this.connectedPeers.
This would cause a problem during Peer.shutdown() as some
other reachable code would also want synchronized access
to this.connectedPeers. Typical symptoms would be log
entries like:
2019-07-02 11:13:05 DEBUG Peer:512 - Message processor for peer 192.144.182.73:9889 failed to terminate
Eventually Network.shutdown() would exit, releasing synchronization
lock and awaking stuck Peer threads, which could then try to access
repository (now closed) causing further log spam.
Now it uses Network.getConnectedPeers to return duplicated
List<Peer>, minimizing lock time on this.connectedPeers.
Also made Peer main thread logging more informative when a IOException
occurs, as most situations are harmless EOF or connection reset by peer.
Refactored duplicate code into Transaction.setInitialApprovalStatus().
This is make sure transactions HAVE a group-approval status
in Synchronizer before Block.isValid is called.
This wasn't a problem for new, unconfirmed, individual transactions
arriving over the wire due to Transaction.importAsUnconfirmed()
doing the right thing.
Also added a groupApprovalTimestamp to BlockChain feature-triggers
to support legacy chains.
Tested by syncing mainnet from scratch.
lastHeight/blockSig/blockTimestamp, etc. moved from PeerData/repository
to Peer as it's transient so no need to store in repository.
Repository now keeps track of when/who added peer, e.g. API/INIT/another peer.
API calls DELETE /peers and DELETE /peers/known also disconnect peer
as well as deleting from repository.
Connection timestamp now reported by API call GET /peers
Some repository-updating code removed from Network/Controller as no
longer needed.
Removed obsolete Controller.hasShorterBlockchain predicate.
Previous version was prone to throwing exceptions during shutdown
sequence, especially in Peer/Network-related threads.
Shutdown now tries to wait for Peer/Network threads to cleanly exit.
API call POST /admin/forcesync now tries to grab blockchain lock,
with timeout, and then repeatedly sync with requested peer until
either fully synced or something unexpected happens.
BlockGenerator will now attempt to generate a new block if none of its
peers have a recent block either (in case of network stall). BlockGenerator
still needs a minimum number of peers before generating though.
Reduce BlockGenerator workload and use of blockchain lock if it
can't generate a block.
Reduce Synchronizer logging output.
Unify calculating timestamp threshold for 'recent' block into Controller.
Don't disconnect peers that fail to send a requested transaction,
as they may no longer have it. e.g. transaction might have expired
or become invalid.
For some other cases, e.g. we have transaction already, move on to
requesting the next transaction instead of giving up on the list.
NOTE: Downloaded update JARs are now expected to have been XORed with 0x5A!
This is to help prevent Windows Firewall from blocking update downloads
based on deep packet inspection.
Download read timeout reduced from 5s to 3s.
Download locations reordered so github entries are at the top as they have
better CDNs.
ApplyUpdate now assumes null response from GET /admin/stop means node
is not running.
ApplyUpdate now checks replacement JAR actually exists before attempting
to overwrite previous version.
ApplyUpdate now tries to use Windows EXE launcher in preference to raw
java command line. (This should improve Windows installer behaviour
in detecting running process and possibly firewall implications too).
Included unit test to cover change.
Modified test blockchain config "test-chain-v2.json" to add maxProxyRelationships.
Added comments to some proxy-related methods in AccountRepository class.
Added transaction [de]serialization test, along with corresponding random transaction generators.
Minor typo fix in Transaction.
Minor clarification in MessageTransactionTransformer.
Added debugging to Account.
Arbitrary transactions now [de]serialize data-type (raw/hash) for v4+ transactions.
Data type also stored in repository. Very small (<=255 byte) data payloads are also stored directly in HSQLDB.
Added ArbitraryDataManager which looks for hash-only data payloads and possibly requests raw data over network
depending on 'policy' (which currently is "fetch everything").
Added networking support for finding, and transferring, arbitrary data payloads.
Minor optimization to message ID generation in Peer.
Minor optimization in Serialization.serializeSizedString()
Bumped version
Controller no longer uses block height to determine whether to sync
but now uses peer's latest block's timestamp and signature.
Also BlockGenerator checks whether it's generating in isolation using
the same peer info (latest block timestamp and signature).
Added API call POST /admin/forcesync peer-address to help get wayward
nodes back on track.
Unified code around, and calling, Transaction.importAsUnconfirmed().
Tidied code around somelock.tryLock() to be more readable.
Controller (post-sync) now broadcasts new chaintip info if
our latest block's signature has changed, not simply the height.
Network.broadcast() only sends out via outbound peer if node has
more than one connection from the same peer. So Controller would
only update one of the peer records with chaintip info.
Controller now updates all connected peers with the ID when it
receives a HEIGHT or HEIGHT_V2 message.
Added node1 thru node7.mcfamily.io to default peers in Network.
Network ignores first "listen port" entry when receiving peers
list from an outbound-connection peer as it already knows
by virtue of having connected to it!
More network message debug logging (hopefully never to be seen).
[some old code left in, but commented out, for a while]
Controller now sets (volatile) requestSync flag when a peer sends new height info.
This allows much quicker response to new blocks which might hopefully improve synchronization
compared with the old periodic sync method.
"Unsolicited" network messages are now added to a BlockingQueue,
and a separate unsolicited message processing thread (one per peer)
deals with this messages in turn.
This allows "reply" network messages to propagate up to the
threads that are waiting for them, preventing deadlocks and
peer disconnections due to lost pings.
Controller tries to do as much new transaction processing
outside of the blockchain lock as possible, and only
broadcasts new transaction's signature if we successfully
import transaction to our unconfirmed pile.
Synchronizer.findSignaturesFromCommonBlock now returns null
to indicate some sort of connection issue (no cool-off)
and an empty list to indicate NO COMMON BLOCK.
That method also tries to work back to genesis block
instead of giving up too early if test block height
becomes negative.
Network.createConnection additionally filters out
candidates if their addresses resolve to the same
IP+port as an existing connection. So now it won't
connect to localhost:1234 if it has an existing
connection with 127.0.0.1:1234.
Network.broadcast only considers unique peers,
i.e. prefers outbound connection if a peer has
corresponding inbound connection.
Added Thread.currentThread().setName() where possible.
Notably: network messages passed up to Controller are now processed in their
own thread, as opposed to peer's thread.
So each message processor in Controller needs to thread-safe.
V2 network protocol asks for unconfirmed transactions, can send round lists
of transaction signatures and ask for individual transactions, to save
bandwidth/processing.