# JFR Analysis — snap-140558.jfr **Host:** masked info **JVM:** OpenJDK 17.0.19, G1GC, max heap 3.64 GB **Recording:** 2026-05-31 11:59:35 UTC, duration **384 seconds**, 4 chunks, 22 MB --- ## 1. CPU Load During Recording | Metric | Observed range | |---|---| | Machine total CPU | 55 – 96% | | JVM user CPU | 20 – 50% | | JVM system CPU | 5 – 15% | | System load avg (at snapshot) | **6.11 on 4 cores** (152% overloaded) | The machine is CPU-saturated throughout the recording. JVM user CPU is consistently high and system CPU (kernel/IO) adds another significant fraction. --- ## 2. GC — The Primary Problem ### 2.1 Overview | Metric | Value | |---|---| | Total GC events | **485** in 384 s (~1.26 GC/sec) | | Young GC events | 248 | | Old (concurrent) GC events | 237 | | Total stop-the-world pause time | **30,514 ms** (8% of runtime) | | Average pause | **62.9 ms** | | **Maximum pause** | **5,930 ms** (5.93 seconds) | 8% of application time is spent paused for GC. The 5.93 s max pause is catastrophic for a peer-to-peer node — it drops connections and misses block propagation windows. ### 2.2 GC Cause Breakdown | Cause | Count | % | |---|---|---| | **G1 Humongous Allocation** | **474** | **97.7%** | | G1 Preventive Collection | 9 | 1.9% | | G1 Evacuation Pause | 2 | 0.4% | **97.7% of all GC events are triggered by humongous object allocations.** This is the smoking gun. ### 2.3 What Is a Humongous Allocation? G1GC divides the heap into equal-sized regions. Any single allocation ≥ 50% of the region size is a "humongous object" and is handled specially: - It is allocated directly in Old Gen (bypasses Eden entirely). - G1 triggers a Young GC immediately to reclaim Old Gen space when humongous objects accumulate. - Humongous objects fragment the heap and are expensive to free. For a 3.64 GB max heap, G1 uses **2 MB regions** → humongous threshold = **1 MB**. --- ## 3. Root Cause: `Peer.readChannel()` Allocates 2 MB Buffers ### 3.1 The allocation `src/main/java/org/qortal/network/Peer.java`, line 783: ```java if (this.byteBuffer == null) { this.byteBuffer = ByteBuffer.allocate(Network.getInstance().getMaxMessageSize()); } ``` `getMaxMessageSize()` is computed once in `Network` as: ```java maxMessageSize = 4 + 1 + 4 + BlockChain.getInstance().getMaxBlockSize(); ``` `maxBlockSize` is set to **2,097,152** bytes in `blockchain.json`: ```json "maxBlockSize": 2097152 ``` So every `ByteBuffer.allocate` call here allocates **2,097,161 bytes ≈ 2.000009 MB** — just above the 2 MB region boundary, and **double** the 1 MB humongous threshold. ### 3.2 Why this triggers constant GCs - The buffer starts as `null` (line 720) and is freed (set to `null`) each time the buffer is empty after a complete message is consumed (line 847). - Every peer connection that reads even a single byte triggers a fresh 2 MB humongous allocation. - With 242+ live threads and active peer networking, this allocation fires continuously. - The JFR `ObjectAllocationSample` records **15,985 samples** of `byte[]` originating from `Peer.readChannel()` line 783, with a sampled weight of **2.0 MB each** — the single largest allocation source in the entire recording (40.7% of all allocation samples are `byte[]`). ### 3.3 Impact chain ``` Peer connects / sends data → byteBuffer == null → allocate 2,097,161 bytes → humongous object → lands in Old Gen → Old Gen fills → G1 Young GC fires (cause: "G1 Humongous Allocation") → STW pause 63 ms avg, up to 5,930 ms max → application stalls, connections time out ``` --- ## 4. CPU Hotspots ### 4.1 Java execution samples (8,366 total) | Thread | Samples | % | |---|---|---| | `Arbitrary Data` group (all threads) | ~4,172 | **49.9%** | | `Synchronizer` | 995 | 11.9% | | `pool-26-thread-1` | 562 | 6.7% | | `AT States trimmer` / `pool-*` | 486 | 5.8% | | `BlockMinter` | 412 | 4.9% | | `pool-16-thread-1` | 382 | 4.6% | | `Foreign Fee Manager-1` | 277 | 3.3% | ### 4.2 Native method samples (17,519 total) | Thread | Samples | % | Note | |---|---|---|---| | `RMI TCP Connection` threads | 2,956 | 16.9% | JMX/diagnostic traffic | | `HttpClient-1-SelectorManager` | 1,478 | 8.4% | | | `qtp406511188-272/273/274` | ~4,431 | 25.3% | Jetty HTTP threads (SSL) | | `grpc-nio-worker-ELG-1-2/3/4` | ~4,431 | 25.3% | gRPC NIO workers | | `Network-IO` | 1,320 | 7.5% | | | `NetworkData-IO` | 1,247 | 7.1% | | | `Arbitrary Data` threads | 570 | 3.3% | | The Jetty + gRPC NIO workers dominate native samples, indicating heavy network I/O in kernel space (epoll/select syscalls). ### 4.3 Arbitrary Data subsystem — what it is actually doing The execution samples for all Arbitrary Data threads point to the same HSQLDB call chain: ``` QuerySpecification.buildResult() → RangeVariable$RangeIteratorMain.next() → IndexAVL$IndexRowIterator.getNextRow() → IndexAVL.next() → NodeAVLDisk.getLeft/getRight/getParent() → RowStoreAVLDisk.get() → DataFileCache.getFromFile() ← disk read under WriteLock → DataFileCache.get() ``` This is a **full AVL index scan on a disk-backed HSQLDB table**, repeated continuously by multiple threads (`Arbitrary Data Manager`, `Arbitrary Index Cache Timer Task`, `Arbitrary Data Cleanup Manager`, `Arbitrary Data Cache Manager`). The `ReentrantReadWriteLock$WriteLock.lock()` on `DataFileCache` causes cross-thread contention on every disk read. Additionally, `Cache.cleanUp()` / `Cache.preparePut()` appear in samples, meaning the HSQLDB row cache is under constant eviction pressure — it cannot hold the working set in memory. ### 4.4 Foreign Fee Manager — EdDSA in a hot path `Foreign Fee Manager-1` (277 Java execution samples + 12 native samples) is burning CPU on: ``` Crypto.verify() → Ed25519.implVerify() → Ed25519.scalarMultStraus128Var() → Wnaf.getSignedVar() ``` Ed25519 signature verification is expensive on aarch64 without hardware acceleration. If `processForeignFeesImportQueue()` is verifying signatures in a tight scheduler loop, it should batch or throttle. --- ## 5. Object Allocation Profile ### 5.1 Allocation sample type breakdown (85,503 total samples) | Type | Samples | % | Primary source | |---|---|---|---| | `byte[]` | **34,837** | **40.7%** | `Peer.readChannel()` (2 MB buffers) + HSQLDB binary reads | | `NodeAVLDisk` | 6,278 | 7.3% | HSQLDB index node reads | | `Peer$PreSerializedMessageWrapper` | 5,013 | 5.9% | Outbound peer message serialization | | `AQS$ConditionNode` | 4,505 | 5.3% | Lock condition queue churn | | `Object[]` | 3,974 | 4.6% | General | | `int[]` | 3,554 | 4.2% | HSQLDB / general | | `RowAVLDisk` | 1,890 | 2.2% | HSQLDB row reads | | `BinaryData` | 1,687 | 2.0% | HSQLDB binary column data | | `Long` | 1,510 | 1.8% | Autoboxing | ### 5.2 Promotion to Old Gen (PromoteObjectInNewPLAB) - **66,987** total promote events; **60,064 (89.7%) tenured = true**. - Nearly 90% of promoted objects survive to Old Gen, confirming the working set is too large for Young Gen to absorb. - `plabSize = 1.0 MB` appears 4,989 times — large 1 MB PLAB chunks being allocated in Old Gen for `byte[]`, `BinaryData`, and `NodeAVLDisk` promotions. ### 5.3 Old Object Sample (741 events — long-lived objects) All 741 OldObjectSample events record `object = [` (arrays), consistent with the 2 MB `byte[]` buffers accumulating in Old Gen between concurrent GC cycles. --- ## 6. Memory State | Pool | Used | Committed | Max | |---|---|---|---| | G1 Eden Space | 64 MB | 360 MB | dynamic | | G1 Old Gen | 1.19 GB | 1.54 GB | 3.64 GB | | G1 Survivor Space | 44 MB | 44 MB | dynamic | | Metaspace | 76 MB | — | unlimited | | CodeHeap (all) | 28.8 MB | — | — | - **Old Gen grew from 897 MB → 1.19 GB** (+300 MB) in ~3 minutes of the live observation window. - G1 Eden has 360 MB committed but only 64 MB used — G1 is not utilising Eden efficiently because most large allocations skip it entirely (humongous path). - Host free physical memory: **74.7 MB of 7.28 GB**. JVM committed virtual memory (7.52 GB) exceeds total RAM — the OS is paging. --- ## 7. Thread State Snapshot | State | Count | |---|---| | RUNNABLE | 20 | | WAITING | 124 | | TIMED_WAITING | 98 | | Live total | 242 (peak 270, 860 started) | - No deadlocks. - No blocked threads. - The high WAITING + TIMED_WAITING count with non-trivial CPU burn (seen in the JMX sample) is explained by threads that loop through work, block briefly on a lock or park, and immediately resume — appearing "waiting" in a point-in-time snapshot but consuming real CPU over time. --- ## 8. Summary of Findings | # | Finding | Severity | |---|---|---| | 1 | `Peer.readChannel()` allocates a 2 MB `ByteBuffer` per message cycle; humongous threshold is 1 MB → 97.7% of GCs triggered by this | **Critical** | | 2 | Max GC pause of **5.93 seconds**; 8% of runtime in STW pauses | **Critical** | | 3 | Old Gen growing ~100 MB/min; Full GC imminent if growth continues | **High** | | 4 | Physical RAM exhausted (74 MB free); JVM committed memory exceeds RAM → OS paging | **High** | | 5 | Arbitrary Data subsystem runs continuous full AVL index scans on disk-backed HSQLDB | **High** | | 6 | HSQLDB `DataFileCache` under constant `WriteLock` contention across 4+ threads | **Medium** | | 7 | `Foreign Fee Manager` verifies Ed25519 signatures in a tight scheduler loop | **Medium** | | 8 | 89.7% of Young Gen promotions go directly to Old Gen — Young Gen is not filtering effectively | **Medium** | --- ## 9. Recommended Fixes ### Fix 1 — Eliminate humongous `ByteBuffer` allocations (Critical) **Option A (JVM flag — immediate mitigation, no code change):** Add `-XX:G1HeapRegionSize=4m` to the JVM launch flags. This doubles the region size, moving the humongous threshold from 1 MB to 2 MB. The 2,097,161-byte buffers drop below the threshold and allocate normally in Eden. **Option B (code fix — correct solution):** Do not set `byteBuffer = null` after each message. Keep the buffer allocated for the lifetime of the connection and `clear()` / `compact()` it instead. This means one 2 MB allocation per peer connection (not per message), and since it stays in Old Gen only once, it stops triggering repeated GC cycles. **Option C (deeper fix):** Allocate a small initial buffer and grow it only when a large message is detected from the header length. Most messages are small; only block messages need 2 MB. ### Fix 2 — Reduce HSQLDB full-scan frequency (High) The Arbitrary Data threads run full `IndexAVL` scans on every tick. Determine whether the queries driving `Arbitrary Data Manager`, `Arbitrary Index Cache Timer Task`, and `Arbitrary Data Cleanup Manager` have appropriate indexes, and whether their scheduling interval can be increased. ### Fix 3 — Throttle Ed25519 verification in Foreign Fee Manager (Medium) Batch signature verification or add a rate limit / back-pressure to `processForeignFeesImportQueue()` so it does not monopolize CPU on each scheduler tick. --- ## 10. Implementation Status | # | Finding | Severity | Status | Commit | |---|---|---|---|---| | 1 | `Peer.readChannel()` 2 MB humongous `ByteBuffer` per message cycle | **Critical** | ✅ Fixed | `b0895f02` | | 2 | Max GC pause 5.93 s; 8% of runtime in STW pauses | **Critical** | ✅ Resolved by #1 | — | | 3 | Old Gen growing ~100 MB/min; Full GC imminent | **High** | ✅ Resolved by #1 | — | | 4 | Physical RAM exhausted; JVM committed memory exceeds RAM → OS paging | **High** | ✅ Fixed | `4134edb6` | | 5 | Arbitrary Data subsystem runs continuous full AVL index scans on disk-backed HSQLDB | **High** | ✅ Fixed | `e3de6944` | | 6 | HSQLDB `DataFileCache` under constant `WriteLock` contention across 4+ threads | **Medium** | ✅ Resolved by #5 | — | | 7 | `Foreign Fee Manager` verifies Ed25519 signatures in a tight scheduler loop | **Medium** | ✅ Fixed | `2ef1e3f8` | | 8 | 89.7% of Young Gen promotions go directly to Old Gen | **Medium** | ✅ Resolved by #1 | — |