12 KiB
Group Audio Calls
This document explains how group audio calls currently work in Qortal Desktop, with emphasis on the audio transport path and the exact way audio bytes are sent.
High-Level Model
Group calls are decentralized and role-based.
- Up to 10 participants: one
root-forwarder. - 11 to 50 participants: one
root-forwarderplus per-cluster forwarders. - A
standby-forwarderexists for failover.
The renderer decides topology and captures/plays audio. The Electron main process owns Reticulum transport state. A Python bridge process handles Reticulum link I/O.
At a high level:
Mic -> capture worklet -> Opus encoder -> encrypted group-audio packet
-> renderer sendAudio IPC -> Electron GroupCallManager
-> ReticulumBridge fd3 binary IPC -> Python presence_bridge.py
-> RNS link packet -> remote Python bridge -> Electron bridge
-> renderer decrypt/decode/jitter/playout
Main Pieces
Renderer:
src/hooks/useGroupVoiceCall.tssrc/lib/group-call/audioPacketCodec.tspublic/worklets/capture-processor.jspublic/worklets/group-playout-processor.js
Electron main:
electron/src/setup.tselectron/src/group-call.tselectron/src/reticulum-bridge.tselectron/src/reticulum-audio-ipc.ts
Python bridge:
electron/resources/presence_bridge.py
Security And Session Model
Each call has a room media key.
- The initiator/root generates the media key.
- The key is distributed per recipient with
window.groupCall.sendKey(). - Audio packets are encrypted with
nacl.secretboxusing that room key. - Forwarders normally route opaque encrypted audio bytes. They do not need to decrypt and re-encrypt just to forward.
Authority is deterministic and generation-scoped:
- The room root is the lowest election rank from
sha256(address + ':' + roomId). - Same-epoch topology disagreements must converge to that same hash-based winner, not raw string order.
- For a given
(roomId, mediaSessionGeneration), there must be at most one authoritative room key. - Only the designated root may originate a new authoritative room key for that generation.
- If a peer loses root while it still has the old key, that key may remain briefly for decrypt continuity, but outbound encrypt/send must stop until the new authoritative key arrives.
The hook also tracks:
callSessionIdmediaSessionGenerationkeyCommitment
Those are used to keep audio/key state aligned during rejoins, recovery, and key rotation.
Audio Capture And Encoding
The renderer capture pipeline lives in useGroupVoiceCall.ts.
- Microphone audio is read into an
AudioContext. capture-processor.jsframes mic audio into 960-sample chunks and computes VAD.- The worklet posts
{ frame, vad }back to the main thread. - The main thread converts
Float32PCM toInt16. - A WebCodecs
AudioEncoderencodes the frame as Opus.
Current encoder behavior:
- Codec:
opus - Sample rate:
48000 - Channels:
1 - Frame duration:
20ms(960samples) - Bitrate: controlled by
OPUS_BITRATE - Expected packet loss: controlled by
OPUS_EXPECTED_PACKETLOSS_PERCENT - In-band FEC is requested when supported by the Electron build
What We Actually Send
The Opus frame is not sent raw. It is wrapped in the group-call audio codec first.
Current primary packet format is v2:
nonce[24] || secretbox(inner)
Where inner is:
version | sourceAddrLen | sourceAddr | vad | seq | timestampMs | opusFrame
There is also support for:
- v3: one encrypted packet carrying multiple Opus frames
- v1: legacy decode fallback
This codec is implemented in src/lib/group-call/audioPacketCodec.ts.
Important fields inside the encrypted payload:
sourceAddr: Qortal address of the original speakervad: whether the sender was speaking for this frameseq: 16-bit audio sequencetimestampMs: relative call timestampopusFrame: encoded voice payload
Renderer Send Path
When the encoder outputs a chunk:
sendEncodedFrame()runs inuseGroupVoiceCall.ts.- It drops immediately if:
- no room key is installed
- mic is muted
- key distribution is still pending
- the session key for the active media generation is missing
- this peer is waiting for an authoritative replacement key after losing root
- It increments the local audio sequence.
- It encrypts the Opus frame with
encodeAudioPacketV2(). - It calls
dispatchEncodedPacket().
dispatchEncodedPacket() decides who to send to:
- If this node is the
root-forwarder, it sends to each downstream cluster member it currently serves. - Otherwise, it sends to its assigned forwarder.
Today sendPacketToPeer() routes through Reticulum, and that path calls:
window.groupCall.sendAudio(roomId, address, payload)
Renderer To Electron IPC
The preload layer exposes:
window.groupCall.sendAudio(roomId, toAddress, data)
That maps to Electron IPC handler gcall:sendAudio in electron/src/setup.ts.
The IPC handler:
- normalizes the payload to a
Buffer - rejects oversized sends above
12,288bytes - calls
GroupCallManager.sendAudio() - returns
success/errorplus send-path diagnostics
Electron Send Path
GroupCallManager.sendAudio() in electron/src/group-call.ts is the main-process entry point for group audio sends.
It does the following:
- Validates the payload with
isValidGcAudioBuffer(). - If the target address is local, it short-circuits and emits
gcall:audiolocally. - Resolves or creates per-peer Reticulum audio state.
- Queues the frame in that peer's
pendingqueue. - Schedules a fair flush across peers.
- If the audio link is already established, it tries to flush immediately.
Important behavior here:
- Pending audio is stored per destination peer, not in one shared unsorted queue.
- The manager drops stale and overloaded pending frames before they pile up indefinitely.
- Flush is fair/round-robin so one busy downstream leg cannot dominate the send path.
- Diagnostics record whether pressure came from pending overflow, stale dropping, link-unready state, or later bridge pressure.
- Packet mode now keeps audio links alive as a parallel safety net.
- If a peer's packet path never resolves, the manager can downgrade that peer to link transport instead of letting the call stay silent.
Electron Bridge IPC Format
ReticulumBridge.enqueueGroupAudio() moves frames into a binary IPC format defined in electron/src/reticulum-audio-ipc.ts.
Electron -> Python uses extra stdio fd 3.
Python -> Electron uses extra stdio fd 4.
The binary message format starts with:
magic: "QAUD"
version: 1
bodyLen: uint32
The body contains one or more frames:
frame_count
linkIdLen | linkId
roomIdLen | roomId
peerPresenceHashLen | peerPresenceHash
peerCallHashLen | peerCallHash
payloadLen | payload
For outbound Electron -> Python audio:
linkIdidentifies the Reticulum audio linkroomIdidentifies the group callpayloadis the encrypted group audio packet from the rendererpeerPresenceHashandpeerCallHashare empty on the outbound side
The bridge batches multiple frames into one QAUD message to reduce overhead, but it now also applies fairness and pressure control.
Python Bridge Send Path
In presence_bridge.py:
_audio_in_reader_loop()reads QAUD batches fromfd 3.- It parses the batch and pushes decoded items into
_audio_decoded_queue. _rns_executor_loop()drains_audio_decoded_queue._process_audio_batch()builds the Reticulum wire payload and callsRNS.Packet.send().
The actual Reticulum payload is JSON, produced by make_group_audio_wire():
{
"t": "<group-audio-wire-type>",
"R": "<roomId>",
"d": "<base64 encrypted audio packet>",
"r": "<sender call destination hash>"
}
Important detail:
dis base64 of the already-encrypted group audio packet from the renderer.- The Reticulum layer is transporting that encrypted payload; it is not the layer that defines the call's media encryption.
Reticulum Link Layer
Group audio is sent over dedicated Reticulum audio links, separate from the higher-level JSON control message flow.
The main process opens and tracks these audio links by peer:
- open link
- wait for establishment
- enqueue audio frames against the link id
- reopen when a link becomes unready or closes
When packet media is enabled, the manager still keeps the link path available. Packet transport remains the preferred fast path, but a peer can fall back to the audio link when packet path warmup or send diagnostics show repeated unresolved-path timeouts.
This is why group audio send diagnostics can distinguish:
- pending-queue pressure in Electron
- bridge queue pressure
- Python-side decoded queue buildup
- actual Reticulum packet send failures
- link-not-ready conditions
Receive Path
On the receiving side, the reverse happens:
- Reticulum delivers an audio packet to the Python bridge.
on_audio_link_packet()parses the Reticulum JSON wire.- It base64-decodes
dback to raw encrypted group-audio bytes. - It wraps that in QAUD binary format and writes it to
fd 4. ReticulumBridgein Electron decodes the QAUD message.- Electron emits
group-audio-packet. GroupCallManagermaps the audio link back to a Qortal address and emitsgcall:audio.- The renderer receives that packet and runs decrypt + decode + jitter + playout.
Renderer Receive, Decrypt, And Playback
After the renderer gets gcall:audio:
- It decrypts the packet with the room media key.
- It decodes v3, v2, or legacy v1 audio packet format.
- It extracts one or more Opus frames plus source metadata.
- It tracks per-source sequence gaps and timing.
- It places Opus frames into a per-source jitter buffer.
gcall-jitter-schedulerdrains jitter buffers on a steady audio clock.AudioDecoderturns Opus back into PCM.group-playout-processor.jsperforms adaptive playout from a PCM ring buffer.- Audio is mixed into the output graph.
The adaptive playout path can raise target latency when a source is starving, but it keeps that separate from transport diagnostics so sender overload and receiver compensation are both visible.
How Forwarding Works
Forwarders route encrypted group audio packets based on topology.
- Non-root participants normally send one encrypted packet upstream to their assigned forwarder.
- The root forwarder fans that same encrypted payload out to its downstream recipients.
- Because the payload already contains the original
sourceAddr, the receiver still knows who actually spoke.
This means forwarding is mostly about transport fanout, not about re-encoding audio.
Diagnostics We Export
The send path exposes diagnostics back to the renderer on every sendAudio() result.
Useful fields include:
pendingFramesbridgeQueuedFramesdecodedQueueDepthbinaryOutQueueDepthqueuePressureDropsqueuePressureDropsLast5sstaleDropsstaleDropsLast5slinkUnreadyDropspacketSendFailures
These are then folded into group-call metrics and diagnostic exports so we can tell whether a bad call was caused mainly by:
- sender-side overload
- stale backlog cleanup
- transport/link failure
- or receiver starvation
Important Current Design Choices
- Audio is encoded once as Opus in the renderer.
- The room key encrypts the actual media payload before transport.
- Reticulum transports that encrypted payload over dedicated audio links.
- Electron and Python maintain separate bounded queues to avoid unlimited catch-up.
- Current work focuses on preferring fresh speech over delayed backlog during overload.
Relevant Files
src/hooks/useGroupVoiceCall.tssrc/lib/group-call/audioPacketCodec.tselectron/src/preload.tselectron/src/setup.tselectron/src/group-call.tselectron/src/reticulum-bridge.tselectron/src/reticulum-audio-ipc.tselectron/resources/presence_bridge.pypublic/worklets/capture-processor.jspublic/worklets/group-playout-processor.js