feat(call): in-call soundboard, quality controls, room call-permissions
CI / Build & Quality Checks (push) Successful in 10m49s
CI / Trigger Desktop Build (push) Successful in 8s

Element Call is now consumed as our self-built fork
(@lotusguild/element-call-embedded); wire up its previously-dormant
capabilities and document the fork as live.

Soundboard (P5-15): a call-bar button plays user-uploaded audio clips into the
call as a real published track (io.lotus.inject_audio) plus local playback.
Clips are uploadable like emoji/sticker packs, stored in io.lotus.soundboard
account data (synced across devices). Gated by a Settings toggle + volume.

Quality controls (P5-31): per-user mic/screenshare bitrate + screenshare
framerate (Settings -> Calls), applied via io.lotus.set_quality clamped to any
room cap. Room admins set caps and hard call-permissions (allow_screenshare /
allow_camera) in Room Settings -> Voice; the call bar hides blocked buttons.

- New: CallSoundboard, useSoundboard, soundboardClips; RoomQuality,
  useCallQuality, callQuality (+ unit tests).
- Optimistic-write RoomQuality admin UI (no stale-state clobber).
- Docs: mark EC fork live across README/FEATURES/TODO/BUGS/TESTING; add D2
  manual-test steps.

Numeric quality caps are client-cooperative; screenshare/camera permissions are
hard-enforced server-side (see LotusGuild/matrix voice-limit-guard).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-30 22:34:17 -04:00
parent 02b2ce8109
commit 7c06b27c73
22 changed files with 1259 additions and 120 deletions
+111 -15
View File
@@ -322,14 +322,104 @@ Users can set a custom background color for `@mention` chips that highlight thei
## Voice / Video Call Improvements
> 🔱 **[EC-FORK]** Element Call is embedded as a **pre-built npm bundle** today.
> The plan to fork & self-build it from source for true ownership — and which of
> the items below would move into our EC source — is in
> 🔱 **[EC-FORK] LIVE (2026-06).** Element Call is now our **self-built fork**
> (`@lotusguild/element-call-embedded@0.20.1-lotus.1`, source at
> `LotusGuild/element-call`), served same-origin — no longer the upstream
> pre-built npm bundle. Several in-call behaviors below are now first-class
> source changes rather than DOM/widget hacks. Background, plan, and the Phase-2
> work list are in
> [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md).
### Element Call Upgrade
### Element Call — Self-Built Fork (`0.20.1-lotus.1`)
Upgraded embedded Element Call widget from **0.16.3** to **0.19.4**.
The embedded widget was upgraded **0.16.3 → 0.19.4 → 0.20.1**, then **forked**.
We self-build `LotusGuild/element-call` and publish it to our private Gitea npm
registry as `@lotusguild/element-call-embedded`; cinny consumes that instead of
`@element-hq/element-call-embedded`. The iframe prints
`Element Call embedded-v0.20.1-lotus.1` in its console (vs. `embedded-v0.20.1`
upstream) — the quickest way to confirm a deploy landed the fork.
All custom behavior lives in the fork's `src/lotus/` modules and is **additive
and dormant by default**, gated by URL flags / widget actions the host opts into,
so a stock EC config is byte-for-byte upstream behavior.
**Active (cinny drives them today):**
| # | Feature | Mechanism | Replaces (old hack) |
| --- | --------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| A7 | **Denoise in-source** | ML noise suppression runs inside EC as a LiveKit `TrackProcessor<Audio>` (flag `lotusDenoiseSource=1`); re-applied on every (re)publish | the build-time `getUserMedia` monkeypatch injected into `index.html`**removed**. Fixes mic-dead-after-reconnect. |
| #2 | **Speaking / mute events** | EC emits `io.lotus.call_state` (throttled); cinny reads speaker + mute state from it (flag `lotusCallState=1`) | scraping EC's DOM for `[data-lk-speaking]` (kept only as fallback) |
| A5 | **Focus participant** | host sends `io.lotus.focus_participant` to pin a tile, coexisting with / overriding the screenshare spotlight | the `.click()`-the-tile DOM hack in `CallControl.ts`**removed** |
| #6 | **In-call avatar decorations** | host pushes `io.lotus.decorations` (per-user APNG URLs); the fork renders them on EC's video-tile avatars | previously impossible — decorations only showed on our pre-join lobby roster |
| #5 | **Native transparent background** | flag `lotusTransparent=1` makes EC's surface transparent so the host wallpaper shows through | the injected `background:none !important` CSS |
**Now wired (cinny drives them — ⚠️ awaiting live verification):**
| # | Capability | Widget action | cinny surface |
| ----- | -------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------- |
| P5-15 | **Audio inject** | `io.lotus.inject_audio` — plays a clip into the call as a separately published track | In-Call Soundboard (uploadable clips) — see below |
| P5-31 | **Quality controls** | `io.lotus.set_quality` — sets audio/screenshare encoding bitrate/framerate | Call Quality Controls (user settings + room-admin caps) — see below |
> Both were dormant capabilities; cinny now drives them (armed via
> `lotusAudioInject=1`). The **only** EC item still open is the P5-31
> **server-side** quality guard (a `voice-limit-guard`-style sidecar reading
> `io.lotus.room_quality`) for hard enforcement across all Matrix clients — the
> client cap is best-effort.
### In-Call Soundboard (P5-15)
A soundboard button (🔔) in the call controls bar opens a popout of the user's
clips. Clicking one **injects it into the call as a real published LiveKit
track** (every participant hears it, via the fork's `io.lotus.inject_audio`) and
plays it locally for the presser (LiveKit doesn't loop your own track back).
- **User-uploadable, like custom emoji/sticker packs.** Clips are stored in the
`io.lotus.soundboard` account data event, so they **sync across all your
devices**. Upload short audio (≤ 1 MB, ≤ 40 clips) from the popout; delete
inline.
- Authenticated media can't be fetched from the widget's realm, so the host
resolves each mxc clip → an authenticated download → a same-session `blob:`
object URL and hands that to the widget.
- Gated by the **Soundboard** toggle (Settings → General → Calls) with a volume
slider. The button is hidden when disabled.
- Files: `utils/soundboardClips.ts`, `hooks/useSoundboard.ts`,
`features/call/CallSoundboard.tsx`, `plugins/call/CallControl.ts#injectAudio`.
### Call Quality Controls (P5-31)
Discord-style encoding controls applied to the local tracks via the fork's
`io.lotus.set_quality` (`RTCRtpSender.setParameters` across all simulcast
encodings, re-applied on every re-publish/reconnect).
- **User settings** (Settings → General → Calls): Microphone Bitrate,
Screenshare Bitrate, Screenshare Framerate (each defaults to **Auto**).
- **Room-admin caps**: admins set a ceiling in Room Settings → General → Voice
(`io.lotus.room_quality` state event); every Lotus client clamps its per-user
quality to `min(user setting, room cap)`.
- Applied by the `useCallQuality` hook on join and whenever settings/caps
change; `utils/callQuality.ts` builds the payload (unit-tested).
**Server-enforced call permissions (hard, ALL clients).** The same
`io.lotus.room_quality` event carries a **publish-source policy**
(`allow_screenshare`, `allow_camera`) enforced server-side by
`voice-limit-guard` (matrix repo, LXC 151): it re-signs the LiveKit JWT's
`canPublishSources`, so the SFU refuses screenshare/camera tracks for **every**
Matrix client (Element, FluffyChat, our fork) — not just Lotus. Admins toggle
these in Room Settings → Voice → **Call Permissions**; cinny also hides the
blocked buttons in the call bar. Enforcement is **live**: the JWT re-sign covers
new joins, and a background reconcile loop revokes an **in-progress**
screenshare/camera (via LiveKit `UpdateParticipant`) within ~3 s of an admin
flipping the policy — so it kills active shares mid-call, not just future ones.
- **Why numeric caps aren't server-enforced:** LiveKit is a pure SFU (forwards,
never transcodes) and has no publisher bitrate/fps field anywhere in the JWT
grant, room config, server `limit:`, or admin API; stock Element Call ignores
room metadata for publish quality. Numeric caps are therefore inherently
**cooperative** — our fork honors them, which is the design above. The
publish-source policy is the one genuine hard, cross-client lever, and it's
implemented.
- **Not yet**: screenshare resolution control (needs a `getDisplayMedia` hook in
the fork).
### Camera Default Off
@@ -431,20 +521,26 @@ A comprehensive mic noise-suppression system in **Settings → General → Calls
- **Support Detection:** UI now detects `AudioWorklet` / `AudioContext` support and disables ML options in unsupported environments.
- **Status Reporting:** The ML shim notifies the host app via `postMessage`. If initialization fails, a system toast alerts the user of the fallback to the raw microphone.
**Open-Source Model Roadmap:**
| Model | Transients (Clicks) | Voice Quality | CPU Usage (WASM) |
| :--- | :--- | :--- | :--- |
| **RNNoise** | Poor | Moderate | < 5% |
| **DTLN** | Good | High | 10-20% |
| **DeepFilterNet 3** | **Excellent** | **Very High** | 25-50%+ |
**Open-Source Models (all now in-source in the EC fork):**
| Model | Transients (Clicks) | Voice Quality | CPU Usage (WASM) | Sample rate |
| :--- | :--- | :--- | :--- | :--- |
| **RNNoise** (default) | Poor | Moderate | < 5% | 48 kHz |
| **Speex** | Poor | Low | < 5% | 48 kHz |
| **DTLN** | Good | High | 10-20% | 16 kHz |
| **DeepFilterNet 3** | **Excellent** | **Very High** | 25-50%+ | 48 kHz |
> **Note:** DeepFilterNet 3 is planned for future inclusion in the desktop build where larger binaries and higher CPU overhead are more acceptable.
> **Update (2026-06):** with the EC fork live, denoise runs **inside** Element
> Call as a LiveKit `TrackProcessor` and **all four models ship in-source**
> (DTLN at 16 kHz, the rest at 48 kHz; the processor degrades to the raw mic
> rather than ever going silent). The model picker selects between them. Real-call
> **audio-quality** comparison across models is still the open verification item
> (RNNoise output is known to be weak) — see `LOTUS_TESTING.md` §D2-1.
### Files
- `build/lotus-denoise.js` — multi-model getUserMedia shim
- `vite.config.js``lotusDenoise()` plugin (copies assets for RNNoise, Speex, and NoiseGate)
- `src/app/plugins/call/CallEmbed.ts` — advanced tier → widget URL params
- **EC fork** `src/lotus/lotusDenoise.ts` + `lotusDenoiseProcessor.ts` — in-source LiveKit `TrackProcessor` (RNNoise/Speex 48 kHz, DTLN 16 kHz, DeepFilterNet 48 kHz); activated by `lotusDenoiseSource=1`. (The old build-time `getUserMedia` shim `build/lotus-denoise.js` is **removed**.)
- `vite.config.js``lotusDenoise()` plugin (now only **copies model assets** for the fork to load; no longer injects a shim)
- `src/app/plugins/call/CallEmbed.ts` — advanced tier → `lotusDenoiseSource` widget URL param
- `src/app/utils/lotusDenoiseUtils.ts` — support detection and model comparison metadata
- `src/app/features/settings/general/General.tsx` — advanced settings UI + mic meter