feat(call): in-call soundboard, quality controls, room call-permissions

Element Call is now consumed as our self-built fork (@lotusguild/element-call-embedded); wire up its previously-dormant capabilities and document the fork as live. Soundboard (P5-15): a call-bar button plays user-uploaded audio clips into the call as a real published track (io.lotus.inject_audio) plus local playback. Clips are uploadable like emoji/sticker packs, stored in io.lotus.soundboard account data (synced across devices). Gated by a Settings toggle + volume. Quality controls (P5-31): per-user mic/screenshare bitrate + screenshare framerate (Settings -> Calls), applied via io.lotus.set_quality clamped to any room cap. Room admins set caps and hard call-permissions (allow_screenshare / allow_camera) in Room Settings -> Voice; the call bar hides blocked buttons. - New: CallSoundboard, useSoundboard, soundboardClips; RoomQuality, useCallQuality, callQuality (+ unit tests). - Optimistic-write RoomQuality admin UI (no stale-state clobber). - Docs: mark EC fork live across README/FEATURES/TODO/BUGS/TESTING; add D2 manual-test steps. Numeric quality caps are client-cooperative; screenshare/camera permissions are hard-enforced server-side (see LotusGuild/matrix voice-limit-guard). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 22:34:17 -04:00
parent 02b2ce8109
commit 7c06b27c73
22 changed files with 1259 additions and 120 deletions
@@ -322,14 +322,104 @@ Users can set a custom background color for `@mention` chips that highlight thei

 ## Voice / Video Call Improvements

-> 🔱 **[EC-FORK]** Element Call is embedded as a **pre-built npm bundle** today.
-> The plan to fork & self-build it from source for true ownership — and which of
-> the items below would move into our EC source — is in
+> 🔱 **[EC-FORK] LIVE (2026-06).** Element Call is now our **self-built fork**
+> (`@lotusguild/element-call-embedded@0.20.1-lotus.1`, source at
+> `LotusGuild/element-call`), served same-origin — no longer the upstream
+> pre-built npm bundle. Several in-call behaviors below are now first-class
+> source changes rather than DOM/widget hacks. Background, plan, and the Phase-2
+> work list are in
 > [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md).

-### Element Call Upgrade
+### Element Call — Self-Built Fork (`0.20.1-lotus.1`)

-Upgraded embedded Element Call widget from **0.16.3** to **0.19.4**.
+The embedded widget was upgraded **0.16.3 → 0.19.4 → 0.20.1**, then **forked**.
+We self-build `LotusGuild/element-call` and publish it to our private Gitea npm
+registry as `@lotusguild/element-call-embedded`; cinny consumes that instead of
+`@element-hq/element-call-embedded`. The iframe prints
+`Element Call embedded-v0.20.1-lotus.1` in its console (vs. `embedded-v0.20.1`
+upstream) — the quickest way to confirm a deploy landed the fork.
+
+All custom behavior lives in the fork's `src/lotus/` modules and is **additive
+and dormant by default**, gated by URL flags / widget actions the host opts into,
+so a stock EC config is byte-for-byte upstream behavior.
+
+**Active (cinny drives them today):**
+
+| #   | Feature                           | Mechanism                                                                                                                               | Replaces (old hack)                                                                                                 |
+| --- | --------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
+| A7  | **Denoise in-source**             | ML noise suppression runs inside EC as a LiveKit `TrackProcessor<Audio>` (flag `lotusDenoiseSource=1`); re-applied on every (re)publish | the build-time `getUserMedia` monkeypatch injected into `index.html` — **removed**. Fixes mic-dead-after-reconnect. |
+| #2  | **Speaking / mute events**        | EC emits `io.lotus.call_state` (throttled); cinny reads speaker + mute state from it (flag `lotusCallState=1`)                          | scraping EC's DOM for `[data-lk-speaking]` (kept only as fallback)                                                  |
+| A5  | **Focus participant**             | host sends `io.lotus.focus_participant` to pin a tile, coexisting with / overriding the screenshare spotlight                           | the `.click()`-the-tile DOM hack in `CallControl.ts` — **removed**                                                  |
+| #6  | **In-call avatar decorations**    | host pushes `io.lotus.decorations` (per-user APNG URLs); the fork renders them on EC's video-tile avatars                               | previously impossible — decorations only showed on our pre-join lobby roster                                        |
+| #5  | **Native transparent background** | flag `lotusTransparent=1` makes EC's surface transparent so the host wallpaper shows through                                            | the injected `background:none !important` CSS                                                                       |
+
+**Now wired (cinny drives them — ⚠️ awaiting live verification):**
+
+| #     | Capability           | Widget action                                                                        | cinny surface                                                       |
+| ----- | -------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------- |
+| P5-15 | **Audio inject**     | `io.lotus.inject_audio` — plays a clip into the call as a separately published track | In-Call Soundboard (uploadable clips) — see below                   |
+| P5-31 | **Quality controls** | `io.lotus.set_quality` — sets audio/screenshare encoding bitrate/framerate           | Call Quality Controls (user settings + room-admin caps) — see below |
+
+> Both were dormant capabilities; cinny now drives them (armed via
+> `lotusAudioInject=1`). The **only** EC item still open is the P5-31
+> **server-side** quality guard (a `voice-limit-guard`-style sidecar reading
+> `io.lotus.room_quality`) for hard enforcement across all Matrix clients — the
+> client cap is best-effort.
+
+### In-Call Soundboard (P5-15)
+
+A soundboard button (🔔) in the call controls bar opens a popout of the user's
+clips. Clicking one **injects it into the call as a real published LiveKit
+track** (every participant hears it, via the fork's `io.lotus.inject_audio`) and
+plays it locally for the presser (LiveKit doesn't loop your own track back).
+
+- **User-uploadable, like custom emoji/sticker packs.** Clips are stored in the
+  `io.lotus.soundboard` account data event, so they **sync across all your
+  devices**. Upload short audio (≤ 1 MB, ≤ 40 clips) from the popout; delete
+  inline.
+- Authenticated media can't be fetched from the widget's realm, so the host
+  resolves each mxc clip → an authenticated download → a same-session `blob:`
+  object URL and hands that to the widget.
+- Gated by the **Soundboard** toggle (Settings → General → Calls) with a volume
+  slider. The button is hidden when disabled.
+- Files: `utils/soundboardClips.ts`, `hooks/useSoundboard.ts`,
+  `features/call/CallSoundboard.tsx`, `plugins/call/CallControl.ts#injectAudio`.
+
+### Call Quality Controls (P5-31)
+
+Discord-style encoding controls applied to the local tracks via the fork's
+`io.lotus.set_quality` (`RTCRtpSender.setParameters` across all simulcast
+encodings, re-applied on every re-publish/reconnect).
+
+- **User settings** (Settings → General → Calls): Microphone Bitrate,
+  Screenshare Bitrate, Screenshare Framerate (each defaults to **Auto**).
+- **Room-admin caps**: admins set a ceiling in Room Settings → General → Voice
+  (`io.lotus.room_quality` state event); every Lotus client clamps its per-user
+  quality to `min(user setting, room cap)`.
+- Applied by the `useCallQuality` hook on join and whenever settings/caps
+  change; `utils/callQuality.ts` builds the payload (unit-tested).
+
+**Server-enforced call permissions (hard, ALL clients).** The same
+`io.lotus.room_quality` event carries a **publish-source policy**
+(`allow_screenshare`, `allow_camera`) enforced server-side by
+`voice-limit-guard` (matrix repo, LXC 151): it re-signs the LiveKit JWT's
+`canPublishSources`, so the SFU refuses screenshare/camera tracks for **every**
+Matrix client (Element, FluffyChat, our fork) — not just Lotus. Admins toggle
+these in Room Settings → Voice → **Call Permissions**; cinny also hides the
+blocked buttons in the call bar. Enforcement is **live**: the JWT re-sign covers
+new joins, and a background reconcile loop revokes an **in-progress**
+screenshare/camera (via LiveKit `UpdateParticipant`) within ~3 s of an admin
+flipping the policy — so it kills active shares mid-call, not just future ones.
+
+- **Why numeric caps aren't server-enforced:** LiveKit is a pure SFU (forwards,
+  never transcodes) and has no publisher bitrate/fps field anywhere in the JWT
+  grant, room config, server `limit:`, or admin API; stock Element Call ignores
+  room metadata for publish quality. Numeric caps are therefore inherently
+  **cooperative** — our fork honors them, which is the design above. The
+  publish-source policy is the one genuine hard, cross-client lever, and it's
+  implemented.
+- **Not yet**: screenshare resolution control (needs a `getDisplayMedia` hook in
+  the fork).

 ### Camera Default Off

@@ -431,20 +521,26 @@ A comprehensive mic noise-suppression system in **Settings → General → Calls
 - **Support Detection:** UI now detects `AudioWorklet` / `AudioContext` support and disables ML options in unsupported environments.
 - **Status Reporting:** The ML shim notifies the host app via `postMessage`. If initialization fails, a system toast alerts the user of the fallback to the raw microphone.

-**Open-Source Model Roadmap:**
-| Model | Transients (Clicks) | Voice Quality | CPU Usage (WASM) |
-| :--- | :--- | :--- | :--- |
-| **RNNoise** | Poor | Moderate | < 5% |
-| **DTLN** | Good | High | 10-20% |
-| **DeepFilterNet 3** | **Excellent** | **Very High** | 25-50%+ |
+**Open-Source Models (all now in-source in the EC fork):**
+| Model | Transients (Clicks) | Voice Quality | CPU Usage (WASM) | Sample rate |
+| :--- | :--- | :--- | :--- | :--- |
+| **RNNoise** (default) | Poor | Moderate | < 5% | 48 kHz |
+| **Speex** | Poor | Low | < 5% | 48 kHz |
+| **DTLN** | Good | High | 10-20% | 16 kHz |
+| **DeepFilterNet 3** | **Excellent** | **Very High** | 25-50%+ | 48 kHz |

-> **Note:** DeepFilterNet 3 is planned for future inclusion in the desktop build where larger binaries and higher CPU overhead are more acceptable.
+> **Update (2026-06):** with the EC fork live, denoise runs **inside** Element
+> Call as a LiveKit `TrackProcessor` and **all four models ship in-source**
+> (DTLN at 16 kHz, the rest at 48 kHz; the processor degrades to the raw mic
+> rather than ever going silent). The model picker selects between them. Real-call
+> **audio-quality** comparison across models is still the open verification item
+> (RNNoise output is known to be weak) — see `LOTUS_TESTING.md` §D2-1.

 ### Files

- `build/lotus-denoise.js` — multi-model getUserMedia shim
- `vite.config.js` — `lotusDenoise()` plugin (copies assets for RNNoise, Speex, and NoiseGate)
- `src/app/plugins/call/CallEmbed.ts` — advanced tier → widget URL params
+- **EC fork** `src/lotus/lotusDenoise.ts` + `lotusDenoiseProcessor.ts` — in-source LiveKit `TrackProcessor` (RNNoise/Speex 48 kHz, DTLN 16 kHz, DeepFilterNet 48 kHz); activated by `lotusDenoiseSource=1`. (The old build-time `getUserMedia` shim `build/lotus-denoise.js` is **removed**.)
+- `vite.config.js` — `lotusDenoise()` plugin (now only **copies model assets** for the fork to load; no longer injects a shim)
+- `src/app/plugins/call/CallEmbed.ts` — advanced tier → `lotusDenoiseSource` widget URL param
 - `src/app/utils/lotusDenoiseUtils.ts` — support detection and model comparison metadata
 - `src/app/features/settings/general/General.tsx` — advanced settings UI + mic meter