feat(call): in-call soundboard, quality controls, room call-permissions
CI / Build & Quality Checks (push) Successful in 10m49s
CI / Trigger Desktop Build (push) Successful in 8s

Element Call is now consumed as our self-built fork
(@lotusguild/element-call-embedded); wire up its previously-dormant
capabilities and document the fork as live.

Soundboard (P5-15): a call-bar button plays user-uploaded audio clips into the
call as a real published track (io.lotus.inject_audio) plus local playback.
Clips are uploadable like emoji/sticker packs, stored in io.lotus.soundboard
account data (synced across devices). Gated by a Settings toggle + volume.

Quality controls (P5-31): per-user mic/screenshare bitrate + screenshare
framerate (Settings -> Calls), applied via io.lotus.set_quality clamped to any
room cap. Room admins set caps and hard call-permissions (allow_screenshare /
allow_camera) in Room Settings -> Voice; the call bar hides blocked buttons.

- New: CallSoundboard, useSoundboard, soundboardClips; RoomQuality,
  useCallQuality, callQuality (+ unit tests).
- Optimistic-write RoomQuality admin UI (no stale-state clobber).
- Docs: mark EC fork live across README/FEATURES/TODO/BUGS/TESTING; add D2
  manual-test steps.

Numeric quality caps are client-cooperative; screenshare/camera permissions are
hard-enforced server-side (see LotusGuild/matrix voice-limit-guard).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-30 22:34:17 -04:00
parent 02b2ce8109
commit 7c06b27c73
22 changed files with 1259 additions and 120 deletions
+65 -45
View File
@@ -48,6 +48,9 @@ Built and gate-green; verify per [LOTUS_TESTING.md](./LOTUS_TESTING.md), then th
| Desktop — proactive update notifications (Tauri) | J1 |
| Remind Me Later | K1 |
| Mobile Bookmarks access | E5 |
| In-Call Soundboard (P5-15, uploadable clips → real call inject) | D2-7 |
| Call Quality Controls (P5-31, user + room-admin caps) | D2-8 |
| Call Permissions (P5-31, hard server-side screenshare/camera policy) | D2-9 |
---
@@ -72,32 +75,32 @@ Status: `[ ]` pending · `[~]` in progress · `[x]` completed
### Confirmed facts
| Finding | Impact |
| --------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| **MSC flags ON:** `msc4140` · `msc3771` · `msc3440.stable` · `msc4133.stable` · `simplified_msc3575` · `msc4222` · `msc3266` · `msc3401_matrix_rtc` | All safe to use now |
| **MSC flags OFF:** `msc4306` (thread subscriptions) · `msc3882` · `msc3912` · `msc4155` | These features are BLOCKED |
| **MSC3266** room summary: flag `msc3266_enabled: true` set but `GET /v1/rooms/{id}/summary` still returns 404 (M_UNRECOGNIZED) | Room Preview BLOCKED — endpoint not implemented in Synapse 1.155 |
| **MSC3892** relation redaction: not in flags | Reaction Redaction feature BLOCKED |
| **MSC4260** report user: `POST /_matrix/client/v3/users/{userId}/report` returns **200** ✅ | **Report User UNBLOCKED** — endpoint live since Synapse 1.133; ready to build |
| **MSC4151** report room: HTTP 405 on GET = endpoint exists (POST only) | Report Room live ✅ |
| `folds AvatarImage` does NOT accept children | Add frame/overlay inside `UserAvatar.tsx` itself — optional `frameName` prop |
| No in-app toast system exists (was) | Built `ToastProvider` + Jotai queue; at `App.tsx:65` |
| `useUnverifiedDeviceCount()` hook exists | `src/app/hooks/useDeviceVerificationStatus.ts:65-106` |
| Voice player: `AudioContent.tsx:44-223` | Playback rate on hidden `<audio>` at line 217 |
| `CallControl.setMicrophone(bool)` at `CallControl.ts:206-212` | For AFK auto-mute |
| `CallControl.toggleSound()` at `CallControl.ts:230-251` | Push-to-deafen — just wire a hotkey to this |
| matrix-js-sdk has NO arbitrary profile field methods | Use `mx.http.authedRequest()` for MSC4133 |
| Sanitizer (`sanitize.ts`) allows table, div, span, a, code, hr | LFG HTML card is safe locally; test on Element/FluffyChat |
| Sanitizer STRIPS `<math>`/MathML tags | Math/LaTeX task must also modify sanitizer |
| Service worker EXISTS at `src/sw.ts` | Quick-reply task: add `notificationclick` handler |
| `knockSupported()` utility exists at `matrix.ts:376-391` | Knock UX: only need "Request to Join" in `RoomIntro.tsx` |
| `KeywordMessages.tsx` already has custom keyword push rules | Full push rule editor: only non-keyword rule types need new UI |
| `getMatrixToRoom()` in `matrix-to.ts` generates invite URLs | Invite link: just add QR code to room settings |
| Cindy CANNOT inject audio into EC call stream | In-call soundboard must be redesigned as local-only |
| Folds uses vanilla-extract in non-TDS, NOT CSS custom properties | Custom accent color: must create new vanilla-extract theme variant dynamically |
| Theme presets need ~50 CSS custom properties each | Significant design work before coding |
| `useCallSpeakers.ts` CSS MutationObserver polling | Visual speaking indicator: TDS ring animation on top of existing data |
| MSC3489/3672 live location: BOTH false on server | Live Location BLOCKED |
| Finding | Impact |
| -------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| **MSC flags ON:** `msc4140` · `msc3771` · `msc3440.stable` · `msc4133.stable` · `simplified_msc3575` · `msc4222` · `msc3266` · `msc3401_matrix_rtc` | All safe to use now |
| **MSC flags OFF:** `msc4306` (thread subscriptions) · `msc3882` · `msc3912` · `msc4155` | These features are BLOCKED |
| **MSC3266** room summary: flag `msc3266_enabled: true` set but `GET /v1/rooms/{id}/summary` still returns 404 (M_UNRECOGNIZED) | Room Preview BLOCKED — endpoint not implemented in Synapse 1.155 |
| **MSC3892** relation redaction: not in flags | Reaction Redaction feature BLOCKED |
| **MSC4260** report user: `POST /_matrix/client/v3/users/{userId}/report` returns **200** | **Report User UNBLOCKED** — endpoint live since Synapse 1.133; ready to build |
| **MSC4151** report room: HTTP 405 on GET = endpoint exists (POST only) | Report Room live ✅ |
| `folds AvatarImage` does NOT accept children | Add frame/overlay inside `UserAvatar.tsx` itself — optional `frameName` prop |
| No in-app toast system exists (was) | Built `ToastProvider` + Jotai queue; at `App.tsx:65` |
| `useUnverifiedDeviceCount()` hook exists | `src/app/hooks/useDeviceVerificationStatus.ts:65-106` |
| Voice player: `AudioContent.tsx:44-223` | Playback rate on hidden `<audio>` at line 217 |
| `CallControl.setMicrophone(bool)` at `CallControl.ts:206-212` | For AFK auto-mute |
| `CallControl.toggleSound()` at `CallControl.ts:230-251` | Push-to-deafen — just wire a hotkey to this |
| matrix-js-sdk has NO arbitrary profile field methods | Use `mx.http.authedRequest()` for MSC4133 |
| Sanitizer (`sanitize.ts`) allows table, div, span, a, code, hr | LFG HTML card is safe locally; test on Element/FluffyChat |
| Sanitizer STRIPS `<math>`/MathML tags | Math/LaTeX task must also modify sanitizer |
| Service worker EXISTS at `src/sw.ts` | Quick-reply task: add `notificationclick` handler |
| `knockSupported()` utility exists at `matrix.ts:376-391` | Knock UX: only need "Request to Join" in `RoomIntro.tsx` |
| `KeywordMessages.tsx` already has custom keyword push rules | Full push rule editor: only non-keyword rule types need new UI |
| `getMatrixToRoom()` in `matrix-to.ts` generates invite URLs | Invite link: just add QR code to room settings |
| ~~Cindy CANNOT inject audio into EC call stream~~ **UNBLOCKED by EC fork**`io.lotus.inject_audio` widget action publishes a clip as a real call track | In-call soundboard CAN now mix into the call (no longer local-only); needs cinny UI to drive the action |
| Folds uses vanilla-extract in non-TDS, NOT CSS custom properties | Custom accent color: must create new vanilla-extract theme variant dynamically |
| Theme presets need ~50 CSS custom properties each | Significant design work before coding |
| `useCallSpeakers.ts` CSS MutationObserver polling | Visual speaking indicator: TDS ring animation on top of existing data |
| MSC3489/3672 live location: BOTH false on server | Live Location BLOCKED |
---
@@ -266,12 +269,17 @@ Features:
---
### [ ] P5-15 · In-Call Soundboard
### [~] P5-15 · In-Call Soundboard — IMPLEMENTED (⚠️ awaiting live verification, D2-7)
**What:** Grid of short audio clips playable into the call audio stream via Web Audio API (AudioBufferSourceNode → MediaStreamDestinationNode → mixed with mic). Built-in clips + user-uploadable custom clips (stored as mxc://). Accessible from call controls bar.
**[AUDIT REQUIRED]** Verify the Element Call integration exposes the mic MediaStream for mixing. This is the highest-risk part of this feature.
**🔱 [EC-FORK]** Owning the EC source (see [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md)) would unblock real audio-injection — a proper soundboard mixed into the call — which is impossible against the prebuilt bundle today.
**Complexity:** High.
**What:** Soundboard button in the call controls bar → popout grid of the user's clips; clicking one plays it **into the call** as a real published track (peers hear it) and locally (presser hears it). Clips are **user-uploadable, just like custom emojis/stickers**.
**🔱 [EC-FORK] Fork side + cinny side DONE.** The fork ships `io.lotus.inject_audio` (`LotusWidgetActions.InjectAudio`, allow-listed in `widget.ts`), armed via the `lotusAudioInject=1` flag; it publishes a clip as a separate LiveKit track — a **real** in-call soundboard mixed into the call, not local-only. cinny now drives it.
**Shipped (cinny):**
- Clips stored in `io.lotus.soundboard` account data → **synced across devices like emoji/sticker packs** (`useSoundboard` hook; `AccountDataEvent.LotusSoundboard`).
- Upload audio (≤1 MB, ≤40 clips) → `mx.uploadContent` → mxc; play resolves mxc → authed download → `blob:` object URL (the widget can't fetch authenticated media itself) → `control.injectAudio(url, volume)` + local playback.
- `CallSoundboard.tsx` popout in the call bar (upload / play / delete), gated on the `soundboardEnabled` setting (Settings → General → Calls, + volume slider).
**Remaining:** a dedicated Settings management page (optional — upload/delete already live in the popout); a small default clip set; live verification (D2-7). Files: `utils/soundboardClips.ts`, `hooks/useSoundboard.ts`, `features/call/CallSoundboard.tsx`, `plugins/call/CallControl.ts#injectAudio`.
**Complexity:** Medium — done.
---
@@ -287,26 +295,38 @@ Features:
### [x] P5-30 · Advanced ML Noise Suppression (Krisp-style)
**What:** High-end background noise cancellation using a pre-trained ML model (RNNoise) running in the browser. Removes dogs, fans, and keyboard clicks from the mic stream.
**Shipped:** 3-tier setting (Off / Browser-native / ML) in Settings → General → Calls. ML tier injects a same-origin pre-init shim into the vendored Element Call `index.html` that monkeypatches `getUserMedia` and routes the captured mic through an RNNoise `AudioWorklet` before LiveKit publishes — no EC fork required. See LOTUS_FEATURES.md → "Noise Suppression (Advanced Multi-Tier)".
**🔱 [EC-FORK]** Once we own the EC source (see [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md)), denoise should become a first-class audio stage **inside** EC instead of an `index.html` getUserMedia monkeypatch — more robust, survives reconnects (fixes the A7 mic-after-reconnect bug), and removes the build-time injection hack.
**Key decision:** LiveKit's Krisp filter is LiveKit-Cloud-only (we self-host the SFU); EC's own RNNoise PR #3892 is unmerged. The shim is the same post-capture pipeline #3892 uses, executed from the realm we control, so it survives EC version bumps.
**AEC note (resolved-as-accepted):** WebAudio capture routing can weaken browser AEC — same tradeoff as EC's upstream feature; mitigated by keeping `echoCancellation`/`autoGainControl` on the raw capture and labeling the tier "beta".
**Shipped:** 3-tier setting (Off / Browser-native / ML) in Settings → General → Calls.
**🔱 [EC-FORK] DONE — moved in-source (2026-06).** ML denoise is now a first-class audio stage **inside** the forked Element Call: a LiveKit `TrackProcessor<Audio>` activated by `lotusDenoiseSource=1` (cinny sets it when ML is selected). The old build-time `getUserMedia`/`index.html` monkeypatch is **removed**. Because EC re-runs the processor on every (re)publish, denoise now **survives reconnects and mic-device switches** — this is the A7 fix (see `LOTUS_BUGS.md` A7, `LOTUS_TESTING.md` §D2-1). The processor degrades to the raw mic rather than going silent.
**Key decision:** LiveKit's Krisp filter is LiveKit-Cloud-only (we self-host the SFU); EC's own RNNoise PR #3892 is unmerged. Owning the fork let us implement the in-source stage directly.
**Model Roadmap (priority order):**
**Models — all in-source in the fork:**
- [ ] **Verify DTLN** (16 kHz narrowband fix) in a real call before investing further — wired but unverified.
- [ ] **DeepFilterNet 3** — best self-hostable upgrade: Rust→WASM, CPU real-time, 48 kHz fullband. Effort: self-host `df_bg.wasm` + DFN3 ONNX model, wire a 48 kHz worklet.
- [ ] **Desktop-only / HW-gated:** FRCRN or NVIDIA Maxine (RTX/Tensor only) — impossible in-browser; would run in Tauri Rust backend + bridge a virtual mic into the webview. Must detect capability and only offer on supported hardware; web falls back to RNNoise.
- [x] **RNNoise** (48 kHz, default) · **Speex** (48 kHz) · **DTLN** (16 kHz) · **DeepFilterNet 3** (48 kHz) — all four wired and selectable.
- [ ] **Open verification:** real-call **audio-quality** comparison across the four models (RNNoise output is known-weak). Track under the denoise quality project, `LOTUS_TESTING.md` §D2-1 / J2.
- [ ] **Desktop-only / HW-gated (future):** FRCRN or NVIDIA Maxine (RTX/Tensor only) — impossible in-browser; would run in the Tauri Rust backend + bridge a virtual mic into the webview. Detect capability; web falls back to RNNoise.
- **Excluded:** Krisp (LiveKit Cloud only); FRCRN/Maxine on web (GPU/server-bound).
---
### [ ] P5-31 · Granular Voice & Screenshare Quality Controls (Discord-style)
### [~] P5-31 · Granular Voice & Screenshare Quality Controls — IMPLEMENTED (⚠️ awaiting live verification, D2-8)
**What:** Let users (or room admins via room settings) adjust audio bitrates (e.g., 64kbps to 512kbps) and screenshare quality (resolution: 720p/1080p/Source, framerate: 15/30/60fps).
**Note:** Requires tight integration with the LiveKit SFU and custom state events for per-room quality caps.
**[AUDIT REQUIRED]** Must verify if current `lk-jwt-service` can be extended with custom bitrate/resolution claims or if a new sidecar (similar to `voice-limit-guard`) is needed for server-side enforcement.
**Complexity:** Extreme.
**What:** Let users (and room admins) adjust audio bitrate and screenshare bitrate/framerate.
**🔱 [EC-FORK] Fork side + client side DONE.** The fork ships `io.lotus.set_quality` (`LotusWidgetActions.SetQuality`) that applies audio/screenshare encoding params (`RTCRtpSender.setParameters`, all simulcast encodings, re-applied on `TrackUnmuted`/republish) inside EC. cinny now drives it.
**Shipped (cinny):**
1. **User settings** (Settings → General → Calls): Microphone Bitrate, Screenshare Bitrate, Screenshare Framerate (`callAudioBitrate` / `screenshareBitrate` / `screenshareFramerate`).
2. **Room-admin caps**: `io.lotus.room_quality` state event (`StateEvent.LotusRoomQuality`) + `RoomQuality.tsx` in Room Settings → General → Voice (mirrors `RoomVoiceLimit`).
3. **Apply logic**: `useCallQuality` (wired in `CallEmbedProvider`'s `CallUtils`) builds `min(user setting, room cap)` and sends `io.lotus.set_quality` on join / when settings change (`utils/callQuality.ts`, unit-tested).
**Server-side enforcement (DONE — matrix repo):** extended `voice-limit-guard.py` (LXC 151) to also read `io.lotus.room_quality` and hard-enforce a **publish-source policy** for ALL clients.
- **Reality (researched, primary-source, LiveKit 1.9.11):** numeric bitrate/fps caps **cannot** be hard-enforced server-side — LiveKit is a pure SFU (forwards, never transcodes); there is NO bitrate/fps field in the JWT grant, `RoomConfiguration`, server `limit:` config, or any admin RPC, and stock Element Call ignores room metadata / custom claims for publish quality. So numeric caps stay **cooperative** (our fork honors them via `min()``set_quality`, already shipped).
- **What IS hard-enforced cross-client:** `VideoGrant.canPublishSources`. The guard holds the LiveKit secret, so when `io.lotus.room_quality` sets `allow_screenshare:false` / `allow_camera:false` it re-signs the issued JWT with a narrowed source list → the SFU refuses those tracks for **every** client (Element, FluffyChat, our fork). Mic always kept. Fail-open; unit-tested (`livekit/test_voice_limit_guard.py`). Admin UI: Room Settings → Voice → **Call Permissions** switches. cinny also hides the blocked buttons.
- **Live (mid-call) enforcement — DONE:** the JWT re-sign covers new joins; for participants **already in the call**, a background reconcile loop in the guard calls LiveKit `UpdateParticipant` every ~3 s to narrow `canPublishSources`, which unpublishes an in-progress screenshare/camera **server-side for all clients** and blocks re-publish (verified LiveKit 1.9.11 auto-unpublishes on permission narrowing). Only removes forbidden sources (never grants), preserves other permission flags, no-ops once compliant. So flipping a room audio-only kills live cameras/screenshares within ~one interval.
- **Not enforceable / deferred:** numeric server enforcement (impossible — see above); screenshare **resolution** control (`set_quality` covers bitrate + framerate; resolution needs a `getDisplayMedia` hook inside the fork).
**Complexity:** DONE — client (cooperative numeric caps) + server (hard publish-source policy). Only the physically-impossible numeric server enforcement is out of scope.
---
@@ -540,7 +560,7 @@ Exhaustive, low-level implementation details for backlog items. Follow these pat
> ⚠️ **[Gemini_Found — CORRECTED]** Gemini originally suggested using LiveKit's `LocalAudioTrack.replaceTrack()` to mix audio into the call stream. This is **not possible** from Lotus Chat's realm: Element Call runs in a **cross-origin iframe** controlled via `matrix-widget-api` (postMessage). LiveKit's JS SDK and its `LocalAudioTrack` live inside EC's sandboxed context — inaccessible from our code. This directly contradicts the confirmed constraint already listed in the Server Capabilities table: _"Cindy CANNOT inject audio into EC call stream — In-call soundboard must be redesigned as local-only."_ The soundboard must be a local-playback-only feature (output through the user's speakers, not mixed into the call audio stream).
>
> 🔱 **[EC-FORK — partial correction]** The "cross-origin" claim above is **outdated**: EC is now **same-origin** / self-hosted (`iframe.sandbox` has `allow-same-origin`; we read `contentDocument`). The _practical_ blocker still holds — LiveKit's `LocalAudioTrack` lives in EC's **module scope** (not on `window`), so it's unreachable from cinny even same-origin. **Owning the EC source** (see [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md)) is the path to a real call-audio-inject API, which would unblock a true in-call soundboard.
> 🔱 **[EC-FORK — RESOLVED]** Both the original claim and the earlier "practical blocker still holds" correction are now **outdated**. EC is same-origin **and** we own the source, so we no longer reach into EC's module scope from cinny — instead the fork **exposes the inject point itself**: the `io.lotus.inject_audio` widget action (`LotusWidgetActions.InjectAudio`) publishes a clip as a separate LiveKit track from inside EC. A **real** in-call soundboard (mixed into the call, not local-only) is therefore unblocked; only the cinny-side UI remains (see P5-15 above). The capability ships dormant today.
---