Compare commits

...

2 Commits

Author SHA1 Message Date
jared 5af024f7e7 docs: consolidate EC test checklist into LOTUS_TESTING.md (§D2)
CI / Build & Quality Checks (push) Successful in 11m28s
CI / Trigger Desktop Build (push) Successful in 20s
Fold the Element Call fork Phase-2 feature tests into the canonical testing
guide as §D2 (denoise reconnect/device-switch/4 models, event-driven
speaker/mute, focus-during-screenshare, in-call decorations, transparency,
+ the dormant #3/#7). Each item keeps a plain / outcome for non-dev
testers, so the standalone ELEMENT_CALL_TEST_CHECKLIST.md is removed — all
in one place.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 17:40:02 -04:00
jared 84ce9843ff docs: log E2EE key-sync issues (KE-1..4) + tester checklist
CI / Build & Quality Checks (push) Failing after 12m1s
CI / Trigger Desktop Build (push) Has been skipped
LOTUS_BUGS.md: new Encryption/E2EE section tagged EXTREME complexity +
planning-session-required for a senior-engineer deep dive — OTK upload
conflict storm (KE-1), Element Call media-key distribution failures causing
audio/video dropouts (KE-2), a timeline decryption error (KE-3), and
MatrixRTC delayed-event timeouts (KE-4). All observed live 2026-06-30; not
caused by the EC fork work. Plus a non-developer ELEMENT_CALL_TEST_CHECKLIST.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 17:37:01 -04:00
2 changed files with 124 additions and 0 deletions
+52
View File
@@ -80,6 +80,58 @@ Items from testing, with their fork-level fix path:
- **N127 — ML denoise shim is never injected in `vite dev`.** The `lotusDenoise` plugin injects only on `closeBundle` (build), so ML noise suppression is silently inactive during local dev. Add a dev-mode injection (`configureServer` / `transformIndexHtml`). Dev-only impact. _Note: this **dissolves entirely** once denoise moves in-source in the fork (A7 fix) — there is then no build-time injection to be missing in dev._
### 🧨 Encryption / E2EE — ⚠️ EXTREME COMPLEXITY · 🧠 PLANNING SESSION REQUIRED · 👤 SENIOR ENGINEER
> **Observed live in prod 2026-06-30** on `chat.lotusguild.org` during a 2-person
> **Element Call** (E2EE enabled). These span **client rust-crypto (via
> `matrix-js-sdk@41.6.0-rc.0`) ↔ Synapse ↔ Element Call's MatrixRTC E2EE** and are
> very likely **interrelated** (see KE-1 → KE-2). Do **not** spot-fix — they need
> a dedicated cross-system planning session with the homeserver owner. Capture
> full client console + a synapse-side trace for the same call before starting.
> **None of these are caused by the EC fork work** (the issues reproduce on the
> old build; the local mic/denoise path is unrelated to key distribution).
- **KE-1 — One-time-key (OTK) upload conflict storm (CRITICAL, root-cause candidate).**
`POST /_matrix/client/v3/keys/upload` returns `400 M_UNKNOWN: One time key
signed_curve25519:AAAAAAAAAGQ already exists. Old key: {…} new key: {…}`
firing **continuously** (many/sec). The client repeatedly tries to publish an
OTK at a key id the server already holds **with a different value**, i.e. the
rust-crypto key store and Synapse have **diverged OTK state**. Impact: floods
the crypto outgoing-request loop and is the prime suspect for the downstream
missing-key failures (no fresh OTKs ⇒ no new Olm sessions ⇒ undecryptable
to-device key events). _Investigate:_ device/key-store reset-or-restore
mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a
Synapse OTK bug. Repro signature: grep console for `already exists`.
**Extreme — planning session.**
- **KE-2 — Element Call media keys not arriving/decrypting → audio & video cut out (CRITICAL).**
`MissingKey: missing key at index N for participant @user`, `skipping decryption
due to missing key`, `MissingKey: key set not found for @user at index 0`, and
rust-crypto `WARN … Received an unexpected encrypted to-device event …
event_type="io.element.call.encryption_keys"`. EC distributes per-participant
media keys as **encrypted to-device `io.element.call.encryption_keys`** events;
these aren't being received/decrypted in order, so remote LiveKit audio/video
can't be decrypted — **this is the "friend's audio cuts out occasionally"
symptom.** Almost certainly downstream of **KE-1** (broken Olm sessions). Spans
EC's MatrixRTC E2EE + rust-crypto to-device + Synapse. **Extreme — planning
session.**
- **KE-3 — Timeline decryption error: missing `algorithm` field (HIGH).**
`Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg:
missing field 'algorithm' at line 1 column 138 …]`. A malformed/legacy
encrypted event (or a serialization mismatch in the RC SDK) that rust-crypto
can't parse. Lower frequency than KE-1/2 but a distinct decode-path failure —
capture the offending event id (`$SASBBzoqj…` seen) and inspect its raw content.
- **KE-4 — MatrixRTC delayed-event / membership timeouts (MEDIUM-HIGH, reliability).**
`[MembershipManager] Network local timeout error while sending event, immediate
retry … AbortError: Restart delayed event timed out before the HS responded`,
with repeated `org.matrix.msc4157.update_delayed_event`. MSC4140/4157
delayed-event reliability against `matrix.lotusguild.org` — can cause stale/ghost
call membership and missed leave events. May be partly **homeserver
responsiveness**; correlate with synapse latency/load. Include in the same
planning session since it shares the call-reliability + HS-interaction surface.
### Security & Privacy
- **N97 — Access token stored in plaintext `localStorage`** (`state/sessions.ts`), vulnerable to XSS; device ID likewise. Architectural — needs a token-protection / session-storage redesign.
+72
View File
@@ -207,6 +207,78 @@ If any control does nothing, that usually means an EC DOM selector changed — c
---
## D2. Element Call **fork** — Phase 2 feature sweep (👥 2 people) — `0.20.1-lotus.1`
> The whole EC iframe is now our **self-built fork** (`@lotusguild/element-call-embedded@0.20.1-lotus.1`).
> Five features are **active** (the host sets their flags / sends their actions); two ship **dormant**.
> **Confirm you're on the fork first:** EC iframe console prints `Element Call embedded-v0.20.1-lotus.1`
> (the old build prints `embedded-v0.20.1`). If it says the old version, the web deploy hasn't landed —
> the fork features won't be present, so don't test D2 yet.
> For non-dev testers, each item below also states the plain "✅ good if / ❌ tell us if" outcome.
### D2-1. Denoise **in-source** — survives reconnect (fixes A7) ⭐ highest risk (everyone's mic)
Flag: cinny sets `lotusDenoiseSource=1` when ML denoise is selected (the old build-time getUserMedia
shim is **removed**). This is the single change with the widest blast radius — test deliberately.
- [ ] **Audio flows, no silence** with ML denoise on (baseline, also §D line 204).
- [ ] **Reconnect (the A7 fix):** in a call with ML denoise on, kill network ~10 s (devtools → Offline)
so EC shows "Connection lost / Reconnect", then restore. **Mic still works AND still denoised**
afterward, **without** End+rejoin. _(This is the exact bug that was reintroduced then fixed; if it
regresses, mic dies on every reconnect.)_
- [ ] **Mic device switch mid-call** (Settings → change microphone): audio keeps working (same
`restart()` path as reconnect).
- [ ] **Mute → unmute** a few times: audio returns each time.
- [ ] **Each model** if the picker offers them: `rnnoise` (default), `speex`, `dtln`, `deepfilternet`
each loads + denoises, no silence. (All four are in-source now; DTLN runs at 16 kHz, others 48 kHz.)
- [ ] **No double-processing:** audio isn't over-suppressed/artifacted (would mean the old shim is still
injected alongside the in-source engine).
- **Rollback if bad for everyone:** revert the cinny deploy commit (restores the shim + `@element-hq` parity).
### D2-2. Speaking + mute indicators from widget **events** (#2)
Flag: `lotusCallState=1`. cinny now reads speaker/mute state from `io.lotus.call_state` events instead of
scraping EC's DOM (DOM fallback retained). Overlaps **G1**.
- [ ] **Speaking glow** lights the **correct** person when they talk (you, then your friend).
- [ ] **PiP "All muted" / "You muted" badge** points at the right person and updates on mute/unmute.
### D2-3. Focus camera **during a screenshare** (#4 / A5)
Action: cinny sends `io.lotus.focus_participant` (the DOM `.click()` hack is gone). Overlaps **A5 / G2**.
- [ ] Person A screenshares; Person B camera on; **MemberGlance → Focus camera** on B → B's camera is
spotlighted **alongside/over** the shared screen (not ignored).
- [ ] Camera-**off** target = graceful (no error, no kick out of the screenshare).
### D2-4. In-call avatar decorations (#6) — **NEW, beyond A6**
Action: cinny pushes `io.lotus.decorations`. **A6 only covered the lobby roster** and called in-call EC
tiles out of scope — that's now in scope.
- [ ] A participant with a **Profile decoration** joins **camera off** → the decoration ring renders on
their **in-call video-tile avatar** (inside EC, not just the lobby), correctly sized/positioned.
- [ ] Decoration tracks the right person across grid/spotlight layout changes; disappears when they leave.
### D2-5. Native transparent background (#5)
Flag: `lotusTransparent=1` (native, replacing the injected `background:none !important`).
- [ ] Call background looks right — host wallpaper/surface shows through; **no** black box, bad
see-through, or layout breakage (also covered loosely by §D2 "looks right").
### D2-6. Dormant features — confirm they do NOTHING (no regression)
EC ships the capability but cinny has **no UI** to trigger them yet:
- [ ] **Soundboard audio-inject (#3)** and **quality controls (#7)** — there should be no new UI and no
effect. (Nothing to test; noted so a tester doesn't go hunting.)
> If any D2 item fails, grab the **EC iframe console** (right-click the call → inspect the iframe) — a
> widget-action/payload mismatch shows up there as a `io.lotus.*` rejection or a `MissingKey`/transport log.
---
# Backlog of previously-fixed-but-unverified items
> Sections AD above are **this session's** work. Everything below was fixed in earlier waves and is still flagged **⚠️ UNTESTED** in `LOTUS_BUGS.md` / `LOTUS_TODO.md`. They're grouped by what kind of environment you need (mobile, desktop, screen reader, etc.) so you can knock out a whole category at once. None of these are urgent the way AD are; do them as you have the right device handy.