From 84ce9843ffc8f21cce1093225caad73227e80354 Mon Sep 17 00:00:00 2001 From: Jared Vititoe Date: Tue, 30 Jun 2026 17:37:01 -0400 Subject: [PATCH] docs: log E2EE key-sync issues (KE-1..4) + tester checklist MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit LOTUS_BUGS.md: new Encryption/E2EE section tagged EXTREME complexity + planning-session-required for a senior-engineer deep dive โ€” OTK upload conflict storm (KE-1), Element Call media-key distribution failures causing audio/video dropouts (KE-2), a timeline decryption error (KE-3), and MatrixRTC delayed-event timeouts (KE-4). All observed live 2026-06-30; not caused by the EC fork work. Plus a non-developer ELEMENT_CALL_TEST_CHECKLIST.md. Co-Authored-By: Claude Opus 4.8 --- ELEMENT_CALL_TEST_CHECKLIST.md | 122 +++++++++++++++++++++++++++++++++ LOTUS_BUGS.md | 52 ++++++++++++++ 2 files changed, 174 insertions(+) create mode 100644 ELEMENT_CALL_TEST_CHECKLIST.md diff --git a/ELEMENT_CALL_TEST_CHECKLIST.md b/ELEMENT_CALL_TEST_CHECKLIST.md new file mode 100644 index 000000000..b473aa965 --- /dev/null +++ b/ELEMENT_CALL_TEST_CHECKLIST.md @@ -0,0 +1,122 @@ +# Voice/Video Call โ€” Testing Checklist ๐ŸŽง + +Thanks for helping test! We just upgraded the voice/video call system. Please run +through the checks below and tell us what happened. + +**What you need:** + +- 2 people (you + a friend), each on their own device, in the same call. A few + checks need one of you to have a **camera** and to **share your screen**. +- About 15โ€“20 minutes. + +**How to report:** for each item just say โœ… (worked) or โŒ (didn't), and for any +โŒ tell us what you saw. If something looks broken, a screenshot helps a lot. + +--- + +## โญ Most important โ€” please do these first + +### 1. Your microphone keeps working after a connection hiccup + +This is the biggest thing we changed, so test it carefully. + +1. Join a call with your friend and talk for a few seconds (make sure they hear you). +2. Now **turn off your WiFi / internet for about 10 seconds**, then turn it back on. + (The call will show a "Connection lost / reconnecting" message โ€” that's expected.) +3. Once it reconnects, **start talking again.** + +- โœ… **Good if:** your friend can still hear you normally after it reconnects, without + you having to leave and rejoin the call. +- โŒ **Tell us if:** your friend can't hear you after reconnecting, or your voice + sounds broken/robotic/muffled, until you leave and rejoin. + +### 2. Microphone quality / noise removal sounds normal + +1. In a call, just talk normally for a bit. +2. If there's background noise (fan, typing, TV), notice whether it's reduced. + +- โœ… **Good if:** your voice is clear and there's no silence, echo, or robotic warble. +- โŒ **Tell us if:** there are dropouts, echo, a "underwater"/metallic sound, or your + mic is silent even though you're talking. + +### 3. Switching your microphone mid-call + +1. While in a call, open call **Settings** and change your microphone to a + different one (e.g. headset โ†” built-in), then back. +2. Talk after each switch. + +- โœ… **Good if:** your friend keeps hearing you after each switch. +- โŒ **Tell us if:** your audio cuts out or doesn't come back after switching. + +### 4. All the call buttons still work + +Go down the call control bar and tap each one, checking it actually does the thing: + +- [ ] **Mute / unmute mic** (icon changes AND your friend stops/starts hearing you) +- [ ] **Camera on / off** +- [ ] **Deafen / sound** toggle (you stop/start hearing others) +- [ ] **Share screen** start and stop (including the "Share your screen?" prompt) +- [ ] **Full screen** on and off +- [ ] **"More" (โ‹ฎ) menu** โ†’ the **Reactions**, **Settings**, and **Grid/Spotlight** + options each open the right thing +- [ ] **Leave / End call** โ€” leaves cleanly + +- โŒ **Tell us if:** any button does nothing when you tap it (tell us which one). + +--- + +## ๐Ÿ‘€ Please also check these + +### 5. The "who's talking" highlight points at the right person + +1. In a call, have your friend talk, then you talk. + +- โœ… **Good if:** the highlight / glow appears around the person who is actually + talking (and the right person, not someone else). +- โŒ **Tell us if:** the wrong person lights up, or nobody lights up when talking. + +### 6. Mute badges show on the right person + +1. Have your friend mute their mic. + +- โœ… **Good if:** any "muted" indicator shows next to the person who is muted. +- โŒ **Tell us if:** it shows on the wrong person or doesn't update. + +### 7. Focus a camera while someone is sharing their screen + +_(Needs: one person sharing screen, another with camera on.)_ + +1. Person A **shares their screen.** +2. Person B turns their **camera on.** +3. Use the **"Focus camera"** option (from a participant's menu) on Person B. + +- โœ… **Good if:** Person B's camera becomes the highlighted/spotlighted view + **alongside or over** the shared screen. +- โŒ **Tell us if:** nothing happens, or it throws you out of the screen share, or + you get an error. + +### 8. Avatar decorations show on call tiles + +_(Needs: someone in the call has an avatar decoration set in Settings โ†’ Profile.)_ + +1. Have a person with a **profile decoration** join with their **camera off** (so + their avatar/picture shows instead of video). + +- โœ… **Good if:** their decoration (the frame/ring/effect around their picture) + shows on their tile **inside the call**, like it does elsewhere in the app. +- โŒ **Tell us if:** the decoration is missing, cut off, or in the wrong place. + +### 9. The call screen looks right + +1. Just look at the overall call screen. + +- โœ… **Good if:** backgrounds, colors, and layout look normal โ€” nothing is a weird + black box, see-through in a bad way, or overlapping. +- โŒ **Tell us if:** anything looks visually broken or out of place. + +--- + +## ๐Ÿ™ Thank you! + +If a call ever sounds bad for **everyone** (not just you), let us know right away โ€” +that's the one we most want to hear about quickly, and we can switch back fast. diff --git a/LOTUS_BUGS.md b/LOTUS_BUGS.md index d19d29a73..6daf74e05 100644 --- a/LOTUS_BUGS.md +++ b/LOTUS_BUGS.md @@ -80,6 +80,58 @@ Items from testing, with their fork-level fix path: - **N127 โ€” ML denoise shim is never injected in `vite dev`.** The `lotusDenoise` plugin injects only on `closeBundle` (build), so ML noise suppression is silently inactive during local dev. Add a dev-mode injection (`configureServer` / `transformIndexHtml`). Dev-only impact. _Note: this **dissolves entirely** once denoise moves in-source in the fork (A7 fix) โ€” there is then no build-time injection to be missing in dev._ +### ๐Ÿงจ Encryption / E2EE โ€” โš ๏ธ EXTREME COMPLEXITY ยท ๐Ÿง  PLANNING SESSION REQUIRED ยท ๐Ÿ‘ค SENIOR ENGINEER + +> **Observed live in prod 2026-06-30** on `chat.lotusguild.org` during a 2-person +> **Element Call** (E2EE enabled). These span **client rust-crypto (via +> `matrix-js-sdk@41.6.0-rc.0`) โ†” Synapse โ†” Element Call's MatrixRTC E2EE** and are +> very likely **interrelated** (see KE-1 โ†’ KE-2). Do **not** spot-fix โ€” they need +> a dedicated cross-system planning session with the homeserver owner. Capture +> full client console + a synapse-side trace for the same call before starting. +> **None of these are caused by the EC fork work** (the issues reproduce on the +> old build; the local mic/denoise path is unrelated to key distribution). + +- **KE-1 โ€” One-time-key (OTK) upload conflict storm (CRITICAL, root-cause candidate).** + `POST /_matrix/client/v3/keys/upload` returns `400 M_UNKNOWN: One time key +signed_curve25519:AAAAAAAAAGQ already exists. Old key: {โ€ฆ} new key: {โ€ฆ}` โ€” + firing **continuously** (many/sec). The client repeatedly tries to publish an + OTK at a key id the server already holds **with a different value**, i.e. the + rust-crypto key store and Synapse have **diverged OTK state**. Impact: floods + the crypto outgoing-request loop and is the prime suspect for the downstream + missing-key failures (no fresh OTKs โ‡’ no new Olm sessions โ‡’ undecryptable + to-device key events). _Investigate:_ device/key-store reset-or-restore + mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a + Synapse OTK bug. Repro signature: grep console for `already exists`. + **Extreme โ€” planning session.** + +- **KE-2 โ€” Element Call media keys not arriving/decrypting โ†’ audio & video cut out (CRITICAL).** + `MissingKey: missing key at index N for participant @user`, `skipping decryption +due to missing key`, `MissingKey: key set not found for @user at index 0`, and + rust-crypto `WARN โ€ฆ Received an unexpected encrypted to-device event โ€ฆ +event_type="io.element.call.encryption_keys"`. EC distributes per-participant + media keys as **encrypted to-device `io.element.call.encryption_keys`** events; + these aren't being received/decrypted in order, so remote LiveKit audio/video + can't be decrypted โ€” **this is the "friend's audio cuts out occasionally" + symptom.** Almost certainly downstream of **KE-1** (broken Olm sessions). Spans + EC's MatrixRTC E2EE + rust-crypto to-device + Synapse. **Extreme โ€” planning + session.** + +- **KE-3 โ€” Timeline decryption error: missing `algorithm` field (HIGH).** + `Error decrypting event (โ€ฆ type=m.room.encrypted โ€ฆ): DecryptionError[msg: +missing field 'algorithm' at line 1 column 138 โ€ฆ]`. A malformed/legacy + encrypted event (or a serialization mismatch in the RC SDK) that rust-crypto + can't parse. Lower frequency than KE-1/2 but a distinct decode-path failure โ€” + capture the offending event id (`$SASBBzoqjโ€ฆ` seen) and inspect its raw content. + +- **KE-4 โ€” MatrixRTC delayed-event / membership timeouts (MEDIUM-HIGH, reliability).** + `[MembershipManager] Network local timeout error while sending event, immediate +retry โ€ฆ AbortError: Restart delayed event timed out before the HS responded`, + with repeated `org.matrix.msc4157.update_delayed_event`. MSC4140/4157 + delayed-event reliability against `matrix.lotusguild.org` โ€” can cause stale/ghost + call membership and missed leave events. May be partly **homeserver + responsiveness**; correlate with synapse latency/load. Include in the same + planning session since it shares the call-reliability + HS-interaction surface. + ### Security & Privacy - **N97 โ€” Access token stored in plaintext `localStorage`** (`state/sessions.ts`), vulnerable to XSS; device ID likewise. Architectural โ€” needs a token-protection / session-storage redesign.