docs: log E2EE key-sync issues (KE-1..4) + tester checklist
LOTUS_BUGS.md: new Encryption/E2EE section tagged EXTREME complexity + planning-session-required for a senior-engineer deep dive — OTK upload conflict storm (KE-1), Element Call media-key distribution failures causing audio/video dropouts (KE-2), a timeline decryption error (KE-3), and MatrixRTC delayed-event timeouts (KE-4). All observed live 2026-06-30; not caused by the EC fork work. Plus a non-developer ELEMENT_CALL_TEST_CHECKLIST.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -80,6 +80,58 @@ Items from testing, with their fork-level fix path:
|
||||
|
||||
- **N127 — ML denoise shim is never injected in `vite dev`.** The `lotusDenoise` plugin injects only on `closeBundle` (build), so ML noise suppression is silently inactive during local dev. Add a dev-mode injection (`configureServer` / `transformIndexHtml`). Dev-only impact. _Note: this **dissolves entirely** once denoise moves in-source in the fork (A7 fix) — there is then no build-time injection to be missing in dev._
|
||||
|
||||
### 🧨 Encryption / E2EE — ⚠️ EXTREME COMPLEXITY · 🧠 PLANNING SESSION REQUIRED · 👤 SENIOR ENGINEER
|
||||
|
||||
> **Observed live in prod 2026-06-30** on `chat.lotusguild.org` during a 2-person
|
||||
> **Element Call** (E2EE enabled). These span **client rust-crypto (via
|
||||
> `matrix-js-sdk@41.6.0-rc.0`) ↔ Synapse ↔ Element Call's MatrixRTC E2EE** and are
|
||||
> very likely **interrelated** (see KE-1 → KE-2). Do **not** spot-fix — they need
|
||||
> a dedicated cross-system planning session with the homeserver owner. Capture
|
||||
> full client console + a synapse-side trace for the same call before starting.
|
||||
> **None of these are caused by the EC fork work** (the issues reproduce on the
|
||||
> old build; the local mic/denoise path is unrelated to key distribution).
|
||||
|
||||
- **KE-1 — One-time-key (OTK) upload conflict storm (CRITICAL, root-cause candidate).**
|
||||
`POST /_matrix/client/v3/keys/upload` returns `400 M_UNKNOWN: One time key
|
||||
signed_curve25519:AAAAAAAAAGQ already exists. Old key: {…} new key: {…}` —
|
||||
firing **continuously** (many/sec). The client repeatedly tries to publish an
|
||||
OTK at a key id the server already holds **with a different value**, i.e. the
|
||||
rust-crypto key store and Synapse have **diverged OTK state**. Impact: floods
|
||||
the crypto outgoing-request loop and is the prime suspect for the downstream
|
||||
missing-key failures (no fresh OTKs ⇒ no new Olm sessions ⇒ undecryptable
|
||||
to-device key events). _Investigate:_ device/key-store reset-or-restore
|
||||
mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a
|
||||
Synapse OTK bug. Repro signature: grep console for `already exists`.
|
||||
**Extreme — planning session.**
|
||||
|
||||
- **KE-2 — Element Call media keys not arriving/decrypting → audio & video cut out (CRITICAL).**
|
||||
`MissingKey: missing key at index N for participant @user`, `skipping decryption
|
||||
due to missing key`, `MissingKey: key set not found for @user at index 0`, and
|
||||
rust-crypto `WARN … Received an unexpected encrypted to-device event …
|
||||
event_type="io.element.call.encryption_keys"`. EC distributes per-participant
|
||||
media keys as **encrypted to-device `io.element.call.encryption_keys`** events;
|
||||
these aren't being received/decrypted in order, so remote LiveKit audio/video
|
||||
can't be decrypted — **this is the "friend's audio cuts out occasionally"
|
||||
symptom.** Almost certainly downstream of **KE-1** (broken Olm sessions). Spans
|
||||
EC's MatrixRTC E2EE + rust-crypto to-device + Synapse. **Extreme — planning
|
||||
session.**
|
||||
|
||||
- **KE-3 — Timeline decryption error: missing `algorithm` field (HIGH).**
|
||||
`Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg:
|
||||
missing field 'algorithm' at line 1 column 138 …]`. A malformed/legacy
|
||||
encrypted event (or a serialization mismatch in the RC SDK) that rust-crypto
|
||||
can't parse. Lower frequency than KE-1/2 but a distinct decode-path failure —
|
||||
capture the offending event id (`$SASBBzoqj…` seen) and inspect its raw content.
|
||||
|
||||
- **KE-4 — MatrixRTC delayed-event / membership timeouts (MEDIUM-HIGH, reliability).**
|
||||
`[MembershipManager] Network local timeout error while sending event, immediate
|
||||
retry … AbortError: Restart delayed event timed out before the HS responded`,
|
||||
with repeated `org.matrix.msc4157.update_delayed_event`. MSC4140/4157
|
||||
delayed-event reliability against `matrix.lotusguild.org` — can cause stale/ghost
|
||||
call membership and missed leave events. May be partly **homeserver
|
||||
responsiveness**; correlate with synapse latency/load. Include in the same
|
||||
planning session since it shares the call-reliability + HS-interaction surface.
|
||||
|
||||
### Security & Privacy
|
||||
|
||||
- **N97 — Access token stored in plaintext `localStorage`** (`state/sessions.ts`), vulnerable to XSS; device ID likewise. Architectural — needs a token-protection / session-storage redesign.
|
||||
|
||||
Reference in New Issue
Block a user