docs(e2ee): investigation update — 41.7.0 delta + web-specific KE-1 root cause
CI / Build & Quality Checks (push) Successful in 10m49s
CI / Trigger Desktop Build (push) Successful in 21s

Code-read + upstream-issue triage this session:
- 41.7.0 / crypto-wasm 18.3.1 does NOT fix KE-1 (no OTK/upload change; #5200
  still open) — the SDK-pin remediation lever is closed.
- Confirmed root cause = rust-crypto store <-> Synapse OTK divergence; the
  leading web trigger is that cinny never requests persistent storage, so the
  IndexedDB crypto store is evictable while the localStorage session survives.
- New buildable preventive mitigation: navigator.storage.persist() on login
  (+ multi-tab guard, 400-loop recovery prompt). Added as §6 with a secondary
  KE-2 to-device-validation hypothesis and capture discriminators.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 15:14:46 -04:00
parent c82ab5c7f5
commit 81904372bc
2 changed files with 110 additions and 0 deletions
+13
View File
@@ -112,6 +112,19 @@ signed_curve25519:AAAAAAAAAGQ already exists. Old key: {…} new key: {…}` —
mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a
Synapse OTK bug. Repro signature: grep console for `already exists`.
**Extreme — planning session.**
**Update 2026-07 (investigation §6):** upstream `matrix-rust-sdk#5200` (still
OPEN) confirms the mechanism — on the 400, `mark_request_as_sent()` never fires
so the SDK re-issues the identical upload forever. **`41.7.0` does NOT fix it**
(crypto-wasm 17→18.3.1 has no OTK/upload change; 18.3.x was to-device security
only) — the SDK-pin lever is closed. Root cause = **store↔server OTK
divergence**; the leading **web-specific** trigger is that cinny never calls
**`navigator.storage.persist()`**, so the IndexedDB crypto store is evictable
while the `localStorage` session/device-id survives → device resurrects with a
blank store → re-uploads OTKs the server still holds. **Actionable preventive
fix (buildable now, no call needed):** request persistent storage on login
(+ optional multi-tab guard + 400-loop→recovery-prompt). Healing an already-
diverged device still needs a clean **logout+login** (not just "clear
storage"). See `LOTUS_E2EE_INVESTIGATION.md` §6.
- **KE-2 — Element Call media keys not arriving/decrypting → audio & video cut out (CRITICAL).**
`MissingKey: missing key at index N for participant @user`, `skipping decryption