Code-read + upstream-issue triage this session: - 41.7.0 / crypto-wasm 18.3.1 does NOT fix KE-1 (no OTK/upload change; #5200 still open) — the SDK-pin remediation lever is closed. - Confirmed root cause = rust-crypto store <-> Synapse OTK divergence; the leading web trigger is that cinny never requests persistent storage, so the IndexedDB crypto store is evictable while the localStorage session survives. - New buildable preventive mitigation: navigator.storage.persist() on login (+ multi-tab guard, 400-loop recovery prompt). Added as §6 with a secondary KE-2 to-device-validation hypothesis and capture discriminators. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
26 KiB
Lotus Chat — E2EE Investigation Runbook (KE-1 → KE-4)
Scope: evidence-gathering only. Do not apply fixes from this document without a cross-system planning session (client rust-crypto ↔ Synapse ↔ Element Call MatrixRTC). Symptom source:
LOTUS_BUGS.md§"Encryption / E2EE" (KE-1..KE-4), observed live 2026-06-30 onchat.lotusguild.orgduring a 2-person Element Call.Client: Lotus Cinny fork,
matrix-js-sdk@41.6.0-rc.0, rust-crypto. Server: Synapse1.155.0on LXC 151 (10.10.10.29), PostgreSQL 17.9 on LXC 109 (10.10.10.44). Facts below are copy-pasteable against that deployment (paths/IPs from/root/code/matrix/README.md).
0. Deployment facts used by this runbook
From the matrix infra README (/root/code/matrix/README.md):
| Thing | Value |
|---|---|
| Synapse host | LXC 151, 10.10.10.29 (Synapse 1.155.0) |
| Synapse log | /var/log/matrix-synapse/homeserver.log |
| Synapse config | /etc/matrix-synapse/homeserver.yaml (+ conf.d/) |
| Synapse HTTP | 10.10.10.29:8008 |
| PostgreSQL host | LXC 109, 10.10.10.44 (PG 17.9), db synapse |
| synapse-admin UI | http://10.10.10.29:8080 |
| LiveKit / lk-jwt / guard | LXC 151: LiveKit :7880/:7881, guard :8070, lk-jwt :8071 |
| SSH path to Synapse | ssh root@10.10.10.4 then pct enter 151 |
| SSH path to PG | ssh root@10.10.10.4 then pct enter 109 |
Getting a psql shell (run on LXC 109, or from 151 over the network):
# On LXC 109:
sudo -u postgres psql synapse
# From LXC 151 (pg_hba allows 10.10.10.29):
psql "host=10.10.10.44 user=synapse dbname=synapse"
Tailing Synapse during a call (on LXC 151):
tail -F /var/log/matrix-synapse/homeserver.log | tee /tmp/lotus-call-$(date +%s).log
Synapse E2EE/to-device logging is chatty at INFO; if a category is silent,
temporarily raise it in /etc/matrix-synapse/conf.d/log.yaml (or the
log_config file referenced by homeserver.yaml):
loggers:
synapse.rest.client.keys: { level: DEBUG }
synapse.handlers.e2e_keys: { level: DEBUG }
synapse.storage.databases.main.end_to_end_keys: { level: DEBUG }
synapse.handlers.devicemessage: { level: DEBUG } # to-device
Then systemctl reload matrix-synapse (reload re-reads log config without a
full restart). Revert to INFO after the capture — DEBUG is very verbose.
1. Per-KE evidence matrix
Client greps assume Chrome/Firefox DevTools console (filter box or, better, "Preserve log" + save-as). The Crypto Diagnostics card (Settings → Developer Tools) auto-captures every signature below into a downloadable JSON — use it as the primary client artifact and DevTools as the raw backup.
KE-1 — OTK upload conflict storm (root-cause candidate)
-
Console signature (grep):
already exists- full:
POST /_matrix/client/v3/keys/upload … 400 M_UNKNOWN: One time key signed_curve25519:<id> already exists. Old key: {…} new key: {…}
-
Capture client-side:
- Timestamp (first occurrence + rate — "N/sec"), device id, user id.
- DevTools → Network → filter
keys/upload: for a failing call save the request body (theone_time_keysmap — note the exactsigned_curve25519:<id>) and the response body (theOld key/new keyJSON). This diff is the smoking gun: same key-id, different value ⇒ store vs server divergence. - Whether it self-heals or loops forever (KE-1 loops).
-
Synapse log grep (LXC 151):
grep -E "keys/upload|One time key .* already exists|OneTimeKey" \ /var/log/matrix-synapse/homeserver.log | grep "<user_id>" -
Synapse SQL (LXC 109) — what the server thinks it holds:
-- Current OTK inventory for the device (compare key_id set against the -- request body the client keeps retrying). SELECT algorithm, key_id, ts_added_ms FROM e2e_one_time_keys_json WHERE user_id = '@user:matrix.lotusguild.org' AND device_id = '<DEVICE_ID>' ORDER BY algorithm, key_id; -- Server's advertised counts (this is what /sync tells the client it has, -- and drives whether the client decides to upload more). SELECT algorithm, count(*) FROM e2e_one_time_keys_json WHERE user_id = '@user:matrix.lotusguild.org' AND device_id = '<DEVICE_ID>' GROUP BY algorithm; -- Fallback key state (used when OTKs are exhausted). SELECT algorithm, key_id, used, ts_added_ms FROM e2e_fallback_keys_json WHERE user_id = '@user:matrix.lotusguild.org' AND device_id = '<DEVICE_ID>';Table names are Synapse 1.155 (
e2e_one_time_keys_json,e2e_fallback_keys_json). If a name is absent, list with\dt e2e*in psql. -
Confirms: if the offending
key_id(from the 400) is present ine2e_one_time_keys_jsonwith a different stored value than the client's request body → OTK state has diverged (rust-crypto store vs Synapse). That is the KE-1 root condition.
KE-2 — EC media keys not arriving/decrypting (audio/video cutouts)
-
Console signature (grep):
MissingKeymissing key at index(e.g.MissingKey: missing key at index N for participant @user)key set not foundio.element.call.encryption_keys(rust-crypto:WARN … Received an unexpected encrypted to-device event … event_type="io.element.call.encryption_keys")
-
Capture client-side:
- Timestamp windows where a participant's audio/video cut out, and the
@participant+index Nfrom the message. - The
io.element.call.encryption_keyswarnings (these are the media-key to-device events failing to decrypt) with their timestamps. - Own device id + user id (to correlate with the sender's Olm session).
- Timestamp windows where a participant's audio/video cut out, and the
-
Synapse log grep (LXC 151) — to-device delivery of the media keys:
grep -E "io.element.call.encryption_keys|m.room.encrypted|/sendToDevice|to_device" \ /var/log/matrix-synapse/homeserver.log | grep -E "<user_id>|<participant_id>" -
Synapse SQL (LXC 109) — undelivered / queued to-device events:
-- Backlog of to-device messages queued for the affected device. A growing -- count here = the HS has the media-key events but the device isn't draining -- them via /sync (or they were sent to a stale device id). SELECT user_id, device_id, count(*) AS pending FROM device_inbox WHERE user_id = '@user:matrix.lotusguild.org' GROUP BY user_id, device_id; -- Cross-check the device id the sender is targeting actually exists / is current. SELECT device_id, display_name, last_seen, ts FROM devices WHERE user_id = '@user:matrix.lotusguild.org'; -
Confirms: to-device events present but undecryptable (client shows the
io.element.call.encryption_keys"unexpected encrypted" warning) ⇒ there is no valid Olm session to decrypt them — the expected downstream of KE-1.
KE-3 — Timeline decryption error: missing algorithm field
- Console signature (grep):
DecryptionError- full:
Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg: missing field 'algorithm' at line 1 column 138 …]
- Capture client-side:
- The event id (
$SASBBzoqj…was one) and the room id. - Pull the raw event JSON via DevTools or the Developer Tools account-data/event
viewer, or directly:
Inspect
GET https://matrix.lotusguild.org/_matrix/client/v3/rooms/<roomId>/event/<eventId>content— confirm whetheralgorithm(should bem.megolm.v1.aes-sha2) is truly absent vs a serialization mismatch.
- The event id (
- Synapse log grep (LXC 151):
grep -E "<eventId>" /var/log/matrix-synapse/homeserver.log - Synapse SQL (LXC 109) — the stored event content as the HS holds it:
SELECT ej.event_id, e.type, e.sender, e.origin_server_ts, (ej.json::json -> 'content' -> 'algorithm') AS algorithm FROM event_json ej JOIN events e USING (event_id) WHERE ej.event_id = '$SASBBzoqj...'; - Confirms: if the stored
content.algorithmis NULL/absent on the HS → a malformed/legacy event was persisted (sender-side or federation). If it is present on the HS but the client throws → an RC-SDK deserialization bug. This distinction decides whether KE-3 is a data problem or a client problem.
KE-4 — MatrixRTC delayed-event / membership timeouts
- Console signature (grep):
update_delayed_event(org.matrix.msc4157.update_delayed_event)delayed event/Restart delayed event timed out- full:
[MembershipManager] Network local timeout error while sending event, immediate retry … AbortError: Restart delayed event timed out before the HS responded
- Capture client-side:
- Timestamps of each timeout; whether they correlate with call join/leave or with general sync slowness.
- DevTools → Network: the
…/delayed_events…/update_delayed_eventrequests — their HTTP status and latency (timed-out vs slow-200).
- Synapse log grep (LXC 151):
grep -E "delayed_event|msc4140|msc4157|update_delayed" \ /var/log/matrix-synapse/homeserver.log | grep "<user_id>" # HS responsiveness in the same window (KE-4 may be pure latency): grep -E "Processed request|/sync" /var/log/matrix-synapse/homeserver.log | tail -50 - Server-side corroboration (Grafana,
dashboard.lotusguild.org): Synapse p99 response time (excl./sync), event-processing lag, DB query latency for the call window. High latency here ⇒ KE-4 is (partly) homeserver responsiveness, not a client bug. - Confirms: timeouts that line up with HS latency spikes → reliability/load; timeouts with a healthy HS → client MembershipManager retry logic.
2. Causality hypothesis
KE-1 OTK upload conflict storm
(rust-crypto store ↔ Synapse OTK state DIVERGED; server rejects re-uploads)
│ no fresh OTKs can be published/claimed
▼
No new Olm (1:1) sessions can be established with this device
│
▼
KE-2 EC media-key to-device events (io.element.call.encryption_keys)
arrive but cannot be decrypted ⇒ MissingKey at index N
⇒ friend's audio/video cuts out
KE-3 (missing algorithm) and KE-4 (delayed-event timeouts) are likely
independent of the KE-1→KE-2 chain: KE-3 is a decode/serialization path,
KE-4 is a MatrixRTC-vs-HS reliability path. Confirm/refute independence with the
decision tree below.
Decision tree — which capture confirms/refutes each link
Q1. Does the KE-1 offending key_id from the 400 response exist in
e2e_one_time_keys_json with a DIFFERENT value than the client request body?
├─ YES → OTK divergence CONFIRMED (KE-1 root). Go to Q2.
└─ NO → Not divergence. Check: are OTK counts at 0 with fallback key `used=true`?
├─ YES → OTK exhaustion, not divergence — different remediation.
└─ NO → Suspect RC-SDK 41.6.0-rc.0 upload-loop regression (see §3).
Q2. During the same call, are io.element.call.encryption_keys to-device events
present in device_inbox / Synapse to-device logs for our device id?
├─ YES + client shows "unexpected encrypted"/MissingKey
│ → KE-1 ⇒ KE-2 LINK CONFIRMED (events delivered, no Olm session to open them).
├─ YES + client decrypts fine, but LiveKit still silent
│ → KE-2 is downstream of LiveKit/SFU, NOT KE-1. Decouple from crypto.
└─ NO (nothing queued/targeted our device)
→ media keys never sent to us: stale device id / membership (see KE-4)
→ KE-2 is a device-targeting problem, weakly linked to KE-1.
Q3. KE-3: is content.algorithm NULL in event_json on the HS?
├─ YES → malformed persisted event (sender/federation). Independent of KE-1.
└─ NO → client-side RC-SDK deserialization bug. Independent of KE-1.
Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
(Grafana) in the same minute?
├─ YES → homeserver responsiveness/load. Independent of KE-1..KE-3.
└─ NO → client MembershipManager retry behavior. Independent.
3. Ranked remediation options (with blast radius)
Ordered least-destructive → most-destructive. Do not run any of these as a "fix" before the planning session — they are listed so evidence collection can be paired with a recovery plan. Confirm the root condition (Q1/Q2) first.
-
Per-device logout + re-login of the affected device (lowest blast radius)
- What: log the one glitching device out and back in. Forces a fresh device id, fresh device keys, and a clean OTK batch — sidesteps a diverged OTK store without touching other sessions.
- Blast radius: that device only. Other sessions/devices untouched.
- Cost: the new device must be re-verified (cross-signing) and will need to restore room keys from key backup to read old encrypted history.
- Confirms/uses: if KE-1 stops after this, OTK-store divergence (Q1) was the cause.
-
Client crypto-store reset (
clearLoginDatapath) (medium)- What:
clearLoginData()insrc/client/initMatrix.ts(coordinator's file — do not edit) deletes ALL IndexedDB databases (incl.web-sync-storeand the rust-crypto storecrypto-store), unregisters service workers, clears all Cache Storage, andlocalStorage.clear(), then reloads.clearCacheAndReload()is lighter — it only callsmx.store.deleteAllData()(sync cache) and does not wipe crypto. - Blast radius: this browser profile only, but total: you are logged out, lose all cached sync state, drafts, settings, and the local megolm/room-key store.
- ⚠️ Message-history / backup implication: wiping
crypto-storedestroys locally-held room keys (megolm inbound sessions). Any history not backed up to server-side Key Backup becomes permanently undecryptable on this device. Before doing this: verify Key Backup is enabled and the recovery key / passphrase is available (Settings → Security), or the user loses readable history. Cross-signing must be re-established too. - Use when: the rust-crypto store itself is corrupt/diverged and option 1 didn't clear it.
- What:
-
SDK pin change off the RC (medium — codebase change, needs rebuild)
- Current pin:
package.json→"matrix-js-sdk": "41.6.0-rc.0"(a release candidate). - Finding (npm / GitHub changelog, checked 2026-07): stable
41.6.0was released 2026-05-26. Its only changelog line is "Throw sane error on completeLoginOnNewDevice IdP rejection" — no OTK / keys-upload / Olm / to-device fix relative to the RC. Later stable lines exist (41.7.0,41.8.0;41.7.0-rc.3/41.9.0-rc.0seen as pre-releases). Nearby crypto-relevant entries:41.5.0"Enable encrypted history sharing by default";41.4.0key-backup handling. No changelog entry directly addresses the KE-1 OTK-conflict symptom in the immediate range — so moving RC→41.6.0stable is a low-risk hygiene step but is not expected to fix KE-1 by itself. Before pinning, re-read the CHANGELOG for any41.7.x/41.8.xOTK/one-time-key/olm entry that post-dates this note. - Blast radius: all users after the next
cinny-build.shdeploy. Test the rust-crypto IndexedDB schema — a downgrade triggers theIDB_VERSION_CONFLICTpath ininitMatrix.ts.
- Current pin:
-
Synapse-side OTK row surgery (LAST RESORT — highest danger)
- What: deleting/rewriting rows in
e2e_one_time_keys_json(and/ore2e_fallback_keys_json,device_inbox) for the affected device to force the client to re-upload a clean batch. - ⚠️ Danger: direct writes to Synapse crypto tables can desync every
device of that user, break Olm sessions for everyone who has claimed one
of those keys, and are easy to get wrong (wrong
key_id, cache not invalidated). Synapse caches OTK counts — a raw DELETE without a restart can leave the advertised count wrong, worsening the KE-1 loop. - Guardrails if ever done (planning session + HS owner only): full
pg_dumpofsynapsefirst; do it during zero active calls; delete only the exact divergedkey_idfor the exactdevice_id;systemctl restart matrix-synapseto flush caches; then log the device out/in (option 1) so it republishes. Never run this speculatively.
- What: deleting/rewriting rows in
4. "Capture session" checklist (run during the next call)
Do these in order. Aim to have client + server capturing the same call.
- Prep server tail (LXC 151): SSH in, start
tail -F /var/log/matrix-synapse/homeserver.log | tee /tmp/lotus-call-$(date +%s).log. (Optionally raise thesynapse.rest.client.keys/handlers.e2e_keys/handlers.devicemessageloggers to DEBUG per §0 andsystemctl reload matrix-synapse— remember to revert after.) - Prep client: open Lotus Chat → Settings → Developer Tools → enable Developer Tools so the Crypto Diagnostics card is visible; note its entry count starts at (or reset by reload to) 0.
- Open DevTools (F12) → Console: enable Preserve log; Network tab: enable Preserve log + Record. Note your device id and user id (Settings → Devices / Developer Tools → Copy access token page shows ids).
- Note wall-clock start time (ISO/UTC) on both machines so logs align.
- Join the Element Call with the second participant; reproduce the fault (wait for the audio/video cutouts and let KE-1 storm run ~30–60s).
- When a fault occurs, note the wall-clock timestamp and which symptom (audio cut / video freeze / etc.) — this bounds the log window.
- Client artifacts: in the Crypto Diagnostics card click Download report
(
lotus-crypto-diag-<ts>.json); in DevTools Network, save the failingkeys/uploadrequest+response (right-click → Save/Copy), and the raw HAR (Network → Save all as HAR) for the call window. - Grab KE-3 event id / KE-2 participant+index from the console (or the
diag JSON
entries[]) for the SQL lookups. - Server artifacts: stop the tail; run the per-KE greps and SQL from §1 against the noted device id / user id / event id, saving output alongside the client JSON. Screenshot the Grafana Synapse latency panels for the window (for KE-4).
- Bundle & label: put client JSON + HAR + server log slice + SQL output in
one folder named with the call's UTC start time. Revert any DEBUG log config
(
systemctl reload matrix-synapse). Hand off to the planning session — do not apply §3 remediations yet.
5. Client diagnostics helper (this kit)
src/app/utils/cryptoDiagLog.ts— capture-only console instrumentation.installCryptoDiagLog()— idempotent; wrapsconsole.warn/console.errorwith pass-through wrappers (originals always called) that ring-buffer (max 200) any line matching the KE signatures. No network, no timers.getCryptoDiagEntries()— snapshot copy of the buffer ({ ts, level, ke, signature, message }, most-recent-last).buildCryptoDiagReport(mx)— JSON string: SDK version, device id, user id, sync state,cryptoReady(mx.getCrypto()presence), per-KE counts, and the entry buffer. No tokens/PII beyond those ids; captured log lines are retained verbatim as evidence.- Signatures → KE mapping:
already exists→KE-1;missing key at index/io.element.call.encryption_keys/MissingKey→KE-2;DecryptionError→KE-3;update_delayed_event/delayed event→KE-4.
src/app/features/settings/developer/CryptoDiagnostics.tsx— a foldsSequenceCard/SettingTilecard (mirrorsdeveloper-tools/DevelopTools.tsx) showing the live matched-entry count (Badge) and a Download report button (Blob →lotus-crypto-diag-<ts>.json, same download idiom asroom-settings/ExportRoomHistory.tsx).
Recommended mount points (coordinator)
- Install call: call
installCryptoDiagLog()as early as possible during boot so it captures crypto errors from first sync — ideally at the top of the client entry module or insideClientRootbefore/aroundinitClient(e.g.src/app/pages/client/ClientRoot.tsx). It is idempotent, side-effect only, and needs nomx, so a module-scope call at app entry is safe. (Do not put it ininitMatrix.ts— that file is off-limits.) - Settings card: render
<CryptoDiagnostics />inside the Developer Tools page — insrc/app/features/settings/developer-tools/DevelopTools.tsx, add it to theBox direction="Column" gap="700"list (guarded by the existingdeveloperToolsflag), right after the "Access Token" card. It pullsmxfromuseMatrixClient()itself, so it just needs to be placed in the tree.
6. 2026-07 investigation update — 41.7.0 delta + web-specific root cause
New findings this session (code-read + upstream issue triage). These sharpen KE-1's root cause and close the "just upgrade the SDK" lever.
6.1 The 41.7.0 upgrade does NOT fix KE-1 (lever closed)
We are now on matrix-js-sdk@41.7.0 → @matrix-org/matrix-sdk-crypto-wasm@18.3.1
(was 41.6.0-rc.0 when KE-1/2 were observed). Checked both changelogs:
- 41.7.0's only crypto line is the security bump to crypto-wasm 18.3.1. No OTK / keys-upload / Olm-session change.
- crypto-wasm 17.0 → 18.3.1: no entry for one-time-keys, keys/upload,
"already exists", or upload conflicts. The 18.3.x work was to-device
security hardening (vodozemac 0.10; sender-spoofing check via
sender_device_keys; MSC4147 validation) — unrelated to the OTK loop. - Upstream
matrix-rust-sdk#5200("OlmMachine constantly tries to upload keys when restoring session") is still OPEN (as of mid-2025). The loop mechanism is confirmed there: on the 400,mark_request_as_sent()never fires, so the keys stay "unshared" and the SDK re-issues the identical failing upload every cycle → the storm.
⇒ Remediation option 3 (SDK pin) is exhausted for KE-1. Do not expect a version bump to help; the fix is store-hygiene, below.
6.2 Confirmed root cause + the web-specific trigger we can act on
Upstream #5200 + #1415 pin the root condition to rust-crypto store ↔
server OTK divergence, from one of:
- Crypto store reset/restore without deregistering the device server-side — the store forgets OTKs it already published; the server still holds them.
- Unsafe concurrent access to the crypto store — e.g. the same session open in multiple browser tabs, each running its own OlmMachine against the one IndexedDB crypto store.
- A store that isn't durably persisted, so a restore can't track what was sent.
Cinny is a web client and hits two of these by construction (verified in code):
- No
navigator.storage.persist()anywhere (grepclean). The rust-crypto IndexedDB store is therefore evictable under storage pressure — while the access token + device id live inlocalStorage(N97), which browsers evict less aggressively. Partial eviction ⇒ the device resurrects with a blank crypto store but the SAME device id ⇒ it re-uploads OTKs the server still holds ⇒ the exact KE-1 "already exists" divergence, with no user action and no visible cause. This is the leading hypothesis for a self-hosted web deployment. - No multi-tab crypto guard (no
navigator.locks/BroadcastChannelleader election insrc/).initMatrix.tscallsmx.initRustCrypto()with no single-writer coordination, so 2+ tabs = concurrent store access = trigger #2.
6.3 Concrete PREVENTIVE client mitigations (new — buildable, don't need a call)
Ordered by value/effort. These reduce the recurrence of KE-1; they don't heal an already-diverged device (that still needs remediation option 1: clean logout+login).
- Request persistent storage on login —
navigator.storage.persist()(cheapest, highest value). Idempotent, side-effect only, no behavior change if the browser denies it. Directly prevents the eviction-induced divergence in 6.2. Best placed at app entry alongside the other module-scope calls (NOT ininitMatrix.ts, which is off-limits) — e.g. a one-liner inClientRoot/app bootstrap:if (navigator.storage?.persist) navigator.storage.persist();Optionally surfacenavigator.storage.persisted()in the Crypto Diagnostics card so a capture records whether the store was evictable. - Multi-tab guard (medium). Detect a second tab of the same session (BroadcastChannel or the Web Locks API) and either (a) warn "Lotus is open in another tab — encryption may misbehave", or (b) make secondary tabs read-only for crypto. Prevents trigger #2.
- Loop detection → recovery prompt (medium). Watch for repeated
keys/upload400M_UNKNOWN … already exists(the client sees the rejection); after N in a window, stop hammering and surface a "Reset encryption on this device (log out & back in)" prompt instead of looping silently.
6.4 Secondary KE-2 hypothesis to test in the capture
crypto-wasm 18.3.0 tightened Olm to-device validation (sender-spoof check +
MSC4147). It's therefore possible KE-2's WARN … unexpected encrypted to-device event … io.element.call.encryption_keys is partly the new validation
rejecting EC's media-key events, not only the missing-Olm-session downstream of
KE-1. Capture discriminator: if KE-2 still occurs in a call where OTK counts
are healthy and no KE-1 storm is present (Q1 = NO), suspect the to-device
validation path (EC ↔ rust-crypto 18.3.x), not KE-1. If KE-2 only ever co-occurs
with the KE-1 storm, the original KE-1⇒KE-2 chain stands.
6.5 What to do now vs. at capture
- Now (no call needed): ship 6.3.1 (
persist()) — it's safe and preventive. Consider 6.3.3 (loop detection) as a follow-up. - At the next glitchy call: run the §4 capture; answer Q1 (divergence?) and
6.4's discriminator. For any currently stuck device, remediation option 1
(clean logout + login, not just "clear storage" — clearing storage without
mx.logout()leaves the server device + its OTKs and can re-trigger the divergence).