docs: P3-4 accessibility — features section, TODO/BUGS, LOTUS_TESTING §P

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 11:45:22 -04:00
parent 6728a1274d
commit 899a14c119
6 changed files with 126 additions and 66 deletions
+29 -24
View File
@@ -17,17 +17,17 @@
From the matrix infra README (`/root/code/matrix/README.md`):
| Thing | Value |
|-------|-------|
| Synapse host | LXC **151**, `10.10.10.29` (Synapse 1.155.0) |
| Synapse log | `/var/log/matrix-synapse/homeserver.log` |
| Synapse config | `/etc/matrix-synapse/homeserver.yaml` (+ `conf.d/`) |
| Synapse HTTP | `10.10.10.29:8008` |
| PostgreSQL host | LXC **109**, `10.10.10.44` (PG 17.9), db `synapse` |
| synapse-admin UI | `http://10.10.10.29:8080` |
| Thing | Value |
| ------------------------ | ------------------------------------------------------------- |
| Synapse host | LXC **151**, `10.10.10.29` (Synapse 1.155.0) |
| Synapse log | `/var/log/matrix-synapse/homeserver.log` |
| Synapse config | `/etc/matrix-synapse/homeserver.yaml` (+ `conf.d/`) |
| Synapse HTTP | `10.10.10.29:8008` |
| PostgreSQL host | LXC **109**, `10.10.10.44` (PG 17.9), db `synapse` |
| synapse-admin UI | `http://10.10.10.29:8080` |
| LiveKit / lk-jwt / guard | LXC 151: LiveKit `:7880/:7881`, guard `:8070`, lk-jwt `:8071` |
| SSH path to Synapse | `ssh root@10.10.10.4` then `pct enter 151` |
| SSH path to PG | `ssh root@10.10.10.4` then `pct enter 109` |
| SSH path to Synapse | `ssh root@10.10.10.4` then `pct enter 151` |
| SSH path to PG | `ssh root@10.10.10.4` then `pct enter 109` |
**Getting a psql shell** (run on LXC 109, or from 151 over the network):
@@ -50,10 +50,10 @@ temporarily raise it in `/etc/matrix-synapse/conf.d/log.yaml` (or the
```yaml
loggers:
synapse.rest.client.keys: { level: DEBUG }
synapse.handlers.e2e_keys: { level: DEBUG }
synapse.rest.client.keys: { level: DEBUG }
synapse.handlers.e2e_keys: { level: DEBUG }
synapse.storage.databases.main.end_to_end_keys: { level: DEBUG }
synapse.handlers.devicemessage: { level: DEBUG } # to-device
synapse.handlers.devicemessage: { level: DEBUG } # to-device
```
Then `systemctl reload matrix-synapse` (reload re-reads log config without a
@@ -86,6 +86,7 @@ use it as the primary client artifact and DevTools as the raw backup.
/var/log/matrix-synapse/homeserver.log | grep "<user_id>"
```
- **Synapse SQL (LXC 109) — what the server thinks it holds:**
```sql
-- Current OTK inventory for the device (compare key_id set against the
-- request body the client keeps retrying).
@@ -106,8 +107,10 @@ use it as the primary client artifact and DevTools as the raw backup.
FROM e2e_fallback_keys_json
WHERE user_id = '@user:matrix.lotusguild.org' AND device_id = '<DEVICE_ID>';
```
> Table names are Synapse 1.155 (`e2e_one_time_keys_json`,
> `e2e_fallback_keys_json`). If a name is absent, list with `\dt e2e*` in psql.
- **Confirms:** if the offending `key_id` (from the 400) is **present** in
`e2e_one_time_keys_json` with a **different** stored value than the client's
request body → OTK state has diverged (rust-crypto store vs Synapse). That is
@@ -132,6 +135,7 @@ use it as the primary client artifact and DevTools as the raw backup.
/var/log/matrix-synapse/homeserver.log | grep -E "<user_id>|<participant_id>"
```
- **Synapse SQL (LXC 109) — undelivered / queued to-device events:**
```sql
-- Backlog of to-device messages queued for the affected device. A growing
-- count here = the HS has the media-key events but the device isn't draining
@@ -145,6 +149,7 @@ use it as the primary client artifact and DevTools as the raw backup.
SELECT device_id, display_name, last_seen, ts
FROM devices WHERE user_id = '@user:matrix.lotusguild.org';
```
- **Confirms:** to-device events present but undecryptable (client shows the
`io.element.call.encryption_keys` "unexpected encrypted" warning) ⇒ there is
**no valid Olm session** to decrypt them — the expected downstream of KE-1.
@@ -265,7 +270,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
> "fix" before the planning session** — they are listed so evidence collection
> can be paired with a recovery plan. Confirm the root condition (Q1/Q2) first.
1. **Per-device logout + re-login of the affected device** *(lowest blast radius)*
1. **Per-device logout + re-login of the affected device** _(lowest blast radius)_
- **What:** log the one glitching device out and back in. Forces a fresh
device id, fresh device keys, and a clean OTK batch — sidesteps a diverged
OTK store without touching other sessions.
@@ -275,7 +280,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
- **Confirms/uses:** if KE-1 stops after this, OTK-store divergence (Q1) was
the cause.
2. **Client crypto-store reset (`clearLoginData` path)** *(medium)*
2. **Client crypto-store reset (`clearLoginData` path)** _(medium)_
- **What:** `clearLoginData()` in `src/client/initMatrix.ts` (coordinator's
file — do not edit) **deletes ALL IndexedDB databases** (incl.
`web-sync-store` and the rust-crypto store `crypto-store`), **unregisters
@@ -294,16 +299,16 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
- **Use when:** the rust-crypto store itself is corrupt/diverged and option 1
didn't clear it.
3. **SDK pin change off the RC** *(medium — codebase change, needs rebuild)*
3. **SDK pin change off the RC** _(medium — codebase change, needs rebuild)_
- **Current pin:** `package.json` → `"matrix-js-sdk": "41.6.0-rc.0"` (a
release candidate).
- **Finding (npm / GitHub changelog, checked 2026-07):** stable **`41.6.0`**
was released **2026-05-26**. Its only changelog line is *"Throw sane error
on completeLoginOnNewDevice IdP rejection"* — **no OTK / keys-upload / Olm /
was released **2026-05-26**. Its only changelog line is _"Throw sane error
on completeLoginOnNewDevice IdP rejection"_ — **no OTK / keys-upload / Olm /
to-device fix** relative to the RC. Later stable lines exist
(`41.7.0`, `41.8.0`; `41.7.0-rc.3` / `41.9.0-rc.0` seen as pre-releases).
Nearby crypto-relevant entries: `41.5.0` *"Enable encrypted history sharing
by default"*; `41.4.0` key-backup handling. **No changelog entry directly
Nearby crypto-relevant entries: `41.5.0` _"Enable encrypted history sharing
by default"_; `41.4.0` key-backup handling. **No changelog entry directly
addresses the KE-1 OTK-conflict symptom** in the immediate range — so
moving RC→`41.6.0` stable is a low-risk hygiene step but is **not expected
to fix KE-1 by itself**. Before pinning, re-read the CHANGELOG for any
@@ -312,7 +317,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
rust-crypto IndexedDB schema — a downgrade triggers the `IDB_VERSION_CONFLICT`
path in `initMatrix.ts`.
4. **Synapse-side OTK row surgery** *(LAST RESORT — highest danger)*
4. **Synapse-side OTK row surgery** _(LAST RESORT — highest danger)_
- **What:** deleting/rewriting rows in `e2e_one_time_keys_json` (and/or
`e2e_fallback_keys_json`, `device_inbox`) for the affected device to force
the client to re-upload a clean batch.
@@ -324,7 +329,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
- **Guardrails if ever done (planning session + HS owner only):** full
`pg_dump` of `synapse` first; do it during **zero active calls**; delete only
the exact diverged `key_id` for the exact `device_id`; `systemctl restart
matrix-synapse` to flush caches; then log the device out/in (option 1) so it
matrix-synapse` to flush caches; then log the device out/in (option 1) so it
republishes. **Never** run this speculatively.
---
@@ -337,7 +342,7 @@ Do these **in order**. Aim to have client + server capturing the **same call**.
`tail -F /var/log/matrix-synapse/homeserver.log | tee /tmp/lotus-call-$(date +%s).log`.
(Optionally raise the `synapse.rest.client.keys` / `handlers.e2e_keys` /
`handlers.devicemessage` loggers to DEBUG per §0 and `systemctl reload
matrix-synapse` — remember to revert after.)
matrix-synapse` — remember to revert after.)
2. **Prep client:** open Lotus Chat → Settings → Developer Tools → **enable
Developer Tools** so the **Crypto Diagnostics** card is visible; note its
entry count starts at (or reset by reload to) 0.
@@ -373,7 +378,7 @@ Do these **in order**. Aim to have client + server capturing the **same call**.
with pass-through wrappers (originals always called) that ring-buffer (max
**200**) any line matching the KE signatures. No network, no timers.
- `getCryptoDiagEntries()` — snapshot copy of the buffer (`{ ts, level, ke,
signature, message }`, most-recent-last).
signature, message }`, most-recent-last).
- `buildCryptoDiagReport(mx)` — JSON string: SDK version, device id, user id,
sync state, `cryptoReady` (`mx.getCrypto()` presence), per-KE counts, and the
entry buffer. No tokens/PII beyond those ids; captured log lines are retained