docs: P3-4 accessibility — features section, TODO/BUGS, LOTUS_TESTING §P
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
+29
-24
@@ -17,17 +17,17 @@
|
||||
|
||||
From the matrix infra README (`/root/code/matrix/README.md`):
|
||||
|
||||
| Thing | Value |
|
||||
|-------|-------|
|
||||
| Synapse host | LXC **151**, `10.10.10.29` (Synapse 1.155.0) |
|
||||
| Synapse log | `/var/log/matrix-synapse/homeserver.log` |
|
||||
| Synapse config | `/etc/matrix-synapse/homeserver.yaml` (+ `conf.d/`) |
|
||||
| Synapse HTTP | `10.10.10.29:8008` |
|
||||
| PostgreSQL host | LXC **109**, `10.10.10.44` (PG 17.9), db `synapse` |
|
||||
| synapse-admin UI | `http://10.10.10.29:8080` |
|
||||
| Thing | Value |
|
||||
| ------------------------ | ------------------------------------------------------------- |
|
||||
| Synapse host | LXC **151**, `10.10.10.29` (Synapse 1.155.0) |
|
||||
| Synapse log | `/var/log/matrix-synapse/homeserver.log` |
|
||||
| Synapse config | `/etc/matrix-synapse/homeserver.yaml` (+ `conf.d/`) |
|
||||
| Synapse HTTP | `10.10.10.29:8008` |
|
||||
| PostgreSQL host | LXC **109**, `10.10.10.44` (PG 17.9), db `synapse` |
|
||||
| synapse-admin UI | `http://10.10.10.29:8080` |
|
||||
| LiveKit / lk-jwt / guard | LXC 151: LiveKit `:7880/:7881`, guard `:8070`, lk-jwt `:8071` |
|
||||
| SSH path to Synapse | `ssh root@10.10.10.4` then `pct enter 151` |
|
||||
| SSH path to PG | `ssh root@10.10.10.4` then `pct enter 109` |
|
||||
| SSH path to Synapse | `ssh root@10.10.10.4` then `pct enter 151` |
|
||||
| SSH path to PG | `ssh root@10.10.10.4` then `pct enter 109` |
|
||||
|
||||
**Getting a psql shell** (run on LXC 109, or from 151 over the network):
|
||||
|
||||
@@ -50,10 +50,10 @@ temporarily raise it in `/etc/matrix-synapse/conf.d/log.yaml` (or the
|
||||
|
||||
```yaml
|
||||
loggers:
|
||||
synapse.rest.client.keys: { level: DEBUG }
|
||||
synapse.handlers.e2e_keys: { level: DEBUG }
|
||||
synapse.rest.client.keys: { level: DEBUG }
|
||||
synapse.handlers.e2e_keys: { level: DEBUG }
|
||||
synapse.storage.databases.main.end_to_end_keys: { level: DEBUG }
|
||||
synapse.handlers.devicemessage: { level: DEBUG } # to-device
|
||||
synapse.handlers.devicemessage: { level: DEBUG } # to-device
|
||||
```
|
||||
|
||||
Then `systemctl reload matrix-synapse` (reload re-reads log config without a
|
||||
@@ -86,6 +86,7 @@ use it as the primary client artifact and DevTools as the raw backup.
|
||||
/var/log/matrix-synapse/homeserver.log | grep "<user_id>"
|
||||
```
|
||||
- **Synapse SQL (LXC 109) — what the server thinks it holds:**
|
||||
|
||||
```sql
|
||||
-- Current OTK inventory for the device (compare key_id set against the
|
||||
-- request body the client keeps retrying).
|
||||
@@ -106,8 +107,10 @@ use it as the primary client artifact and DevTools as the raw backup.
|
||||
FROM e2e_fallback_keys_json
|
||||
WHERE user_id = '@user:matrix.lotusguild.org' AND device_id = '<DEVICE_ID>';
|
||||
```
|
||||
|
||||
> Table names are Synapse 1.155 (`e2e_one_time_keys_json`,
|
||||
> `e2e_fallback_keys_json`). If a name is absent, list with `\dt e2e*` in psql.
|
||||
|
||||
- **Confirms:** if the offending `key_id` (from the 400) is **present** in
|
||||
`e2e_one_time_keys_json` with a **different** stored value than the client's
|
||||
request body → OTK state has diverged (rust-crypto store vs Synapse). That is
|
||||
@@ -132,6 +135,7 @@ use it as the primary client artifact and DevTools as the raw backup.
|
||||
/var/log/matrix-synapse/homeserver.log | grep -E "<user_id>|<participant_id>"
|
||||
```
|
||||
- **Synapse SQL (LXC 109) — undelivered / queued to-device events:**
|
||||
|
||||
```sql
|
||||
-- Backlog of to-device messages queued for the affected device. A growing
|
||||
-- count here = the HS has the media-key events but the device isn't draining
|
||||
@@ -145,6 +149,7 @@ use it as the primary client artifact and DevTools as the raw backup.
|
||||
SELECT device_id, display_name, last_seen, ts
|
||||
FROM devices WHERE user_id = '@user:matrix.lotusguild.org';
|
||||
```
|
||||
|
||||
- **Confirms:** to-device events present but undecryptable (client shows the
|
||||
`io.element.call.encryption_keys` "unexpected encrypted" warning) ⇒ there is
|
||||
**no valid Olm session** to decrypt them — the expected downstream of KE-1.
|
||||
@@ -265,7 +270,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
|
||||
> "fix" before the planning session** — they are listed so evidence collection
|
||||
> can be paired with a recovery plan. Confirm the root condition (Q1/Q2) first.
|
||||
|
||||
1. **Per-device logout + re-login of the affected device** *(lowest blast radius)*
|
||||
1. **Per-device logout + re-login of the affected device** _(lowest blast radius)_
|
||||
- **What:** log the one glitching device out and back in. Forces a fresh
|
||||
device id, fresh device keys, and a clean OTK batch — sidesteps a diverged
|
||||
OTK store without touching other sessions.
|
||||
@@ -275,7 +280,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
|
||||
- **Confirms/uses:** if KE-1 stops after this, OTK-store divergence (Q1) was
|
||||
the cause.
|
||||
|
||||
2. **Client crypto-store reset (`clearLoginData` path)** *(medium)*
|
||||
2. **Client crypto-store reset (`clearLoginData` path)** _(medium)_
|
||||
- **What:** `clearLoginData()` in `src/client/initMatrix.ts` (coordinator's
|
||||
file — do not edit) **deletes ALL IndexedDB databases** (incl.
|
||||
`web-sync-store` and the rust-crypto store `crypto-store`), **unregisters
|
||||
@@ -294,16 +299,16 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
|
||||
- **Use when:** the rust-crypto store itself is corrupt/diverged and option 1
|
||||
didn't clear it.
|
||||
|
||||
3. **SDK pin change off the RC** *(medium — codebase change, needs rebuild)*
|
||||
3. **SDK pin change off the RC** _(medium — codebase change, needs rebuild)_
|
||||
- **Current pin:** `package.json` → `"matrix-js-sdk": "41.6.0-rc.0"` (a
|
||||
release candidate).
|
||||
- **Finding (npm / GitHub changelog, checked 2026-07):** stable **`41.6.0`**
|
||||
was released **2026-05-26**. Its only changelog line is *"Throw sane error
|
||||
on completeLoginOnNewDevice IdP rejection"* — **no OTK / keys-upload / Olm /
|
||||
was released **2026-05-26**. Its only changelog line is _"Throw sane error
|
||||
on completeLoginOnNewDevice IdP rejection"_ — **no OTK / keys-upload / Olm /
|
||||
to-device fix** relative to the RC. Later stable lines exist
|
||||
(`41.7.0`, `41.8.0`; `41.7.0-rc.3` / `41.9.0-rc.0` seen as pre-releases).
|
||||
Nearby crypto-relevant entries: `41.5.0` *"Enable encrypted history sharing
|
||||
by default"*; `41.4.0` key-backup handling. **No changelog entry directly
|
||||
Nearby crypto-relevant entries: `41.5.0` _"Enable encrypted history sharing
|
||||
by default"_; `41.4.0` key-backup handling. **No changelog entry directly
|
||||
addresses the KE-1 OTK-conflict symptom** in the immediate range — so
|
||||
moving RC→`41.6.0` stable is a low-risk hygiene step but is **not expected
|
||||
to fix KE-1 by itself**. Before pinning, re-read the CHANGELOG for any
|
||||
@@ -312,7 +317,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
|
||||
rust-crypto IndexedDB schema — a downgrade triggers the `IDB_VERSION_CONFLICT`
|
||||
path in `initMatrix.ts`.
|
||||
|
||||
4. **Synapse-side OTK row surgery** *(LAST RESORT — highest danger)*
|
||||
4. **Synapse-side OTK row surgery** _(LAST RESORT — highest danger)_
|
||||
- **What:** deleting/rewriting rows in `e2e_one_time_keys_json` (and/or
|
||||
`e2e_fallback_keys_json`, `device_inbox`) for the affected device to force
|
||||
the client to re-upload a clean batch.
|
||||
@@ -324,7 +329,7 @@ Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
|
||||
- **Guardrails if ever done (planning session + HS owner only):** full
|
||||
`pg_dump` of `synapse` first; do it during **zero active calls**; delete only
|
||||
the exact diverged `key_id` for the exact `device_id`; `systemctl restart
|
||||
matrix-synapse` to flush caches; then log the device out/in (option 1) so it
|
||||
matrix-synapse` to flush caches; then log the device out/in (option 1) so it
|
||||
republishes. **Never** run this speculatively.
|
||||
|
||||
---
|
||||
@@ -337,7 +342,7 @@ Do these **in order**. Aim to have client + server capturing the **same call**.
|
||||
`tail -F /var/log/matrix-synapse/homeserver.log | tee /tmp/lotus-call-$(date +%s).log`.
|
||||
(Optionally raise the `synapse.rest.client.keys` / `handlers.e2e_keys` /
|
||||
`handlers.devicemessage` loggers to DEBUG per §0 and `systemctl reload
|
||||
matrix-synapse` — remember to revert after.)
|
||||
matrix-synapse` — remember to revert after.)
|
||||
2. **Prep client:** open Lotus Chat → Settings → Developer Tools → **enable
|
||||
Developer Tools** so the **Crypto Diagnostics** card is visible; note its
|
||||
entry count starts at (or reset by reload to) 0.
|
||||
@@ -373,7 +378,7 @@ Do these **in order**. Aim to have client + server capturing the **same call**.
|
||||
with pass-through wrappers (originals always called) that ring-buffer (max
|
||||
**200**) any line matching the KE signatures. No network, no timers.
|
||||
- `getCryptoDiagEntries()` — snapshot copy of the buffer (`{ ts, level, ke,
|
||||
signature, message }`, most-recent-last).
|
||||
signature, message }`, most-recent-last).
|
||||
- `buildCryptoDiagReport(mx)` — JSON string: SDK version, device id, user id,
|
||||
sync state, `cryptoReady` (`mx.getCrypto()` presence), per-KE counts, and the
|
||||
entry buffer. No tokens/PII beyond those ids; captured log lines are retained
|
||||
|
||||
Reference in New Issue
Block a user