feat(crypto) + docs: request persistent storage; consolidate docs to 3
CI / Build & Quality Checks (push) Successful in 10m54s
CI / Trigger Desktop Build (push) Successful in 12s

- index.tsx: request navigator.storage.persist() for logged-in sessions so the
  browser can't evict the IndexedDB rust-crypto store (eviction while the
  localStorage session survives resurrects the device with a blank store → the
  KE-1 "one time key already exists" upload storm). Guarded, checks persisted()
  first, best-effort.
- Docs: remove HANDOFF_ELEMENT_CALL_FORK.md, LOTUS_E2EE_INVESTIGATION.md, and
  LOTUS_BUGS.md. Port their live content into the three kept docs — verification
  backlog → LOTUS_TESTING; open bugs + E2EE (KE-1..4) + an Element Call fork
  operational reference (publish steps + io.lotus action catalog) → LOTUS_TODO.
  Fix all dangling references (README, code comments, cross-doc links). Full
  history of the removed docs remains in git.

Gates: tsc/eslint/prettier clean, build OK, 665 tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 15:28:09 -04:00
parent 81904372bc
commit 049472e25f
10 changed files with 205 additions and 1392 deletions
+136 -3
View File
@@ -35,7 +35,7 @@ Completed features are documented in [LOTUS_FEATURES.md](./LOTUS_FEATURES.md).
## ✅ Done — Awaiting Verification
Built and gate-green; verify per [LOTUS_TESTING.md](./LOTUS_TESTING.md), then they graduate to LOTUS_FEATURES.md. (Bug-side fixes awaiting verification live in LOTUS_BUGS.md.)
Built and gate-green; verify per [LOTUS_TESTING.md](./LOTUS_TESTING.md), then they graduate to LOTUS_FEATURES.md. (Open bugs + the verification backlog now live in this file and LOTUS_TESTING.md.)
| Feature | Test guide |
| :-------------------------------------------------------------------------------- | :---------------- |
@@ -217,7 +217,7 @@ Features:
### [~] P4-8 · Encrypted Message Search Indexing & Caching — IMPLEMENTED (2026-07), opt-in
**Shipped:** `src/app/utils/searchCache.ts` — raw-IndexedDB per-room index (`lotus-search-cache`) of decrypted search rows + coverage markers, merged into local search (in-memory-wins dedupe). **Opt-in, default OFF** (stores plaintext at rest) with a privacy note, Clear button, and logout wipe. Awaiting live QA (LOTUS_BUGS AW / P4-8 row).
**Shipped:** `src/app/utils/searchCache.ts` — raw-IndexedDB per-room index (`lotus-search-cache`) of decrypted search rows + coverage markers, merged into local search (in-memory-wins dedupe). **Opt-in, default OFF** (stores plaintext at rest) with a privacy note, Clear button, and logout wipe. Awaiting live QA (LOTUS_TESTING outstanding-verification backlog).
### [~] P4-1 · Thread Notification Mode Per-Thread — IMPLEMENTED (2026-07), ⚠️ AWAITING LIVE QA
@@ -316,7 +316,7 @@ Features:
**What:** High-end background noise cancellation using a pre-trained ML model (RNNoise) running in the browser. Removes dogs, fans, and keyboard clicks from the mic stream.
**Shipped:** 3-tier setting (Off / Browser-native / ML) in Settings → General → Calls.
**🔱 [EC-FORK] DONE — moved in-source (2026-06).** ML denoise is now a first-class audio stage **inside** the forked Element Call: a LiveKit `TrackProcessor<Audio>` activated by `lotusDenoiseSource=1` (cinny sets it when ML is selected). The old build-time `getUserMedia`/`index.html` monkeypatch is **removed**. Because EC re-runs the processor on every (re)publish, denoise now **survives reconnects and mic-device switches** — this is the A7 fix (see `LOTUS_BUGS.md` A7, `LOTUS_TESTING.md` §D2-1). The processor degrades to the raw mic rather than going silent.
**🔱 [EC-FORK] DONE — moved in-source (2026-06).** ML denoise is now a first-class audio stage **inside** the forked Element Call: a LiveKit `TrackProcessor<Audio>` activated by `lotusDenoiseSource=1` (cinny sets it when ML is selected). The old build-time `getUserMedia`/`index.html` monkeypatch is **removed**. Because EC re-runs the processor on every (re)publish, denoise now **survives reconnects and mic-device switches** — this is the A7 fix (see `LOTUS_TESTING.md` §D2-1). The processor degrades to the raw mic rather than going silent.
**Key decision:** LiveKit's Krisp filter is LiveKit-Cloud-only (we self-host the SFU); EC's own RNNoise PR #3892 is unmerged. Owning the fork let us implement the in-source stage directly.
**Models — all in-source in the fork:**
@@ -826,3 +826,136 @@ edit → commit → git push origin lotus
- **Synapse (Matrix):** LXC 151 on `compute-storage-01` — `pct exec 151 -- bash`
- **Config:** `/etc/matrix-synapse/homeserver.yaml`
- **Version check:** `curl -s https://matrix.lotusguild.org/_matrix/client/versions`
---
## Element Call fork — operational reference
_Ported from the retired `HANDOFF_ELEMENT_CALL_FORK.md` (2026-07; full history in git). The fork lives at `LotusGuild/element-call` (branch `lotus`, forked from upstream tag `v0.20.1`); cinny consumes it as the npm package `@lotusguild/element-call-embedded`, whose built bundle is copied into `public/element-call/`._
**Publish a new fork version (manual; needs the Gitea npm token):**
1. In the fork, bump `embedded/web/package.json` version (current unpublished: `0.20.1-lotus.2`).
2. Build: `pnpm run build:embedded` (Node 24, pnpm 10.33.0; output → repo `dist/`, staged into `embedded/web/dist`).
3. `cd embedded/web && npm version <tag> --no-git-tag-version && npm publish` to the Gitea registry (`code.lotusguild.org`). Publicly readable; only publishing needs the token.
4. In cinny: bump the `@lotusguild/element-call-embedded` pin (`package.json`, currently `0.20.1-lotus.1`) → the new version, `npm install`, build.
**`io.lotus.*` widget actions (fork ↔ cinny host):**
| Action | Direction | Purpose | Fork module |
| :-- | :-- | :-- | :-- |
| `io.lotus.call_state` | EC→host | speaker/mute/camera state stream (URL `lotusCallState=1`) | `lotusCallState.ts` |
| `io.lotus.focus_participant` | host→EC | spotlight a participant (works during screenshare) | `lotusFocus.ts` |
| `io.lotus.inject_audio` | host→EC | soundboard clip mixed into the call (URL `lotusAudioInject=1`) | `lotusAudioInject.ts` |
| `io.lotus.set_quality` | host→EC | audio/screenshare bitrate/fps caps | `lotusQuality.ts` |
| `io.lotus.decorations` | host→EC | in-call avatar decorations | `lotusDecorations.ts` |
| `io.lotus.set_deafen` | host→EC | deafen / screenshare-audio-mute at the LiveKit source (P6-2) | `lotusDeafen.ts` |
Also flag-gated (URL params): `lotusTransparent`/`lotusTheme` (theme), `lotusDenoiseSource=1` (in-source ML denoise). New toWidget actions must be added to the enum + `LOTUS_TO_WIDGET_ACTIONS` in `src/lotus/lotusActions.ts` and only SENT after call-join (else a 10s timeout). **P6-2 phase 2 pending:** after publishing lotus.2, bump the cinny pin + delete the `CallControl.ts` `<audio>.muted` fallback.
---
## 🔴 Open — Actionable
### 🧨 Encryption / E2EE — ⚠️ EXTREME COMPLEXITY · 🧠 PLANNING SESSION REQUIRED · 👤 SENIOR ENGINEER
> 🧰 **Investigation kit ready (2026-07):** `LOTUS_E2EE_INVESTIGATION.md` (git history)
> has the per-KE capture runbook (console signatures, synapse-side queries, the
> KE-1→KE-2 causality decision tree, ranked remediations), and the client now
> ships a **Crypto Diagnostics** capture helper (Settings) — run it during the
> next affected call and download the report before starting any fix.
> **Observed live in prod 2026-06-30** on `chat.lotusguild.org` during a 2-person
> **Element Call** (E2EE enabled). These span **client rust-crypto (via
> `matrix-js-sdk@41.6.0-rc.0`) ↔ Synapse ↔ Element Call's MatrixRTC E2EE** and are
> very likely **interrelated** (see KE-1 → KE-2). Do **not** spot-fix — they need
> a dedicated cross-system planning session with the homeserver owner. Capture
> full client console + a synapse-side trace for the same call before starting.
> **None of these are caused by the EC fork work** (the issues reproduce on the
> old build; the local mic/denoise path is unrelated to key distribution).
- **KE-1 — One-time-key (OTK) upload conflict storm (CRITICAL, root-cause candidate).**
`POST /_matrix/client/v3/keys/upload` returns `400 M_UNKNOWN: One time key
signed_curve25519:AAAAAAAAAGQ already exists. Old key: {…} new key: {…}` —
firing **continuously** (many/sec). The client repeatedly tries to publish an
OTK at a key id the server already holds **with a different value**, i.e. the
rust-crypto key store and Synapse have **diverged OTK state**. Impact: floods
the crypto outgoing-request loop and is the prime suspect for the downstream
missing-key failures (no fresh OTKs ⇒ no new Olm sessions ⇒ undecryptable
to-device key events). _Investigate:_ device/key-store reset-or-restore
mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a
Synapse OTK bug. Repro signature: grep console for `already exists`.
**Extreme — planning session.**
**Update 2026-07 (investigation §6):** upstream `matrix-rust-sdk#5200` (still
OPEN) confirms the mechanism — on the 400, `mark_request_as_sent()` never fires
so the SDK re-issues the identical upload forever. **`41.7.0` does NOT fix it**
(crypto-wasm 17→18.3.1 has no OTK/upload change; 18.3.x was to-device security
only) — the SDK-pin lever is closed. Root cause = **store↔server OTK
divergence**; the leading **web-specific** trigger is that cinny never calls
**`navigator.storage.persist()`**, so the IndexedDB crypto store is evictable
while the `localStorage` session/device-id survives → device resurrects with a
blank store → re-uploads OTKs the server still holds. **Actionable preventive
fix (buildable now, no call needed):** request persistent storage on login
(+ optional multi-tab guard + 400-loop→recovery-prompt). Healing an already-
diverged device still needs a clean **logout+login** (not just "clear
storage"). Full runbook (synapse SQL, capture checklist, §6 diagnosis) is in git history at `LOTUS_E2EE_INVESTIGATION.md` (removed 2026-07).
- **KE-2 — Element Call media keys not arriving/decrypting → audio & video cut out (CRITICAL).**
`MissingKey: missing key at index N for participant @user`, `skipping decryption
due to missing key`, `MissingKey: key set not found for @user at index 0`, and
rust-crypto `WARN … Received an unexpected encrypted to-device event …
event_type="io.element.call.encryption_keys"`. EC distributes per-participant
media keys as **encrypted to-device `io.element.call.encryption_keys`** events;
these aren't being received/decrypted in order, so remote LiveKit audio/video
can't be decrypted — **this is the "friend's audio cuts out occasionally"
symptom.** Almost certainly downstream of **KE-1** (broken Olm sessions). Spans
EC's MatrixRTC E2EE + rust-crypto to-device + Synapse. **Extreme — planning
session.**
- **KE-3 — Timeline decryption error: missing `algorithm` field (HIGH).**
`Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg:
missing field 'algorithm' at line 1 column 138 …]`. A malformed/legacy
encrypted event (or a serialization mismatch in the RC SDK) that rust-crypto
can't parse. Lower frequency than KE-1/2 but a distinct decode-path failure —
capture the offending event id (`$SASBBzoqj…` seen) and inspect its raw content.
- **KE-4 — MatrixRTC delayed-event / membership timeouts (MEDIUM-HIGH, reliability).**
`[MembershipManager] Network local timeout error while sending event, immediate
retry … AbortError: Restart delayed event timed out before the HS responded`,
with repeated `org.matrix.msc4157.update_delayed_event`. MSC4140/4157
delayed-event reliability against `matrix.lotusguild.org` — can cause stale/ghost
call membership and missed leave events. May be partly **homeserver
responsiveness**; correlate with synapse latency/load. Include in the same
planning session since it shares the call-reliability + HS-interaction surface.
### Security & Privacy
- **N97 — Access token stored in plaintext `localStorage`** (`state/sessions.ts`), vulnerable to XSS; device ID likewise. Architectural — needs a token-protection / session-storage redesign.
- ~~**Session writes are non-atomic and not cross-tab synced**~~ — **done (2026-07):** atomic single-key `cinny_session_v1` blob (legacy-key migration + dual-write) + `subscribeSessionChanges`/`useSessionSync` cross-tab reload. (The plaintext-token concern in N97 above is the remaining, separate architectural item.)
- **Persisted PII without encryption:** user status message + expiry (`settings/account/Profile.tsx`), unsent composer drafts (`room/RoomInput.tsx`). Leak risk on shared devices.
### PWA / Offline / Notifications
- **N107 — SW has no `push` handler** — Web Push delivery is entirely non-functional. Needs a `push` listener + a Matrix push-gateway integration.
- **No app-asset caching strategy** (`src/sw.ts`) — no offline capability.
- ~~**`manifest: false`** may block PWA install~~ — **verified OK (2026-06):** `index.html` links `/manifest.json`, which exists in `public/` and is copied to `dist/`; VitePWA intentionally doesn't generate one. Not a bug.
### Dependencies & Build
- ~~**`matrix-js-sdk` pinned to a Release Candidate**~~ — **done (2026-07):** moved to `41.7.0` stable (crypto-wasm 18.3.1 security bump). Deep-audit dep triage: all 16 npm advisories are dev-only/unreachable/dead-dep — zero shipped exposure; dead `dompurify` removed. `@atlaskit`/build-tool pins remain review-worthy but low priority.
- **Build-time overhead:** `lotusDenoise` does heavy sequential `fs` work in `closeBundle`; `viteStaticCopy` config is complex with redundant renames — could be streamlined.
### Code Hygiene / DevEx
- **Automated test suite — 561+ tests across 65+ modules, a hard CI gate.** `npm test` runs Node's built-in runner via `tsx` (not vitest — Vite 8 is ahead of vitest's range) and **blocks the build job on failure**. Broad pure-logic coverage: utils (common, regex, sanitize/XSS, time, matrix, matrix-uia, mimeTypes, sort, accentColor, findAndReplace, AsyncSearch, ASCIILexicalTable, keyboard, room, matrix-crypto, featureCheck, syntaxHighlight, imageCompression, user-agent, callSounds), state (settings, sessions, recentSearches, upload, typingMembers, lists, room-list, toast, scheduledMessages, backupRestore, callEmbed/callPreferences, spaceRooms, …), plugins (matrix-to, call/utils, via-servers, bad-words, recent-emoji, custom-emoji, markdown block/inline/utils), OIDC (cs-api, useParsedLoginFlows, oidcState), lotus/avatarDecorations, message-search, search filters. Prevention work has caught + fixed **4 real bugs** (`findAndReplace` infinite-loop; `getSettings` crash-on-load when storage is blocked; `isMacOS` never matching modern Macs; `isMLDenoiseSupported` throwing `ReferenceError` instead of returning false on browsers lacking the `AudioWorkletNode` binding). **Next:** component/integration tests (the untestable-under-tsx DOM/React surface).
- **Extensive `as any` casts** across `src/` — gradual typing cleanup.
- **`types/matrix/` mirrors SDK types** instead of importing them — drift risk.
- ~~**Hardcoded CDN URL** should move to an env var~~ — **done:** `avatarDecorations.ts` already honors a `VITE_DECORATION_CDN` env override (lines 14-16); the in-repo literal is only the default. Nothing left.
- **`patch-folds.mjs` edits `node_modules` directly** — consider `patch-package`.
- **Infra docs:** `contrib/nginx` lacks security headers (HSTS/CSP) + uses rewrites over `try_files`; `contrib/caddy` has a placeholder path. CI/CD (`prod-deploy.yml`): sequential deploy, aggressive 1-min Netlify timeout, `package-manager-cache: false`.
- **README:** keep the fork-sync version + logo path current. (`CONTRIBUTING.md` is intentionally left as upstream Cinny's — not a Lotus concern.)
- **Architecture notes (low priority):** deep `features/` + `hooks/` nesting, many small coupled hooks, possible dead CSS/components, `SpacingVariant` / `DropTarget` recipe simplification.
- **Git workflow (forward-looking):** keep commits scoped — past monolithic "fix all bugs" commits and inconsistent prefixes hurt `git bisect`.
### Big Projects
- ~~**#5 — Seasonal themes & chat-background redesign.**~~ **DONE (2026-06/07):** 11 seasonal/holiday overlays shipped and later toned down + given a settings preview grid; all 19 chat backgrounds redesigned (Carbon + Aurora kept per user preference), one design sprint each, GPU-friendly CSS with `prefers-reduced-motion` + pause toggle. Remaining polish rides normal bug flow, not a "big project."