Compare commits

..

1 Commits

Author SHA1 Message Date
jared 049472e25f feat(crypto) + docs: request persistent storage; consolidate docs to 3
CI / Build & Quality Checks (push) Successful in 10m54s
CI / Trigger Desktop Build (push) Successful in 12s
- index.tsx: request navigator.storage.persist() for logged-in sessions so the
  browser can't evict the IndexedDB rust-crypto store (eviction while the
  localStorage session survives resurrects the device with a blank store → the
  KE-1 "one time key already exists" upload storm). Guarded, checks persisted()
  first, best-effort.
- Docs: remove HANDOFF_ELEMENT_CALL_FORK.md, LOTUS_E2EE_INVESTIGATION.md, and
  LOTUS_BUGS.md. Port their live content into the three kept docs — verification
  backlog → LOTUS_TESTING; open bugs + E2EE (KE-1..4) + an Element Call fork
  operational reference (publish steps + io.lotus action catalog) → LOTUS_TODO.
  Fix all dangling references (README, code comments, cross-doc links). Full
  history of the removed docs remains in git.

Gates: tsc/eslint/prettier clean, build OK, 665 tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 15:28:09 -04:00
10 changed files with 205 additions and 1392 deletions
-686
View File
@@ -1,686 +0,0 @@
# HANDOFF — Forking & Self-Building Element Call ("Lotus Call")
> **Audience:** a fresh Claude/engineer session with **no prior context** on this
> project. Read this top-to-bottom before touching anything. This document is the
> single source of truth for the Element Call (EC) fork initiative.
>
> **Status:** **PHASE 02 IMPLEMENTED (build-verified, not yet live-tested)**
> (2026-06-30). The fork exists, builds, is published, and cinny consumes it
> (Phase 0/1). **All 7 Phase-2 EC features are implemented on the fork's `lotus`
> branch**, each additive + flag-gated, build+typecheck-clean, per-feature
> reviewed (+ a holistic multi-agent review), and pushed. **None are live-tested
> yet** — every one needs the `LOTUS_TESTING.md` §D sweep, and the **cinny host
> side must be wired** (set flags / send actions / handle call_state) — see §12.
> See **§9** Phase 0/1 results, **§10** cutover, **§11** Phase-2 seams, **§12**
> Phase-2 status + cinny integration checklist. Created 2026-06 from `LotusGuild/cinny`.
---
## 9. Phase 0 Results (verified 2026-06-29)
**Decisions taken with the user:** scope = Phase 0 recon; consumption model =
**private npm package** (§5 option 1). Recommended registry = **Gitea's built-in
npm registry** (`code.lotusguild.org`) — zero new infra.
### 9.1 Version → tag → commit mapping (LOCKED)
| Source | Value |
| :--------------------------------------------------- | :----------------------------------------- |
| cinny `package.json` pin | `@element-hq/element-call-embedded@0.20.1` |
| Bundle self-report (`VITE_APP_VERSION`/`appVersion`) | `embedded-v0.20.1` |
| npm registry `gitHead` for 0.20.1 | `2d74c48151d9edc01c65a22a91478aac81bf24d0` |
| GitHub tag `v0.20.1` → commit | `2d74c48…`**same commit** |
**Fork from upstream tag `v0.20.1` (commit `2d74c48`).** The embedded package
version equals the element-call release tag; repo `package.json` version is
`0.0.0` and the real version is stamped at publish time from the tag.
### 9.2 The shipped npm dist is a CLEAN upstream build
No `lotus`/`denoise`/`rnnoise` strings anywhere in
`node_modules/@element-hq/element-call-embedded/dist`. **All Lotus customization
(denoise shim) is injected at cinny build time, not baked into the package** — so
swapping the source does not disturb cinny's denoise injection layer. The
ringtone/reaction assets (`baduntss`, `cat`, `clap`, `call_declined`, …) are
upstream EC's own, not ours.
### 9.3 Build toolchain & mechanism
- **Node `24`** (`.node-version`), **pnpm `10.33.0`** (`packageManager` field,
via corepack).
- Build: **`pnpm run build:embedded`** = `vite build --config
vite-embedded.config.ts` with `NODE_OPTIONS=--max-old-space-size=16384`.
- Output dir is **repo-root `dist/`**; CI stages it into **`embedded/web/dist`**
(the `embedded/web/` dir holds the publish template: `package.json`, README,
both LICENSE files).
- Publish workflow upstream = `.github/workflows/publish-embedded-packages.yaml`:
builds → `npm version <tag> --no-git-tag-version` → `npm publish --provenance
--access public` to npmjs as `@element-hq/element-call-embedded`. (Also
Android/Maven + iOS/SwiftPM — irrelevant; we are web-only.)
### 9.4 Build reproduction — PARITY CONFIRMED
Cloned `element-call@v0.20.1` to `/root/code/element-call` (shallow), built with
isolated Node 24 / pnpm 10.33.0 (system Node 20 / cinny untouched). Result vs the
shipped npm dist:
- **137 of 147 files byte-identical** (same Vite content-hash): all CSS, fonts,
wasm, audio, JSON locale files, and `IndexedDBWorker`.
- **Only 5 JS chunks differ** (`index`, `pako.esm`, `polyfill-force`,
`rust-crypto`, `spa`) — **cause isolated to the version define**: our local
build baked `appVersion:\`dev\``(because`VITE_APP_VERSION`was unset) vs the
npm build's`appVersion:\`embedded-v0.20.1\``. `index.html` is identical modulo
the hashed asset filenames. **Benign** — our CI sets the version from the git
tag, so a tagged CI build will match.
### 9.5 Fork CI (drafted)
`.gitea/workflows/ci.yml` is staged in the clone (models cinny's
`.gitea/workflows/ci.yml` + upstream's publish flow). Linux-only (`ubuntu-latest`)
— the Windows worker is for cinny-desktop/Tauri, not the EC web bundle. Build job
on PR/push to `lotus`; publish job on `v*` tag → `@lotusguild/element-call-embedded`
to the Gitea npm registry (needs `secrets.GITEA_NPM_TOKEN`).
### 9.6 Phase 1 — DONE (2026-06-29)
1. ✅ **Fork repo live:** `code.lotusguild.org/LotusGuild/element-call` (public,
AGPL), default branch `lotus`, full history (7018 commits) + tag `v0.20.1`.
Branch `lotus` = `v0.20.1` + 2-file diff (CI workflow + embedded package
rename).
2. ✅ **Package published:** `@lotusguild/element-call-embedded@0.20.1` on the
Gitea npm registry (published manually from the version-faithful build while
the admin token was available). **Publicly readable** (unauth `npm install`
works → devs/CI need no token to consume; only publishing needs one).
3. ✅ **cinny wired & built clean** (Node 24): `.npmrc` scope line +
`package.json` dep + `vite.config.js` `viteStaticCopy` src. `npm install`
swapped the package (resolved from Gitea), `npm run build` succeeded,
`dist/public/element-call/` populated, bundle reports `appVersion:
embedded-v0.20.1`, **denoise shim injected + all denoise assets copied**
(injection layer unchanged). **These cinny edits are staged in the working
tree, NOT committed/pushed** — pushing triggers CI → desktop → deploy, so it's
gated on the §D live test (see §10).
### 9.8 Reproducibility note (important)
A from-source rebuild is **NOT byte-identical** to upstream's npm tarball.
137/147 files match exactly (CSS, fonts, wasm, audio, worker); the 5 JS chunks
(`index`, `pako.esm`, `polyfill-force`, `rust-crypto`, `spa`) differ because the
rolldown/oxc **minifier mangles export names differently** across build
environments (and the version-define is one input). This is normal and benign —
the code is functionally equivalent. **Do not chase byte-parity; the §D live call
test is the real parity gate.**
### 9.9 Remaining follow-ups (not blocking the cutover)
- **CI publishing:** `.gitea/workflows/ci.yml` publishes on a `v*` tag but needs
(a) a Gitea Actions runner for `LotusGuild/element-call`, and (b) a **durable**
`GITEA_NPM_TOKEN` repo secret with package read/write (the admin token used for
the manual publish is being deleted, so it was deliberately NOT baked in). Until
then, publishing is manual (`npm version <tag>` in `embedded/web` →
`npm publish`).
- Decide rebase cadence vs upstream (0.20.2 / 0.20.3 already out — see §9.1).
### 9.7 Ready-to-apply artifacts (staged 2026-06-29)
**Fork side — already committed** on branch `lotus` in `/root/code/element-call`
(remote `lotus` = `code.lotusguild.org/LotusGuild/element-call.git`, push deferred
until the repo exists). Minimal 2-file diff vs tag `v0.20.1`:
`.gitea/workflows/ci.yml` (new) + `embedded/web/package.json` (rename to
`@lotusguild/element-call-embedded`). Push with:
`git push -u lotus lotus && git push lotus v0.20.1` (and tag `v0.20.1` on our side
to trigger the first publish, or push our own `v0.20.1` tag).
**cinny side — NOT yet applied** (applying before the package is published breaks
`npm ci`). Exactly 3 edits + a lockfile regen:
1. `.npmrc` — append the scoped-registry line:
```
@lotusguild:registry=https://code.lotusguild.org/api/packages/LotusGuild/npm/
```
(CI/auth: `//code.lotusguild.org/api/packages/LotusGuild/npm/:_authToken=${GITEA_NPM_TOKEN}`
— inject via env in CI, do not commit a plaintext token.)
2. `package.json:104` —
`"@element-hq/element-call-embedded": "0.20.1"` →
`"@lotusguild/element-call-embedded": "0.20.1"`.
3. `vite.config.js:25` — `viteStaticCopy` src:
`node_modules/@element-hq/element-call-embedded/dist` →
`node_modules/@lotusguild/element-call-embedded/dist`.
**`stripBase: 4` stays unchanged** — `node_modules/@lotusguild/element-call-embedded/dist`
is still exactly 4 leading segments. (Update the comment's path reference too.)
4. `package-lock.json` — regenerated by `npm install`, not hand-edited (drops the
`registry.npmjs.org/@element-hq/...` resolved URL for the Gitea one).
The denoise injection (`lotusDenoise()` in `vite.config.js`) is **unchanged** — it
keys off `dist/public/element-call/index.html`, which our fork's bundle still
produces identically (verified: `index.html` byte-identical modulo asset hashes).
---
## 0. TL;DR / The Goal
We embed **Element Call** (the Matrix group-VoIP/video app) inside Lotus Chat to
power voice/video channels. Today we consume Element's **pre-compiled npm
bundle** and can only steer it from the outside (a limited widget API + fragile
same-origin DOM hacks). Several in-call problems are **unfixable from outside**
because they live in EC's compiled JS.
**We want true ownership: fork `element-hq/element-call`, build it from source
ourselves, host our build, and replace the npm bundle with our fork.** Then
every in-call behavior becomes editable code.
**This requires standing up a brand-new repo and build pipeline for our EC fork.**
---
## 1. Why fork? (What we cannot fix today)
These came out of live testing and are documented in `LOTUS_BUGS.md` →
"Known Element Call iframe limitations":
| Issue | What's wrong | Why outside-fixes fail |
| :----------------------------------------------------- | :------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **A6** — avatar decorations in-call | Our profile-decoration overlays don't appear on in-call video tiles | The video grid is rendered by EC's React app inside the iframe. We can only inject overlay DOM (fragile) — we can't make it a first-class part of the tile. |
| **A5** — focus camera / fullscreen during screenshare | Can't reliably spotlight a participant's camera while someone screenshares | EC's **layout logic** (screenshare priority, spotlight) is compiled JS we don't control. We currently DOM-click tiles as a hack. |
| **A7** — mic dead after EC's "Reconnect" | After EC's own mid-call reconnect, the local mic isn't re-published | EC's reconnect/track-republish path is internal. (Partly entangled with our denoise shim — see §6.) |
| Native theming | EC's UI doesn't match Lotus design; we inject CSS hacks | Real theming needs source-level component/token changes. |
| Decorations, custom controls, custom layouts, branding | all blocked | all require source access |
**Bottom line:** the iframe is **same-origin** (we self-host it), so we can read
and even write its DOM — but we **do not own its source**, so we can't change its
**behavior/logic**, only poke at its rendered output. Forking removes that wall.
---
## 2. How EC is integrated TODAY (the current architecture)
Understand this fully before changing it — the fork must slot into the same
integration seams.
### 2.1 Where the EC bundle comes from
- npm package: **`@element-hq/element-call-embedded`**, pinned to **`0.20.1`** in
`cinny/package.json` (line ~104).
- It ships a **pre-built `dist/`**. At cinny build time,
`vite-plugin-static-copy` copies that `dist/` flat into
**`public/element-call/`** (see `cinny/vite.config.js`, the `copyFiles`
target with `rename: { stripBase: 4 }` — note the stripBase gotcha documented
there; getting this wrong 404s the widget).
- It is **NOT committed** to git (`git ls-files public/element-call` → 0). It's a
build artifact materialized from `node_modules`.
### 2.2 How EC is loaded & controlled
- The widget iframe `src` is **same-origin**:
`${BASE_URL}/public/element-call/index.html?<params>` (see
`cinny/src/app/plugins/call/CallEmbed.ts`, `getWidget()` /
`getIframe()`). Sandbox: `allow-forms allow-scripts allow-same-origin
allow-popups allow-modals allow-downloads`; `allow="microphone; camera;
display-capture; autoplay; clipboard-write;"`.
- **Control surface #1 — the official widget API** (`matrix-widget-api`):
`ClientWidgetApi` + a custom `CallWidgetDriver`. This is the robust,
version-stable channel (theme change, hangup, capabilities, timeline events).
Files: `plugins/call/CallEmbed.ts`, `plugins/call/CallWidgetDriver.ts`,
`plugins/call/utils.ts` (capabilities), `plugins/call/CallControl.ts`.
- **Control surface #2 — same-origin DOM poking** (fragile, version-coupled):
reading `iframe.contentDocument` to detect speakers/mute state and
`.click()`-ing tiles to focus a camera. Files:
`hooks/useCallSpeakers.ts` (reads `[data-muted]`, `[data-video-fit]`),
`plugins/call/CallControl.ts` (`focusCameraParticipant` — tile selectors).
**These selectors break on every EC version bump.** A fork lets us replace
these hacks with real APIs/props.
- **Control surface #3 — URL params + build-time injection** for our denoise
shim (see §6).
### 2.3 Full file inventory (everything that touches EC in cinny)
Plugin / core:
- `src/app/plugins/call/CallEmbed.ts` — iframe creation, widget API wiring, theme sync, hangup, load watchdog/self-heal, denoise URL params.
- `src/app/plugins/call/CallControl.ts` — control state + **DOM-poking** (`focusCameraParticipant`, spotlight).
- `src/app/plugins/call/CallControl.tsx` _(call-status variant)_ and `features/call-status/CallControl.tsx`.
- `src/app/plugins/call/CallWidgetDriver.ts` — widget driver (capabilities, event relay).
- `src/app/plugins/call/utils.ts` — widget capabilities set.
- `src/app/plugins/call/hooks.ts`, `index.ts` — plugin exports/hooks.
- `src/app/state/callEmbed.ts` — jotai atoms for the active embed.
React / UI:
- `src/app/components/CallEmbedProvider.tsx` — the big one: incoming-call ring/banner, RTCNotification + **RTCDecline** listeners, PiP, mute badges, fullscreen, ringtones.
- `src/app/features/call/CallView.tsx` — prescreen lobby vs joined (the iframe placement target), load-error recovery UI.
- `src/app/features/call/CallControls.tsx` — in-call control bar (mic/cam/deafen/screenshare/fullscreen/more/PiP).
- `src/app/features/call/CallMemberCard.tsx` — **lobby** participant roster (this is where `AvatarDecoration` works today; in-call grid is EC's).
- `src/app/features/call/PrescreenControls.tsx` — join controls.
- `src/app/features/call-status/*` — `CallStatus.tsx`, `MemberGlance.tsx` (the "Focus camera" menu lives here), `LiveChip.tsx`.
- `src/app/features/room-nav/RoomNavItem.tsx`, `features/room/Room.tsx`, `features/room/RoomViewHeader.tsx`, `pages/client/space/Space.tsx`, `pages/CallStatusRenderer.tsx`, `pages/Router.tsx` — call entry points / status surfacing.
Hooks:
- `src/app/hooks/useCallEmbed.ts`, `useCall.ts`, `useCallSpeakers.ts` (DOM-poking), `useCallJoinLeaveSounds.ts`, `useAfkAutoMute.ts`.
Build:
- `cinny/vite.config.js` — `copyFiles` (EC dist copy) + `lotusDenoise()` plugin (denoise asset copy + index.html shim injection, in `closeBundle`).
Utils:
- `src/app/utils/ringtones.ts`, `utils/denoisePipeline.ts`, `utils/lotusDenoiseUtils.ts`.
---
## 3. Hosting / infra context (the OTHER repo)
There are **two repos**:
1. **`LotusGuild/cinny`** (`/root/code/cinny`) — this Lotus Chat fork. Consumes EC.
2. **`LotusGuild/matrix`** (`/root/code/matrix`) — the **infra/homeserver** repo.
Subdirs: `livekit/` (the SFU EC talks to), `deploy/`, `draupnir/`,
`hookshot/`, `landing/`, `matrixbot/`, `systemd/`. Gitea remote
`code.lotusguild.org/LotusGuild/matrix`, branch `main`.
EC needs a **LiveKit SFU** + the **livekit-jwt-service**; those live in
`matrix/livekit/`. A self-hosted EC build must be configured to point at our
homeserver (`matrix.lotusguild.org` / synapse) and our LiveKit. EC's runtime
`config.json` (homeserver, livekit URL, feature flags) is part of what we'll own
once we build it ourselves.
Deployment today: `chat.lotusguild.org` (the cinny web build, which embeds EC at
`/public/element-call/`). cinny-desktop (`LotusGuild/cinny-desktop`, a Tauri
wrapper, bumped by cinny CI) embeds the same.
---
## 4. The plan (proposed — confirm with the user before executing)
### Decision: **YES, create a new repo.** `LotusGuild/element-call`
Rationale: EC is a large standalone app (React + LiveKit client SDK + matrixRTC +
its own Vite build + heavy deps). Keep it out of cinny so cinny's build stays
clean — cinny keeps consuming a **built EC `dist/`**, exactly as today, just
sourced from **our fork** instead of npm.
### Phase 0 — Recon (no code)
- Fork `github.com/element-hq/element-call` → `LotusGuild/element-call` on Gitea.
- Pin to the upstream tag matching **0.20.1** (`element-call-embedded` 0.20.1's
corresponding `element-call` release) so behavior matches what's shipping now.
Verify the embedded-package version ↔ element-call repo tag mapping.
- Read EC's own build docs: it builds the "embedded" widget bundle (the thing
currently published as `@element-hq/element-call-embedded`). Reproduce that
build locally and confirm the output matches `public/element-call/` today.
- **License:** element-call is **AGPL-3.0**, same as Lotus Chat — compatible.
Our fork must remain AGPL and publish source.
### Phase 1 — Reproduce current behavior from our fork (parity, no features)
- Build our fork's embedded bundle; wire cinny to consume it instead of the npm
package (see §5 for the consumption options). Smoke-test: a call works exactly
as today (web + desktop), denoise shim still injects, widget API + theme still
work. **No behavior change yet** — this de-risks the swap.
### Phase 2 — Replace the outside hacks with source-level features
Tackle the §1 issues in EC's source:
- **A6:** render avatar decorations as part of the video-tile component
(read decoration data we pass in via widget data / URL param / a small bridge).
- **A5:** fix focus/spotlight + screenshare-coexistence in EC's layout code;
expose a clean widget action so cinny can trigger it (kill the DOM `.click()`).
- **A7:** fix mic re-publish on reconnect; reconcile with our denoise shim (§6) —
ideally move denoise INTO the fork as a real audio-processing step instead of a
`getUserMedia` monkeypatch.
- Native Lotus theming/branding at the source (kill the injected-CSS hacks).
- Then retire the DOM-poking in `useCallSpeakers.ts` / `CallControl.ts` in favor
of real widget messages.
### Phase 3 — Maintenance posture
- Decide rebase cadence vs. upstream element-call releases. Keep customizations
isolated (feature flags / minimal-diff patches) to ease rebasing.
- CI in the new repo builds + publishes the embedded dist as a versioned
artifact; cinny CI consumes a pinned version.
---
## 5. How cinny should consume the fork (pick one — decide with user)
1. **Private npm package** (mirror the current model): our fork's CI publishes
`@lotusguild/element-call-embedded` to a registry; cinny depends on it and
`viteStaticCopy` keeps working almost unchanged. _Cleanest swap; needs a
registry._
2. **Git submodule + build in cinny CI:** add the fork as a submodule, build it
during cinny's build, copy its `dist/` to `public/element-call/`. _No
registry; heavier cinny CI._
3. **CI artifact copy:** fork CI uploads a `dist` tarball; cinny CI downloads a
pinned version at build. _Decoupled; needs artifact plumbing._
**Recommendation: Option 1** — it changes the least in cinny (just swap the
package name in `package.json` + the `viteStaticCopy` src path) and preserves the
clean cinny/EC separation.
---
## 6. The denoise shim — critical interaction (don't break this)
Lotus ships ML noise suppression by **injecting a same-origin pre-init shim into
EC's `index.html` at build time** (cinny `vite.config.js` → `lotusDenoise()`,
`closeBundle`). The shim monkeypatches `getUserMedia` **before EC captures the
mic** and routes audio through RNNoise/Speex/DTLN AudioWorklets, then EC/LiveKit
publishes the processed track. It's activated via URL params
(`lotusDenoise=ml&lotusModel=…&lotusGate=…`) set in `CallEmbed.ts`.
- Assets copied to `public/element-call/denoise/` at build (sapphi RNNoise/Speex/
gate worklets + `@workadventure/noise-suppression` DTLN tree).
- Related: `utils/denoisePipeline.ts`, `utils/lotusDenoiseUtils.ts`,
`settings/general/DenoiseTester.tsx`, `VoiceMessageRecorder.tsx`.
- **Known issues:** denoise quality is still poor (tracked separately); and the
mic-after-reconnect bug (A7) is suspected to involve the shim's getUserMedia
patch handing back a stale processed stream when EC re-acquires the mic.
**Once we own the fork, the right move is to make denoise a first-class
audio-processing stage inside EC** (not an index.html monkeypatch) — more robust,
survives reconnects, and removes the build-time injection hack. Until then, the
fork's `index.html` must remain injectable the same way, or the shim must be
re-homed into the fork.
---
## 7. Doc-accuracy notes / corrections for the new session
- `LOTUS_TODO.md` (~line 533) calls EC a **"cross-origin iframe"** — **outdated.**
EC is **same-origin** today (self-hosted under our domain;
`iframe.sandbox` includes `allow-same-origin`; we read `contentDocument`), and
**as of 2026-06-29 we own the fork's source** (`@lotusguild/element-call-embedded`).
The _practical_ point it made still holds _until we ship the audio-inject API_:
**LiveKit's `LocalAudioTrack` lives in EC's module scope**, not on `window`, so
cinny can't reach it even same-origin — which is why the in-call soundboard had
to be local-playback-only. **The fork removes this wall:** EC can expose a real
`io.lotus.inject_audio` widget action (Phase 2) that mixes into the published
track from inside its own module scope.
- `LOTUS_FEATURES.md` documents the EC upgrade history (0.16.3 → 0.19.4 →
0.20.1), the dark-mode CSS injection, and AFK auto-mute — all relevant prior
art for what the fork must preserve.
- `LOTUS_TESTING.md` §D is the **EC regression sweep** to re-run after the fork
swap (Phase 1 parity check).
---
## 8. First actions for the new session
1. Read this file, then skim §2.3's files in `cinny` to internalize the seams.
2. Confirm with the user: new repo name, consumption model (§5), rebase cadence.
3. Phase 0: fork element-call, map 0.20.1 ↔ element-call tag, reproduce the
embedded build locally, diff against `public/element-call/`.
4. Phase 1: wire cinny to the fork, run `LOTUS_TESTING.md` §D parity sweep.
5. Only then start Phase 2 features (A5/A6/A7, theming, denoise-in-source).
**Cross-references:** `LOTUS_BUGS.md` (EC limitations + verify queue),
`LOTUS_TODO.md` (denoise/soundboard constraints), `LOTUS_FEATURES.md` (EC history),
`LOTUS_TESTING.md` §D (regression sweep). Infra: `/root/code/matrix` (`livekit/`,
`deploy/`).
---
## 10. Live cutover — the remaining steps (Phase 1 finish)
The fork is published and cinny builds against it locally (§9.6). What's left to
go live:
1. **Run `LOTUS_TESTING.md` §D** against a local cinny build (`npm run build` is
already proven; serve `dist/` or `npm run dev`). Verify a real call: join,
mic/cam, screenshare, theme sync, denoise on, widget hangup — web first.
2. **Commit the cinny edits** (currently staged, uncommitted in the working tree):
`.npmrc`, `package.json`, `package-lock.json`, `vite.config.js`. Suggested
message: `chore(call): consume self-built @lotusguild/element-call-embedded`.
3. **Push to `lotus`** → cinny CI builds, then `trigger-desktop` bumps
cinny-desktop → Tauri release. Re-run §D on **cinny-desktop** (the path where
the old `stripBase` bug bit — verify the widget loads, not a 404).
4. Only then start **Phase 2** (A5/A6/A7, theming, denoise-in-source).
---
## 11. Phase 2 — implementation seams (mapped 2026-06-29)
The exact integration points for each Phase 2 item, found by reading the EC fork
- cinny source. **All of these are media-path / in-call features that cannot be
functionally verified without a live Matrix + LiveKit call** — implement each as
a minimal, **feature-flagged, additive** diff (no behavior change unless cinny
opts in), build-verify the fork (`pnpm build:embedded`, ~15s) AND cinny
(`npm run build`), then gate shipping on `LOTUS_TESTING.md` §D.
**Shared widget channel (the backbone for #2/#3/#4/#7):**
- EC→cinny: `widget.api.transport.send("io.lotus.<x>", data)` (see
`element-call/src/widget.ts`).
- cinny→EC actions: add the action name to the `lazyActions` allow-list in
`widget.ts` (the array at ~L101) and handle it in EC; cinny sends via
`this.call.transport.send(...)`.
- cinny receives EC→cinny actions via the existing `listenAction(type, cb)`
helper in `plugins/call/CallEmbed.ts:626` (auto-replies `{}` so the transport
doesn't time out — same pattern as `io.element.device_mute`).
**#2 mute/speaker events** — Source: subscribe to `vm.userMedia$`
(`CallViewModel`), per member `speaking$` + `audioEnabled$`
(`state/media/UserMediaViewModel.ts:47-48`); aggregate and
`transport.send("io.lotus.call_state", {participants:[{id,speaking,audioEnabled}]})`.
Mount in `room/InCallView.tsx` via `useEffect` guarded by `widget !== null`.
cinny: `listenAction("io.lotus.call_state")` in `CallEmbed.ts`, feed
`hooks/useCallSpeakers.ts` → delete its `contentDocument` `[data-muted]` /
`[data-video-fit]` scrape. _Additive, low risk._
**#4 spotlight/focus** — EC: add `io.lotus.focus_participant` to the `lazyActions`
list (`widget.ts`), drive `vm`'s spotlight (`spotlightSpeaker$` /
`spotlight$` in `CallViewModel.ts:898/1001`) to pin a given identity, coexisting
with `hasRemoteScreenShares$` (L1008). cinny: replace
`CallControl.ts` `focusCameraParticipant` `.click()` walk with
`transport.send("io.lotus.focus_participant", {userId})`. _Additive, low risk._
**#3 audio-inject** — EC: add `io.lotus.inject_audio` action; mix an
`AudioBufferSourceNode` into the published mic track. The local publish path is
`state/CallViewModel/localMember/Publisher.ts` + `LocalMember.ts` (LiveKit
`localParticipant`); create a `MediaStreamAudioDestinationNode`, mix mic + clip,
`replaceTrack`. cinny soundboard calls the action instead of local-only playback.
_Medium; touches publish path → live-test carefully._
**#1 denoise-in-source** — replace the cinny `lotusDenoise()` `getUserMedia`
monkeypatch with a real processing stage in EC's mic capture
(`Publisher.ts`/`LocalMember.ts`; note EC has a `TrackProcessorContext` +
`BlurBackgroundTransformer` precedent in `livekit/`). EC re-runs it on every
(re)publish → fixes A7. Remove `vite.config.js` `lotusDenoise()` + URL params in
`CallEmbed.ts`; move `denoise/` assets into the fork. _Highest value, highest
risk — most live testing._
**#5 theming** — add a Lotus/TDS theme in EC's theme system (`src/useTheme.ts` +
EC theme tokens / CSS); driven by the existing `setTheme()` channel cinny already
calls (`CallEmbed.ts:277`). Bake transparent background. Delete cinny's
`applyStyles()` injection + `background:none !important`. _Medium._
**#6 in-call decorations** — render the decoration APNG in EC's tile component
(`tile/GridTile.tsx`); pass slugs via widget member data. cinny already has the
decoration data + `AvatarDecoration` (lobby `CallMemberCard.tsx`). _Medium-Large._
**#7 quality controls** — set audio `maxBitrate` via
`RTCRtpSender.setParameters` and screenshare `getDisplayMedia` constraints in
EC's publish path (`Publisher.ts`); configurable via `config.json` / a widget
message. Keep the server `voice-limit-guard` as enforcement. _Medium._
**Rollback:** revert the 4 cinny files (restores `@element-hq/...@0.20.1` from
npmjs). The fork repo/package can stay; nothing else depends on it until pushed.
### Local repro/build environment (this session, 2026-06-29)
- Upstream cloned + our `lotus` branch at `/root/code/element-call` (remote
`lotus` → Gitea; origin → github upstream, now un-shallowed/full history).
- Isolated **Node 24.18.0** lives in the session scratchpad (system Node is 20);
cinny's `.node-version` is `24.13.1`, so use Node 24 to build cinny too.
- Build the embedded bundle: in `/root/code/element-call`, with Node 24 + pnpm
10.33.0 on PATH, `VITE_APP_VERSION=embedded-v0.20.1 pnpm run build:embedded`
→ output in `dist/`; stage to `embedded/web/dist` before publishing.
---
## 12. Phase 2 — IMPLEMENTED on the fork (2026-06-30)
All 7 EC features are on the `lotus` branch of `LotusGuild/element-call`, each
**additive + feature-flagged** (a vanilla call with no `lotus*` params / no Lotus
actions behaves exactly like upstream), build + `tsc` clean, per-feature reviewed
(fixes applied) and holistically reviewed. **Not yet live-tested** — all need the
`LOTUS_TESTING.md` §D sweep.
Fork modules live under `element-call/src/lotus/*`; mounts are `useEffect`s in
`src/room/InCallView.tsx`. Custom widget actions are in `src/lotus/lotusActions.ts`
(toWidget ones allow-listed in `src/widget.ts`).
| # | Feature | Enable via | EC module |
| :--- | :------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------ | ---------------------------------------------------- |
| 2 | Speaker/mute/camera state → host | URL `lotusCallState=1` | `lotusCallState.ts` (sends `io.lotus.call_state`) |
| 4 | Focus/spotlight a participant (works during screenshare) | action `io.lotus.focus_participant {userId | null}` | `lotusFocus.ts` + `CallViewModel` spotlight override |
| 3 | Soundboard audio-inject (heard by peers) | URL `lotusAudioInject=1` + action `io.lotus.inject_audio {url,volume?}` | `lotusAudioInject.ts` |
| 7 | Audio/screenshare quality caps | action `io.lotus.set_quality {audioMaxBitrate?,screenshareMaxBitrate?,screenshareMaxFramerate?}` | `lotusQuality.ts` |
| 5 | Transparent bg + Lotus theme | URL `lotusTransparent=1` / `lotusTheme=1` | `useTheme.ts` + `index.css` |
| 6 | In-call avatar decorations | action `io.lotus.decorations {decorations:{userId:url}}` | `lotusDecorations.ts` + `MediaView.tsx` |
| 1 | ML denoise in-source (fixes A7) | URL **`lotusDenoiseSource=1`** (+`lotusModel`,`lotusGate`,`lotusGateThreshold`,`lotusDenoiseBase`) — deliberately NOT the existing `lotusDenoise=ml` (that drives the host shim; reusing it would double-process) | `lotusDenoise.ts` + `lotusDenoiseProcessor.ts` |
| P6-2 | Deafen / screenshare-audio-mute at the LiveKit source | action `io.lotus.set_deafen {deafened,screenshareAudioMuted}` — sets remote `RemoteParticipant.setVolume(0/1)` per source (Microphone + ScreenShareAudio), persists to late joiners via `RoomEvent.ParticipantConnected` | `lotusDeafen.ts` |
### 12.4 P6-2 — deafen action (retires cinny's iframe-DOM `.muted` hack)
`io.lotus.set_deafen` (fork commit, folded into unpublished **`0.20.1-lotus.2`**) replaces
cinny's `CallControl.setSound`/`applyScreenshareAudioMuted` DOM `<audio>.muted` poking —
which broke silently on EC re-render / late tracks. **Two-phase rollout:**
1. **DONE (this batch):** fork action implemented; cinny's `CallControl` now ALSO sends
`io.lotus.set_deafen` (gated on join via `forceState`) alongside the retained DOM hack.
Against the current pinned bundle (`lotus.1`, no handler) the action is immediately
error-replied and swallowed by `.catch` — inert, no timeout.
2. **TODO — needs YOU to publish, then me:** publish the fork (`0.20.1-lotus.2`) to npm →
I bump cinny's pin `0.20.1-lotus.1` → `lotus.2`, `npm install`, then DELETE the DOM
`.muted` code from `CallControl.ts` (the hack is fully retired only here).
**Known divergence to confirm:** deafen silences remote Microphone + ScreenShareAudio, but
NOT injected/soundboard audio (`Track.Source.Unknown` — livekit-client's `setVolume` type
only accepts Microphone|ScreenShareAudio). So a deafened user still hears host-triggered
soundboard clips. Defensible (short, host-gated); confirm it's the desired UX.
**Security hardening applied** (holistic audit): `lotusDenoiseBase` forced
same-origin before `audioWorklet.addModule` (was an arbitrary-code-load vector
via a crafted link); audio-inject gated behind `lotusAudioInject=1`; decoration
roster capped. Only `https`/`blob` URLs accepted for inject/decoration assets.
### 12.1 cinny host integration checklist (REQUIRED to light these up)
> ✅ **STATUS (2026-06): COMPLETE.** All items below are shipped. call_state,
> focus_participant, decorations, and transparent background are active; the
> in-source denoise cutover is done (flag `lotusDenoiseSource=1`, **all four**
> models in-source); and the two formerly-dormant capabilities now have cinny
> UI — **soundboard** (`io.lotus.inject_audio`, P5-15) and **quality controls +
> room permissions** (`io.lotus.set_quality` + `io.lotus.room_quality`, P5-31,
> with server-side enforcement in `LotusGuild/matrix`). See `LOTUS_FEATURES.md`
> → "Element Call — Self-Built Fork". The checklist is kept below as the record
> of what was wired. (One open denoise item tracked separately: the "Series
> Suppression" native-NS toggle is not wired to the real call path.)
The EC side is additive and dormant until cinny opts in. Host work (in
`src/app/plugins/call/CallEmbed.ts` unless noted) — **done**:
> ⚠️ **CRITICAL TIMING (protocol audit F1):** only send `io.lotus.*` **toWidget**
> actions (#3 focus, #6 decorations, #7 quality, audio-inject) **after** the call
> is joined (`CallEmbed.onCallJoined` / `this.joined`). Those actions are
> allow-listed at EC app-init (so `preventDefault` suppresses the auto-error)
> but their handlers only mount with `InCallView` (post-join). Sending earlier
> leaves the host's `transport.send` pending until the **10s timeout**. Queue and
> flush on join, or no-op before join.
>
> Also: **F3 (RESOLVED)** — all four models (`rnnoise`/`speex`/`dtln`/
> `deepfilternet`) are now implemented in-source in `lotusDenoiseProcessor.ts`;
> the picker offers all four. **F4** — cinny no longer forwards a native-NS flag
> in the `ml` branch (the "Series Suppression" toggle is currently a no-op in
> real calls — open item). **F7** — no widget _capability_ changes needed;
> custom actions bypass capability checks.
1. **Set the URL flags** on the widget iframe params (the `URLSearchParams` in
`CallEmbed`): `lotusCallState=1`, `lotusTransparent=1`/`lotusTheme=1`,
`lotusAudioInject=1` as desired. (Denoise sets `lotusDenoiseSource=1` + `lotusModel`/`lotusGate`/`lotusGateThreshold` in the `ml` tier.)
2. **Ack `io.lotus.call_state`**: add `listenAction('io.lotus.call_state', …)` —
without a reply the fork's sends time out every 250ms. Feed the payload into
`useCallSpeakers` and RETIRE its `contentDocument` DOM scrape.
3. **Send actions** via `this.call.transport.send(...)`:
`io.lotus.focus_participant` (replace `CallControl.focusCameraParticipant`s
`.click()`), `io.lotus.inject_audio` (from the soundboard), `io.lotus.set_quality`
(from quality settings), `io.lotus.decorations` (push the MSC4133 decoration
map; resolve mxc→https first).
4. **#1 denoise cutover**: once verified, STOP injecting the `lotusDenoise()`
shim in `cinny/vite.config.js` and remove the `index.html` injection — the
fork now does denoise in-source. Keep shipping the `denoise/` assets (the
fork loads `./denoise/…` at runtime) until those move into the fork build.
5. Re-run `LOTUS_TESTING.md` §D for each feature; only then ship.
### 12.2 Holistic multi-agent review — outstanding follow-ups (non-blocking)
Four aspect-agents reviewed the whole fork. Criticals were fixed in-branch (the
denoise restart-silence/A7 bug; the `lotusDenoiseBase` code-load vector;
audio-inject opt-in gate; #6 rendering in the wrong component; #7 simulcast cap).
Remaining, deliberately deferred:
- **Denoise H2 (double-processing):** if cinny is set to `lotusDenoise=ml` while
ALSO still injecting its build-time `getUserMedia` shim, audio is denoised
twice. The #1 cutover MUST remove the cinny-side injection (it currently has
none injected into the iframe — keep it that way). Hard requirement, not code.
- **Denoise M1 (perf):** in-source uses non-SIMD `rnnoise.wasm`; the reference
preferred SIMD with detection. Perf-only; add SIMD detection later.
- **dtln/deepfilternet (F3): RESOLVED** — all four models
(rnnoise/speex/dtln/deepfilternet) are now implemented in
`lotusDenoiseProcessor.ts` (faithful port of cinny's `build/lotus-denoise.js`
pipeline). This also fixed a real bug (the gate worklet name was `noiseGate`;
correct is the hyphenated `noise-gate`) and added per-model sample rates
(DTLN 16 kHz, others 48 kHz), context `resume()`, and SIMD wasm selection.
Still needs live §D testing per model, and depends on cinny shipping the
DTLN (`denoise/workadventure/`) + DeepFilterNet (`denoise/deepfilternet/`)
asset trees (it already does).
- **Rebase-fragility (build agent MED):** the `CallViewModel` spotlight override
edits hot upstream lines (renamed `spotlightSpeaker$`→`autoSpotlightSpeaker$`).
For cheaper future rebases, refactor it into a `src/lotus/lotusSpotlight.ts`
wrapper that takes the upstream stream and returns the overridden one, leaving
upstream's definition byte-identical (a single import + two token swaps).
- **Denoise asset coupling (build agent HIGH):** the fork loads `./denoise/*`
shipped by cinny, not by the fork build (documented in the processor). Add an
integration smoke-check that `GET …/element-call/denoise/rnnoise.wasm` == 200,
and pin the `@sapphi-red/web-noise-suppressor` version both repos expect.
- **Unconditional effect registration (build agent LOW):** focus/audio-inject/
quality/decorations register widget handlers on every embedded call (true
no-ops for a non-Lotus host). Intentional; gate behind a coarse `lotus=1` flag
if strict zero-footprint is desired.
- **Privacy (security agent):** decoration/inject URLs accept any `https`; ideally
restrict to the homeserver media origin host-side. Call-state exposes
userId/deviceId/speaking to the (trusted, same-origin) host — documented.
**Nothing here blocks the §D live test — but every feature still needs it.**
### 12.3 Safe rollout when prod is the only test environment
Every Phase-2 feature is now **dormant by default** — with the flags cinny sets
today, the fork behaves identically to the parity build (`#1` was decoupled onto
`lotusDenoiseSource=1` so it no longer collides with the host's `lotusDenoise=ml`
shim). This enables a low-risk incremental rollout even without a staging env:
1. **Ship dormant first.** Publish the `lotus` branch (e.g. `0.20.1-lotus.1`),
bump cinny's pin, deploy. With no Lotus flags set / no Lotus actions sent,
this is upstream-equivalent (only inert, holistically-reviewed code runs).
"Testing" here = confirm a normal call still works.
2. **Enable ONE feature at a time**, each independently revertable:
- URL-flag features (#2 `lotusCallState`, #5 `lotusTransparent`/`lotusTheme`,
#1 `lotusDenoiseSource`): add the flag in `CallEmbed.getWidget`, deploy,
test that one feature, roll back just that flag if needed.
- Action features (#3,#4,#6,#7): wire the host send + (for #2) the
`listenAction` ack, gated on join (§12.1 F1).
3. **#1 denoise cutover is a coordinated 2-step** (do together): set
`lotusDenoiseSource=1` AND remove the `lotusDenoise()` shim injection +
`lotusDenoise=ml` param in cinny — otherwise audio is denoised twice.
Roll back = revert both.
4. Baseline is always upstream-equivalent, so any single feature can be disabled
by flipping its flag/send off without touching the rest.
**Blocker to step 1:** publishing the `lotus` branch needs a Gitea npm token
(the admin token used for the `0.20.1` parity publish was deleted). Either
provide a token for a manual `npm publish`, or stand up the Gitea Actions runner
- `GITEA_NPM_TOKEN` secret so a `v0.20.1-lotus.1` tag auto-publishes.
-188
View File
@@ -1,188 +0,0 @@
# Lotus Chat — Open Bugs & Technical Debt
**Only OPEN and awaiting-verification items live here.** Resolved findings
(fixed-and-verified, false-positives, won't-fix) have been removed to keep this
actionable — the full history is in git. Items fixed in code but not yet
verified in a real environment are in **Needs Verification** below and have
step-by-step checks in [`LOTUS_TESTING.md`](./LOTUS_TESTING.md).
> Design rules for any fix here: follow the **Native-Cinny Law** and **TDS
> Design Law** in [`LOTUS_TODO.md`](./LOTUS_TODO.md).
---
## ⚠️ Needs Verification — fixed in code, awaiting live testing
Implemented and gate-green; confirm each per `LOTUS_TESTING.md`, then delete the row.
| ID | Item | File / area | Test |
| :--- | :-------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| #2 | Chat-background animation flicker (`contain:paint`) | `lotus/chatBackground.ts` | F1 |
| #4 | Ringtone re-fixes: classic loudness + caller decline notice (A2 ✓ live) | `CallEmbedProvider.tsx`, `ringtones.ts` | A1,A3,A4 |
| #6 | Background vs. seasonal theme mutual exclusion | `state/settings.ts`, `General.tsx` | F2 |
| #7 | Composer toolbar touch targets (≥44px) | `room/RoomInput.tsx` | E1 |
| #8 | Room Settings horizontal overflow (mobile) | `components/page/style.css.ts` | E2 |
| #9 | Modal fullscreen on mobile (`useModalStyle`) | 22+ modal files | E3 |
| #10 | Composer not hidden by keyboard (`100dvh`) | `src/index.css` | E4 |
| #12 | PiP "All muted" badge re-fixed (was firing on any single mute) | `hooks/useCallSpeakers.ts` | G1 |
| N96 | Call-recovery overlay single "Back" button | `call/CallView.tsx` | A7 |
| N95 | AFK-monitor mic released on mute (OS indicator clears) | `hooks/useAfkAutoMute.ts` | L1 |
| N108 | Maskable PWA icons (Android adaptive) | `public/manifest.json` + `res/android/maskable-*` | L2 |
| EC | EC iframe load watchdog + self-heal + recovery UI | `plugins/call/CallEmbed.ts`, `CallView.tsx` | A7 |
| N105 | Notification clicks work after tab close (SW `notificationclick` + `showNotification`) | `sw.ts`, `utils/dom.ts`, `ClientNonUIFeatures.tsx` | get a msg notif, close the tab, click it → app focuses/opens + routes to the room |
| Gal | MediaGallery lazy-decrypt (true virtualization deferred) | `room/MediaGallery.tsx` | H1 |
| a11y | aria-labels: edit-history / reaction / thread / reply | `message/*` (`FallbackContent`, `Reaction`, `Reply`) | I |
| P3-8 | Thread Panel (side drawer, chips, threaded receipts, thread composer) | `features/room/thread/*`, `RoomTimeline/RoomInput` | 6-step checklist in LOTUS_TODO §P3-8 |
| P4-4 | KaTeX math (`$…$`, `$$…$$`, data-mx-maths; lazy chunk) | `utils/mathParse.ts`, `components/math/` | send `$x^2$`, `$$\int f$$`, `$5 and $10` (stays text), math inside code block (stays text) |
| P4-8 | Encrypted-search cache (opt-in toggle, clear button, logout wipe) | `utils/searchCache.ts`, message-search | enable in search panel → search → reload → coverage persists; logout wipes |
| N97a | Session blob migration + cross-tab logout sync | `state/sessions.ts`, `useSessionSync` | login on old build → new build migrates; logout in tab A → tab B drops to auth |
| P4-1 | Slack-style thread notifications (participating default, All/Mentions/Mute, badge math) | `utils/threadNotifications.ts`, `ClientNonUIFeatures`, `roomToUnread` | 6-step checklist in LOTUS_TODO §P4-1 |
| AW-1 | Scheduled-message cancel no longer ghost-sends (error row on failure) | `ScheduledMessagesTray.tsx` | schedule → cancel with network cut → item stays + error; retry works |
| AW-2 | Emoji lazy-load (search/autocomplete/recents fill in; board opens fast) | `plugins/emoji.ts` + consumers | first emoji-board open of a session: grid+search populate; reactions still label |
| AW-3 | SW precache (repeat-visit near-instant; deploys still picked up immediately) | `sw.ts`, `vite.config.js` | load app twice (2nd = cached assets); deploy → reload picks new version |
| AW-4 | Desktop CSP tighten + Escape/panel fixes + thread Jump to Latest | `tauri.conf.json`, Room/ThreadPanel | desktop: boots, avatars/media load, VT323 font renders, location maps embed, calls connect, deep links work |
| P3-4 | Accessibility compliance pass (collapsed-msg SR sender, form/overlay labels, typing announce, focus-return, `?` help, jsx-a11y CI gate) | `message/*`, `RoomViewTyping`, `features/shortcuts/*`, `eslint.config.mjs` | LOTUS_TESTING §P — axe-core + VoiceOver/NVDA on the golden path |
| P6-1 | Desktop Linux parity (no-sleep in calls, launcher badge), autostart toggle, tray Do-Not-Disturb | `native/power.rs`, `lib.rs`, `useTauriDnd`, `General.tsx` | Linux desktop: no display sleep during a call; tray DND silences notifications; launch-on-login persists; Unity badge (Ubuntu); DND toggle polarity |
| P6-2 | EC deafen/screenshare-audio-mute via `io.lotus.set_deafen` (retires the `<audio>.muted` iframe hack) | fork `lotusDeafen.ts`, cinny `CallControl.ts` | AFTER publish+pin-bump: deafen silences remote audio + survives a reconnect / new screenshare / late joiner (the cases the DOM hack failed); screenshare-audio-mute toggles independently |
| P6-3 | Forward-to-multiple-rooms (multi-select + partial-failure summary) + live bookmark previews (edits/redactions, snapshot fallback) | `ForwardMessageDialog.tsx`+`forwardContent.ts`, `BookmarksPanel.tsx` | forward one msg to 3 rooms (incl. 1 you cannot post to = partial summary); bookmark then edit shows edited; redact shows deleted; leave room shows snapshot |
| P6-4 | HSTS + Permissions-Policy on prod nginx (+ contrib examples) | `matrix/cinny/nginx.conf`, `contrib/nginx`, `contrib/caddy` | after `nginx -s reload`: `curl -sI https://chat.lotusguild.org` shows HSTS + Permissions-Policy; a call (cam/mic/screenshare) + location share still work |
**Verified working in live testing (2026-06):** A2, B1B4, C1, C3, D (mic/camera/deafen/screenshare/fullscreen/more-menu/PiP). Denoise quality in D is still poor — tracked under the denoise project, not a regression.
---
## 🧩 Element Call source-level items — now actionable via the fork
> 🔱 **[EC-FORK]** **UPDATE 2026-06-30: Phase 2 IMPLEMENTED.** We own and
> self-build Element Call (`LotusGuild/element-call` →
> `@lotusguild/element-call-embedded@0.20.1-lotus.1`, cinny wired). A5/A6/A7
> below are **fixed in the fork** — they are now ⚠️ awaiting **live
> verification** (`LOTUS_TESTING.md` §D2), not open work. See
> [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md) §10. Delete each
> row once verified live.
The in-call participant grid is rendered **inside EC's app** — now editable source
(previously a prebuilt npm bundle we could only style around). Status of the items
from testing:
- **A5 — "Focus camera": ⚠️ FIXED in fork, awaiting verify (D2-3).** cinny now
sends an `io.lotus.focus_participant` widget action that pins a participant in
EC's layout (coexisting with / overriding the screenshare spotlight); the old
`.click()`-the-tile DOM hack in `CallControl.ts` is deleted.
- **A6 — avatar decorations in-call: ⚠️ FIXED in fork, awaiting verify (D2-4).**
cinny pushes `io.lotus.decorations` (per-user APNG URLs) and the fork renders
them on EC's participant video-tile avatars — not just our pre-join lobby roster.
- **A7 — mic dead after EC's "Reconnect": ⚠️ FIXED in fork, awaiting verify
(D2-1).** Denoise moved into EC's mic-capture/publish pipeline as a first-class
LiveKit `TrackProcessor` (flag `lotusDenoiseSource=1`); EC re-runs it on every
(re)publish, so reconnects keep denoise alive natively. The build-time
`getUserMedia`/`index.html` injection (the root cause) is removed. **Highest
blast radius — everyone's mic; verify D2-1 carefully.**
---
## 🔴 Open — Actionable
### 🧨 Encryption / E2EE — ⚠️ EXTREME COMPLEXITY · 🧠 PLANNING SESSION REQUIRED · 👤 SENIOR ENGINEER
> 🧰 **Investigation kit ready (2026-07):** [`LOTUS_E2EE_INVESTIGATION.md`](./LOTUS_E2EE_INVESTIGATION.md)
> has the per-KE capture runbook (console signatures, synapse-side queries, the
> KE-1→KE-2 causality decision tree, ranked remediations), and the client now
> ships a **Crypto Diagnostics** capture helper (Settings) — run it during the
> next affected call and download the report before starting any fix.
> **Observed live in prod 2026-06-30** on `chat.lotusguild.org` during a 2-person
> **Element Call** (E2EE enabled). These span **client rust-crypto (via
> `matrix-js-sdk@41.6.0-rc.0`) ↔ Synapse ↔ Element Call's MatrixRTC E2EE** and are
> very likely **interrelated** (see KE-1 → KE-2). Do **not** spot-fix — they need
> a dedicated cross-system planning session with the homeserver owner. Capture
> full client console + a synapse-side trace for the same call before starting.
> **None of these are caused by the EC fork work** (the issues reproduce on the
> old build; the local mic/denoise path is unrelated to key distribution).
- **KE-1 — One-time-key (OTK) upload conflict storm (CRITICAL, root-cause candidate).**
`POST /_matrix/client/v3/keys/upload` returns `400 M_UNKNOWN: One time key
signed_curve25519:AAAAAAAAAGQ already exists. Old key: {…} new key: {…}`
firing **continuously** (many/sec). The client repeatedly tries to publish an
OTK at a key id the server already holds **with a different value**, i.e. the
rust-crypto key store and Synapse have **diverged OTK state**. Impact: floods
the crypto outgoing-request loop and is the prime suspect for the downstream
missing-key failures (no fresh OTKs ⇒ no new Olm sessions ⇒ undecryptable
to-device key events). _Investigate:_ device/key-store reset-or-restore
mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a
Synapse OTK bug. Repro signature: grep console for `already exists`.
**Extreme — planning session.**
**Update 2026-07 (investigation §6):** upstream `matrix-rust-sdk#5200` (still
OPEN) confirms the mechanism — on the 400, `mark_request_as_sent()` never fires
so the SDK re-issues the identical upload forever. **`41.7.0` does NOT fix it**
(crypto-wasm 17→18.3.1 has no OTK/upload change; 18.3.x was to-device security
only) — the SDK-pin lever is closed. Root cause = **store↔server OTK
divergence**; the leading **web-specific** trigger is that cinny never calls
**`navigator.storage.persist()`**, so the IndexedDB crypto store is evictable
while the `localStorage` session/device-id survives → device resurrects with a
blank store → re-uploads OTKs the server still holds. **Actionable preventive
fix (buildable now, no call needed):** request persistent storage on login
(+ optional multi-tab guard + 400-loop→recovery-prompt). Healing an already-
diverged device still needs a clean **logout+login** (not just "clear
storage"). See `LOTUS_E2EE_INVESTIGATION.md` §6.
- **KE-2 — Element Call media keys not arriving/decrypting → audio & video cut out (CRITICAL).**
`MissingKey: missing key at index N for participant @user`, `skipping decryption
due to missing key`, `MissingKey: key set not found for @user at index 0`, and
rust-crypto `WARN … Received an unexpected encrypted to-device event …
event_type="io.element.call.encryption_keys"`. EC distributes per-participant
media keys as **encrypted to-device `io.element.call.encryption_keys`** events;
these aren't being received/decrypted in order, so remote LiveKit audio/video
can't be decrypted — **this is the "friend's audio cuts out occasionally"
symptom.** Almost certainly downstream of **KE-1** (broken Olm sessions). Spans
EC's MatrixRTC E2EE + rust-crypto to-device + Synapse. **Extreme — planning
session.**
- **KE-3 — Timeline decryption error: missing `algorithm` field (HIGH).**
`Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg:
missing field 'algorithm' at line 1 column 138 …]`. A malformed/legacy
encrypted event (or a serialization mismatch in the RC SDK) that rust-crypto
can't parse. Lower frequency than KE-1/2 but a distinct decode-path failure —
capture the offending event id (`$SASBBzoqj…` seen) and inspect its raw content.
- **KE-4 — MatrixRTC delayed-event / membership timeouts (MEDIUM-HIGH, reliability).**
`[MembershipManager] Network local timeout error while sending event, immediate
retry … AbortError: Restart delayed event timed out before the HS responded`,
with repeated `org.matrix.msc4157.update_delayed_event`. MSC4140/4157
delayed-event reliability against `matrix.lotusguild.org` — can cause stale/ghost
call membership and missed leave events. May be partly **homeserver
responsiveness**; correlate with synapse latency/load. Include in the same
planning session since it shares the call-reliability + HS-interaction surface.
### Security & Privacy
- **N97 — Access token stored in plaintext `localStorage`** (`state/sessions.ts`), vulnerable to XSS; device ID likewise. Architectural — needs a token-protection / session-storage redesign.
- ~~**Session writes are non-atomic and not cross-tab synced**~~ — **done (2026-07):** atomic single-key `cinny_session_v1` blob (legacy-key migration + dual-write) + `subscribeSessionChanges`/`useSessionSync` cross-tab reload. (The plaintext-token concern in N97 above is the remaining, separate architectural item.)
- **Persisted PII without encryption:** user status message + expiry (`settings/account/Profile.tsx`), unsent composer drafts (`room/RoomInput.tsx`). Leak risk on shared devices.
### PWA / Offline / Notifications
- **N107 — SW has no `push` handler** — Web Push delivery is entirely non-functional. Needs a `push` listener + a Matrix push-gateway integration.
- **No app-asset caching strategy** (`src/sw.ts`) — no offline capability.
- ~~**`manifest: false`** may block PWA install~~ — **verified OK (2026-06):** `index.html` links `/manifest.json`, which exists in `public/` and is copied to `dist/`; VitePWA intentionally doesn't generate one. Not a bug.
### Dependencies & Build
- ~~**`matrix-js-sdk` pinned to a Release Candidate**~~ — **done (2026-07):** moved to `41.7.0` stable (crypto-wasm 18.3.1 security bump). Deep-audit dep triage: all 16 npm advisories are dev-only/unreachable/dead-dep — zero shipped exposure; dead `dompurify` removed. `@atlaskit`/build-tool pins remain review-worthy but low priority.
- **Build-time overhead:** `lotusDenoise` does heavy sequential `fs` work in `closeBundle`; `viteStaticCopy` config is complex with redundant renames — could be streamlined.
### Code Hygiene / DevEx
- **Automated test suite — 561+ tests across 65+ modules, a hard CI gate.** `npm test` runs Node's built-in runner via `tsx` (not vitest — Vite 8 is ahead of vitest's range) and **blocks the build job on failure**. Broad pure-logic coverage: utils (common, regex, sanitize/XSS, time, matrix, matrix-uia, mimeTypes, sort, accentColor, findAndReplace, AsyncSearch, ASCIILexicalTable, keyboard, room, matrix-crypto, featureCheck, syntaxHighlight, imageCompression, user-agent, callSounds), state (settings, sessions, recentSearches, upload, typingMembers, lists, room-list, toast, scheduledMessages, backupRestore, callEmbed/callPreferences, spaceRooms, …), plugins (matrix-to, call/utils, via-servers, bad-words, recent-emoji, custom-emoji, markdown block/inline/utils), OIDC (cs-api, useParsedLoginFlows, oidcState), lotus/avatarDecorations, message-search, search filters. Prevention work has caught + fixed **4 real bugs** (`findAndReplace` infinite-loop; `getSettings` crash-on-load when storage is blocked; `isMacOS` never matching modern Macs; `isMLDenoiseSupported` throwing `ReferenceError` instead of returning false on browsers lacking the `AudioWorkletNode` binding). **Next:** component/integration tests (the untestable-under-tsx DOM/React surface).
- **Extensive `as any` casts** across `src/` — gradual typing cleanup.
- **`types/matrix/` mirrors SDK types** instead of importing them — drift risk.
- ~~**Hardcoded CDN URL** should move to an env var~~ — **done:** `avatarDecorations.ts` already honors a `VITE_DECORATION_CDN` env override (lines 14-16); the in-repo literal is only the default. Nothing left.
- **`patch-folds.mjs` edits `node_modules` directly** — consider `patch-package`.
- **Infra docs:** `contrib/nginx` lacks security headers (HSTS/CSP) + uses rewrites over `try_files`; `contrib/caddy` has a placeholder path. CI/CD (`prod-deploy.yml`): sequential deploy, aggressive 1-min Netlify timeout, `package-manager-cache: false`.
- **README:** keep the fork-sync version + logo path current. (`CONTRIBUTING.md` is intentionally left as upstream Cinny's — not a Lotus concern.)
- **Architecture notes (low priority):** deep `features/` + `hooks/` nesting, many small coupled hooks, possible dead CSS/components, `SpacingVariant` / `DropTarget` recipe simplification.
- **Git workflow (forward-looking):** keep commits scoped — past monolithic "fix all bugs" commits and inconsistent prefixes hurt `git bisect`.
### Big Projects
- ~~**#5 — Seasonal themes & chat-background redesign.**~~ **DONE (2026-06/07):** 11 seasonal/holiday overlays shipped and later toned down + given a settings preview grid; all 19 chat backgrounds redesigned (Carbon + Aurora kept per user preference), one design sprint each, GPU-friendly CSS with `prefers-reduced-motion` + pause toggle. Remaining polish rides normal bug flow, not a "big project."
-504
View File
@@ -1,504 +0,0 @@
# Lotus Chat — E2EE Investigation Runbook (KE-1 → KE-4)
> **Scope:** evidence-gathering only. Do **not** apply fixes from this document
> without a cross-system planning session (client rust-crypto ↔ Synapse ↔
> Element Call MatrixRTC). Symptom source: `LOTUS_BUGS.md` §"Encryption / E2EE"
> (KE-1..KE-4), observed live 2026-06-30 on `chat.lotusguild.org` during a
> 2-person Element Call.
>
> **Client:** Lotus Cinny fork, `matrix-js-sdk@41.6.0-rc.0`, rust-crypto.
> **Server:** Synapse `1.155.0` on **LXC 151** (`10.10.10.29`), PostgreSQL 17.9
> on **LXC 109** (`10.10.10.44`). Facts below are copy-pasteable against that
> deployment (paths/IPs from `/root/code/matrix/README.md`).
---
## 0. Deployment facts used by this runbook
From the matrix infra README (`/root/code/matrix/README.md`):
| Thing | Value |
| ------------------------ | ------------------------------------------------------------- |
| Synapse host | LXC **151**, `10.10.10.29` (Synapse 1.155.0) |
| Synapse log | `/var/log/matrix-synapse/homeserver.log` |
| Synapse config | `/etc/matrix-synapse/homeserver.yaml` (+ `conf.d/`) |
| Synapse HTTP | `10.10.10.29:8008` |
| PostgreSQL host | LXC **109**, `10.10.10.44` (PG 17.9), db `synapse` |
| synapse-admin UI | `http://10.10.10.29:8080` |
| LiveKit / lk-jwt / guard | LXC 151: LiveKit `:7880/:7881`, guard `:8070`, lk-jwt `:8071` |
| SSH path to Synapse | `ssh root@10.10.10.4` then `pct enter 151` |
| SSH path to PG | `ssh root@10.10.10.4` then `pct enter 109` |
**Getting a psql shell** (run on LXC 109, or from 151 over the network):
```bash
# On LXC 109:
sudo -u postgres psql synapse
# From LXC 151 (pg_hba allows 10.10.10.29):
psql "host=10.10.10.44 user=synapse dbname=synapse"
```
**Tailing Synapse during a call** (on LXC 151):
```bash
tail -F /var/log/matrix-synapse/homeserver.log | tee /tmp/lotus-call-$(date +%s).log
```
Synapse E2EE/to-device logging is chatty at `INFO`; if a category is silent,
temporarily raise it in `/etc/matrix-synapse/conf.d/log.yaml` (or the
`log_config` file referenced by `homeserver.yaml`):
```yaml
loggers:
synapse.rest.client.keys: { level: DEBUG }
synapse.handlers.e2e_keys: { level: DEBUG }
synapse.storage.databases.main.end_to_end_keys: { level: DEBUG }
synapse.handlers.devicemessage: { level: DEBUG } # to-device
```
Then `systemctl reload matrix-synapse` (reload re-reads log config without a
full restart). **Revert to `INFO` after the capture** — DEBUG is very verbose.
---
## 1. Per-KE evidence matrix
Client greps assume Chrome/Firefox DevTools console (filter box or, better,
"Preserve log" + save-as). The **Crypto Diagnostics** card (Settings →
Developer Tools) auto-captures every signature below into a downloadable JSON —
use it as the primary client artifact and DevTools as the raw backup.
### KE-1 — OTK upload conflict storm (root-cause candidate)
- **Console signature (grep):**
- `already exists`
- full: `POST /_matrix/client/v3/keys/upload … 400 M_UNKNOWN: One time key signed_curve25519:<id> already exists. Old key: {…} new key: {…}`
- **Capture client-side:**
- Timestamp (first occurrence + rate — "N/sec"), **device id**, **user id**.
- DevTools → **Network** → filter `keys/upload`: for a failing call save the
**request body** (the `one_time_keys` map — note the exact `signed_curve25519:<id>`)
and the **response body** (the `Old key` / `new key` JSON). This diff is the
smoking gun: same key-id, different value ⇒ store vs server divergence.
- Whether it self-heals or loops forever (KE-1 loops).
- **Synapse log grep (LXC 151):**
```bash
grep -E "keys/upload|One time key .* already exists|OneTimeKey" \
/var/log/matrix-synapse/homeserver.log | grep "<user_id>"
```
- **Synapse SQL (LXC 109) — what the server thinks it holds:**
```sql
-- Current OTK inventory for the device (compare key_id set against the
-- request body the client keeps retrying).
SELECT algorithm, key_id, ts_added_ms
FROM e2e_one_time_keys_json
WHERE user_id = '@user:matrix.lotusguild.org'
AND device_id = '<DEVICE_ID>'
ORDER BY algorithm, key_id;
-- Server's advertised counts (this is what /sync tells the client it has,
-- and drives whether the client decides to upload more).
SELECT algorithm, count(*) FROM e2e_one_time_keys_json
WHERE user_id = '@user:matrix.lotusguild.org' AND device_id = '<DEVICE_ID>'
GROUP BY algorithm;
-- Fallback key state (used when OTKs are exhausted).
SELECT algorithm, key_id, used, ts_added_ms
FROM e2e_fallback_keys_json
WHERE user_id = '@user:matrix.lotusguild.org' AND device_id = '<DEVICE_ID>';
```
> Table names are Synapse 1.155 (`e2e_one_time_keys_json`,
> `e2e_fallback_keys_json`). If a name is absent, list with `\dt e2e*` in psql.
- **Confirms:** if the offending `key_id` (from the 400) is **present** in
`e2e_one_time_keys_json` with a **different** stored value than the client's
request body → OTK state has diverged (rust-crypto store vs Synapse). That is
the KE-1 root condition.
### KE-2 — EC media keys not arriving/decrypting (audio/video cutouts)
- **Console signature (grep):**
- `MissingKey`
- `missing key at index` (e.g. `MissingKey: missing key at index N for participant @user`)
- `key set not found`
- `io.element.call.encryption_keys` (rust-crypto: `WARN … Received an unexpected encrypted to-device event … event_type="io.element.call.encryption_keys"`)
- **Capture client-side:**
- Timestamp windows where a participant's audio/video cut out, and the
`@participant` + `index N` from the message.
- The `io.element.call.encryption_keys` warnings (these are the media-key
to-device events failing to decrypt) with their timestamps.
- Own device id + user id (to correlate with the sender's Olm session).
- **Synapse log grep (LXC 151) — to-device delivery of the media keys:**
```bash
grep -E "io.element.call.encryption_keys|m.room.encrypted|/sendToDevice|to_device" \
/var/log/matrix-synapse/homeserver.log | grep -E "<user_id>|<participant_id>"
```
- **Synapse SQL (LXC 109) — undelivered / queued to-device events:**
```sql
-- Backlog of to-device messages queued for the affected device. A growing
-- count here = the HS has the media-key events but the device isn't draining
-- them via /sync (or they were sent to a stale device id).
SELECT user_id, device_id, count(*) AS pending
FROM device_inbox
WHERE user_id = '@user:matrix.lotusguild.org'
GROUP BY user_id, device_id;
-- Cross-check the device id the sender is targeting actually exists / is current.
SELECT device_id, display_name, last_seen, ts
FROM devices WHERE user_id = '@user:matrix.lotusguild.org';
```
- **Confirms:** to-device events present but undecryptable (client shows the
`io.element.call.encryption_keys` "unexpected encrypted" warning) ⇒ there is
**no valid Olm session** to decrypt them — the expected downstream of KE-1.
### KE-3 — Timeline decryption error: missing `algorithm` field
- **Console signature (grep):**
- `DecryptionError`
- full: `Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg: missing field 'algorithm' at line 1 column 138 …]`
- **Capture client-side:**
- The **event id** (`$SASBBzoqj…` was one) and the **room id**.
- Pull the raw event JSON via DevTools or the Developer Tools account-data/event
viewer, or directly:
```
GET https://matrix.lotusguild.org/_matrix/client/v3/rooms/<roomId>/event/<eventId>
```
Inspect `content` — confirm whether `algorithm` (should be
`m.megolm.v1.aes-sha2`) is truly absent vs a serialization mismatch.
- **Synapse log grep (LXC 151):**
```bash
grep -E "<eventId>" /var/log/matrix-synapse/homeserver.log
```
- **Synapse SQL (LXC 109) — the stored event content as the HS holds it:**
```sql
SELECT ej.event_id, e.type, e.sender, e.origin_server_ts,
(ej.json::json -> 'content' -> 'algorithm') AS algorithm
FROM event_json ej
JOIN events e USING (event_id)
WHERE ej.event_id = '$SASBBzoqj...';
```
- **Confirms:** if the stored `content.algorithm` is **NULL/absent** on the HS →
a malformed/legacy event was persisted (sender-side or federation). If it is
**present** on the HS but the client throws → an RC-SDK deserialization bug.
This distinction decides whether KE-3 is a data problem or a client problem.
### KE-4 — MatrixRTC delayed-event / membership timeouts
- **Console signature (grep):**
- `update_delayed_event` (`org.matrix.msc4157.update_delayed_event`)
- `delayed event` / `Restart delayed event timed out`
- full: `[MembershipManager] Network local timeout error while sending event, immediate retry … AbortError: Restart delayed event timed out before the HS responded`
- **Capture client-side:**
- Timestamps of each timeout; whether they correlate with call join/leave or
with general sync slowness.
- DevTools → Network: the `…/delayed_events…` / `update_delayed_event`
requests — their **HTTP status and latency** (timed-out vs slow-200).
- **Synapse log grep (LXC 151):**
```bash
grep -E "delayed_event|msc4140|msc4157|update_delayed" \
/var/log/matrix-synapse/homeserver.log | grep "<user_id>"
# HS responsiveness in the same window (KE-4 may be pure latency):
grep -E "Processed request|/sync" /var/log/matrix-synapse/homeserver.log | tail -50
```
- **Server-side corroboration (Grafana, `dashboard.lotusguild.org`):** Synapse
p99 response time (excl. `/sync`), event-processing lag, DB query latency for
the call window. High latency here ⇒ KE-4 is (partly) homeserver
responsiveness, not a client bug.
- **Confirms:** timeouts that line up with HS latency spikes → reliability/load;
timeouts with a healthy HS → client MembershipManager retry logic.
---
## 2. Causality hypothesis
```
KE-1 OTK upload conflict storm
(rust-crypto store ↔ Synapse OTK state DIVERGED; server rejects re-uploads)
│ no fresh OTKs can be published/claimed
No new Olm (1:1) sessions can be established with this device
KE-2 EC media-key to-device events (io.element.call.encryption_keys)
arrive but cannot be decrypted ⇒ MissingKey at index N
⇒ friend's audio/video cuts out
```
KE-3 (missing `algorithm`) and KE-4 (delayed-event timeouts) are **likely
independent** of the KE-1→KE-2 chain: KE-3 is a decode/serialization path,
KE-4 is a MatrixRTC-vs-HS reliability path. Confirm/refute independence with the
decision tree below.
### Decision tree — which capture confirms/refutes each link
```
Q1. Does the KE-1 offending key_id from the 400 response exist in
e2e_one_time_keys_json with a DIFFERENT value than the client request body?
├─ YES → OTK divergence CONFIRMED (KE-1 root). Go to Q2.
└─ NO → Not divergence. Check: are OTK counts at 0 with fallback key `used=true`?
├─ YES → OTK exhaustion, not divergence — different remediation.
└─ NO → Suspect RC-SDK 41.6.0-rc.0 upload-loop regression (see §3).
Q2. During the same call, are io.element.call.encryption_keys to-device events
present in device_inbox / Synapse to-device logs for our device id?
├─ YES + client shows "unexpected encrypted"/MissingKey
│ → KE-1 ⇒ KE-2 LINK CONFIRMED (events delivered, no Olm session to open them).
├─ YES + client decrypts fine, but LiveKit still silent
│ → KE-2 is downstream of LiveKit/SFU, NOT KE-1. Decouple from crypto.
└─ NO (nothing queued/targeted our device)
→ media keys never sent to us: stale device id / membership (see KE-4)
→ KE-2 is a device-targeting problem, weakly linked to KE-1.
Q3. KE-3: is content.algorithm NULL in event_json on the HS?
├─ YES → malformed persisted event (sender/federation). Independent of KE-1.
└─ NO → client-side RC-SDK deserialization bug. Independent of KE-1.
Q4. KE-4: do delayed-event timeouts coincide with Synapse p99 latency spikes
(Grafana) in the same minute?
├─ YES → homeserver responsiveness/load. Independent of KE-1..KE-3.
└─ NO → client MembershipManager retry behavior. Independent.
```
---
## 3. Ranked remediation options (with blast radius)
> Ordered least-destructive → most-destructive. **Do not run any of these as a
> "fix" before the planning session** — they are listed so evidence collection
> can be paired with a recovery plan. Confirm the root condition (Q1/Q2) first.
1. **Per-device logout + re-login of the affected device** _(lowest blast radius)_
- **What:** log the one glitching device out and back in. Forces a fresh
device id, fresh device keys, and a clean OTK batch — sidesteps a diverged
OTK store without touching other sessions.
- **Blast radius:** that device only. Other sessions/devices untouched.
- **Cost:** the new device must be re-verified (cross-signing) and will need
to restore room keys from **key backup** to read old encrypted history.
- **Confirms/uses:** if KE-1 stops after this, OTK-store divergence (Q1) was
the cause.
2. **Client crypto-store reset (`clearLoginData` path)** _(medium)_
- **What:** `clearLoginData()` in `src/client/initMatrix.ts` (coordinator's
file — do not edit) **deletes ALL IndexedDB databases** (incl.
`web-sync-store` and the rust-crypto store `crypto-store`), **unregisters
service workers**, **clears all Cache Storage**, and **`localStorage.clear()`**,
then reloads. `clearCacheAndReload()` is lighter — it only calls
`mx.store.deleteAllData()` (sync cache) and does **not** wipe crypto.
- **Blast radius:** this browser profile only, but total: you are logged out,
lose all cached sync state, drafts, settings, and **the local
megolm/room-key store**.
- **⚠️ Message-history / backup implication:** wiping `crypto-store` destroys
locally-held **room keys (megolm inbound sessions)**. Any history **not
backed up to server-side Key Backup** becomes **permanently undecryptable
on this device**. Before doing this: verify Key Backup is enabled and the
recovery key / passphrase is available (Settings → Security), or the user
loses readable history. Cross-signing must be re-established too.
- **Use when:** the rust-crypto store itself is corrupt/diverged and option 1
didn't clear it.
3. **SDK pin change off the RC** _(medium — codebase change, needs rebuild)_
- **Current pin:** `package.json` → `"matrix-js-sdk": "41.6.0-rc.0"` (a
release candidate).
- **Finding (npm / GitHub changelog, checked 2026-07):** stable **`41.6.0`**
was released **2026-05-26**. Its only changelog line is _"Throw sane error
on completeLoginOnNewDevice IdP rejection"_ — **no OTK / keys-upload / Olm /
to-device fix** relative to the RC. Later stable lines exist
(`41.7.0`, `41.8.0`; `41.7.0-rc.3` / `41.9.0-rc.0` seen as pre-releases).
Nearby crypto-relevant entries: `41.5.0` _"Enable encrypted history sharing
by default"_; `41.4.0` key-backup handling. **No changelog entry directly
addresses the KE-1 OTK-conflict symptom** in the immediate range — so
moving RC→`41.6.0` stable is a low-risk hygiene step but is **not expected
to fix KE-1 by itself**. Before pinning, re-read the CHANGELOG for any
`41.7.x`/`41.8.x` OTK/one-time-key/olm entry that post-dates this note.
- **Blast radius:** all users after the next `cinny-build.sh` deploy. Test the
rust-crypto IndexedDB schema — a downgrade triggers the `IDB_VERSION_CONFLICT`
path in `initMatrix.ts`.
4. **Synapse-side OTK row surgery** _(LAST RESORT — highest danger)_
- **What:** deleting/rewriting rows in `e2e_one_time_keys_json` (and/or
`e2e_fallback_keys_json`, `device_inbox`) for the affected device to force
the client to re-upload a clean batch.
- **⚠️ Danger:** direct writes to Synapse crypto tables can **desync every
device of that user**, break Olm sessions **for everyone who has claimed one
of those keys**, and are easy to get wrong (wrong `key_id`, cache not
invalidated). Synapse caches OTK counts — a raw DELETE without a restart can
leave the advertised count wrong, **worsening** the KE-1 loop.
- **Guardrails if ever done (planning session + HS owner only):** full
`pg_dump` of `synapse` first; do it during **zero active calls**; delete only
the exact diverged `key_id` for the exact `device_id`; `systemctl restart
matrix-synapse` to flush caches; then log the device out/in (option 1) so it
republishes. **Never** run this speculatively.
---
## 4. "Capture session" checklist (run during the next call)
Do these **in order**. Aim to have client + server capturing the **same call**.
1. **Prep server tail (LXC 151):** SSH in, start
`tail -F /var/log/matrix-synapse/homeserver.log | tee /tmp/lotus-call-$(date +%s).log`.
(Optionally raise the `synapse.rest.client.keys` / `handlers.e2e_keys` /
`handlers.devicemessage` loggers to DEBUG per §0 and `systemctl reload
matrix-synapse` — remember to revert after.)
2. **Prep client:** open Lotus Chat → Settings → Developer Tools → **enable
Developer Tools** so the **Crypto Diagnostics** card is visible; note its
entry count starts at (or reset by reload to) 0.
3. **Open DevTools** (F12) → Console: enable **Preserve log**; Network tab:
enable **Preserve log** + **Record**. Note your **device id** and **user id**
(Settings → Devices / Developer Tools → Copy access token page shows ids).
4. **Note wall-clock start time** (ISO/UTC) on both machines so logs align.
5. **Join the Element Call** with the second participant; reproduce the fault
(wait for the audio/video cutouts and let KE-1 storm run ~3060s).
6. **When a fault occurs, note the wall-clock timestamp** and which symptom
(audio cut / video freeze / etc.) — this bounds the log window.
7. **Client artifacts:** in the Crypto Diagnostics card click **Download report**
(`lotus-crypto-diag-<ts>.json`); in DevTools Network, save the failing
`keys/upload` request+response (right-click → Save/Copy), and the raw HAR
(Network → Save all as HAR) for the call window.
8. **Grab KE-3 event id / KE-2 participant+index** from the console (or the
diag JSON `entries[]`) for the SQL lookups.
9. **Server artifacts:** stop the tail; run the per-KE greps and SQL from §1
against the noted device id / user id / event id, saving output alongside the
client JSON. Screenshot the Grafana Synapse latency panels for the window
(for KE-4).
10. **Bundle & label:** put client JSON + HAR + server log slice + SQL output in
one folder named with the call's UTC start time. Revert any DEBUG log config
(`systemctl reload matrix-synapse`). Hand off to the planning session — **do
not apply §3 remediations yet.**
---
## 5. Client diagnostics helper (this kit)
- **`src/app/utils/cryptoDiagLog.ts`** — capture-only console instrumentation.
- `installCryptoDiagLog()` — idempotent; wraps `console.warn`/`console.error`
with pass-through wrappers (originals always called) that ring-buffer (max
**200**) any line matching the KE signatures. No network, no timers.
- `getCryptoDiagEntries()` — snapshot copy of the buffer (`{ ts, level, ke,
signature, message }`, most-recent-last).
- `buildCryptoDiagReport(mx)` — JSON string: SDK version, device id, user id,
sync state, `cryptoReady` (`mx.getCrypto()` presence), per-KE counts, and the
entry buffer. No tokens/PII beyond those ids; captured log lines are retained
verbatim as evidence.
- **Signatures → KE mapping:** `already exists`→KE-1; `missing key at index` /
`io.element.call.encryption_keys` / `MissingKey`→KE-2; `DecryptionError`→KE-3;
`update_delayed_event` / `delayed event`→KE-4.
- **`src/app/features/settings/developer/CryptoDiagnostics.tsx`** — a folds
`SequenceCard`/`SettingTile` card (mirrors `developer-tools/DevelopTools.tsx`)
showing the live matched-entry count (Badge) and a **Download report** button
(Blob → `lotus-crypto-diag-<ts>.json`, same download idiom as
`room-settings/ExportRoomHistory.tsx`).
### Recommended mount points (coordinator)
- **Install call:** call `installCryptoDiagLog()` **as early as possible during
boot** so it captures crypto errors from first sync — ideally at the top of
the client entry module or inside `ClientRoot` before/around `initClient`
(e.g. `src/app/pages/client/ClientRoot.tsx`). It is idempotent, side-effect
only, and needs no `mx`, so a module-scope call at app entry is safe. (Do
**not** put it in `initMatrix.ts` — that file is off-limits.)
- **Settings card:** render `<CryptoDiagnostics />` inside the Developer Tools
page — in `src/app/features/settings/developer-tools/DevelopTools.tsx`, add it
to the `Box direction="Column" gap="700"` list (guarded by the existing
`developerTools` flag), right after the "Access Token" card. It pulls `mx`
from `useMatrixClient()` itself, so it just needs to be placed in the tree.
---
## 6. 2026-07 investigation update — 41.7.0 delta + web-specific root cause
New findings this session (code-read + upstream issue triage). These **sharpen
KE-1's root cause and close the "just upgrade the SDK" lever**.
### 6.1 The 41.7.0 upgrade does NOT fix KE-1 (lever closed)
We are now on **`matrix-js-sdk@41.7.0`** → **`@matrix-org/matrix-sdk-crypto-wasm@18.3.1`**
(was `41.6.0-rc.0` when KE-1/2 were observed). Checked both changelogs:
- 41.7.0's only crypto line is the **security bump to crypto-wasm 18.3.1**. No
OTK / keys-upload / Olm-session change.
- crypto-wasm 17.0 → 18.3.1: **no entry** for one-time-keys, keys/upload,
"already exists", or upload conflicts. The 18.3.x work was **to-device
security hardening** (vodozemac 0.10; sender-spoofing check via
`sender_device_keys`; MSC4147 validation) — unrelated to the OTK loop.
- Upstream **`matrix-rust-sdk#5200`** ("OlmMachine constantly tries to upload
keys when restoring session") is **still OPEN** (as of mid-2025). The loop
mechanism is confirmed there: on the 400, `mark_request_as_sent()` never
fires, so the keys stay "unshared" and the SDK re-issues the identical failing
upload every cycle → the storm.
⇒ **Remediation option 3 (SDK pin) is exhausted for KE-1.** Do not expect a
version bump to help; the fix is store-hygiene, below.
### 6.2 Confirmed root cause + the web-specific trigger we can act on
Upstream `#5200` + `#1415` pin the root condition to **rust-crypto store ↔
server OTK divergence**, from one of:
1. **Crypto store reset/restore without deregistering the device server-side**
— the store forgets OTKs it already published; the server still holds them.
2. **Unsafe concurrent access to the crypto store** — e.g. the **same session
open in multiple browser tabs**, each running its own OlmMachine against the
one IndexedDB crypto store.
3. A store that isn't durably persisted, so a restore can't track what was sent.
**Cinny is a web client and hits two of these by construction (verified in code):**
- **No `navigator.storage.persist()` anywhere** (`grep` clean). The rust-crypto
IndexedDB store is therefore **evictable under storage pressure** — while the
**access token + device id live in `localStorage`** (N97), which browsers evict
_less_ aggressively. Partial eviction ⇒ the device **resurrects with a blank
crypto store but the SAME device id** ⇒ it re-uploads OTKs the server still
holds ⇒ the **exact KE-1 "already exists" divergence**, with **no user action**
and no visible cause. This is the leading hypothesis for a self-hosted web
deployment.
- **No multi-tab crypto guard** (no `navigator.locks` / `BroadcastChannel`
leader election in `src/`). `initMatrix.ts` calls `mx.initRustCrypto()` with no
single-writer coordination, so 2+ tabs = concurrent store access = trigger #2.
### 6.3 Concrete PREVENTIVE client mitigations (new — buildable, don't need a call)
Ordered by value/effort. These reduce the _recurrence_ of KE-1; they don't heal
an already-diverged device (that still needs remediation option 1: clean
logout+login).
1. **Request persistent storage on login — `navigator.storage.persist()`**
_(cheapest, highest value)_. Idempotent, side-effect only, no behavior change
if the browser denies it. Directly prevents the eviction-induced divergence in
6.2. Best placed at app entry alongside the other module-scope calls (NOT in
`initMatrix.ts`, which is off-limits) — e.g. a one-liner in `ClientRoot`/app
bootstrap: `if (navigator.storage?.persist) navigator.storage.persist();`
Optionally surface `navigator.storage.persisted()` in the Crypto Diagnostics
card so a capture records whether the store was evictable.
2. **Multi-tab guard** _(medium)_. Detect a second tab of the same session
(BroadcastChannel or the Web Locks API) and either (a) warn "Lotus is open in
another tab — encryption may misbehave", or (b) make secondary tabs read-only
for crypto. Prevents trigger #2.
3. **Loop detection → recovery prompt** _(medium)_. Watch for repeated
`keys/upload` 400 `M_UNKNOWN … already exists` (the client sees the rejection);
after N in a window, stop hammering and surface a "Reset encryption on this
device (log out & back in)" prompt instead of looping silently.
### 6.4 Secondary KE-2 hypothesis to test in the capture
crypto-wasm **18.3.0 tightened Olm to-device validation** (sender-spoof check +
MSC4147). It's therefore possible KE-2's `WARN … unexpected encrypted to-device
event … io.element.call.encryption_keys` is **partly** the new validation
rejecting EC's media-key events, not _only_ the missing-Olm-session downstream of
KE-1. **Capture discriminator:** if KE-2 still occurs in a call where OTK counts
are healthy and no KE-1 storm is present (Q1 = NO), suspect the to-device
validation path (EC ↔ rust-crypto 18.3.x), not KE-1. If KE-2 only ever co-occurs
with the KE-1 storm, the original KE-1⇒KE-2 chain stands.
### 6.5 What to do now vs. at capture
- **Now (no call needed):** ship 6.3.1 (`persist()`) — it's safe and preventive.
Consider 6.3.3 (loop detection) as a follow-up.
- **At the next glitchy call:** run the §4 capture; answer Q1 (divergence?) and
6.4's discriminator. For any _currently_ stuck device, remediation option 1
(clean **logout + login**, not just "clear storage" — clearing storage without
`mx.logout()` leaves the server device + its OTKs and can re-trigger the
divergence).
+2 -2
View File
@@ -330,7 +330,7 @@ Users can set a custom background color for `@mention` chips that highlight thei
> pre-built npm bundle. Several in-call behaviors below are now first-class > pre-built npm bundle. Several in-call behaviors below are now first-class
> source changes rather than DOM/widget hacks. Background, plan, and the Phase-2 > source changes rather than DOM/widget hacks. Background, plan, and the Phase-2
> work list are in > work list are in
> [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md). > the Element Call fork reference in [`LOTUS_TODO.md`](./LOTUS_TODO.md).
### Element Call — Self-Built Fork (`0.20.1-lotus.1`) ### Element Call — Self-Built Fork (`0.20.1-lotus.1`)
@@ -1235,7 +1235,7 @@ The session persists as ONE atomic `cinny_session_v1` JSON write (previously ~10
### Crypto Diagnostics (E2EE investigation kit) ### Crypto Diagnostics (E2EE investigation kit)
**Settings → Developer Tools → Crypto Diagnostics**: a capture-only ring buffer (max 200) hooks `console.warn/error` for E2EE failure signatures (OTK upload conflicts, missing call media keys, decryption errors, delayed-event timeouts) and downloads a JSON report — the evidence input for the KE-1→4 investigation. Companion runbook: [`LOTUS_E2EE_INVESTIGATION.md`](./LOTUS_E2EE_INVESTIGATION.md). `utils/cryptoDiagLog.ts`, `features/settings/developer/CryptoDiagnostics.tsx`. **Settings → Developer Tools → Crypto Diagnostics**: a capture-only ring buffer (max 200) hooks `console.warn/error` for E2EE failure signatures (OTK upload conflicts, missing call media keys, decryption errors, delayed-event timeouts) and downloads a JSON report — the evidence input for the KE-1→4 investigation. Companion diagnosis: the Encryption / E2EE section of [`LOTUS_TODO.md`](./LOTUS_TODO.md). `utils/cryptoDiagLog.ts`, `features/settings/developer/CryptoDiagnostics.tsx`.
--- ---
+47 -3
View File
@@ -328,7 +328,7 @@ _(Requires the guard deployed on LXC 151 — auto-deploys on a `matrix` repo pus
# Backlog of previously-fixed-but-unverified items # Backlog of previously-fixed-but-unverified items
> Sections AD above are **this session's** work. Everything below was fixed in earlier waves and is still flagged **⚠️ UNTESTED** in `LOTUS_BUGS.md` / `LOTUS_TODO.md`. They're grouped by what kind of environment you need (mobile, desktop, screen reader, etc.) so you can knock out a whole category at once. None of these are urgent the way AD are; do them as you have the right device handy. > Sections AD above are **this session's** work. Everything below was fixed in earlier waves and is still flagged **⚠️ UNTESTED** (see the outstanding-verification backlog below / `LOTUS_TODO.md`). They're grouped by what kind of environment you need (mobile, desktop, screen reader, etc.) so you can knock out a whole category at once. None of these are urgent the way AD are; do them as you have the right device handy.
## E. Mobile / responsive (needs a real phone, or devtools device emulation) ## E. Mobile / responsive (needs a real phone, or devtools device emulation)
@@ -575,7 +575,7 @@ Log into **matrix.lotusguild.org** (password) and **matrix.org**.
## O. July 2026 batch — threads, notifications, math, search cache, audit wave ## O. July 2026 batch — threads, notifications, math, search cache, audit wave
Everything landed after the OIDC work. These mirror the checklists in `LOTUS_TODO.md` (§P3-8, §P4-1) and the Needs-Verification rows in `LOTUS_BUGS.md` (P3-8/P4-1/P4-4/P4-8/N97a/AW-1…4). **⚠️ Threads change the main timeline** — thread replies no longer render inline; that's intended (see O1). Everything landed after the OIDC work. These mirror the checklists in `LOTUS_TODO.md` (§P3-8, §P4-1) and the outstanding-verification backlog below (P3-8/P4-1/P4-4/P4-8/N97a/AW-1…4). **⚠️ Threads change the main timeline** — thread replies no longer render inline; that's intended (see O1).
### O1. Thread Panel (P3-8) — 👥 2 people help for live replies ### O1. Thread Panel (P3-8) — 👥 2 people help for live replies
@@ -626,7 +626,7 @@ The webview CSP was tightened and the full native module set now compiles. Smoke
### O8. E2EE / call-key cluster (KE-1→4) — 👥 2 people, during a real call ### O8. E2EE / call-key cluster (KE-1→4) — 👥 2 people, during a real call
We shipped the diagnostics kit + a **Crypto Diagnostics** card (**Settings → Developer Tools**). During your next call that glitches (audio cutouts, "Unable to decrypt"), open it and **Download report**, and note whether the symptoms even still occur now that we're on **matrix-js-sdk 41.7.0** (crypto-wasm 18.3.1). Send me the report + `LOTUS_E2EE_INVESTIGATION.md` is the runbook. We shipped the diagnostics kit + a **Crypto Diagnostics** card (**Settings → Developer Tools**). During your next call that glitches (audio cutouts, "Unable to decrypt"), open it and **Download report**, and note whether the symptoms even still occur now that we're on **matrix-js-sdk 41.7.0** (crypto-wasm 18.3.1). Send me the report; the KE-1..4 diagnosis + capture guidance is in `LOTUS_TODO.md` (Encryption / E2EE), with the full original runbook in git history.
--- ---
@@ -670,3 +670,47 @@ Run the axe DevTools extension (or Lighthouse → Accessibility) on a room view,
4. **A4** (in-call banner) + **A3** (ringtone) — newest call logic, hardest to reproduce. 4. **A4** (in-call banner) + **A3** (ringtone) — newest call logic, hardest to reproduce.
5. **D** (EC control sweep) — guards against the fork breaking calls. 5. **D** (EC control sweep) — guards against the fork breaking calls.
6. Everything else. 6. Everything else.
---
## Outstanding verification backlog
_Ported from the retired `LOTUS_BUGS.md` (2026-07). Compact index of shipped-but-not-live-tested items; the detailed steps are in the lettered sections above._
Implemented and gate-green; confirm each per `LOTUS_TESTING.md`, then delete the row.
| ID | Item | File / area | Test |
| :--- | :-------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| #2 | Chat-background animation flicker (`contain:paint`) | `lotus/chatBackground.ts` | F1 |
| #4 | Ringtone re-fixes: classic loudness + caller decline notice (A2 ✓ live) | `CallEmbedProvider.tsx`, `ringtones.ts` | A1,A3,A4 |
| #6 | Background vs. seasonal theme mutual exclusion | `state/settings.ts`, `General.tsx` | F2 |
| #7 | Composer toolbar touch targets (≥44px) | `room/RoomInput.tsx` | E1 |
| #8 | Room Settings horizontal overflow (mobile) | `components/page/style.css.ts` | E2 |
| #9 | Modal fullscreen on mobile (`useModalStyle`) | 22+ modal files | E3 |
| #10 | Composer not hidden by keyboard (`100dvh`) | `src/index.css` | E4 |
| #12 | PiP "All muted" badge re-fixed (was firing on any single mute) | `hooks/useCallSpeakers.ts` | G1 |
| N96 | Call-recovery overlay single "Back" button | `call/CallView.tsx` | A7 |
| N95 | AFK-monitor mic released on mute (OS indicator clears) | `hooks/useAfkAutoMute.ts` | L1 |
| N108 | Maskable PWA icons (Android adaptive) | `public/manifest.json` + `res/android/maskable-*` | L2 |
| EC | EC iframe load watchdog + self-heal + recovery UI | `plugins/call/CallEmbed.ts`, `CallView.tsx` | A7 |
| N105 | Notification clicks work after tab close (SW `notificationclick` + `showNotification`) | `sw.ts`, `utils/dom.ts`, `ClientNonUIFeatures.tsx` | get a msg notif, close the tab, click it → app focuses/opens + routes to the room |
| Gal | MediaGallery lazy-decrypt (true virtualization deferred) | `room/MediaGallery.tsx` | H1 |
| a11y | aria-labels: edit-history / reaction / thread / reply | `message/*` (`FallbackContent`, `Reaction`, `Reply`) | I |
| P3-8 | Thread Panel (side drawer, chips, threaded receipts, thread composer) | `features/room/thread/*`, `RoomTimeline/RoomInput` | 6-step checklist in LOTUS_TODO §P3-8 |
| P4-4 | KaTeX math (`$…$`, `$$…$$`, data-mx-maths; lazy chunk) | `utils/mathParse.ts`, `components/math/` | send `$x^2$`, `$$\int f$$`, `$5 and $10` (stays text), math inside code block (stays text) |
| P4-8 | Encrypted-search cache (opt-in toggle, clear button, logout wipe) | `utils/searchCache.ts`, message-search | enable in search panel → search → reload → coverage persists; logout wipes |
| N97a | Session blob migration + cross-tab logout sync | `state/sessions.ts`, `useSessionSync` | login on old build → new build migrates; logout in tab A → tab B drops to auth |
| P4-1 | Slack-style thread notifications (participating default, All/Mentions/Mute, badge math) | `utils/threadNotifications.ts`, `ClientNonUIFeatures`, `roomToUnread` | 6-step checklist in LOTUS_TODO §P4-1 |
| AW-1 | Scheduled-message cancel no longer ghost-sends (error row on failure) | `ScheduledMessagesTray.tsx` | schedule → cancel with network cut → item stays + error; retry works |
| AW-2 | Emoji lazy-load (search/autocomplete/recents fill in; board opens fast) | `plugins/emoji.ts` + consumers | first emoji-board open of a session: grid+search populate; reactions still label |
| AW-3 | SW precache (repeat-visit near-instant; deploys still picked up immediately) | `sw.ts`, `vite.config.js` | load app twice (2nd = cached assets); deploy → reload picks new version |
| AW-4 | Desktop CSP tighten + Escape/panel fixes + thread Jump to Latest | `tauri.conf.json`, Room/ThreadPanel | desktop: boots, avatars/media load, VT323 font renders, location maps embed, calls connect, deep links work |
| P3-4 | Accessibility compliance pass (collapsed-msg SR sender, form/overlay labels, typing announce, focus-return, `?` help, jsx-a11y CI gate) | `message/*`, `RoomViewTyping`, `features/shortcuts/*`, `eslint.config.mjs` | LOTUS_TESTING §P — axe-core + VoiceOver/NVDA on the golden path |
| P6-1 | Desktop Linux parity (no-sleep in calls, launcher badge), autostart toggle, tray Do-Not-Disturb | `native/power.rs`, `lib.rs`, `useTauriDnd`, `General.tsx` | Linux desktop: no display sleep during a call; tray DND silences notifications; launch-on-login persists; Unity badge (Ubuntu); DND toggle polarity |
| P6-2 | EC deafen/screenshare-audio-mute via `io.lotus.set_deafen` (retires the `<audio>.muted` iframe hack) | fork `lotusDeafen.ts`, cinny `CallControl.ts` | AFTER publish+pin-bump: deafen silences remote audio + survives a reconnect / new screenshare / late joiner (the cases the DOM hack failed); screenshare-audio-mute toggles independently |
| P6-3 | Forward-to-multiple-rooms (multi-select + partial-failure summary) + live bookmark previews (edits/redactions, snapshot fallback) | `ForwardMessageDialog.tsx`+`forwardContent.ts`, `BookmarksPanel.tsx` | forward one msg to 3 rooms (incl. 1 you cannot post to = partial summary); bookmark then edit shows edited; redact shows deleted; leave room shows snapshot |
| P6-4 | HSTS + Permissions-Policy on prod nginx (+ contrib examples) | `matrix/cinny/nginx.conf`, `contrib/nginx`, `contrib/caddy` | after `nginx -s reload`: `curl -sI https://chat.lotusguild.org` shows HSTS + Permissions-Policy; a call (cam/mic/screenshare) + location share still work |
**Verified working in live testing (2026-06):** A2, B1B4, C1, C3, D (mic/camera/deafen/screenshare/fullscreen/more-menu/PiP). Denoise quality in D is still poor — tracked under the denoise project, not a regression.
---
+136 -3
View File
@@ -35,7 +35,7 @@ Completed features are documented in [LOTUS_FEATURES.md](./LOTUS_FEATURES.md).
## ✅ Done — Awaiting Verification ## ✅ Done — Awaiting Verification
Built and gate-green; verify per [LOTUS_TESTING.md](./LOTUS_TESTING.md), then they graduate to LOTUS_FEATURES.md. (Bug-side fixes awaiting verification live in LOTUS_BUGS.md.) Built and gate-green; verify per [LOTUS_TESTING.md](./LOTUS_TESTING.md), then they graduate to LOTUS_FEATURES.md. (Open bugs + the verification backlog now live in this file and LOTUS_TESTING.md.)
| Feature | Test guide | | Feature | Test guide |
| :-------------------------------------------------------------------------------- | :---------------- | | :-------------------------------------------------------------------------------- | :---------------- |
@@ -217,7 +217,7 @@ Features:
### [~] P4-8 · Encrypted Message Search Indexing & Caching — IMPLEMENTED (2026-07), opt-in ### [~] P4-8 · Encrypted Message Search Indexing & Caching — IMPLEMENTED (2026-07), opt-in
**Shipped:** `src/app/utils/searchCache.ts` — raw-IndexedDB per-room index (`lotus-search-cache`) of decrypted search rows + coverage markers, merged into local search (in-memory-wins dedupe). **Opt-in, default OFF** (stores plaintext at rest) with a privacy note, Clear button, and logout wipe. Awaiting live QA (LOTUS_BUGS AW / P4-8 row). **Shipped:** `src/app/utils/searchCache.ts` — raw-IndexedDB per-room index (`lotus-search-cache`) of decrypted search rows + coverage markers, merged into local search (in-memory-wins dedupe). **Opt-in, default OFF** (stores plaintext at rest) with a privacy note, Clear button, and logout wipe. Awaiting live QA (LOTUS_TESTING outstanding-verification backlog).
### [~] P4-1 · Thread Notification Mode Per-Thread — IMPLEMENTED (2026-07), ⚠️ AWAITING LIVE QA ### [~] P4-1 · Thread Notification Mode Per-Thread — IMPLEMENTED (2026-07), ⚠️ AWAITING LIVE QA
@@ -316,7 +316,7 @@ Features:
**What:** High-end background noise cancellation using a pre-trained ML model (RNNoise) running in the browser. Removes dogs, fans, and keyboard clicks from the mic stream. **What:** High-end background noise cancellation using a pre-trained ML model (RNNoise) running in the browser. Removes dogs, fans, and keyboard clicks from the mic stream.
**Shipped:** 3-tier setting (Off / Browser-native / ML) in Settings → General → Calls. **Shipped:** 3-tier setting (Off / Browser-native / ML) in Settings → General → Calls.
**🔱 [EC-FORK] DONE — moved in-source (2026-06).** ML denoise is now a first-class audio stage **inside** the forked Element Call: a LiveKit `TrackProcessor<Audio>` activated by `lotusDenoiseSource=1` (cinny sets it when ML is selected). The old build-time `getUserMedia`/`index.html` monkeypatch is **removed**. Because EC re-runs the processor on every (re)publish, denoise now **survives reconnects and mic-device switches** — this is the A7 fix (see `LOTUS_BUGS.md` A7, `LOTUS_TESTING.md` §D2-1). The processor degrades to the raw mic rather than going silent. **🔱 [EC-FORK] DONE — moved in-source (2026-06).** ML denoise is now a first-class audio stage **inside** the forked Element Call: a LiveKit `TrackProcessor<Audio>` activated by `lotusDenoiseSource=1` (cinny sets it when ML is selected). The old build-time `getUserMedia`/`index.html` monkeypatch is **removed**. Because EC re-runs the processor on every (re)publish, denoise now **survives reconnects and mic-device switches** — this is the A7 fix (see `LOTUS_TESTING.md` §D2-1). The processor degrades to the raw mic rather than going silent.
**Key decision:** LiveKit's Krisp filter is LiveKit-Cloud-only (we self-host the SFU); EC's own RNNoise PR #3892 is unmerged. Owning the fork let us implement the in-source stage directly. **Key decision:** LiveKit's Krisp filter is LiveKit-Cloud-only (we self-host the SFU); EC's own RNNoise PR #3892 is unmerged. Owning the fork let us implement the in-source stage directly.
**Models — all in-source in the fork:** **Models — all in-source in the fork:**
@@ -826,3 +826,136 @@ edit → commit → git push origin lotus
- **Synapse (Matrix):** LXC 151 on `compute-storage-01` — `pct exec 151 -- bash` - **Synapse (Matrix):** LXC 151 on `compute-storage-01` — `pct exec 151 -- bash`
- **Config:** `/etc/matrix-synapse/homeserver.yaml` - **Config:** `/etc/matrix-synapse/homeserver.yaml`
- **Version check:** `curl -s https://matrix.lotusguild.org/_matrix/client/versions` - **Version check:** `curl -s https://matrix.lotusguild.org/_matrix/client/versions`
---
## Element Call fork — operational reference
_Ported from the retired `HANDOFF_ELEMENT_CALL_FORK.md` (2026-07; full history in git). The fork lives at `LotusGuild/element-call` (branch `lotus`, forked from upstream tag `v0.20.1`); cinny consumes it as the npm package `@lotusguild/element-call-embedded`, whose built bundle is copied into `public/element-call/`._
**Publish a new fork version (manual; needs the Gitea npm token):**
1. In the fork, bump `embedded/web/package.json` version (current unpublished: `0.20.1-lotus.2`).
2. Build: `pnpm run build:embedded` (Node 24, pnpm 10.33.0; output → repo `dist/`, staged into `embedded/web/dist`).
3. `cd embedded/web && npm version <tag> --no-git-tag-version && npm publish` to the Gitea registry (`code.lotusguild.org`). Publicly readable; only publishing needs the token.
4. In cinny: bump the `@lotusguild/element-call-embedded` pin (`package.json`, currently `0.20.1-lotus.1`) → the new version, `npm install`, build.
**`io.lotus.*` widget actions (fork ↔ cinny host):**
| Action | Direction | Purpose | Fork module |
| :-- | :-- | :-- | :-- |
| `io.lotus.call_state` | EC→host | speaker/mute/camera state stream (URL `lotusCallState=1`) | `lotusCallState.ts` |
| `io.lotus.focus_participant` | host→EC | spotlight a participant (works during screenshare) | `lotusFocus.ts` |
| `io.lotus.inject_audio` | host→EC | soundboard clip mixed into the call (URL `lotusAudioInject=1`) | `lotusAudioInject.ts` |
| `io.lotus.set_quality` | host→EC | audio/screenshare bitrate/fps caps | `lotusQuality.ts` |
| `io.lotus.decorations` | host→EC | in-call avatar decorations | `lotusDecorations.ts` |
| `io.lotus.set_deafen` | host→EC | deafen / screenshare-audio-mute at the LiveKit source (P6-2) | `lotusDeafen.ts` |
Also flag-gated (URL params): `lotusTransparent`/`lotusTheme` (theme), `lotusDenoiseSource=1` (in-source ML denoise). New toWidget actions must be added to the enum + `LOTUS_TO_WIDGET_ACTIONS` in `src/lotus/lotusActions.ts` and only SENT after call-join (else a 10s timeout). **P6-2 phase 2 pending:** after publishing lotus.2, bump the cinny pin + delete the `CallControl.ts` `<audio>.muted` fallback.
---
## 🔴 Open — Actionable
### 🧨 Encryption / E2EE — ⚠️ EXTREME COMPLEXITY · 🧠 PLANNING SESSION REQUIRED · 👤 SENIOR ENGINEER
> 🧰 **Investigation kit ready (2026-07):** `LOTUS_E2EE_INVESTIGATION.md` (git history)
> has the per-KE capture runbook (console signatures, synapse-side queries, the
> KE-1→KE-2 causality decision tree, ranked remediations), and the client now
> ships a **Crypto Diagnostics** capture helper (Settings) — run it during the
> next affected call and download the report before starting any fix.
> **Observed live in prod 2026-06-30** on `chat.lotusguild.org` during a 2-person
> **Element Call** (E2EE enabled). These span **client rust-crypto (via
> `matrix-js-sdk@41.6.0-rc.0`) ↔ Synapse ↔ Element Call's MatrixRTC E2EE** and are
> very likely **interrelated** (see KE-1 → KE-2). Do **not** spot-fix — they need
> a dedicated cross-system planning session with the homeserver owner. Capture
> full client console + a synapse-side trace for the same call before starting.
> **None of these are caused by the EC fork work** (the issues reproduce on the
> old build; the local mic/denoise path is unrelated to key distribution).
- **KE-1 — One-time-key (OTK) upload conflict storm (CRITICAL, root-cause candidate).**
`POST /_matrix/client/v3/keys/upload` returns `400 M_UNKNOWN: One time key
signed_curve25519:AAAAAAAAAGQ already exists. Old key: {…} new key: {…}` —
firing **continuously** (many/sec). The client repeatedly tries to publish an
OTK at a key id the server already holds **with a different value**, i.e. the
rust-crypto key store and Synapse have **diverged OTK state**. Impact: floods
the crypto outgoing-request loop and is the prime suspect for the downstream
missing-key failures (no fresh OTKs ⇒ no new Olm sessions ⇒ undecryptable
to-device key events). _Investigate:_ device/key-store reset-or-restore
mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a
Synapse OTK bug. Repro signature: grep console for `already exists`.
**Extreme — planning session.**
**Update 2026-07 (investigation §6):** upstream `matrix-rust-sdk#5200` (still
OPEN) confirms the mechanism — on the 400, `mark_request_as_sent()` never fires
so the SDK re-issues the identical upload forever. **`41.7.0` does NOT fix it**
(crypto-wasm 17→18.3.1 has no OTK/upload change; 18.3.x was to-device security
only) — the SDK-pin lever is closed. Root cause = **store↔server OTK
divergence**; the leading **web-specific** trigger is that cinny never calls
**`navigator.storage.persist()`**, so the IndexedDB crypto store is evictable
while the `localStorage` session/device-id survives → device resurrects with a
blank store → re-uploads OTKs the server still holds. **Actionable preventive
fix (buildable now, no call needed):** request persistent storage on login
(+ optional multi-tab guard + 400-loop→recovery-prompt). Healing an already-
diverged device still needs a clean **logout+login** (not just "clear
storage"). Full runbook (synapse SQL, capture checklist, §6 diagnosis) is in git history at `LOTUS_E2EE_INVESTIGATION.md` (removed 2026-07).
- **KE-2 — Element Call media keys not arriving/decrypting → audio & video cut out (CRITICAL).**
`MissingKey: missing key at index N for participant @user`, `skipping decryption
due to missing key`, `MissingKey: key set not found for @user at index 0`, and
rust-crypto `WARN … Received an unexpected encrypted to-device event …
event_type="io.element.call.encryption_keys"`. EC distributes per-participant
media keys as **encrypted to-device `io.element.call.encryption_keys`** events;
these aren't being received/decrypted in order, so remote LiveKit audio/video
can't be decrypted — **this is the "friend's audio cuts out occasionally"
symptom.** Almost certainly downstream of **KE-1** (broken Olm sessions). Spans
EC's MatrixRTC E2EE + rust-crypto to-device + Synapse. **Extreme — planning
session.**
- **KE-3 — Timeline decryption error: missing `algorithm` field (HIGH).**
`Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg:
missing field 'algorithm' at line 1 column 138 …]`. A malformed/legacy
encrypted event (or a serialization mismatch in the RC SDK) that rust-crypto
can't parse. Lower frequency than KE-1/2 but a distinct decode-path failure —
capture the offending event id (`$SASBBzoqj…` seen) and inspect its raw content.
- **KE-4 — MatrixRTC delayed-event / membership timeouts (MEDIUM-HIGH, reliability).**
`[MembershipManager] Network local timeout error while sending event, immediate
retry … AbortError: Restart delayed event timed out before the HS responded`,
with repeated `org.matrix.msc4157.update_delayed_event`. MSC4140/4157
delayed-event reliability against `matrix.lotusguild.org` — can cause stale/ghost
call membership and missed leave events. May be partly **homeserver
responsiveness**; correlate with synapse latency/load. Include in the same
planning session since it shares the call-reliability + HS-interaction surface.
### Security & Privacy
- **N97 — Access token stored in plaintext `localStorage`** (`state/sessions.ts`), vulnerable to XSS; device ID likewise. Architectural — needs a token-protection / session-storage redesign.
- ~~**Session writes are non-atomic and not cross-tab synced**~~ — **done (2026-07):** atomic single-key `cinny_session_v1` blob (legacy-key migration + dual-write) + `subscribeSessionChanges`/`useSessionSync` cross-tab reload. (The plaintext-token concern in N97 above is the remaining, separate architectural item.)
- **Persisted PII without encryption:** user status message + expiry (`settings/account/Profile.tsx`), unsent composer drafts (`room/RoomInput.tsx`). Leak risk on shared devices.
### PWA / Offline / Notifications
- **N107 — SW has no `push` handler** — Web Push delivery is entirely non-functional. Needs a `push` listener + a Matrix push-gateway integration.
- **No app-asset caching strategy** (`src/sw.ts`) — no offline capability.
- ~~**`manifest: false`** may block PWA install~~ — **verified OK (2026-06):** `index.html` links `/manifest.json`, which exists in `public/` and is copied to `dist/`; VitePWA intentionally doesn't generate one. Not a bug.
### Dependencies & Build
- ~~**`matrix-js-sdk` pinned to a Release Candidate**~~ — **done (2026-07):** moved to `41.7.0` stable (crypto-wasm 18.3.1 security bump). Deep-audit dep triage: all 16 npm advisories are dev-only/unreachable/dead-dep — zero shipped exposure; dead `dompurify` removed. `@atlaskit`/build-tool pins remain review-worthy but low priority.
- **Build-time overhead:** `lotusDenoise` does heavy sequential `fs` work in `closeBundle`; `viteStaticCopy` config is complex with redundant renames — could be streamlined.
### Code Hygiene / DevEx
- **Automated test suite — 561+ tests across 65+ modules, a hard CI gate.** `npm test` runs Node's built-in runner via `tsx` (not vitest — Vite 8 is ahead of vitest's range) and **blocks the build job on failure**. Broad pure-logic coverage: utils (common, regex, sanitize/XSS, time, matrix, matrix-uia, mimeTypes, sort, accentColor, findAndReplace, AsyncSearch, ASCIILexicalTable, keyboard, room, matrix-crypto, featureCheck, syntaxHighlight, imageCompression, user-agent, callSounds), state (settings, sessions, recentSearches, upload, typingMembers, lists, room-list, toast, scheduledMessages, backupRestore, callEmbed/callPreferences, spaceRooms, …), plugins (matrix-to, call/utils, via-servers, bad-words, recent-emoji, custom-emoji, markdown block/inline/utils), OIDC (cs-api, useParsedLoginFlows, oidcState), lotus/avatarDecorations, message-search, search filters. Prevention work has caught + fixed **4 real bugs** (`findAndReplace` infinite-loop; `getSettings` crash-on-load when storage is blocked; `isMacOS` never matching modern Macs; `isMLDenoiseSupported` throwing `ReferenceError` instead of returning false on browsers lacking the `AudioWorkletNode` binding). **Next:** component/integration tests (the untestable-under-tsx DOM/React surface).
- **Extensive `as any` casts** across `src/` — gradual typing cleanup.
- **`types/matrix/` mirrors SDK types** instead of importing them — drift risk.
- ~~**Hardcoded CDN URL** should move to an env var~~ — **done:** `avatarDecorations.ts` already honors a `VITE_DECORATION_CDN` env override (lines 14-16); the in-repo literal is only the default. Nothing left.
- **`patch-folds.mjs` edits `node_modules` directly** — consider `patch-package`.
- **Infra docs:** `contrib/nginx` lacks security headers (HSTS/CSP) + uses rewrites over `try_files`; `contrib/caddy` has a placeholder path. CI/CD (`prod-deploy.yml`): sequential deploy, aggressive 1-min Netlify timeout, `package-manager-cache: false`.
- **README:** keep the fork-sync version + logo path current. (`CONTRIBUTING.md` is intentionally left as upstream Cinny's — not a Lotus concern.)
- **Architecture notes (low priority):** deep `features/` + `hooks/` nesting, many small coupled hooks, possible dead CSS/components, `SpacingVariant` / `DropTarget` recipe simplification.
- **Git workflow (forward-looking):** keep commits scoped — past monolithic "fix all bugs" commits and inconsistent prefixes hurt `git bisect`.
### Big Projects
- ~~**#5 — Seasonal themes & chat-background redesign.**~~ **DONE (2026-06/07):** 11 seasonal/holiday overlays shipped and later toned down + given a settings preview grid; all 19 chat backgrounds redesigned (Carbon + Aurora kept per user preference), one design sprint each, GPU-friendly CSS with `prefers-reduced-motion` + pause toggle. Remaining polish rides normal bug flow, not a "big project."
+4 -4
View File
@@ -180,10 +180,10 @@ avatar decorations on EC video tiles, and a native transparent background.
(`io.lotus.inject_audio` → in-call soundboard) and quality controls (`io.lotus.inject_audio` → in-call soundboard) and quality controls
(`io.lotus.set_quality`). (`io.lotus.set_quality`).
The full plan and integration map is in The fork's `io.lotus.*` action catalog + the publish procedure are in
**[`HANDOFF_ELEMENT_CALL_FORK.md`](HANDOFF_ELEMENT_CALL_FORK.md)**; infra/hosting + **[`LOTUS_TODO.md`](LOTUS_TODO.md)** ("Element Call fork — operational reference");
build-pipeline notes live in the `LotusGuild/matrix` repo README. Search the docs infra/hosting + build-pipeline notes live in the `LotusGuild/matrix` repo README.
for the **`[EC-FORK]`** tag to find every related note. Search the docs for the **`[EC-FORK]`** tag to find every related note.
### Build ### Build
+1 -1
View File
@@ -34,7 +34,7 @@ export class CallControl extends EventEmitter implements CallControlState {
// P6-2: mirrors CallEmbed.joined. Set true from forceState(), which CallEmbed // P6-2: mirrors CallEmbed.joined. Set true from forceState(), which CallEmbed
// invokes only from onCallJoined(). Gates io.lotus.set_deafen so we never send // invokes only from onCallJoined(). Gates io.lotus.set_deafen so we never send
// before the fork's widget handler mounts (pre-join sends pend to a 10s // before the fork's widget handler mounts (pre-join sends pend to a 10s
// timeout — HANDOFF_ELEMENT_CALL_FORK.md §12.1 F1). // timeout — io.lotus toWidget actions must only be sent after call-join).
private joined = false; private joined = false;
private get document(): Document | undefined { private get document(): Document | undefined {
+1 -1
View File
@@ -5,7 +5,7 @@ import pkg from '../../../package.json';
// //
// Installs pass-through wrappers around `console.warn` / `console.error` that // Installs pass-through wrappers around `console.warn` / `console.error` that
// ring-buffer any log line matching the KE-1..KE-4 bug-cluster signatures // ring-buffer any log line matching the KE-1..KE-4 bug-cluster signatures
// (see LOTUS_E2EE_INVESTIGATION.md). It NEVER swallows a log call — the // (E2EE KE-1..4 capture; see LOTUS_TODO.md). It NEVER swallows a log call — the
// original console method is always invoked — and it performs NO network I/O. // original console method is always invoked — and it performs NO network I/O.
// The report metadata is limited to SDK version / device id / user id / sync // The report metadata is limited to SDK version / device id / user id / sync
// state; the captured log lines themselves are intentional evidence and may // state; the captured log lines themselves are intentional evidence and may
+14
View File
@@ -44,6 +44,20 @@ if ('serviceWorker' in navigator) {
}); });
} }
// Request persistent storage so the browser can't evict the IndexedDB
// rust-crypto store under storage pressure. Eviction (while the localStorage
// session/device-id survives) resurrects the device with a blank crypto store,
// which then re-uploads OTKs the server already holds → the "one time key
// already exists" upload storm and E2EE breakage. Only ask for sessions worth
// protecting (skip anonymous/landing visitors to avoid a needless Firefox
// prompt); check persisted() first so we don't re-prompt. Best-effort.
if (navigator.storage?.persist && getFallbackSession()) {
navigator.storage
.persisted()
.then((already) => (already ? undefined : navigator.storage.persist()))
.catch(() => undefined);
}
// Reload once if a lazy-loaded chunk is missing (stale deployment) // Reload once if a lazy-loaded chunk is missing (stale deployment)
window.addEventListener('vite:preloadError', () => { window.addEventListener('vite:preloadError', () => {
if (!sessionStorage.getItem('chunk-reload-attempted')) { if (!sessionStorage.getItem('chunk-reload-attempted')) {