Files
cinny/LOTUS_BUGS.md
T
jared eafa353364 feat(decorations): allow VITE_DECORATION_CDN override; close N127
- avatarDecorations: resolve the decoration CDN base from VITE_DECORATION_CDN at
  runtime, falling back to the DECORATION_CDN literal (kept intact so the sync
  script + tests still parse it). Lets a deploy repoint the CDN without a code
  edit. Guarded for the tsx test runner (import.meta.env undefined there).
- LOTUS_BUGS: close N127 — the denoise dev-injection gap dissolved with the A7
  cutover (no getUserMedia shim is injected anymore; denoise is in-source in the
  EC fork), so there is nothing to inject in dev.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 19:18:09 -04:00

167 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Lotus Chat — Open Bugs & Technical Debt
**Only OPEN and awaiting-verification items live here.** Resolved findings
(fixed-and-verified, false-positives, won't-fix) have been removed to keep this
actionable — the full history is in git. Items fixed in code but not yet
verified in a real environment are in **Needs Verification** below and have
step-by-step checks in [`LOTUS_TESTING.md`](./LOTUS_TESTING.md).
> Design rules for any fix here: follow the **Native-Cinny Law** and **TDS
> Design Law** in [`LOTUS_TODO.md`](./LOTUS_TODO.md).
---
## ⚠️ Needs Verification — fixed in code, awaiting live testing
Implemented and gate-green; confirm each per `LOTUS_TESTING.md`, then delete the row.
| ID | Item | File / area | Test |
| :--- | :------------------------------------------------------------------------------------- | :--------------------------------------------------- | :-------------------------------------------------------------------------------- |
| #2 | Chat-background animation flicker (`contain:paint`) | `lotus/chatBackground.ts` | F1 |
| #4 | Ringtone re-fixes: classic loudness + caller decline notice (A2 ✓ live) | `CallEmbedProvider.tsx`, `ringtones.ts` | A1,A3,A4 |
| #6 | Background vs. seasonal theme mutual exclusion | `state/settings.ts`, `General.tsx` | F2 |
| #7 | Composer toolbar touch targets (≥44px) | `room/RoomInput.tsx` | E1 |
| #8 | Room Settings horizontal overflow (mobile) | `components/page/style.css.ts` | E2 |
| #9 | Modal fullscreen on mobile (`useModalStyle`) | 22+ modal files | E3 |
| #10 | Composer not hidden by keyboard (`100dvh`) | `src/index.css` | E4 |
| #12 | PiP "All muted" badge re-fixed (was firing on any single mute) | `hooks/useCallSpeakers.ts` | G1 |
| N96 | Call-recovery overlay single "Back" button | `call/CallView.tsx` | A7 |
| N95 | AFK-monitor mic released on mute (OS indicator clears) | `hooks/useAfkAutoMute.ts` | L1 |
| N108 | Maskable PWA icons (Android adaptive) | `public/manifest.json` + `res/android/maskable-*` | L2 |
| EC | EC iframe load watchdog + self-heal + recovery UI | `plugins/call/CallEmbed.ts`, `CallView.tsx` | A7 |
| N105 | Notification clicks work after tab close (SW `notificationclick` + `showNotification`) | `sw.ts`, `utils/dom.ts`, `ClientNonUIFeatures.tsx` | get a msg notif, close the tab, click it → app focuses/opens + routes to the room |
| Gal | MediaGallery lazy-decrypt (true virtualization deferred) | `room/MediaGallery.tsx` | H1 |
| a11y | aria-labels: edit-history / reaction / thread / reply | `message/*` (`FallbackContent`, `Reaction`, `Reply`) | I |
**Verified working in live testing (2026-06):** A2, B1B4, C1, C3, D (mic/camera/deafen/screenshare/fullscreen/more-menu/PiP). Denoise quality in D is still poor — tracked under the denoise project, not a regression.
---
## 🧩 Element Call source-level items — now actionable via the fork
> 🔱 **[EC-FORK]** **UPDATE 2026-06-29: the fork is live.** We now own and
> self-build Element Call (`LotusGuild/element-call` →
> `@lotusguild/element-call-embedded`, Phase 1 done & cinny wired). A5/A6/A7
> below are **no longer "won't fix"** — they are ordinary source changes. See
> [`HANDOFF_ELEMENT_CALL_FORK.md`](./HANDOFF_ELEMENT_CALL_FORK.md) §10 + the Phase
> 2 work list. (The iframe is **same-origin** / self-hosted; the old blocker was
> that we didn't own EC's compiled source — which we now do.)
The in-call participant grid is rendered **inside EC's app**. Previously a
pre-built npm bundle we could only style/place around; now editable source.
Items from testing, with their fork-level fix path:
- **A5 — "Focus camera":** EC supports native tile-pinning. Our bottom-bar "Focus
camera" is a programmatic wrapper that **`.click()`s the tile** today
(`CallControl.ts` `focusCameraParticipant`), and during a screenshare EC
spotlights the shared screen so a camera pin may not override it. **Fork fix:**
add an `io.lotus.focus_participant` widget action that pins a participant in
EC's layout (coexisting with / overriding the screenshare spotlight); cinny
sends it via the widget API and the DOM-click hack is deleted. _Status: Open —
Actionable (Phase 2)._
- **A6 — avatar decorations in-call:** decorations render on **our** pre-join
lobby roster (`CallMemberCard`) but not on EC's in-call video tiles. **Fork
fix:** render the decoration APNG inside EC's participant-tile component, fed
decoration slugs via widget member data. _Status: Open — Actionable (Phase 2)._
- **A7 — mic dead after EC's "Reconnect":** the mid-call "Connection lost /
Reconnect" screen is **EC's own** (our load watchdog only covers an initial
hung load). After EC reconnects, the mic isn't re-published through our denoise
`getUserMedia` shim until a clean End+rejoin. **Fork fix:** move denoise into
EC's mic-capture/publish pipeline as a first-class audio stage — EC re-runs it
on every (re)publish, so reconnects keep denoise alive natively, and the
build-time `index.html` injection is removed. _Status: Open — Actionable
(Phase 2); root cause is the `getUserMedia` monkeypatch, not EC itself._
---
## 🔴 Open — Actionable
### Calls / Audio
- ~~**N127 — ML denoise shim is never injected in `vite dev`.**~~ **RESOLVED (dissolved by the A7 denoise cutover).** `vite.config.js` no longer injects a getUserMedia shim at all — the forked Element Call runs ML denoise in-source as a LiveKit `TrackProcessor` (activated by `lotusDenoiseSource=1`), so there is no build-time injection that could be missing in dev. Nothing to fix.
### 🧨 Encryption / E2EE — ⚠️ EXTREME COMPLEXITY · 🧠 PLANNING SESSION REQUIRED · 👤 SENIOR ENGINEER
> **Observed live in prod 2026-06-30** on `chat.lotusguild.org` during a 2-person
> **Element Call** (E2EE enabled). These span **client rust-crypto (via
> `matrix-js-sdk@41.6.0-rc.0`) ↔ Synapse ↔ Element Call's MatrixRTC E2EE** and are
> very likely **interrelated** (see KE-1 → KE-2). Do **not** spot-fix — they need
> a dedicated cross-system planning session with the homeserver owner. Capture
> full client console + a synapse-side trace for the same call before starting.
> **None of these are caused by the EC fork work** (the issues reproduce on the
> old build; the local mic/denoise path is unrelated to key distribution).
- **KE-1 — One-time-key (OTK) upload conflict storm (CRITICAL, root-cause candidate).**
`POST /_matrix/client/v3/keys/upload` returns `400 M_UNKNOWN: One time key
signed_curve25519:AAAAAAAAAGQ already exists. Old key: {…} new key: {…}`
firing **continuously** (many/sec). The client repeatedly tries to publish an
OTK at a key id the server already holds **with a different value**, i.e. the
rust-crypto key store and Synapse have **diverged OTK state**. Impact: floods
the crypto outgoing-request loop and is the prime suspect for the downstream
missing-key failures (no fresh OTKs ⇒ no new Olm sessions ⇒ undecryptable
to-device key events). _Investigate:_ device/key-store reset-or-restore
mismatch, OTK id-counter desync, RC-SDK (`41.6.0-rc.0`) regression, or a
Synapse OTK bug. Repro signature: grep console for `already exists`.
**Extreme — planning session.**
- **KE-2 — Element Call media keys not arriving/decrypting → audio & video cut out (CRITICAL).**
`MissingKey: missing key at index N for participant @user`, `skipping decryption
due to missing key`, `MissingKey: key set not found for @user at index 0`, and
rust-crypto `WARN … Received an unexpected encrypted to-device event …
event_type="io.element.call.encryption_keys"`. EC distributes per-participant
media keys as **encrypted to-device `io.element.call.encryption_keys`** events;
these aren't being received/decrypted in order, so remote LiveKit audio/video
can't be decrypted — **this is the "friend's audio cuts out occasionally"
symptom.** Almost certainly downstream of **KE-1** (broken Olm sessions). Spans
EC's MatrixRTC E2EE + rust-crypto to-device + Synapse. **Extreme — planning
session.**
- **KE-3 — Timeline decryption error: missing `algorithm` field (HIGH).**
`Error decrypting event (… type=m.room.encrypted …): DecryptionError[msg:
missing field 'algorithm' at line 1 column 138 …]`. A malformed/legacy
encrypted event (or a serialization mismatch in the RC SDK) that rust-crypto
can't parse. Lower frequency than KE-1/2 but a distinct decode-path failure —
capture the offending event id (`$SASBBzoqj…` seen) and inspect its raw content.
- **KE-4 — MatrixRTC delayed-event / membership timeouts (MEDIUM-HIGH, reliability).**
`[MembershipManager] Network local timeout error while sending event, immediate
retry … AbortError: Restart delayed event timed out before the HS responded`,
with repeated `org.matrix.msc4157.update_delayed_event`. MSC4140/4157
delayed-event reliability against `matrix.lotusguild.org` — can cause stale/ghost
call membership and missed leave events. May be partly **homeserver
responsiveness**; correlate with synapse latency/load. Include in the same
planning session since it shares the call-reliability + HS-interaction surface.
### Security & Privacy
- **N97 — Access token stored in plaintext `localStorage`** (`state/sessions.ts`), vulnerable to XSS; device ID likewise. Architectural — needs a token-protection / session-storage redesign.
- **Session writes are non-atomic and not cross-tab synced** (`state/sessions.ts`) — risks inconsistent state / races across tabs.
- **Persisted PII without encryption:** user status message + expiry (`settings/account/Profile.tsx`), unsent composer drafts (`room/RoomInput.tsx`). Leak risk on shared devices.
### PWA / Offline / Notifications
- **N107 — SW has no `push` handler** — Web Push delivery is entirely non-functional. Needs a `push` listener + a Matrix push-gateway integration.
- **No app-asset caching strategy** (`src/sw.ts`) — no offline capability.
- ~~**`manifest: false`** may block PWA install~~ — **verified OK (2026-06):** `index.html` links `/manifest.json`, which exists in `public/` and is copied to `dist/`; VitePWA intentionally doesn't generate one. Not a bug.
### Dependencies & Build
- **`matrix-js-sdk` pinned to a Release Candidate** (`41.6.0-rc.0`); `@atlaskit` and build tools (`vite`, `typescript`, `eslint`) on unstable/experimental pins — review for stable versions; RC SDK is a tree-shaking/bundle-size risk.
- **Build-time overhead:** `lotusDenoise` does heavy sequential `fs` work in `closeBundle`; `viteStaticCopy` config is complex with redundant renames — could be streamlined.
### Code Hygiene / DevEx
- **Automated test suite — 545 tests across 62 modules, a hard CI gate.** `npm test` runs Node's built-in runner via `tsx` (not vitest — Vite 8 is ahead of vitest's range) and **blocks the build job on failure**. Broad pure-logic coverage: utils (common, regex, sanitize/XSS, time, matrix, matrix-uia, mimeTypes, sort, accentColor, findAndReplace, AsyncSearch, ASCIILexicalTable, keyboard, room, matrix-crypto, featureCheck, syntaxHighlight, imageCompression, user-agent, callSounds), state (settings, sessions, recentSearches, upload, typingMembers, lists, room-list, toast, scheduledMessages, backupRestore, callEmbed/callPreferences, spaceRooms, …), plugins (matrix-to, call/utils, via-servers, bad-words, recent-emoji, custom-emoji, markdown block/inline/utils), OIDC (cs-api, useParsedLoginFlows, oidcState), lotus/avatarDecorations, message-search, search filters. Prevention work has caught + fixed **4 real bugs** (`findAndReplace` infinite-loop; `getSettings` crash-on-load when storage is blocked; `isMacOS` never matching modern Macs; `isMLDenoiseSupported` throwing `ReferenceError` instead of returning false on browsers lacking the `AudioWorkletNode` binding). **Next:** component/integration tests (the untestable-under-tsx DOM/React surface).
- **Extensive `as any` casts** across `src/` — gradual typing cleanup.
- **`types/matrix/` mirrors SDK types** instead of importing them — drift risk.
- **Hardcoded CDN URL** should move to an env var (the decoration CDN is now single-sourced in `avatarDecorations.ts`, but the literal is still in-repo).
- **`patch-folds.mjs` edits `node_modules` directly** — consider `patch-package`.
- **Infra docs:** `contrib/nginx` lacks security headers (HSTS/CSP) + uses rewrites over `try_files`; `contrib/caddy` has a placeholder path. CI/CD (`prod-deploy.yml`): sequential deploy, aggressive 1-min Netlify timeout, `package-manager-cache: false`.
- **README:** keep the fork-sync version + logo path current. (`CONTRIBUTING.md` is intentionally left as upstream Cinny's — not a Lotus concern.)
- **Architecture notes (low priority):** deep `features/` + `hooks/` nesting, many small coupled hooks, possible dead CSS/components, `SpacingVariant` / `DropTarget` recipe simplification.
- **Git workflow (forward-looking):** keep commits scoped — past monolithic "fix all bugs" commits and inconsistent prefixes hurt `git bisect`.
### Big Projects
- **#5 — Seasonal themes & chat-background redesign.** Current backgrounds are basic CSS; goal is high-fidelity, research-backed, GPU-accelerated designs (layered `oklch`, `backdrop-filter`, `contain:paint`) with WCAG-AA overlay contrast. Treat each as its own design sprint.