Files
matrix/README.md
T
jared 7618b3b091
Lint / Shell (shellcheck) (push) Successful in 9s
Lint / JS (eslint) (push) Successful in 6s
Lint / Python (ruff) (push) Successful in 5s
Lint / Python deps (pip-audit) (push) Successful in 31s
Lint / Secret scan (gitleaks) (push) Successful in 5s
docs: Slack-style per-thread notifications (P4-1)
Landing: thread row + prose note the participating-default notifications with
per-thread All/Mentions/Mute. README: Lotus Cinny threads row extended.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 22:41:04 -04:00

799 lines
54 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Lotus Matrix Infrastructure
[![Lint](https://code.lotusguild.org/LotusGuild/matrix/actions/workflows/lint.yml/badge.svg)](https://code.lotusguild.org/LotusGuild/matrix/actions?workflow=lint.yml)
Matrix server infrastructure for the Lotus Guild homeserver (`matrix.lotusguild.org`).
**Repo**: https://code.lotusguild.org/LotusGuild/matrix
---
## Repo Structure
```
matrix/
├── hookshot/ # Hookshot JS transformation functions (one file per webhook)
│ ├── deploy.sh # Deploys all .js files to Matrix room state via API
│ ├── proxmox.js
│ ├── grafana.js
│ ├── uptime-kuma.js
│ └── ... # One .js per webhook service
├── cinny/
│ ├── config.json # Cinny homeserver config (deployed to /var/www/html/config.json)
│ ├── upstream-check.sh # Daily script: checks if cinnyapp/cinny main has new commits, pings Matrix
│ └── lotus-build.sh # Merge + build script: fetches upstream/main, merges, builds, deploys
├── landing/
│ └── index.html # matrix.lotusguild.org landing page
├── draupnir/
│ └── production.yaml # Draupnir config (access token is redacted — see rotation docs below)
├── deploy/ # Auto-deployment infrastructure
│ ├── lxc151-hookshot.sh # Deploy script for LXC 151 (matrix/hookshot/livekit)
│ ├── lxc106-cinny.sh # Deploy script for LXC 106 (cinny)
│ ├── lxc139-landing.sh # Deploy script for LXC 139 (landing page)
│ ├── lxc110-draupnir.sh # Deploy script for LXC 110 (draupnir)
│ ├── livekit-graceful-restart.sh # Waits for zero active calls before restarting livekit
│ ├── hooks-lxc151.json # webhook binary config for LXC 151
│ ├── hooks-lxc106.json # webhook binary config for LXC 106
│ ├── hooks-lxc139.json # webhook binary config for LXC 139
│ └── hooks-lxc110.json # webhook binary config for LXC 110
└── systemd/
├── livekit-server.service # LiveKit systemd unit (with HA migration fix)
├── livekit-graceful-restart.service # oneshot — checks pending restart flag
├── livekit-graceful-restart.timer # Runs every 5 min
├── draupnir.service
└── cinny-upstream-check.cron # Installed to /etc/cron.d/ on LXC 106 — runs daily at noon
```
---
## Infrastructure
| Service | IP | LXC | RAM | Disk | Versions |
|---------|----|-----|-----|------|----------|
| Synapse | 10.10.10.29 | 151 | 8GB | 50GB | Synapse 1.155.0, LiveKit 1.9.11, hookshot 7.3.2, coturn latest |
| PostgreSQL 17 | 10.10.10.44 | 109 | 6GB | 30GB | PostgreSQL 17.9 |
| Cinny Web | 10.10.10.6 | 106 | 2GB | 8GB | Debian 12, nginx, Node 24, Lotus Cinny fork (custom, tracks `cinnyapp/cinny` main) |
| Draupnir | 10.10.10.24 | 110 | 1GB | 10GB | Draupnir v2.9.0, Node.js v22 |
| Prometheus | 10.10.10.48 | 118 | — | — | Prometheus — scrapes all Matrix services |
| Grafana | 10.10.10.49 | 107 | — | — | Grafana 12.4.0 — dashboard.lotusguild.org |
| NPM | 10.10.10.27 | 139 | — | — | Nginx Proxy Manager + matrix landing page |
| Authelia | 10.10.10.36 | 167 | — | — | SSO/OIDC provider |
| LLDAP | 10.10.10.39 | 147 | — | — | LDAP user directory |
| Uptime Kuma | 10.10.10.25 | 101 | — | — | Uptime monitoring (micro1 node) |
**Key paths on Synapse LXC (151):**
- Synapse config: `/etc/matrix-synapse/homeserver.yaml`
- Synapse conf.d: `/etc/matrix-synapse/conf.d/` (metrics.yaml, report_stats.yaml, server_name.yaml)
- coturn config: `/etc/turnserver.conf`
- LiveKit config: `/etc/livekit/config.yaml`
- LiveKit service: `livekit-server.service`
- lk-jwt-service: `lk-jwt-service.service` (now binds `:8071` via drop-in `/etc/systemd/system/lk-jwt-service.service.d/override.conf`; serves JWT tokens for MatrixRTC at `/sfu/get` and legacy `/get_token`)
- voice-limit-guard: `voice-limit-guard.service` (binds `:8070`, fronts lk-jwt-service — enforces hard per-room voice participant limits **and publish permissions (screenshare/camera via JWT re-signing)** for ALL clients; script `/opt/voice-limit-guard/voice-limit-guard.py`) — see [Voice Channel Limits & Call Permissions](#voice-channel-limits--call-permissions)
- Hookshot: `/opt/hookshot/`, service: `matrix-hookshot.service`
- Hookshot config: `/opt/hookshot/config.yml`
- Hookshot registration: `/etc/matrix-synapse/hookshot-registration.yaml`
- Bot: `/opt/matrixbot/`, service: `matrixbot.service`
- Repo clone (auto-deploy): `/opt/matrix-config/`
- Deploy env: `/etc/matrix-deploy.env` (MATRIX_TOKEN, MATRIX_SERVER, MATRIX_ROOM)
- Deploy log: `/var/log/matrix-deploy.log`
**Key paths on Draupnir LXC (110):**
- Install path: `/opt/draupnir/`
- Config: `/opt/draupnir/config/production.yaml`
- Data/SQLite DBs: `/data/storage/`
- Service: `draupnir.service`
- Management room: `#management:matrix.lotusguild.org` (`!mEvR5fe3jMmzwd-FwNygD72OY_yu8H3UP_N-57oK7MI`)
- Bot account: `@draupnir:matrix.lotusguild.org` (power level 100 in all protected rooms and the Lotus Guild space)
- Subscribed ban lists: `#community-moderation-effort-bl:neko.dev`, `#matrix-org-coc-bl:matrix.org`
- Rebuild: `NODE_OPTIONS="--max_old_space_size=6144" npm run build`
- Healthz endpoint: `http://10.10.10.24:8081/healthz` (200 = healthy, 418 = disconnected)
- Abuse reporting endpoint: `POST http://10.10.10.24:8080/_matrix/draupnir/1/report/{roomId}/{eventId}`
- Audit DBs: `/data/storage/user-restriction-audit-log.db`, `/data/storage/room-audit-log.db`
**Key paths on PostgreSQL LXC (109):**
- PostgreSQL config: `/etc/postgresql/17/main/postgresql.conf`
- Tuning conf.d: `/etc/postgresql/17/main/conf.d/synapse_tuning.conf`
- HBA config: `/etc/postgresql/17/main/pg_hba.conf`
- Data directory: `/var/lib/postgresql/17/main`
**Key paths on Cinny LXC (106):**
- Lotus fork source: `/opt/lotus-cinny/` (fork of `cinnyapp/cinny` main, custom Lotus Guild branch)
- Upstream remote: `https://github.com/cinnyapp/cinny.git` (added as `upstream`)
- Built files: `/var/www/html/`
- Cinny config: `/var/www/html/config.json`
- Config backup (survives rebuilds): `/opt/lotus-cinny/.cinny-config.json`
- Monitor env: `/etc/cinny-monitor.env` (MATRIX_TOKEN, MATRIX_SERVER, MATRIX_ROOM, MATRIX_PING_USER — not in git)
- Upstream check script: `/usr/local/bin/cinny-upstream-check.sh`
- Build/deploy script: `/usr/local/bin/cinny-build.sh` (triggered by webhook or manual run)
- Cron: `/etc/cron.d/cinny-upstream-check` (runs at noon daily — checks only, does not auto-build)
- Monitor state: `/var/lib/cinny-monitor/last-upstream-commit`
- Monitor log: `/var/log/cinny-monitor.log`
- Build log: `/var/log/cinny-build.log`
- Nginx site config: `/etc/nginx/sites-available/cinny`
---
## Auto-Deployment
Pushes to `main` on `LotusGuild/matrix` automatically deploy to the relevant LXC(s) via Gitea webhooks. All 4 LXCs are fully independent — each runs its own webhook listener and deploys only its own files. No cross-LXC SSH dependencies.
### How It Works
1. Push to `LotusGuild/matrix` on Gitea
2. Gitea fires webhooks to all 4 LXCs simultaneously (HMAC-SHA256 validated)
3. Each LXC runs `/usr/local/bin/matrix-deploy.sh` via the `webhook` binary
4. Script does `git fetch + reset --hard origin/main`, checks which files changed, deploys only relevant ones
5. Logs to `/var/log/matrix-deploy.log` on each LXC
### Per-LXC Webhook Endpoints
| LXC | Service | IP | Port | Deploys When Changed |
|-----|---------|----|----|----------------------|
| 151 | matrix/hookshot | 10.10.10.29 | **9500** | `hookshot/*.js`, `systemd/livekit-server.service`, `livekit/voice-limit-guard.py`, `systemd/voice-limit-guard.service`, `matrixbot/*` |
| 106 | cinny | 10.10.10.6 | 9000 | `cinny/config.json`, `cinny/upstream-check.sh`, `cinny/lotus-build.sh`, `deploy/hooks-lxc106.json`, `systemd/cinny-upstream-check.cron` |
| 139 | landing/NPM | 10.10.10.27 | 9000 | `landing/index.html` |
| 110 | draupnir | 10.10.10.24 | 9000 | `draupnir/production.yaml` |
> LXC 151 uses port **9500** because ports 90009004 are occupied by Synapse and Hookshot.
### What Each Deploy Does
**LXC 151 — hookshot/livekit:**
- `hookshot/*.js` changed → runs `hookshot/deploy.sh` (pushes transform functions to Matrix room state via API, requires `MATRIX_TOKEN` in `/etc/matrix-deploy.env`)
- `systemd/livekit-server.service` changed → copies file, `daemon-reload`, sets `/run/livekit-restart-pending` flag (actual restart deferred — see Livekit Graceful Restart below)
- `livekit/voice-limit-guard.py` / `systemd/voice-limit-guard.service` changed → `py_compile`-validates, installs to `/opt/voice-limit-guard/`, `daemon-reload` (if unit changed), and restarts `voice-limit-guard` (restart only affects joins in a ~1s window; established calls talk directly to livekit-server, so no call is dropped)
**LXC 106 — cinny:**
- `cinny/config.json` → copies to `/var/www/html/config.json`
- `cinny/upstream-check.sh` → copies to `/usr/local/bin/cinny-upstream-check.sh`, `chmod +x`
- `cinny/lotus-build.sh` → copies to `/usr/local/bin/cinny-build.sh`, `chmod +x`
- `deploy/hooks-lxc106.json` → copies to `/etc/webhook/hooks.json`, restarts `webhook` service
- `systemd/cinny-upstream-check.cron` → copies to `/etc/cron.d/cinny-upstream-check`, `chmod 644`
**LXC 139 — landing page:**
- `landing/index.html` → copies to `/var/www/matrix-landing/index.html`, `nginx -s reload`
**LXC 110 — draupnir:**
- `draupnir/production.yaml` → extracts live `accessToken` from existing config, overwrites from repo, restores token via `sed`, restarts `draupnir.service`
### Installed Components (per LXC)
- `webhook` binary (Debian package `webhook` v2.8.0) listening on respective port
- `/etc/webhook/hooks.json` — unique HMAC-SHA256 secret per LXC
- `/usr/local/bin/matrix-deploy.sh` — deploy script from this repo
- `/etc/systemd/system/webhook.service` — enabled and running
- `/opt/matrix-config/` — clone of this repo
- `/var/log/matrix-deploy.log` — deploy log
**LXC 151 additionally:**
- `/etc/matrix-deploy.env``MATRIX_TOKEN`, `MATRIX_SERVER`, `MATRIX_ROOM` (not in git)
- `/usr/local/bin/livekit-graceful-restart.sh`
- `/etc/systemd/system/livekit-graceful-restart.service` + `.timer`
**LXC 106 additionally:**
- `/etc/cinny-monitor.env``MATRIX_TOKEN`, `MATRIX_SERVER`, `MATRIX_ROOM`, `MATRIX_PING_USER` (not in git)
- `/var/lib/cinny-monitor/last-upstream-commit` — state file (tracks last-seen upstream SHA)
- `/opt/lotus-cinny/` — git clone of `code.lotusguild.org/LotusGuild/cinny` with `upstream` remote (`cinnyapp/cinny`)
- `/root/.git-credentials` — Gitea token `lxc106-lotus-cinny` (write:repository scope, revocable via Gitea UI)
- `/var/lib/cinny-monitor/last-upstream-tag` — last seen stable release tag (e.g. `v4.11.1`)
### Livekit Graceful Restart
Killing livekit-server while a call is active drops everyone. Instead:
1. Deploy to LXC 151 copies the new `livekit-server.service` and sets a `/run/livekit-restart-pending` flag
2. `livekit-graceful-restart.timer` runs every 5 minutes
3. The timer script counts established TCP connections on port 7881 (`ss -tn state established`)
4. If zero connections → restarts livekit-server and clears the flag
5. If connections exist → logs and exits, retries in 5 minutes
---
## Voice Channel Limits & Call Permissions
Per-room voice **participant caps** and **publish permissions** (screenshare / camera) are enforced **server-side for every client** (Element, FluffyChat, Lotus Chat, …), not just our own web client. Both are enforced by the same `voice-limit-guard` sidecar (`livekit/voice-limit-guard.py`), which fronts lk-jwt-service at token issue.
**How it works**
Every Matrix client must fetch a LiveKit JWT from lk-jwt-service before it can join a call. `voice-limit-guard` (a small fail-open Python sidecar) sits in front of that service:
- lk-jwt-service was moved off `:8070` to `:8071` (systemd drop-in). The guard now owns `:8070`, so NPM's existing `/sfu/get` + `/get_token` proxy targets are unchanged.
- On each token request the guard reads the room's Lotus policy from Synapse admin state (one `/state` fetch, cached 10 s): `io.lotus.voice_limit``max_users`, and `io.lotus.room_quality``allow_screenshare` / `allow_camera`. The room id is taken from the **endpoint's own field** (`/get_token``room_id`, `/sfu/get``room`) exactly as lk-jwt-service reads it, so a client sending both keys can't get a different room's policy applied than the token is minted for.
- **Participant limit** — it forwards to lk-jwt-service, and if a token is issued decodes the JWT to get the LiveKit alias (`video.room`) + requester (`sub`), then asks LiveKit `ListParticipants` how many **distinct Matrix users** are in the room. requester already present (rejoin) → allow · distinct users ≥ limit → **403** · otherwise → allow.
- **Publish permissions (screenshare / camera)** — LiveKit is a pure SFU and **cannot cap a publisher's bitrate/framerate** (no such field exists in the grant/config/API — that stays a Lotus-client-cooperative setting). But the JWT's `video.canPublishSources` **is** SFU-enforced for every client. Since the guard holds the LiveKit signing secret, when a room forbids a source it **decodes the issued token, drops `screen_share`/`screen_share_audio` (and/or `camera`) from `canPublishSources`, and re-signs it** (HS256, same key). Microphone is always kept. The SFU then rejects those tracks for **all** clients — nothing to opt into.
- **Live (mid-call) enforcement** — the JWT re-sign covers anyone *joining* after a policy change. For people **already in the call**, a background **reconcile loop** (every `GUARD_RECONCILE_INTERVAL`, default 3 s) calls LiveKit `UpdateParticipant` to narrow their `canPublishSources`, which **unpublishes an in-progress screenshare/camera server-side for all clients** and blocks re-publish (confirmed LiveKit 1.9.11 behavior: reducing `can_publish_sources` removes the offending live track). So flipping a room to audio-only kills existing cameras/screenshares within ~one interval. The loop learns each LiveKit room's Matrix id from tokens it issues, only ever **removes** forbidden sources (never grants), preserves every other permission flag (full-replace safety), and no-ops once compliant. Disable with `GUARD_RECONCILE=0`.
- **Fail-open:** any error (admin API down, bad/absent token, LiveKit unreachable, unparseable room id, unexpected JWT shape) returns the upstream response **unchanged**, so calls keep working even if enforcement is degraded. The limit check and the source-policy re-sign are **independent** (a LiveKit-admin outage during the limit count can't skip the source restriction, and vice-versa). Before re-signing, the guard **verifies its own secret actually signed the token** — on a `LIVEKIT_SECRET` mismatch it skips the restriction and passes the original token through (so a secret drift can never emit a token the SFU rejects). A room with no policy set takes a zero-overhead fast path (token untouched).
> **Security note:** `LIVEKIT_KEY`/`LIVEKIT_SECRET` are currently hardcoded in `systemd/voice-limit-guard.service` (pre-existing). Since this secret now also signs re-issued join tokens, it should be moved into `/etc/matrix-deploy.env` (already an `EnvironmentFile` on LXC 151) and the exposed value rotated. Not changed automatically to avoid a deploy breaking before the env file carries it.
Pure logic (limit decision, source narrowing, JWT re-sign/verify roundtrip, tamper detection) is unit-tested in `livekit/test_voice_limit_guard.py` (`python3 -m unittest livekit.test_voice_limit_guard`).
**Setting policy:** room admins use Lotus Chat → Room Settings → General → **Voice** (Call Permissions switches + Quality Caps). Any tool that can send room state works too:
```bash
# max 5 participants; send {} to remove the limit
curl -X PUT -H "Authorization: Bearer <admin_token>" -H "Content-Type: application/json" \
"https://matrix.lotusguild.org/_matrix/client/v3/rooms/<roomId>/state/io.lotus.voice_limit/" \
-d '{"max_users": 5}'
# forbid screenshare + make it audio-only (hard, all clients); numeric caps are
# Lotus-client-cooperative hints in the same event
curl -X PUT -H "Authorization: Bearer <admin_token>" -H "Content-Type: application/json" \
"https://matrix.lotusguild.org/_matrix/client/v3/rooms/<roomId>/state/io.lotus.room_quality/" \
-d '{"allow_screenshare": false, "allow_camera": false, "audio_max_kbps": 32}'
```
**Config:** the guard reads `MATRIX_TOKEN` (server-admin) from `/etc/matrix-deploy.env`; LiveKit key/secret + ports are set in `systemd/voice-limit-guard.service`.
**Deploy:** auto-deploys on push (LXC 151 handler `py_compile`-validates then restarts the guard). Manual (re)deploy / first-time setup:
```bash
# On LXC 151
install -D -m644 /opt/matrix-config/livekit/voice-limit-guard.py /opt/voice-limit-guard/voice-limit-guard.py
install -m644 /opt/matrix-config/systemd/voice-limit-guard.service /etc/systemd/system/voice-limit-guard.service
# one-time: rebind lk-jwt-service to :8071
mkdir -p /etc/systemd/system/lk-jwt-service.service.d
printf '[Service]\nEnvironment=LIVEKIT_JWT_BIND=:8071\n' > /etc/systemd/system/lk-jwt-service.service.d/override.conf
systemctl daemon-reload && systemctl restart lk-jwt-service && systemctl enable --now voice-limit-guard
```
**To fully revert** (back to lk-jwt-service directly on `:8070`): `systemctl disable --now voice-limit-guard`, remove the drop-in, `daemon-reload`, `systemctl restart lk-jwt-service`.
---
## Access Token Rotation
The `MATRIX_TOKEN` in `/etc/matrix-deploy.env` on LXC 151 is a Jared user token used to push hookshot transforms to Matrix room state (requires power level ≥ 50 in Spam and Stuff).
The token in `draupnir/production.yaml` in this repo is **intentionally redacted** (`accessToken: REDACTED`). The deploy script on LXC 110 extracts the live token from the running config before overwriting from the repo, then restores it.
**To rotate the hookshot deploy token (LXC 151):**
1. Generate a new token via Synapse admin API or Cinny → Settings → Security → Manage Sessions
2. SSH to LXC 151 (via `ssh root@10.10.10.4` then `pct enter 151`): `nano /etc/matrix-deploy.env`
3. Replace `MATRIX_TOKEN=<old>` with new token
4. Test: `MATRIX_TOKEN=<new> MATRIX_SERVER=https://matrix.lotusguild.org bash /opt/matrix-config/hookshot/deploy.sh`
**To rotate the Draupnir token:**
1. Generate new token for `@draupnir:matrix.lotusguild.org`
2. On LXC 110: `nano /opt/draupnir/config/production.yaml` → update `accessToken`
3. `systemctl restart draupnir`
4. Do **not** commit the token to git — the repo version stays redacted
---
## Port Maps
**Router → 10.10.10.29 (forwarded):**
- TCP+UDP 3478 — TURN/STUN
- TCP+UDP 5349 — TURNS/TLS
- TCP 7881 — LiveKit ICE TCP fallback
- TCP+UDP 49152-65535 — TURN relay range
**Internal port map (LXC 151):**
| Port | Service | Bind |
|------|---------|------|
| 8008 | Synapse HTTP | 0.0.0.0 |
| 9000 | Synapse metrics | 127.0.0.1 + 10.10.10.29 |
| 9001 | Hookshot widgets | 0.0.0.0 |
| 9002 | Hookshot bridge (appservice) | 127.0.0.1 |
| 9003 | Hookshot webhooks | 0.0.0.0 |
| 9004 | Hookshot metrics | 0.0.0.0 |
| 9100 | node_exporter | 0.0.0.0 |
| 9101 | matrix-admin exporter | 0.0.0.0 |
| 9500 | webhook (auto-deploy) | 0.0.0.0 |
| 6789 | LiveKit metrics | 0.0.0.0 |
| 7880 | LiveKit HTTP | 0.0.0.0 |
| 7881 | LiveKit RTC TCP | 0.0.0.0 |
| 8070 | voice-limit-guard (fronts lk-jwt-service) | 0.0.0.0 |
| 8071 | lk-jwt-service (behind guard) | 0.0.0.0 |
| 8080 | synapse-admin (nginx) | 0.0.0.0 |
| 3478 | coturn STUN/TURN | 0.0.0.0 |
| 5349 | coturn TURNS/TLS | 0.0.0.0 |
**Internal port map (LXC 110 — Draupnir):**
| Port | Service | Bind |
|------|---------|------|
| 8080 | Draupnir web (abuse reporting) | 0.0.0.0 |
| 8081 | Draupnir healthz | 0.0.0.0 |
| 9000 | webhook (auto-deploy) | 0.0.0.0 |
| 9100 | node_exporter | 0.0.0.0 |
| 9256 | process_exporter | 0.0.0.0 |
**Internal port map (LXC 109 — PostgreSQL):**
| Port | Service | Bind |
|------|---------|------|
| 5432 | PostgreSQL | 0.0.0.0 (hba-restricted to 10.10.10.29) |
| 9100 | node_exporter | 0.0.0.0 |
| 9187 | postgres_exporter | 0.0.0.0 |
---
## Rooms (all v12)
| Room | Room ID | Join Rule |
|------|---------|-----------|
| The Lotus Guild (Space) | `!-1ZBnAH-JiCOV8MGSKN77zDGTuI3pgSdy8Unu_DrDyc` | public |
| General | `!wfokQ1-pE896scu_AOcCBA2s3L4qFo-PTBAFTd0WMI0` | public |
| Commands | `!ou56mVZQ8ZB7AhDYPmBV5_BR28WMZ4x5zwZkPCqjq1s` | restricted (Space members) |
| Memes | `!GK6v5cLEEnowIooQJv5jECfISUjADjt8aKhWv9VbG5U` | restricted (Space members) |
| Music | `!ktQu0gavhjpCMkgxk8SYdb6mnJRY-u7mY7_KfksV0SU` | restricted (Space members) |
| Voice Room | `!ARbRFSPNp2U0MslWTBGoTT3gbmJJ25dPRL6enQntvPo` | restricted (Space members) |
| Management | `!mEvR5fe3jMmzwd-FwNygD72OY_yu8H3UP_N-57oK7MI` | invite |
| Cool Kids | `!R7DT3QZHG9P8QQvX6zsZYxjkKgmUucxDz_n31qNrC94` | invite |
| Spam and Stuff | `!GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg` | invite, **no E2EE** (hookshot) |
**Power level roles (Cinny tags):**
- 100: Owner (jared, draupnir, lotusbot)
- 50: The Nerdy Council / Panel of Geeks (enhuynh, lonely)
- 0: Member
---
## Webhook Integrations (matrix-hookshot 7.3.2)
Generic webhooks bridged into **Spam and Stuff**.
Each service gets its own virtual user (`@hookshot_<service>`) with a unique avatar.
Webhook URL format: `https://matrix.lotusguild.org/webhook/<uuid>`
| Service | Webhook UUID | Notes |
|---------|-------------|-------|
| Grafana | `df4a1302-2d62-4a01-b858-fb56f4d3781a` | Unified alerting contact point |
| Proxmox | `9b3eafe5-7689-4011-addd-c466e524661d` | Notification system (8.1+), Discord embed format |
| Sonarr | `aeffc311-0686-42cb-9eeb-6757140c072e` | All event types |
| Radarr | `34913454-c1ac-4cda-82ea-924d4a9e60eb` | All event types |
| Readarr | `e57ab4f3-56e6-4dc4-8b30-2f4fd4bbeb0b` | All event types |
| Lidarr | `66ac6fdd-69f6-4f47-bb00-b7f6d84d7c1c` | All event types |
| Uptime Kuma | `1a02e890-bb25-42f1-99fe-bba6a19f1811` | Status change notifications |
| Seerr | `555185af-90a1-42ff-aed5-c344e11955cf` | Request/approval events |
| Owncast (Livestream) | `9993e911-c68b-4271-a178-c2d65ca88499` | STREAM_STARTED / STREAM_STOPPED |
| Bazarr | `470fb267-3436-4dd3-a70c-e6e8db1721be` | Subtitle events (Apprise JSON notifier) |
| Tinker-Tickets | `6e306faf-8eea-4ba5-83ef-bf8f421f929e` | Custom transformation code |
**Hookshot notes:**
- Spam and Stuff is intentionally **unencrypted** — hookshot bridges cannot join E2EE rooms
- JS transformation functions use hookshot v2 API: `result = { version: "v2", plain, html, msgtype }`
- The `result` variable must be assigned without `var`/`let`/`const` (QuickJS IIFE sandbox)
- NPM proxies `https://matrix.lotusguild.org/webhook/*``http://10.10.10.29:9003`
- NPM proxies `/sfu/get` and `/get_token``http://10.10.10.29:8070` (lk-jwt-service). Both paths are in `/data/nginx/proxy_host/49.conf` on LXC 139 — **NPM will overwrite these if proxy host 49 is re-saved via the UI; re-add both location blocks after any NPM save**
- Proxmox sends Discord embed format: `data.embeds[0].{title,description,fields}` — NOT flat fields
- Transform functions are stored as Matrix room state (`uk.half-shot.matrix-hookshot.generic.hook`) and deployed via `hookshot/deploy.sh`
**Deploying hookshot transforms manually:**
```bash
# On LXC 151 or from any machine with access
export MATRIX_TOKEN=<jared_token>
export MATRIX_SERVER=https://matrix.lotusguild.org
export MATRIX_ROOM='!GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg'
bash /opt/matrix-config/hookshot/deploy.sh # deploy all
bash /opt/matrix-config/hookshot/deploy.sh proxmox.js # deploy one
```
---
## Moderation (Draupnir v2.9.0)
Draupnir runs on LXC 110, manages moderation across all protected rooms (including the Lotus Guild space) via `#management:matrix.lotusguild.org`.
**Subscribed ban lists:**
- `#community-moderation-effort-bl:neko.dev` — 12,599 banned users, 245 servers, 59 rooms
- `#matrix-org-coc-bl:matrix.org` — 4,589 banned users, 220 servers, 2 rooms
**Common commands (send in management room):**
```
!draupnir status — current status + protected rooms
!draupnir ban @user:server * "reason" — ban from all protected rooms
!draupnir redact @user:server — redact their recent messages
!draupnir rooms add !roomid:server — add a room to protection
!draupnir watch <alias> --no-confirm — subscribe to a ban list
```
### Abuse Reporting
When a Matrix client user clicks "Report" on a message, Synapse receives a `POST /_matrix/client/v3/rooms/{roomId}/report/{eventId}` request and stores the report internally. To forward these to the Draupnir management room, a Synapse Python module must be installed on LXC 151.
**Draupnir web server** is enabled (port 8080). The endpoint is:
```
POST http://10.10.10.24:8080/_matrix/draupnir/1/report/{roomId}/{eventId}
```
**To complete Synapse integration (one-time, on LXC 151):**
1. Install the module: `pip install matrix-synapse-draupnir-abuse-reports` (or equivalent — check Draupnir releases)
2. Add to `/etc/matrix-synapse/homeserver.yaml`:
```yaml
modules:
- module: "draupnir.abuse_reports.AbuseReportEndpoint"
config:
draupnir_endpoint: "http://10.10.10.24:8080"
```
3. `systemctl restart matrix-synapse`
> Until the Synapse module is installed, abuse reports are stored in Synapse's DB but do NOT appear in the management room. The Draupnir web server is running and ready to receive forwarded reports.
---
## Lotus Cinny (chat.lotusguild.org)
`chat.lotusguild.org` serves a custom Lotus Guild fork of the official `cinnyapp/cinny` main branch. The fork lives at `code.lotusguild.org/LotusGuild/cinny` and tracks upstream via a `git remote add upstream https://github.com/cinnyapp/cinny.git` workflow.
**Upstream monitoring (daily at noon):**
- `cinny-upstream-check.sh` hits the GitHub API and compares the latest `cinnyapp/cinny` main commit against the stored SHA in `/var/lib/cinny-monitor/last-upstream-commit`
- If new commits exist, sends a Matrix message to Spam and Stuff with an `@jared:matrix.lotusguild.org` ping and a link to the commit
- Does **not** auto-build — you review the diff and decide when to merge
**Merge + build workflow:**
1. Receive upstream notification in Matrix
2. Review the diff: `https://github.com/cinnyapp/cinny/compare/<old>...<new>`
3. Send `!cinny-update` in any Matrix room — LotusBot POSTs to the cinny-build webhook on LXC 106
4. `cinny-build.sh` runs: `git fetch upstream && git merge upstream/main`, `npm ci`, `npm run build`, deploys to `/var/www/html/`
5. Build result (success or conflict) is posted back to Matrix
**Manual build (SSH):**
```bash
# On LXC 106
/usr/local/bin/cinny-build.sh
```
**Merge conflict recovery:**
```bash
# On LXC 106
cd /opt/lotus-cinny
git merge upstream/main # resolve conflicts in editor
git add -A && git merge --continue
/usr/local/bin/cinny-build.sh
```
**LXC 106 one-time setup** (after forking `cinnyapp/cinny` to `code.lotusguild.org/LotusGuild/cinny`):
```bash
# On LXC 106
git clone https://code.lotusguild.org/LotusGuild/cinny.git /opt/lotus-cinny
cd /opt/lotus-cinny
git remote add upstream https://github.com/cinnyapp/cinny.git
git fetch upstream
# Create env file (fill in a valid Matrix token)
cat > /etc/cinny-monitor.env << 'EOF'
MATRIX_TOKEN=<jared_or_bot_token>
MATRIX_SERVER=https://matrix.lotusguild.org
MATRIX_ROOM=!GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg
MATRIX_PING_USER=@jared:matrix.lotusguild.org
EOF
chmod 600 /etc/cinny-monitor.env
```
**Cinny-build webhook token** (for LotusBot `!cinny-update`): stored in `deploy/hooks-lxc106.json` (`cinny-build` hook, header `X-Build-Token`). LotusBot must POST to `http://10.10.10.6:9000/hooks/cinny-build` with this header.
**Why 8GB RAM:** Vite's build process needs ~6GB Node heap (`--max_old_space_size=6144`) for the rendering-chunks phase. Previously at 4GB — OOM killed during render.
### 🔱 Element Call fork — "Lotus Call" (true ownership) — LIVE
We **self-build** Element Call from a fork (`LotusGuild/element-call`) and publish
it to our private Gitea npm registry as `@lotusguild/element-call-embedded`
(`0.20.1-lotus.1`); cinny consumes that instead of the upstream
`@element-hq/element-call-embedded` bundle. In-call behavior is now editable
source, not just widget-API + DOM steering. This is AGPL (same license).
**Shipped via the fork:** in-source denoise (a LiveKit `TrackProcessor` that
survives reconnects), in-call speaking/mute events, focus-a-participant during
screenshare, avatar decorations on EC video tiles, native transparent background.
**Built but dormant (need cinny UI):** call-audio injection
(`io.lotus.inject_audio`, unblocks a real in-call soundboard) and quality controls
(`io.lotus.set_quality`).
Infra notes for THIS repo:
- EC talks to our **LiveKit SFU** (`livekit/`, LXC 151) + `lk-jwt-service`; the
fork's runtime `config.json` points at `matrix.lotusguild.org` + our LiveKit.
The cinny EC `config.json` lives in `cinny/config.json` here.
- **Build/deploy:** the fork builds in the cinny pipeline (its `dist/` is bundled
into the cinny build that LXC 106 serves) — no separate EC LXC. A future quality
controls feature (P5-31) would add a `voice-limit-guard`-style sidecar on LXC 151.
**Full handoff & step-by-step plan:** `LotusGuild/cinny` →
[`HANDOFF_ELEMENT_CALL_FORK.md`](https://code.lotusguild.org/LotusGuild/cinny/src/branch/lotus/HANDOFF_ELEMENT_CALL_FORK.md).
### Custom Features
All custom code lives in `src/app/` on the `lotus` branch of `code.lotusguild.org/LotusGuild/cinny`. Changes survive upstream merges as long as they don't conflict with the same files upstream touched.
| Feature | Files | Notes |
|---------|-------|-------|
| **Element Call embed** | `src/app/plugins/call/`, `src/app/hooks/useCallEmbed.ts`, `src/app/components/CallEmbedProvider.tsx` | 🔱 **[EC-FORK] LIVE** — self-built fork `@lotusguild/element-call-embedded@0.20.1-lotus.1` (source `LotusGuild/element-call`), bundled into the cinny build, served same-origin. Steered via `matrix-widget-api` + custom `io.lotus.*` actions (call_state, focus_participant, decorations, inject_audio, set_quality) — DOM-poking retained only as fallback. See `LotusGuild/cinny` → `HANDOFF_ELEMENT_CALL_FORK.md` |
| **DM calls** | `src/app/features/room/Room.tsx`, `src/app/features/room/RoomViewHeader.tsx` | Phone button in DM room header; `useCallStart(true)` passes `intent: StartedByUser`; Room.tsx switches to CallView layout when DM has active call |
| **Picture-in-picture call** | `src/app/components/CallEmbedProvider.tsx` | When navigating away from the call room, the embed shrinks to a 280×158px PiP in the bottom-right. Click navigates back. Implemented via `useEffect` imperatively overriding styles on `callEmbedRef.current` — cannot use a wrapper div because `useCallEmbedPlacementSync` writes `top/left/width/height` directly onto that element |
| **Screenshare fullscreen** | `src/app/features/call/CallControls.tsx`, `src/app/features/call/Controls.tsx` | When screensharing, a fullscreen button appears in call controls. Calls `callEmbedRef.current?.requestFullscreen()` on the Cinny call container. EC naturally spotlights the screenshare — the old 600ms grid-revert code was removed (it caused fullscreen to show avatars instead of the screen) |
| **PiP screenshare focus** | `src/app/components/CallEmbedProvider.tsx`, `src/app/plugins/call/CallControl.ts` | When the floating PiP window is active and screenshare is detected (no cameras present), auto-enables EC spotlight view so the screenshare fills the PiP rather than showing avatar tiles |
| **Screenshare audio mute** | `src/app/features/call/Controls.tsx`, `src/app/features/call/CallControls.tsx`, `src/app/plugins/call/CallControl.ts` | Dedicated button to independently mute/unmute audio from screenshares without muting microphone audio. Targets `audio[data-lk-source="screen_share_audio"]` LiveKit elements. Persists across deafen/undeafen cycles |
| **In-call soundboard (P5-15)** | `src/app/features/call/CallSoundboard.tsx`, `src/app/hooks/useSoundboard.ts`, `src/app/utils/soundboardClips.ts`, `CallControl.ts#injectAudio` | Call-bar popout of user-uploaded clips. Playing one sends the fork's `io.lotus.inject_audio` (armed via `lotusAudioInject=1`) so it publishes as a real LiveKit track heard by all, plus local playback. Clips are uploadable like emoji/sticker packs — stored in `io.lotus.soundboard` account data (synced across devices); host resolves mxc → authed download → `blob:` URL for the widget. Gated by `soundboardEnabled` setting |
| **Call quality controls (P5-31)** | `src/app/utils/callQuality.ts`, `src/app/hooks/useCallQuality.ts`, `CallControl.ts#setQuality` | Per-user mic/screenshare bitrate + screenshare framerate (Settings → Calls), applied via the fork's `io.lotus.set_quality`, clamped to any room cap (`min(user, room)`). **Client-cooperative** (numeric caps aren't SFU-enforceable). Unit-tested |
| **Room call permissions (P5-31)** | `src/app/features/common-settings/general/RoomQuality.tsx`, `types/matrix/room.ts` (`LotusRoomQuality`), `CallControls.tsx` | Admin switches in Room Settings → Voice write `io.lotus.room_quality` `allow_screenshare`/`allow_camera`; the call bar hides blocked buttons. **Hard-enforced server-side for all clients** by `voice-limit-guard` (this repo) — see [Voice Channel Limits & Call Permissions](#voice-channel-limits--call-permissions) |
| **Custom status message** | `src/app/features/settings/account/Profile.tsx`, `src/app/features/room/MembersDrawer.tsx`, `src/app/components/user-profile/UserHero.tsx`, `src/app/components/user-profile/UserRoomProfile.tsx`, `src/app/hooks/useUserPresence.ts` | Discord-style free-form status text. Set via Settings → Account → "Status Message" with an emoji picker (lazy-loaded `EmojiBoard`). Saved via `mx.setPresence({ status_msg })`. Displayed below the username in the members drawer and user profile popout. Syncs live via Matrix presence events |
| **PTT (Push-to-Talk)** | `src/app/features/call/CallControls.tsx`, `src/app/state/settings.ts` | Hold-to-talk key (default: Space, configurable). Mutes mic on join; holds mic open while key is held. Badge shows `PTT — Hold SPACE` / `● Live`. Listens on both main window and EC iframe `contentWindow` for key events |
| **PTT badge theming** | `src/app/features/call/CallControls.tsx` | Plain folds `Chip` by default; neon terminal style (`#00FF88`/`#FF6B00`, JetBrains Mono) when `lotusTerminal` setting is on |
| **GIF picker** | `src/app/components/GifPicker.tsx`, `src/app/features/room/RoomInput.tsx` | Giphy JS/React SDK (`@giphy/react-components`, `@giphy/js-fetch-api`, `styled-components`). API key in `config.json` → `gifApiKey`. GIF button appears next to Send only when `gifApiKey` is set. Sends GIF as `m.image` (fetches blob → `mx.uploadContent` → `mx.sendMessage`). `FocusTrap` handles click-outside / Escape to close |
| **GIF picker terminal theme** | `src/app/components/GifPicker.tsx` | When `lotusTerminal` is on: dark navy background (`#060c14`), orange dim border, 4px radius, `// GIF_SEARCH` header, injected `<style>` overrides Giphy SDK SearchBar input (dark bg, orange border/focus ring, JetBrains Mono), custom orange scrollbar |
| **Terminal Design System toggle** | `src/app/state/settings.ts`, `src/app/features/settings/` | `lotusTerminal` boolean setting. When enabled: PTT badge, GIF picker, and voice message recorder use LotusGuild Terminal Design System aesthetics (green #00FF88 / orange #FF6B00, JetBrains Mono) |
| **Presence status badges** | `src/app/features/room/MembersDrawer.tsx`, `src/app/features/common-settings/members/Members.tsx`, `src/app/hooks/useUserPresence.ts`, `src/app/components/presence/` | Online/busy/away colored dot badges shown next to verification shields for every member in the room members drawer and settings members panel. Uses `useUserPresence(userId)` hook + `PresenceBadge` component. Members.tsx wraps the hook in a `MemberPresenceBadge` child component to satisfy React hook rules inside `.map()` |
| **Discord-style presence tracking** | `src/app/hooks/usePresenceUpdater.ts`, `src/app/pages/client/ClientNonUIFeatures.tsx` | Broadcasts `online` on startup, `unavailable` after 10 min idle or tab hidden, `offline` on page close (fetch+keepalive). Activity throttled to 1 event/sec. `hidePresence` setting broadcasts offline and disables all tracking |
| **Per-member device sessions panel** | `src/app/components/user-profile/UserRoomProfile.tsx`, `src/app/hooks/useOtherUserDevices.ts` | Collapsible "Sessions" card in user profile popout. Lists all devices with colored shield icons (green=verified, yellow=unverified, loading/error states). Per-device "Verify" button initiates cross-signing SAS emoji verification. Updates live via `CryptoEvent.DevicesUpdated`. Only shown when cross-signing is active |
| **Privacy settings** | `src/app/features/settings/general/General.tsx`, `src/app/state/settings.ts` | Dedicated Privacy section in General settings. `hideActivity` suppresses typing indicators and read receipts. `hidePresence` appears offline to everyone |
| **Encrypted room search** | `src/app/features/message-search/useLocalMessageSearch.ts`, `src/app/features/message-search/MessageSearch.tsx` | Searches locally cached decrypted events in E2EE rooms alongside server-side search. Per-room "Load more" buttons paginate 100 msgs at a time; shows oldest cached date and X/Y coverage counter. Sender-aware (respects `from:@user` filter) |
| **Message search: sender filter** | `src/app/features/message-search/SearchInput.tsx`, `src/app/features/message-search/SearchFilters.tsx` | Type `from:@user` in the search box for live autocomplete of known users (homeserver-biased ranking). Selected senders shown as removable chips. Works for both server search and local encrypted search |
| **Message search: date range** | `src/app/features/message-search/SearchFilters.tsx`, `src/app/features/message-search/useMessageSearch.ts` | From/To date pickers in the filter bar. Passed as `from_ts`/`to_ts` epoch ms to Matrix `/search` |
| **Document title unread count** | `src/app/pages/client/ClientNonUIFeatures.tsx` | Tab title updates to `(N) Lotus Chat` for mentions, `· Lotus Chat` for unreads, `Lotus Chat` when clear |
| **Message draft persistence** | `src/app/features/room/RoomInput.tsx` | Unsent messages survive page reload via `localStorage` (`draft-msg-<roomId>`). Jotai in-memory atom remains the primary store; localStorage used as fallback on reload. Cleared on send |
| **PiP position persistence + snap** | `src/app/components/CallEmbedProvider.tsx` | PiP position saved to `localStorage` on drag end; restored on next PiP enter (clamped to viewport). Double-click snaps to nearest corner with 180ms CSS transition |
| **Threads (P3-8 + P4-1)** | `src/app/features/room/thread/`, `state/room/thread.ts`, `utils/threadNotifications.ts`, `hooks/useRoomsListener.ts` | Full m.thread support: side panel (own composer, per-thread drafts), "N replies" unread chips, threaded receipts; SDK `threadSupport` on, markAsRead unthreaded; replies no longer render inline. **Slack-style notifications**: default = participating-only, per-thread All/Mentions/Mute in `io.lotus.thread_notifications` account data; muted threads subtracted from room badges client-side |
| **KaTeX math + encrypted-search cache + session hardening + crypto diagnostics** | `utils/{mathParse,searchCache,cryptoDiagLog}.ts`, `state/sessions.ts`, `LOTUS_E2EE_INVESTIGATION.md` | July 2026 batch: `$…$`/`$$…$$` + `data-mx-maths` via lazy KaTeX; opt-in IndexedDB search index for E2EE rooms (wiped on logout); atomic `cinny_session_v1` blob + cross-tab logout sync; KE-1→4 diagnostics capture card in Developer Tools |
| **Desktop app (Tauri)** | `cinny-desktop` → `src-tauri/src/native/*.rs`, `src-tauri/src/lib.rs`; cinny `src/app/hooks/useTauri*.ts`, `src/app/components/TauriDesktopFeatures.tsx` | Tauri v2 native shell: rich WinRT toast notifications (click → open room, inline quick reply), Windows Focus Assist → DND sync, taskbar Jump List of recent rooms, taskbar thumbnail + volume-flyout call controls (mute/deafen/end), no-sleep during calls, network-change awareness (`mx.retryImmediately`), opt-in TDS window chrome, recursive folder drag-drop, auto-update toast. Windows-native pieces compile in CI (Gitea `windows` runner + GitHub `windows-latest`); detail in cinny `LOTUS_FEATURES.md` → Desktop App Features |
| **LiveKit codec config** | `/etc/livekit/config.yaml` (LXC 151) | `enabled_codecs`: VP8, H264, VP9, Opus, RED for better quality and redundancy |
**Key config values (`/opt/lotus-cinny/config.json`, root — vite copies this to dist):**
```json
{
"defaultHomeserver": 0,
"homeserverList": ["matrix.lotusguild.org"],
"allowCustomHomeservers": false,
"gifApiKey": "AqqDuQwZNjYttz7Mn6ME4JH1bJIuZ5CO"
}
```
> Note: The root `/opt/lotus-cinny/config.json` is what matters — vite copies it to `dist/`. `public/config.json` is not used.
---
## Known Issues
### LiveKit Port Conflict After HA Migration
LXC 151 can migrate between Proxmox nodes via HA. After migration, the old livekit-server process on the source node can leave a stale entry holding port 7881 on the destination. Fixed in `livekit-server.service` via:
```ini
ExecStartPre=-/bin/bash -c 'pkill -x livekit-server; sleep 1'
KillMode=control-group
```
### coturn TLS Reset Errors
Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal — clients probe TURN and drop once they establish a direct P2P path.
### BBR Congestion Control
`net.ipv4.tcp_congestion_control = bbr` must be set on the Proxmox host, not inside an unprivileged LXC. All other sysctl tuning (TCP/UDP buffers, fin_timeout) is applied inside LXC 151.
---
## Server Checklist
## Server Checklist
### Quality of Life
- [x] **Upgrade Synapse to v1.155.0** — Done 2026-06-18. LXC 151 was already on Debian 13 Trixie; no OS migration needed.
- [x] Migrate from SQLite to PostgreSQL
- [x] TURN/STUN server (coturn) for reliable voice/video
- [x] URL previews
- [x] Upload size limit 200MB
- [x] Full-text message search (PostgreSQL backend)
- [x] Media retention policy (remote: 1yr, local: 3yr)
- [x] Sliding sync (native Synapse)
- [x] LiveKit for Element Call video rooms
- [x] Default room version v12, all rooms upgraded
- [x] Landing page with client recommendations
- [x] Synapse metrics endpoint (port 9000, Prometheus-compatible)
- [x] Lotus Cinny fork — custom fork of `cinnyapp/cinny` main, daily upstream check + Matrix notification
- [x] Auto-deployment via Gitea webhooks (all 4 LXCs)
- [ ] Push notifications gateway (Sygnal) — needs Apple/Google developer credentials
- [ ] Lotus Cinny custom branding — Lotus Guild theme (colours, title, favicon, PWA name)
### Performance Tuning
- [x] PostgreSQL `shared_buffers` → 1500MB, `effective_cache_size`, `work_mem`, checkpoint tuning
- [x] PostgreSQL `pg_stat_statements` extension installed
- [x] PostgreSQL autovacuum tuned per-table (5 high-churn tables), `autovacuum_max_workers` → 5
- [x] Synapse `event_cache_size` → 30K, per-cache factors tuned
- [x] sysctl TCP/UDP buffer alignment on LXC 151 (`/etc/sysctl.d/99-matrix-tuning.conf`)
- [x] LiveKit: `empty_timeout: 300`, `departure_timeout: 20`, `max_participants: 50`
- [x] LiveKit ICE port range expanded to 50000-51000
- [x] LiveKit TURN TTL reduced to 1h
- [x] LiveKit VP9/AV1 codecs enabled
- [x] TCP retransmit timeout lowered (`tcp_retries2=5`, `tcp_syn_retries=4`, `tcp_keepalive_probes=3`) — stalled outbound federation connections now fail in ~15-30s instead of ~15 min
- [x] Unreachable routes added for servers with asymmetric connectivity (can reach us but we can't reach their federation port) — prevents 90s TCP hangs from being added to lag; defined in `/etc/network/interfaces` post-up hooks and survive reboots (bark.lgbt ×2, parodia.dev, chat.ohaa.xyz, matrix.k8ekat.dev)
- [x] Stuck `device_lists_remote_resync` entries cleared for dead-server users (@dalite:bark.lgbt, @arndot:matrix.goch.social) — device list resync was firing every 30s
- [ ] BBR congestion control — must be applied on Proxmox host
### Auth & SSO
- [x] Token-based registration
- [x] SSO/OIDC via Authelia
- [x] `allow_existing_users: true` for linking accounts to SSO
- [x] Password auth alongside SSO
- [x] Terms of Service / consent enforcement — `require_at_registration: false`, `block_events_error` set; new users cannot send messages until they explicitly accept via `/_matrix/consent`; Synapse sends a Server Notice DM with the consent URL on first blocked send
### Webhooks & Integrations
- [x] matrix-hookshot 7.3.2 — 11 active webhook services
- [x] Per-service JS transformation functions (stored in git, auto-deployed)
- [x] Per-service virtual user avatars
- [x] NPM reverse proxy for `/webhook` path
### Room Structure
- [x] The Lotus Guild space with all core rooms
- [x] Correct power levels and join rules per room
- [x] Custom room avatars
- [x] Voice room visible to space members (`suggested: true`)
### Hardening
- [x] Rate limiting
- [x] E2EE on all rooms (except Spam and Stuff — intentional for hookshot)
- [x] coturn internal peer deny rules (blocks relay to RFC1918; `allowed-peer-ip` scoped to 10.10.10.29 only — LiveKit host)
- [x] coturn TCP relay disabled (`no-tcp-relay=true`) — UDP only, reduces internal network SSRF risk
- [x] coturn hardening: `stale-nonce=600`, `user-quota=100`, `total-quota=1000`, strong cipher list
- [x] `rc_joins` and `rc_invites` rate limits explicitly set in homeserver.yaml
- [x] `pg_hba.conf` locked down — remote access restricted to Synapse LXC only
- [x] Federation open with key verification
- [x] fail2ban on Synapse login endpoint (5 retries / 24h ban)
- [x] Synapse metrics port 9000 restricted to `127.0.0.1` + `10.10.10.29`
- [x] coturn cert auto-renewal — daily sync cron on compute-storage-01
- [x] `/.well-known/matrix/client` and `/server` live on lotusguild.org
- [x] `suppress_key_server_warning: true`
- [x] Automated database + media backups
- [x] Federation bad-actor blocking via Draupnir ban lists (17,000+ entries)
- [x] Webhook HMAC-SHA256 validation on all auto-deploy endpoints
### Monitoring
- [x] Grafana dashboard — `dashboard.lotusguild.org/d/matrix-synapse-dashboard` (140+ panels, Draupnir section added)
- [x] Prometheus scraping all Matrix services (Synapse, Hookshot, LiveKit, node_exporter, postgres, Draupnir)
- [x] 15 active alert rules across matrix-folder and infra-folder (includes Draupnir Down)
- [x] Uptime Kuma monitors: Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt-service, Hookshot
- [x] Draupnir: node_exporter (9100), process_exporter (9256), healthz probe via blackbox (8081)
### Admin
- [x] Synapse admin API dashboard (synapse-admin at http://10.10.10.29:8080)
- [x] Draupnir moderation bot — LXC 110, v2.9.0, all rooms + space, 2 ban lists
- [ ] Lotus Cinny custom branding — fork live at code.lotusguild.org/LotusGuild/cinny
---
## Monitoring & Observability
### Prometheus Scrape Jobs
| Job | Target | Metrics |
|-----|--------|---------|
| `synapse` | `10.10.10.29:9000` | Full Synapse internals |
| `matrix-admin` | `10.10.10.29:9101` | DAU, MAU, room/user/media totals |
| `livekit` | `10.10.10.29:6789` | Rooms, participants, packets, latency |
| `hookshot` | `10.10.10.29:9004` | Connections, API calls/failures, Node.js runtime |
| `matrix-node` | `10.10.10.29:9100` | CPU, RAM, network, load average, disk |
| `postgres` | `10.10.10.44:9187` | pg_stat_database, connections, WAL, block I/O |
| `postgres-node` | `10.10.10.44:9100` | CPU, RAM, network, load average, disk |
| `draupnir-node` | `10.10.10.24:9100` | CPU, RAM, network, load average, disk |
| `draupnir-process` | `10.10.10.24:9256` | Process CPU/memory/threads/uptime (process_exporter) |
| `draupnir-healthz` | `10.10.10.24:8081/healthz` → `127.0.0.1:9115` | `probe_success` (1=healthy, 0=disconnected) via blackbox exporter |
> **Disk I/O:** All servers use Ceph-backed storage. Per-device disk I/O metrics are meaningless — use Network I/O panels to see actual storage traffic.
### Alert Rules
**Matrix folder:**
| Alert | Fires when | Severity |
|-------|-----------|----------|
| Synapse Down | `up{job="synapse"}` < 1 for 2m | critical |
| PostgreSQL Down | `pg_up` < 1 for 2m | critical |
| LiveKit Down | `up{job="livekit"}` < 1 for 2m | critical |
| Hookshot Down | `up{job="hookshot"}` < 1 for 2m | critical |
| Draupnir Down | `up{job="draupnir-node"}` < 0.5 for 2m | critical |
| PG Connection Saturation | connections > 80% of max for 5m | warning |
| Federation Queue Backing Up | pending PDUs > 100 for 10m | warning |
| Synapse High Memory | RSS > 2000MB for 10m | warning |
| Synapse High Response Time | p99 latency (excl. /sync) > 10s for 5m | warning |
| Synapse Event Processing Lag | any processor > 300s behind for 15m | warning |
| Synapse DB Query Latency High | p99 query time > 1s for 5m | warning |
**Infrastructure folder:**
| Alert | Fires when | Severity |
|-------|-----------|----------|
| Service Exporter Down | any `up == 0` for 3m | critical |
| Node High CPU Usage | CPU > 90% for 10m | warning |
| Node High Memory Usage | RAM > 90% for 10m | warning |
| Node Disk Space Low | available < 15% (excl. tmpfs/overlay) for 10m | warning |
> **`/sync` long-poll:** The Matrix `/sync` endpoint is a long-poll (clients hold it open ≤30s). It is excluded from the High Response Time alert to prevent false positives.
> **Synapse Event Processing Lag** alert fires when `synapse_event_processing_lag > 300s` for 15 consecutive minutes (threshold raised from 120s/5m to reduce noise from normal federation backoff cycling).
>
> Root cause: several federated servers (bark.lgbt, parodia.dev, etc.) have asymmetric connectivity — they can reach us but we cannot reach their federation ports. Each inbound transaction they send resets our backoff to 0, triggering a new outbound connection attempt that hangs for ~90s (TCP `User timeout`). This causes the lag metric to spike. Mitigations in place:
> 1. `tcp_retries2=5` in `/etc/sysctl.d/99-matrix-tuning.conf` — TCP hangs now fail in ~15-30s
> 2. `ip route add unreachable <ip>` in `/etc/network/interfaces` post-up — outbound connections to these servers fail in 0ms (ICMP unreachable)
> 3. Alert threshold raised to 300s/15m — only fires for genuine outages, not normal 10-min backoff cycles
>
> To find new offending servers: `grep "User timeout\|ConnectingCancell" /var/log/matrix-synapse/homeserver.log | grep -oP "\[([^\]]+)\]" | sort | uniq -c | sort -rn | head -20`
---
## LotusBot
LotusBot (`@lotusbot:matrix.lotusguild.org`) is a Matrix bot running on LXC 151 at `/opt/matrixbot/`.
All commands use the `!` prefix. Run `!help` in any room for the full list.
### AI / Fun
| Command | Description |
|---------|-------------|
| `!ask <question>` | Ask the AI anything |
| `!fortune` | Get a fortune cookie |
| `!8ball <question>` | Magic 8-ball (yes/no/maybe, funny style). `--debug` shows raw AI output |
| `!roast @user` | Roast someone |
| `!story <prompt>` | Generate a short story |
| `!debate <topic>` | AI argues both sides of a topic |
### Games
| Command | Description |
|---------|-------------|
| `!wordle` | Daily Wordle-style word game |
| `!trivia [category]` | Trivia question (gaming/tech/movies/music/science/anime/etc.) |
| `!rps <rock\|paper\|scissors>` | Rock Paper Scissors |
| `!poll <question> \| option1 \| option2...` | Create a reaction poll |
| `!hangman [--hard] [--extended]` | Hangman — `--hard` uses long words, `--extended` adds more body parts |
| `!guess <letter or word>` | Guess a letter or the full word in hangman |
| `!scramble` | Unscramble the word before time runs out |
| `!wyr` | Would You Rather — two AI-generated options, vote with reactions |
| `!riddle` | AI generates a riddle — try to solve it! |
| `!numguess` | Number Guess — bot picks 1100 |
| `!ng <number>` | Guess in an active number game (temperature hints included) |
| `!wordchain` | Word Chain — each word must start with the last letter of the previous |
| `!wc <word>` | Add a word to the chain |
| `!endwc` | End the word chain and see the final score |
| `!acronym` | AI picks an acronym — submit the funniest expansion with `!ac` then vote |
| `!ac <expansion>` | Submit an acronym expansion |
| `!20q` | 20 Questions — AI thinks of something, you ask yes/no questions |
| `!q <question>` | Ask a yes/no question in 20Q |
| `!answer <guess>` | Guess the answer in 20Q |
| `!nhie` | Never Have I Ever — react 🙋 (have) or 🙅 (never) |
| `!hottake` | AI generates a hot take — react 🔥 (agree) or 💧 (disagree) |
| `!ttt @user` | Tic-Tac-Toe — challenge someone |
| `!move <1-9>` | Make a move in Tic-Tac-Toe |
| `!blackjack` | Play Blackjack against the dealer |
| `!hit` | Draw another card in Blackjack |
| `!stand` | Stand — dealer plays out |
| `!triviaduel @user` | Trivia Duel — first-to-3 battle |
| `!da <A/B/C/D or answer>` | Answer in a Trivia Duel |
### Random
| Command | Description |
|---------|-------------|
| `!flip` | Flip a coin |
| `!roll [NdN]` | Roll dice (e.g. `!roll 2d6`) |
| `!random <min> <max>` | Random number in range |
| `!champion` | Pick a random champion |
| `!agent [role]` | Pick a random Valorant agent |
### Server
| Command | Description |
|---------|-------------|
| `!minecraft` | Check Minecraft server status |
| `!ping` | Check bot latency |
| `!health` | Bot health + uptime stats |
---
## Tech Stack
| Component | Technology | Version |
|-----------|-----------|---------|
| Homeserver | Synapse | 1.155.0 |
| Database | PostgreSQL | 17.9 |
| TURN | coturn | latest |
| Video/voice calls | LiveKit SFU | 1.9.11 |
| LiveKit JWT | lk-jwt-service | latest |
| Moderation | Draupnir | 2.9.0 |
| SSO | Authelia (OIDC) + LLDAP | — |
| Webhook bridge | matrix-hookshot | 7.3.2 |
| Reverse proxy | Nginx Proxy Manager | — |
| Web client | Lotus Cinny (fork of `cinnyapp/cinny` main) | custom |
| Element Call embed | `@lotusguild/element-call-embedded` (self-built fork of `element-hq/element-call`) | 0.20.1-lotus.1 |
| GIF picker | Giphy JS/React SDK (`@giphy/react-components`) | — |
| Auto-deploy | adnanh/webhook | 2.8.0 |
| Bot language | Python 3 | 3.x |
| Bot library | matrix-nio (E2EE) | latest |
| Bot dependencies | matrix-nio[e2ee], aiohttp, python-dotenv, mcrcon | — |