diff --git a/README.md b/README.md index 959bba5..30800c6 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ -# Lotus Matrix Bot & Server Roadmap +# Lotus Matrix Infrastructure -Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lotusguild.org`). +Matrix server infrastructure for the Lotus Guild homeserver (`matrix.lotusguild.org`). -**Repo**: https://code.lotusguild.org/LotusGuild/matrixBot +**Repo**: https://code.lotusguild.org/LotusGuild/matrix ## Status: Phase 7 — Moderation & Client Customisation @@ -25,17 +25,54 @@ Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lot --- +## Repo Structure + +``` +matrix/ +├── hookshot/ # Hookshot JS transformation functions (one file per webhook) +│ ├── deploy.sh # Deploys all .js files to Matrix room state via API +│ ├── proxmox.js +│ ├── grafana.js +│ ├── uptime-kuma.js +│ └── ... # One .js per webhook service +├── cinny/ +│ ├── config.json # Cinny homeserver config (deployed to /var/www/html/config.json) +│ └── dev-update.sh # Nightly build script for Cinny dev branch +├── landing/ +│ └── index.html # matrix.lotusguild.org landing page +├── draupnir/ +│ └── production.yaml # Draupnir config (access token is redacted — see rotation docs below) +├── deploy/ # Auto-deployment infrastructure +│ ├── lxc151-hookshot.sh # Deploy script for LXC 151 (matrix/hookshot/livekit) +│ ├── lxc106-cinny.sh # Deploy script for LXC 106 (cinny) +│ ├── lxc139-landing.sh # Deploy script for LXC 139 (landing page) +│ ├── lxc110-draupnir.sh # Deploy script for LXC 110 (draupnir) +│ ├── livekit-graceful-restart.sh # Waits for zero active calls before restarting livekit +│ ├── hooks-lxc151.json # webhook binary config for LXC 151 +│ ├── hooks-lxc106.json # webhook binary config for LXC 106 +│ ├── hooks-lxc139.json # webhook binary config for LXC 139 +│ └── hooks-lxc110.json # webhook binary config for LXC 110 +└── systemd/ + ├── livekit-server.service # LiveKit systemd unit (with HA migration fix) + ├── livekit-graceful-restart.service # oneshot — checks pending restart flag + ├── livekit-graceful-restart.timer # Runs every 5 min + ├── draupnir.service + └── cinny-dev-update.cron # Installed to /etc/cron.d/ on LXC 106 +``` + +--- + ## Infrastructure | Service | IP | LXC | RAM | vCPUs | Disk | Versions | |---------|----|-----|-----|-------|------|----------| | Synapse | 10.10.10.29 | 151 | 8GB | 4 (Ryzen 9 7900) | 50GB | Synapse 1.149.0, LiveKit 1.9.11, hookshot 7.3.2, coturn latest | | PostgreSQL 17 | 10.10.10.44 | 109 | 6GB | 3 (Ryzen 9 7900) | 30GB | PostgreSQL 17.9 | -| Cinny Web | 10.10.10.6 | 106 | 256MB | 1 | 8GB | Debian 13, nginx, Node 24, Cinny 4.10.5 | +| Cinny Web | 10.10.10.6 | 106 | 2GB | 1 | 8GB | Debian 12, nginx, Node 24, Cinny `dev` branch (nightly build) | | Draupnir | 10.10.10.24 | 110 | 1GB | 2 (Ryzen 9 7900) | 10GB | Draupnir v2.9.0, Node.js v22 | | Prometheus | 10.10.10.48 | 118 | — | — | — | Prometheus — scrapes all Matrix services | | Grafana | 10.10.10.49 | 107 | — | — | — | Grafana 12.4.0 — dashboard.lotusguild.org | -| NPM | 10.10.10.27 | 139 | — | — | — | Nginx Proxy Manager | +| NPM | 10.10.10.27 | 139 | — | — | — | Nginx Proxy Manager + matrix landing page | | Authelia | 10.10.10.36 | 167 | — | — | — | SSO/OIDC provider | | LLDAP | 10.10.10.39 | 147 | — | — | — | LDAP user directory | | Uptime Kuma | 10.10.10.25 | 101 | — | — | — | Uptime monitoring (micro1 node) | @@ -51,7 +88,9 @@ Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lot - Hookshot config: `/opt/hookshot/config.yml` - Hookshot registration: `/etc/matrix-synapse/hookshot-registration.yaml` - Bot: `/opt/matrixbot/`, service: `matrixbot.service` -- Landing page: `/var/www/matrix-landing/index.html` (on NPM LXC 139) +- Repo clone (auto-deploy): `/opt/matrix-config/` +- Deploy env: `/etc/matrix-deploy.env` (MATRIX_TOKEN, MATRIX_SERVER, MATRIX_ROOM) +- Deploy log: `/var/log/matrix-deploy.log` **Key paths on Draupnir LXC (110):** - Install path: `/opt/draupnir/` @@ -70,12 +109,98 @@ Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lot - Data directory: `/var/lib/postgresql/17/main` **Key paths on Cinny LXC (106):** -- Source: `/opt/cinny/` (branch: `add-joined-call-controls`) +- Source: `/opt/cinny-dev/` (branch: `dev`, auto-updated nightly at 3am) - Built files: `/var/www/html/` - Cinny config: `/var/www/html/config.json` -- Config backup (survives rebuilds): `/opt/cinny-config.json` +- Config backup (survives rebuilds): `/opt/cinny-dev/.cinny-config.json` +- Dev update script: `/usr/local/bin/cinny-dev-update.sh` +- Cron: `/etc/cron.d/cinny-dev-update` (runs at 3:00am daily) - Nginx site config: `/etc/nginx/sites-available/cinny` -- Rebuild script: `/usr/local/bin/cinny-update` + +--- + +## Auto-Deployment + +Pushes to `main` on `LotusGuild/matrix` automatically deploy to the relevant LXC(s) via Gitea webhooks. All 4 LXCs are fully independent — each runs its own webhook listener and deploys only its own files. No cross-LXC SSH dependencies. + +### How It Works + +1. Push to `LotusGuild/matrix` on Gitea +2. Gitea fires webhooks to all 4 LXCs simultaneously (HMAC-SHA256 validated) +3. Each LXC runs `/usr/local/bin/matrix-deploy.sh` via the `webhook` binary +4. Script does `git fetch + reset --hard origin/main`, checks which files changed, deploys only relevant ones +5. Logs to `/var/log/matrix-deploy.log` on each LXC + +### Per-LXC Webhook Endpoints + +| LXC | Service | IP | Port | Deploys When Changed | +|-----|---------|----|----|----------------------| +| 151 | matrix/hookshot | 10.10.10.29 | **9500** | `hookshot/*.js`, `systemd/livekit-server.service` | +| 106 | cinny | 10.10.10.6 | 9000 | `cinny/config.json`, `cinny/dev-update.sh` | +| 139 | landing/NPM | 10.10.10.27 | 9000 | `landing/index.html` | +| 110 | draupnir | 10.10.10.24 | 9000 | `draupnir/production.yaml` | + +> LXC 151 uses port **9500** because ports 9000–9004 are occupied by Synapse and Hookshot. + +### What Each Deploy Does + +**LXC 151 — hookshot/livekit:** +- `hookshot/*.js` changed → runs `hookshot/deploy.sh` (pushes transform functions to Matrix room state via API, requires `MATRIX_TOKEN` in `/etc/matrix-deploy.env`) +- `systemd/livekit-server.service` changed → copies file, `daemon-reload`, sets `/run/livekit-restart-pending` flag (actual restart deferred — see Livekit Graceful Restart below) + +**LXC 106 — cinny:** +- `cinny/config.json` → copies to `/var/www/html/config.json` +- `cinny/dev-update.sh` → copies to `/usr/local/bin/cinny-dev-update.sh`, `chmod +x` + +**LXC 139 — landing page:** +- `landing/index.html` → copies to `/var/www/matrix-landing/index.html`, `nginx -s reload` + +**LXC 110 — draupnir:** +- `draupnir/production.yaml` → extracts live `accessToken` from existing config, overwrites from repo, restores token via `sed`, restarts `draupnir.service` + +### Installed Components (per LXC) + +- `webhook` binary (Debian package `webhook` v2.8.0) listening on respective port +- `/etc/webhook/hooks.json` — unique HMAC-SHA256 secret per LXC +- `/usr/local/bin/matrix-deploy.sh` — deploy script from this repo +- `/etc/systemd/system/webhook.service` — enabled and running +- `/opt/matrix-config/` — clone of this repo +- `/var/log/matrix-deploy.log` — deploy log + +**LXC 151 additionally:** +- `/etc/matrix-deploy.env` — `MATRIX_TOKEN`, `MATRIX_SERVER`, `MATRIX_ROOM` (not in git) +- `/usr/local/bin/livekit-graceful-restart.sh` +- `/etc/systemd/system/livekit-graceful-restart.service` + `.timer` + +### Livekit Graceful Restart + +Killing livekit-server while a call is active drops everyone. Instead: + +1. Deploy to LXC 151 copies the new `livekit-server.service` and sets a `/run/livekit-restart-pending` flag +2. `livekit-graceful-restart.timer` runs every 5 minutes +3. The timer script counts established TCP connections on port 7881 (`ss -tn state established`) +4. If zero connections → restarts livekit-server and clears the flag +5. If connections exist → logs and exits, retries in 5 minutes + +--- + +## Access Token Rotation + +The `MATRIX_TOKEN` in `/etc/matrix-deploy.env` on LXC 151 is a Jared user token used to push hookshot transforms to Matrix room state (requires power level ≥ 50 in Spam and Stuff). + +The token in `draupnir/production.yaml` in this repo is **intentionally redacted** (`accessToken: REDACTED`). The deploy script on LXC 110 extracts the live token from the running config before overwriting from the repo, then restores it. + +**To rotate the hookshot deploy token (LXC 151):** +1. Generate a new token via Synapse admin API or Cinny → Settings → Security → Manage Sessions +2. SSH to LXC 151 (via `ssh root@10.10.10.4` then `pct enter 151`): `nano /etc/matrix-deploy.env` +3. Replace `MATRIX_TOKEN=` with new token +4. Test: `MATRIX_TOKEN= MATRIX_SERVER=https://matrix.lotusguild.org bash /opt/matrix-config/hookshot/deploy.sh` + +**To rotate the Draupnir token:** +1. Generate new token for `@draupnir:matrix.lotusguild.org` +2. On LXC 110: `nano /opt/draupnir/config/production.yaml` → update `accessToken` +3. `systemctl restart draupnir` +4. Do **not** commit the token to git — the repo version stays redacted --- @@ -98,6 +223,7 @@ Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lot | 9004 | Hookshot metrics | 0.0.0.0 | | 9100 | node_exporter | 0.0.0.0 | | 9101 | matrix-admin exporter | 0.0.0.0 | +| 9500 | webhook (auto-deploy) | 0.0.0.0 | | 6789 | LiveKit metrics | 0.0.0.0 | | 7880 | LiveKit HTTP | 0.0.0.0 | | 7881 | LiveKit RTC TCP | 0.0.0.0 | @@ -123,6 +249,7 @@ Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lot | General | `!wfokQ1-pE896scu_AOcCBA2s3L4qFo-PTBAFTd0WMI0` | public | | Commands | `!ou56mVZQ8ZB7AhDYPmBV5_BR28WMZ4x5zwZkPCqjq1s` | restricted (Space members) | | Memes | `!GK6v5cLEEnowIooQJv5jECfISUjADjt8aKhWv9VbG5U` | restricted (Space members) | +| Voice Room | `!ARbRFSPNp2U0MslWTBGoTT3gbmJJ25dPRL6enQntvPo` | restricted (Space members) | | Management | `!mEvR5fe3jMmzwd-FwNygD72OY_yu8H3UP_N-57oK7MI` | invite | | Cool Kids | `!R7DT3QZHG9P8QQvX6zsZYxjkKgmUucxDz_n31qNrC94` | invite | | Spam and Stuff | `!GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg` | invite, **no E2EE** (hookshot) | @@ -145,7 +272,7 @@ Webhook URL format: `https://matrix.lotusguild.org/webhook/` | Service | Webhook UUID | Notes | |---------|-------------|-------| | Grafana | `df4a1302-2d62-4a01-b858-fb56f4d3781a` | Unified alerting contact point | -| Proxmox | `9b3eafe5-7689-4011-addd-c466e524661d` | Notification system (8.1+) | +| Proxmox | `9b3eafe5-7689-4011-addd-c466e524661d` | Notification system (8.1+), Discord embed format | | Sonarr | `aeffc311-0686-42cb-9eeb-6757140c072e` | All event types | | Radarr | `34913454-c1ac-4cda-82ea-924d4a9e60eb` | All event types | | Readarr | `e57ab4f3-56e6-4dc4-8b30-2f4fd4bbeb0b` | All event types | @@ -161,6 +288,18 @@ Webhook URL format: `https://matrix.lotusguild.org/webhook/` - JS transformation functions use hookshot v2 API: `result = { version: "v2", plain, html, msgtype }` - The `result` variable must be assigned without `var`/`let`/`const` (QuickJS IIFE sandbox) - NPM proxies `https://matrix.lotusguild.org/webhook/*` → `http://10.10.10.29:9003` +- Proxmox sends Discord embed format: `data.embeds[0].{title,description,fields}` — NOT flat fields +- Transform functions are stored as Matrix room state (`uk.half-shot.matrix-hookshot.generic.hook`) and deployed via `hookshot/deploy.sh` + +**Deploying hookshot transforms manually:** +```bash +# On LXC 151 or from any machine with access +export MATRIX_TOKEN= +export MATRIX_SERVER=https://matrix.lotusguild.org +export MATRIX_ROOM='!GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg' +bash /opt/matrix-config/hookshot/deploy.sh # deploy all +bash /opt/matrix-config/hookshot/deploy.sh proxmox.js # deploy one +``` --- @@ -183,12 +322,44 @@ Draupnir runs on LXC 110, manages moderation across all 9 protected rooms via `# --- +## Cinny Dev Branch (chat.lotusguild.org) + +`chat.lotusguild.org` tracks the Cinny `dev` branch to test the latest beta features. + +**Nightly build process (`cinny-dev-update.sh`):** +1. `git fetch origin dev` — checks for new commits; exits early if nothing changed +2. Builds in `/opt/cinny-dev/` using Node 24 with `NODE_OPTIONS=--max_old_space_size=896` +3. Validates `dist/index.html` exists before touching the live web root +4. Copies `dist/` to `/var/www/html/`, restores `config.json` from `/opt/cinny-dev/.cinny-config.json` +5. Runs at 3:00am daily via `/etc/cron.d/cinny-dev-update` + +**Manual rebuild:** +```bash +# On LXC 106 +/usr/local/bin/cinny-dev-update.sh +``` + +**Why 2GB RAM:** Vite's build process OOM-killed at 1GB. 896MB Node heap + OS overhead requires at least 1.5GB; 2GB gives headroom. + +--- + ## Known Issues +### LiveKit Port Conflict After HA Migration + +LXC 151 can migrate between Proxmox nodes via HA. After migration, the old livekit-server process on the source node can leave a stale entry holding port 7881 on the destination. Fixed in `livekit-server.service` via: + +```ini +ExecStartPre=-/bin/bash -c 'pkill -x livekit-server; sleep 1' +KillMode=control-group +``` + ### coturn TLS Reset Errors + Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal — clients probe TURN and drop once they establish a direct P2P path. ### BBR Congestion Control + `net.ipv4.tcp_congestion_control = bbr` must be set on the Proxmox host, not inside an unprivileged LXC. All other sysctl tuning (TCP/UDP buffers, fin_timeout) is applied inside LXC 151. --- @@ -207,7 +378,8 @@ Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal - [x] Default room version v12, all rooms upgraded - [x] Landing page with client recommendations - [x] Synapse metrics endpoint (port 9000, Prometheus-compatible) -- [x] Custom Cinny client LXC 106 — Cinny 4.10.5, `add-joined-call-controls` branch, weekly auto-update cron +- [x] Cinny `dev` branch — nightly auto-build, tracks latest beta features +- [x] Auto-deployment via Gitea webhooks (all 4 LXCs) - [ ] Push notifications gateway (Sygnal) — needs Apple/Google developer credentials - [ ] Cinny custom branding — Lotus Guild theme (colours, title, favicon, PWA name) @@ -231,7 +403,7 @@ Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal ### Webhooks & Integrations - [x] matrix-hookshot 7.3.2 — 11 active webhook services -- [x] Per-service JS transformation functions +- [x] Per-service JS transformation functions (stored in git, auto-deployed) - [x] Per-service virtual user avatars - [x] NPM reverse proxy for `/webhook` path @@ -239,6 +411,7 @@ Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal - [x] The Lotus Guild space with all core rooms - [x] Correct power levels and join rules per room - [x] Custom room avatars +- [x] Voice room visible to space members (`suggested: true`) ### Hardening - [x] Rate limiting @@ -254,6 +427,7 @@ Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal - [x] `suppress_key_server_warning: true` - [x] Automated database + media backups - [x] Federation bad-actor blocking via Draupnir ban lists (17,000+ entries) +- [x] Webhook HMAC-SHA256 validation on all auto-deploy endpoints ### Monitoring - [x] Grafana dashboard — `dashboard.lotusguild.org/d/matrix-synapse-dashboard` (140+ panels) @@ -314,55 +488,10 @@ Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal --- -## Bot Checklist - -### Core -- [x] matrix-nio async client with E2EE -- [x] Device trust (auto-trust all devices) -- [x] Graceful shutdown (SIGTERM/SIGINT) -- [x] Initial sync token (ignores old messages on startup) -- [x] Auto-accept room invites -- [x] Deployed as systemd service (`matrixbot.service`) on LXC 151 - -### Commands -- [x] `!help` — list commands -- [x] `!ping` — latency check -- [x] `!8ball ` — magic 8-ball -- [x] `!fortune` — fortune cookie -- [x] `!flip` — coin flip -- [x] `!roll ` — dice roller -- [x] `!random ` — random number -- [x] `!rps ` — rock paper scissors -- [x] `!poll ` — poll with reactions -- [x] `!trivia` — trivia game (reactions, 30s reveal) -- [x] `!champion [lane]` — random LoL champion -- [x] `!agent [role]` — random Valorant agent -- [x] `!wordle` — full Wordle game (daily, hard mode, stats, share) -- [x] `!minecraft ` — RCON whitelist add -- [x] `!ask ` — Ollama LLM (lotusllm, 2min cooldown) -- [x] `!health` — bot uptime + service status - -### Welcome System -- [x] Watches Space joins and DMs new members automatically -- [x] React-to-join: react with ✅ in DM → bot invites to General, Commands, Memes -- [x] Welcome event ID persisted to `welcome_state.json` - -### Wordle -- [x] Daily puzzles with two-pass letter evaluation -- [x] Hard mode with constraint validation -- [x] Stats persistence (`wordle_stats.json`) -- [x] Cinny-compatible rendering (inline `` tiles) -- [x] DM-based gameplay, `!wordle share` posts result to public room -- [x] Virtual keyboard display - ---- - ## Tech Stack | Component | Technology | Version | |-----------|-----------|---------| -| Bot language | Python 3 | 3.x | -| Bot library | matrix-nio (E2EE) | latest | | Homeserver | Synapse | 1.149.0 | | Database | PostgreSQL | 17.9 | | TURN | coturn | latest | @@ -372,22 +501,8 @@ Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs. Normal | SSO | Authelia (OIDC) + LLDAP | — | | Webhook bridge | matrix-hookshot | 7.3.2 | | Reverse proxy | Nginx Proxy Manager | — | -| Web client | Cinny (`add-joined-call-controls` branch) | 4.10.5 | +| Web client | Cinny (`dev` branch, nightly build) | dev | +| Auto-deploy | adnanh/webhook | 2.8.0 | +| Bot language | Python 3 | 3.x | +| Bot library | matrix-nio (E2EE) | latest | | Bot dependencies | matrix-nio[e2ee], aiohttp, python-dotenv, mcrcon | — | - -## Bot Files - -``` -matrixBot/ -├── bot.py # Entry point, client setup, event loop -├── callbacks.py # Message + reaction event handlers -├── commands.py # All command implementations -├── config.py # Environment config + validation -├── utils.py # send_text, send_html, send_reaction, get_or_create_dm -├── welcome.py # Welcome message + react-to-join logic -├── wordle.py # Full Wordle game engine -├── wordlist_answers.py # Wordle answer word list -├── wordlist_valid.py # Wordle valid guess word list -├── .env.example # Environment variable template -└── requirements.txt # Python dependencies -```