# Lotus Matrix Bot & Server Roadmap Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lotusguild.org`). **Repo**: https://code.lotusguild.org/LotusGuild/matrixBot ## Status: Phase 5 — Optimization, Voice Quality & Custom Client --- ## Priority Order 1. ~~PostgreSQL migration~~ 2. ~~TURN server~~ 3. ~~Room structure + space setup~~ 4. ~~Matrix bot (core + commands)~~ 5. ~~LiveKit / Element Call~~ 6. ~~SSO / OIDC (Authelia)~~ 7. ~~Webhook integrations (hookshot)~~ 8. ~~Voice stability & quality tuning~~ 9. ~~Custom Cinny client (chat.lotusguild.org)~~ 10. Custom emoji packs (partially finished) 11. Cinny custom branding (Lotus Guild theme) 12. Draupnir moderation bot 13. Push notifications (Sygnal) --- ## Infrastructure | Service | IP | LXC | RAM | vCPUs | Disk | Versions | |---------|----|-----|-----|-------|------|----------| | Synapse | 10.10.10.29 | 151 | 8GB | 4 (Ryzen 9 7900) | 50GB (21% used) | Synapse 1.148.0, LiveKit 1.9.11, hookshot 7.3.2, coturn latest | | PostgreSQL 17 | 10.10.10.44 | 109 | 6GB | 3 (Ryzen 9 7900) | 30GB (5% used) | PostgreSQL 17.9 | | Cinny Web | 10.10.10.6 | 106 | 256MB runtime | 1 | 8GB (27% used) | Debian 13, nginx, Node 24, Cinny 4.10.5 | | NPM | 10.10.10.27 | 139 | — | — | — | Nginx Proxy Manager | | Authelia | 10.10.10.36 | 167 | — | — | — | SSO/OIDC provider | | LLDAP | 10.10.10.39 | 147 | — | — | — | LDAP user directory | | Uptime Kuma | 10.10.10.25 | 101 | — | — | — | Uptime monitoring (micro1 node) | > **Note:** PostgreSQL container IP is `10.10.10.44`, not `.2` — update any stale references. **Key paths on Synapse/matrix LXC (151):** - Synapse config: `/etc/matrix-synapse/homeserver.yaml` - Synapse conf.d: `/etc/matrix-synapse/conf.d/` (metrics.yaml, report_stats.yaml, server_name.yaml) - coturn config: `/etc/turnserver.conf` - LiveKit config: `/etc/livekit/config.yaml` - LiveKit service: `livekit-server.service` - lk-jwt-service: `lk-jwt-service.service` (binds `:8070`, serves JWT tokens for MatrixRTC) - Hookshot: `/opt/hookshot/`, service: `matrix-hookshot.service` - Hookshot config: `/opt/hookshot/config.yml` - Hookshot registration: `/etc/matrix-synapse/hookshot-registration.yaml` - Landing page: `/var/www/matrix-landing/index.html` (on NPM LXC 139) - Bot: `/opt/matrixbot/`, service: `matrixbot.service` **Key paths on PostgreSQL LXC (109):** - PostgreSQL config: `/etc/postgresql/17/main/postgresql.conf` - PostgreSQL conf.d: `/etc/postgresql/17/main/conf.d/` - HBA config: `/etc/postgresql/17/main/pg_hba.conf` - Data directory: `/var/lib/postgresql/17/main` **Running services on LXC 151:** | Service | PID status | Memory | Notes | |---------|-----------|--------|-------| | matrix-synapse | active, 2+ days | 231MB peak 312MB | No workers, single process | | livekit-server | active, 2+ days | 22MB peak 58MB | v1.9.11, node IP = 162.192.14.139 | | lk-jwt-service | active, 2+ days | 2.7MB | Binds :8070, LIVEKIT_URL=wss://matrix.lotusguild.org | | matrix-hookshot | active, 2+ days | 76MB peak 172MB | Actively receiving webhooks | | matrixbot | active, 2+ days | 26MB peak 59MB | Some E2EE key errors (see known issues) | | coturn | active, 2+ days | 13MB | Periodic TCP reset errors (normal) | **Currently Open Port forwarding (router → 10.10.10.29):** - TCP+UDP 3478 (TURN/STUN signaling) - TCP+UDP 5349 (TURNS/TLS) - TCP 7881 (LiveKit ICE TCP fallback) - TCP+UDP 49152-65535 (TURN relay range) - LiveKit WebRTC media: 50100-50500 (subset of above, only 400 ports — see improvements) **Internal port map (LXC 151):** | Port | Service | Bind | |------|---------|------| | 8008 | Synapse HTTP | 0.0.0.0 + ::1 | | 9000 | Synapse metrics (Prometheus) | 0.0.0.0 | | 9001 | Hookshot widgets + metrics | 127.0.0.1 | | 9002 | Hookshot bridge | 127.0.0.1 | | 9003 | Hookshot webhooks | 0.0.0.0 | | 7880 | LiveKit HTTP | 0.0.0.0 | | 7881 | LiveKit RTC TCP | 0.0.0.0 | | 8070 | lk-jwt-service | 0.0.0.0 | | 8080 | synapse-admin (nginx) | 0.0.0.0 | | 3478 | coturn STUN/TURN | 0.0.0.0 | | 5349 | coturn TURNS/TLS | 0.0.0.0 | --- ## Rooms (all v12) | Room | Room ID | Join Rule | |------|---------|-----------| | The Lotus Guild (Space) | `!-1ZBnAH-JiCOV8MGSKN77zDGTuI3pgSdy8Unu_DrDyc` | public | | General | `!wfokQ1-pE896scu_AOcCBA2s3L4qFo-PTBAFTd0WMI0` | public | | Commands | `!ou56mVZQ8ZB7AhDYPmBV5_BR28WMZ4x5zwZkPCqjq1s` | restricted (Space members) | | Memes | `!GK6v5cLEEnowIooQJv5jECfISUjADjt8aKhWv9VbG5U` | restricted (Space members) | | Management | `!mEvR5fe3jMmzwd-FwNygD72OY_yu8H3UP_N-57oK7MI` | invite | | Cool Kids | `!R7DT3QZHG9P8QQvX6zsZYxjkKgmUucxDz_n31qNrC94` | invite | | Spam and Stuff | `!GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg` | invite, **no E2EE** (hookshot) | **Power level roles (Cinny tags):** - 100: Owner (jared) - 50: The Nerdy Council (enhuynh, lonely) - 48: Panel of Geeks - 35: Cool Kids - 0: Member --- ## Webhook Integrations (matrix-hookshot 7.3.2) Generic webhooks bridged into **Spam and Stuff** via [matrix-hookshot](https://github.com/matrix-org/matrix-hookshot). Each service gets its own virtual user (`@hookshot_`) with a unique avatar. Webhook URL format: `https://matrix.lotusguild.org/webhook/` | Service | Webhook UUID | Notes | |---------|-------------|-------| | Grafana | `df4a1302-2d62-4a01-b858-fb56f4d3781a` | Unified alerting contact point | | Proxmox | `9b3eafe5-7689-4011-addd-c466e524661d` | Notification system (8.1+) | | Sonarr | `aeffc311-0686-42cb-9eeb-6757140c072e` | All event types | | Radarr | `34913454-c1ac-4cda-82ea-924d4a9e60eb` | All event types | | Readarr | `e57ab4f3-56e6-4dc4-8b30-2f4fd4bbeb0b` | All event types | | Lidarr | `66ac6fdd-69f6-4f47-bb00-b7f6d84d7c1c` | All event types | | Uptime Kuma | `1a02e890-bb25-42f1-99fe-bba6a19f1811` | Status change notifications | | Seerr | `555185af-90a1-42ff-aed5-c344e11955cf` | Request/approval events | | Owncast (Livestream) | `9993e911-c68b-4271-a178-c2d65ca88499` | STREAM_STARTED / STREAM_STOPPED (hookshot display name: "Livestream") | | Bazarr | `470fb267-3436-4dd3-a70c-e6e8db1721be` | Subtitle events (Apprise JSON notifier) | | Tinker-Tickets | `6e306faf-8eea-4ba5-83ef-bf8f421f929e` | Custom transformation code | **Hookshot notes:** - Spam and Stuff is intentionally **unencrypted** — hookshot bridges cannot join E2EE rooms - Webhook tokens stored in Synapse PostgreSQL `room_account_data` for `@hookshot` - JS transformation functions use hookshot v2 API: set `result = { version: "v2", plain, html, msgtype }` - The `result` variable must be assigned without `var`/`let`/`const` (needs implicit global scope in the QuickJS IIFE sandbox) - NPM proxies `https://matrix.lotusguild.org/webhook/*` → `http://10.10.10.29:9003` - Virtual user avatars: set via appservice token (`as_token` in hookshot-registration.yaml) impersonating each user - Hookshot bridge port (9002) binds `127.0.0.1` only; webhook ingest (9003) binds `0.0.0.0` (NPM-proxied) --- ## Known Issues ### coturn TLS Reset Errors Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs from external IPs. This is normal — clients probe TURN and drop the connection once they establish a direct P2P path. Not an issue. ### BBR Congestion Control — Host-Level Only `net.ipv4.tcp_congestion_control = bbr` and `net.core.default_qdisc = fq` cannot be set from inside an unprivileged LXC container — they affect the host kernel's network namespace. These must be applied on the Proxmox host itself to take effect for all containers. All other sysctl tuning (TCP/UDP buffers, fin_timeout) applied successfully inside LXC 151. --- ## Optimizations & Improvements ### 1. LiveKit / Voice Quality ✅ Applied Noise suppression and volume normalization are **client-side only** (browser/Element X handles this via WebRTC's built-in audio processing). The server cannot enforce these. Applied server-side improvements: - **ICE port range expanded:** 50100-50500 (400 ports) → **50000-51000 (1001 ports)** = ~500 concurrent WebRTC streams - **TURN TTL reduced:** 86400s (24h) → **3600s (1h)** — stale allocations expire faster - **Room defaults added:** `empty_timeout: 300`, `departure_timeout: 20`, `max_participants: 50` **Client-side audio advice for users:** - **Element Web/Desktop:** Settings → Voice & Video → enable "Noise Suppression" and "Echo Cancellation" - **Element X (mobile):** automatic via WebRTC stack - **Cinny (chat.lotusguild.org):** voice via embedded Element Call widget — browser WebRTC noise suppression is active automatically ### 2. PostgreSQL Tuning (LXC 109) ✅ Applied `/etc/postgresql/17/main/conf.d/synapse_tuning.conf` written and active. `pg_stat_statements` extension created in the `synapse` database. Config applied: ```ini # Memory — shared_buffers = 25% RAM, effective_cache_size = 75% RAM shared_buffers = 1500MB effective_cache_size = 4500MB work_mem = 32MB # Per sort/hash operation (safe at low connection count) maintenance_work_mem = 256MB # VACUUM, CREATE INDEX wal_buffers = 64MB # WAL write buffer # Checkpointing checkpoint_completion_target = 0.9 # Spread checkpoint I/O (default 0.5 is aggressive) max_wal_size = 2GB # Storage (Ceph RBD block device = SSD-equivalent random I/O) random_page_cost = 1.1 # Default 4.0 assumes spinning disk effective_io_concurrency = 200 # For SSDs/Ceph # Parallel queries (3 vCPUs) max_worker_processes = 3 max_parallel_workers_per_gather = 1 max_parallel_workers = 2 # Monitoring shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.track = all ``` Restarted `postgresql@17-main`. Expected impact: Synapse query latency drops as the DB grows — the entire current 120MB database fits in shared_buffers. ### 3. PostgreSQL Security — pg_hba.conf (LXC 109) ✅ Applied Removed the two open rules (`0.0.0.0/24 md5` and `0.0.0.0/0 md5`). Remote access is now restricted to Synapse LXC only: ``` host synapse synapse_user 10.10.10.29/32 scram-sha-256 ``` All other remote connections are rejected. Local Unix socket and loopback remain functional for admin access. ### 4. Synapse Cache Tuning (LXC 151) ✅ Applied `event_cache_size` bumped 15K → 30K. `_get_state_group_for_events: 3.0` added to `per_cache_factors` (heavily hit during E2EE key sharing). Synapse restarted cleanly. ```yaml event_cache_size: 30K caches: global_factor: 2.0 per_cache_factors: get_users_in_room: 3.0 get_current_state_ids: 3.0 _get_state_group_for_events: 3.0 ``` ### 5. Network / sysctl Tuning (LXC 151) ✅ Applied `/etc/sysctl.d/99-matrix-tuning.conf` written and active. TCP/UDP buffers aligned and fin_timeout reduced. ```ini # Align TCP buffers with core maximums net.ipv4.tcp_rmem = 4096 131072 26214400 net.ipv4.tcp_wmem = 4096 65536 26214400 # UDP buffer sizing for WebRTC media streams net.core.rmem_max = 26214400 net.core.wmem_max = 26214400 net.ipv4.udp_rmem_min = 65536 net.ipv4.udp_wmem_min = 65536 # Reduce latency for short-lived TURN connections net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_intvl = 30 ``` > **BBR note:** `tcp_congestion_control = bbr` and `default_qdisc = fq` require host-level sysctl — cannot be set inside an unprivileged LXC. Apply on the Proxmox host to benefit all containers: > ```bash > echo "net.ipv4.tcp_congestion_control = bbr" >> /etc/sysctl.d/99-bbr.conf > echo "net.core.default_qdisc = fq" >> /etc/sysctl.d/99-bbr.conf > sysctl --system > ``` ### 6. Synapse Federation Hardening The server is effectively a private server for friends. Restricting federation prevents abuse and reduces load. Add to `homeserver.yaml`: ```yaml # Allow federation only with specific trusted servers (or disable entirely) federation_domain_whitelist: - matrix.org # Keep for bridging if needed - matrix.lotusguild.org # OR to go fully closed (recommended for friends-only): # federation_enabled: false ``` ### 7. Bot E2EE Key Fix (LXC 151) ✅ Applied `nio_store/` cleared and bot restarted cleanly. Megolm session errors resolved. --- ## Custom Cinny Client (chat.lotusguild.org) Cinny v4 is the preferred client — clean UI, Cinny-style rendering already used by the bot's Wordle tiles. We build from source to get voice support and full branding control. ### Why Cinny over Element Web - Much cleaner aesthetics, already the de-facto client for guild members - Element Web voice suppression (Krisp) is only on `app.element.io` — a custom build loses it - Cinny `add-joined-call-controls` branch uses `@element-hq/element-call-embedded` which talks to the **existing** MatrixRTC → lk-jwt-service → LiveKit stack with zero new infrastructure - Static build (nginx serving ~5MB of files) — nearly zero runtime resource cost ### Voice support status (as of March 2026) The official `add-joined-call-controls` branch (maintained by `ajbura`, last commit March 8 2026) embeds Element Call as a widget via `@element-hq/element-call-embedded: 0.16.3`. This uses the same MatrixRTC protocol that lk-jwt-service already handles. Two direct LiveKit integration PRs (#2703, #2704) were proposed but closed without merge — so the embedded Element Call approach is the official path. Since lk-jwt-service is already running on LXC 151 and configured for `wss://matrix.lotusguild.org`, voice calls will work out of the box once the Cinny build is deployed. ### LXC Setup **Create the LXC** (run on the host): ```bash # ProxmoxVE Debian 13 community script bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/ct/debian.sh)" ``` Recommended settings: 2GB RAM, 1-2 vCPUs, 20GB disk, Debian 13, static IP on VLAN 10 (e.g. `10.10.10.XX`). **Inside the new LXC:** ```bash # Install nginx + git + nvm dependencies apt update && apt install -y nginx git curl # Install Node.js 24 via nvm curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash source ~/.bashrc nvm install 24 nvm use 24 # Clone Cinny and switch to voice-support branch git clone https://github.com/cinnyapp/cinny.git /opt/cinny cd /opt/cinny git checkout add-joined-call-controls # Install dependencies and build npm ci NODE_OPTIONS=--max_old_space_size=4096 npm run build # Output: /opt/cinny/dist/ # Deploy to nginx root cp -r /opt/cinny/dist/* /var/www/html/ ``` **Configure Cinny** — edit `/var/www/html/config.json`: ```json { "defaultHomeserver": 0, "homeserverList": ["matrix.lotusguild.org"], "allowCustomHomeservers": false, "featuredCommunities": { "openAsDefault": false, "spaces": [], "rooms": [], "servers": [] }, "hashRouter": { "enabled": false, "basename": "/" } } ``` **Nginx config** — `/etc/nginx/sites-available/cinny` (matches the official `docker-nginx.conf`): ```nginx server { listen 80; listen [::]:80; server_name chat.lotusguild.org; root /var/www/html; index index.html; location / { rewrite ^/config.json$ /config.json break; rewrite ^/manifest.json$ /manifest.json break; rewrite ^/sw.js$ /sw.js break; rewrite ^/pdf.worker.min.js$ /pdf.worker.min.js break; rewrite ^/public/(.*)$ /public/$1 break; rewrite ^/assets/(.*)$ /assets/$1 break; rewrite ^(.+)$ /index.html break; } } ``` ```bash ln -s /etc/nginx/sites-available/cinny /etc/nginx/sites-enabled/ nginx -t && systemctl reload nginx ``` Then in **NPM**: add a proxy host for `chat.lotusguild.org` → `http://10.10.10.XX:80` with SSL. ### Rebuilding after updates ```bash cd /opt/cinny git pull npm ci NODE_OPTIONS=--max_old_space_size=4096 npm run build cp -r dist/* /var/www/html/ # Preserve your config.json — it gets overwritten by the copy above, so: # Option: keep config.json outside dist and symlink/copy it in after each build ``` ### Key paths (Cinny LXC 106 — 10.10.10.6) - Source: `/opt/cinny/` (branch: `add-joined-call-controls`) - Built files: `/var/www/html/` - Cinny config: `/var/www/html/config.json` - Config backup (survives rebuilds): `/opt/cinny-config.json` - Nginx site config: `/etc/nginx/sites-available/cinny` - Rebuild script: `/usr/local/bin/cinny-update` --- ## Server Checklist ### Quality of Life - [x] Migrate from SQLite to PostgreSQL - [x] TURN/STUN server (coturn) for reliable voice/video - [x] URL previews - [x] Upload size limit 200MB - [x] Full-text message search (PostgreSQL backend) - [x] Media retention policy (remote: 1yr, local: 3yr) - [x] Sliding sync (native Synapse) - [x] LiveKit for Element Call video rooms - [x] Default room version v12, all rooms upgraded - [x] Landing page with client recommendations (Cinny, Commet, Element, Element X mobile) - [x] Synapse metrics endpoint (port 9000, Prometheus-compatible) - [ ] Push notifications gateway (Sygnal) for mobile clients - [x] LiveKit port range expanded to 50000-51000 for voice call capacity - [x] Custom Cinny client LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 built from `add-joined-call-controls`, nginx serving, HA enabled - [x] NPM proxy entry for `chat.lotusguild.org` → 10.10.10.6:80, SSL via Cloudflare DNS challenge, HTTPS forced, HTTP/2 + HSTS enabled - [x] Cinny weekly auto-update cron (`/etc/cron.d/cinny-update`, Sundays 3am, logs to `/var/log/cinny-update.log`) - [ ] Cinny custom branding — Lotus Guild theme (colors, title, favicon, PWA name) ### Performance Tuning - [x] PostgreSQL `shared_buffers` → 1500MB, `effective_cache_size`, `work_mem`, checkpoint tuning applied - [x] PostgreSQL `pg_stat_statements` extension installed in `synapse` database - [x] PostgreSQL autovacuum tuned per-table (`state_groups_state`, `events`, `receipts_linearized`, `receipts_graph`, `device_lists_stream`, `presence_stream`), `autovacuum_max_workers` → 5 - [x] Synapse `event_cache_size` → 30K, `_get_state_group_for_events` cache factor added - [x] sysctl TCP/UDP buffer alignment applied to LXC 151 (`/etc/sysctl.d/99-matrix-tuning.conf`) - [x] LiveKit room `empty_timeout: 300`, `departure_timeout: 20`, `max_participants: 50` - [x] LiveKit ICE port range expanded to 50000-51000 - [x] LiveKit TURN TTL reduced from 24h to 1h - [x] LiveKit VP9/AV1 codecs enabled (`video_codecs: [VP8, H264, VP9, AV1]`) - [ ] BBR congestion control — must be applied on Proxmox host, not inside LXC (see Known Issues) ### Auth & SSO - [x] Token-based registration - [x] SSO/OIDC via Authelia - [x] `allow_existing_users: true` for linking accounts to SSO - [x] Password auth alongside SSO ### Webhooks & Integrations - [x] matrix-hookshot 7.3.2 installed and running - [x] Generic webhook bridge for 11 active services (Grafana, Proxmox, Sonarr, Radarr, Readarr, Lidarr, Uptime Kuma, Seerr, Owncast/Livestream, Bazarr, Tinker-Tickets) - [x] Per-service JS transformation functions — all rewritten to handle full event payloads (all event types, health alerts, app updates, release groups, download clients) - [x] Per-service virtual user avatars - [x] NPM reverse proxy for `/webhook` path - [x] Tinker Tickets custom transformation code ### Room Structure - [x] The Lotus Guild space - [x] All core rooms with correct power levels and join rules - [x] Spam and Stuff room for service notifications (hookshot) - [x] Custom room avatars ### Hardening - [x] Rate limiting - [x] E2EE on all rooms (except Spam and Stuff — intentional for hookshot) - [x] coturn internal peer deny rules (blocks relay to RFC1918 except allowed subnet) - [x] `pg_hba.conf` locked down — remote access restricted to Synapse LXC (10.10.10.29) only - [x] Federation enabled with key verification (open for invite-only growth to friends/family/coworkers) - [x] fail2ban on Synapse login endpoint (5 retries / 24h ban, LXC 151) - [x] Synapse metrics port 9000 restricted to `127.0.0.1` + `10.10.10.29` (was `0.0.0.0`) - [x] coturn cert auto-renewal — daily sync cron on compute-storage-01 copies NPM cert → coturn - [x] `/.well-known/matrix/client` and `/server` live on lotusguild.org (NPM advanced config) - [x] `suppress_key_server_warning: true` in homeserver.yaml - [ ] Federation allow/deny lists for known bad actors - [ ] Regular Synapse updates - [x] Automated database + media backups ### Monitoring - [x] Synapse metrics endpoint (port 9000, Prometheus-compatible) - [x] Uptime Kuma monitors added: Synapse HTTP, LiveKit TCP, PostgreSQL TCP, Cinny Web, coturn TCP 3478, lk-jwt-service, Hookshot - [ ] Uptime Kuma: coturn UDP STUN monitoring (requires push/heartbeat — no native UDP type in Kuma) - [ ] Grafana dashboard for Synapse Prometheus metrics — Grafana at 10.10.10.49 (LXC 107), Prometheus scraping 10.10.10.29:9000 confirmed. Import dashboard ID `18618` from grafana.com ### Admin - [x] Synapse admin API dashboard (synapse-admin at http://10.10.10.29:8080) - [x] Power levels per room - [ ] Draupnir moderation bot (new LXC or alongside existing bot) - [ ] Cinny custom branding (Lotus Guild theme — colors, title, favicon, PWA name) --- ## Improvement Audit (March 2026) Comprehensive audit of the current infrastructure against official documentation and security best practices. Applied March 9 2026. ### Priority Summary | Issue | Severity | Status | |-------|----------|--------| | coturn TLS cert expires May 12 — no auto-renewal | **CRITICAL** | ✅ Fixed — daily sync cron on compute-storage-01 copies NPM-renewed cert to coturn | | Synapse metrics port 9000 bound to `0.0.0.0` | **HIGH** | ✅ Fixed — now binds `127.0.0.1` + `10.10.10.29` (Prometheus still works, internet blocked) | | `/.well-known/matrix/client` returns 404 | MEDIUM | ✅ Fixed — NPM lotusguild.org proxy host updated, live at `https://lotusguild.org/.well-known/matrix/client` | | `suppress_key_server_warning` not set | MEDIUM | ✅ Fixed — added to homeserver.yaml | | No fail2ban on `/_matrix/client/.*/login` | MEDIUM | ✅ Fixed — fail2ban installed, matrix-synapse jail active (5 retries / 24h ban) | | No media purge cron (retention policy set but never triggers) | MEDIUM | ✅ N/A — `media_retention` block already in homeserver.yaml; Synapse runs the purge internally on schedule | | PostgreSQL autovacuum not tuned per-table | LOW | ✅ Fixed — all 5 high-churn tables tuned, `autovacuum_max_workers` → 5 | | Hookshot metrics scrape unconfirmed | LOW | ⚠️ Port 9001 responds but `/metrics` returns 404 — hookshot bug or path mismatch; low impact | | LiveKit VP9/AV1 codec support | LOW | ✅ Applied — `video_codecs: [VP8, H264, VP9, AV1]` added to livekit config | | Federation allow/deny list not configured | LOW | Pending — Mjolnir/Draupnir on roadmap | | Sygnal push notifications not deployed | INFO | Deferred | --- ### 1. coturn Cert Auto-Renewal ✅ The coturn cert is managed by NPM (cert ID 91, stored at `/etc/letsencrypt/live/npm-91/` on LXC 139). NPM renews it automatically. A sync script on `compute-storage-01` detects when NPM renews and copies it to coturn. **Deployed:** `/usr/local/bin/coturn-cert-sync.sh` on compute-storage-01, cron `/etc/cron.d/coturn-cert-sync` (runs 03:30 daily). Script compares cert expiry dates between LXC 139 and LXC 151. If they differ (NPM renewed), it copies `fullchain.pem` + `privkey.pem` and restarts coturn. **Additional coturn hardening (while you're in there):** ``` # /etc/turnserver.conf stale_nonce=600 # Nonce expires 600s (prevents replay attacks) user-quota=100 # Max concurrent allocations per user total-quota=1000 # Total allocations on server max-bps=1000000 # 1 Mbps per TURN session cipher-list="ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-CHACHA20-POLY1305" ``` --- ### 2. Synapse Configuration Gaps **a) Metrics port exposed to 0.0.0.0 (HIGH)** Port 9000 currently binds `0.0.0.0` — exposes internal state, user counts, DB query times externally. Fix in `homeserver.yaml`: ```yaml metrics_flags: some_legacy_unrestricted_resources: false listeners: - port: 9000 bind_addresses: ['127.0.0.1'] # NOT 0.0.0.0 type: metrics resources: [] ``` Grafana at `10.10.10.49` scrapes port 9000 from within the VLAN so this is safe to lock down. **b) suppress_key_server_warning (MEDIUM)** Fills Synapse logs with noise on every restart. One line in `homeserver.yaml`: ```yaml suppress_key_server_warning: true ``` **c) Database connection pooling (LOW — track for growth)** Current defaults (`cp_min: 5`, `cp_max: 10`) are fine for single-process. When adding workers, increase `cp_max` to 20–30 per worker group. Add explicitly to `homeserver.yaml` to make it visible: ```yaml database: name: psycopg2 args: cp_min: 5 cp_max: 10 ``` --- ### 3. Matrix Well-Known 404 `/.well-known/matrix/client` returns 404. This breaks client autodiscovery — users who type `lotusguild.org` instead of `matrix.lotusguild.org` get an error. Fix in NPM with a custom location block on the `lotusguild.org` proxy host: ```nginx location /.well-known/matrix/client { add_header Content-Type application/json; add_header Access-Control-Allow-Origin *; return 200 '{"m.homeserver":{"base_url":"https://matrix.lotusguild.org"}}'; } location /.well-known/matrix/server { add_header Content-Type application/json; add_header Access-Control-Allow-Origin *; return 200 '{"m.server":"matrix.lotusguild.org:443"}'; } ``` --- ### 4. fail2ban for Synapse Login No brute-force protection on `/_matrix/client/*/login`. Easy win. **`/etc/fail2ban/jail.d/matrix-synapse.conf`:** ```ini [matrix-synapse] enabled = true port = http,https filter = matrix-synapse logpath = /var/log/matrix-synapse/homeserver.log backend = systemd journalmatch = _SYSTEMD_UNIT=matrix-synapse.service + PRIORITY=3 findtime = 600 maxretry = 5 bantime = 86400 ``` **`/etc/fail2ban/filter.d/matrix-synapse.conf`:** ```ini [Definition] failregex = ^.*Failed \(password\|SAML\) login attempt for user .* from .*$ ^.*"POST /.*login.*" 401.*$ ignoreregex = ^.*"GET /sync.*".*$ ``` --- ### 5. Synapse Media Purge Cron Retention policy is configured (remote 1yr, local 3yr) but nothing actually triggers the purge — media accumulates silently. The Synapse admin API purge endpoint must be called explicitly. **`/usr/local/bin/purge-synapse-media.sh`** (create on LXC 151): ```bash #!/bin/bash ADMIN_TOKEN="syt_your_admin_token" # Purge remote media (cached from other homeservers) older than 90 days CUTOFF_TS=$(($(date +%s000) - 7776000000)) curl -X POST \ "http://localhost:8008/_synapse/admin/v1/purge_media_cache?before_ts=$CUTOFF_TS" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -s -o /dev/null echo "$(date): Synapse remote media purge completed" >> /var/log/synapse-purge.log ``` ```bash chmod +x /usr/local/bin/purge-synapse-media.sh echo "0 4 * * * root /usr/local/bin/purge-synapse-media.sh" > /etc/cron.d/synapse-purge ``` --- ### 6. PostgreSQL Autovacuum Per-Table Tuning The high-churn Synapse tables (`state_groups_state`, `events`, `receipts`) are not tuned for aggressive autovacuum. As the DB grows, bloat accumulates and queries slow down. Run on LXC 109 (PostgreSQL): ```sql -- state_groups_state: biggest bloat source ALTER TABLE state_groups_state SET ( autovacuum_vacuum_scale_factor = 0.01, autovacuum_analyze_scale_factor = 0.005, autovacuum_vacuum_cost_delay = 5, autovacuum_naptime = 30 ); -- events: second priority ALTER TABLE events SET ( autovacuum_vacuum_scale_factor = 0.02, autovacuum_analyze_scale_factor = 0.01, autovacuum_vacuum_cost_delay = 5, autovacuum_naptime = 30 ); -- receipts and device_lists_stream ALTER TABLE receipts SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_cost_delay = 5); ALTER TABLE device_lists_stream SET (autovacuum_vacuum_scale_factor = 0.02); ALTER TABLE presence_stream SET (autovacuum_vacuum_scale_factor = 0.02); ``` Also bump `autovacuum_max_workers` from 3 → 5: ```sql ALTER SYSTEM SET autovacuum_max_workers = 5; SELECT pg_reload_conf(); ``` **Monitor vacuum health:** ```sql SELECT relname, last_autovacuum, n_dead_tup, n_live_tup FROM pg_stat_user_tables WHERE relname IN ('events', 'state_groups_state', 'receipts') ORDER BY n_dead_tup DESC; ``` --- ### 7. Hookshot Metrics + Grafana **Hookshot metrics** are exposed at `127.0.0.1:9001/metrics` but it's unconfirmed whether Prometheus at `10.10.10.49` is scraping them. Verify: ```bash # On LXC 151 curl http://127.0.0.1:9001/metrics | head -20 ``` If Prometheus is scraping, add the hookshot dashboard from the repo: `contrib/hookshot-dashboard.json` → import into Grafana. **Grafana Synapse dashboard** — Prometheus is already scraping Synapse at port 9000. Import the official dashboard: - Grafana → Dashboards → Import → ID `18618` (Synapse Monitoring) - Set Prometheus datasource → done - Shows room count, message rates, federation lag, cache hit rates, DB query times in real time --- ### 8. Federation Security Currently: open federation with key verification (correct for invite-only friends server). Recommended additions: **Server-level allow/deny in `homeserver.yaml`** (optional, for closing federation entirely): ```yaml # Fully closed (recommended long-term for private guild): federation_enabled: false # OR: whitelist-only federation federation_domain_whitelist: - matrix.lotusguild.org - matrix.org # Keep if bridging needed ``` **Per-room ACLs** for reactive blocking of specific bad servers: ```json { "type": "m.room.server_acl", "content": { "allow": ["*"], "deny": ["spam.example.com"] } } ``` **Mjolnir/Draupnir** (already on roadmap) handles this automatically with ban list subscriptions (t2bot spam lists etc). --- ### 9. Sygnal Push Notifications Sygnal is the official Matrix push gateway for mobile (Element X on iOS/Android). Without it, notifications don't arrive when the app is backgrounded. **Requirements:** - Apple Developer account (APNS cert) for iOS - Firebase project (FCM API key) for Android - New LXC or run alongside existing services **Basic config (`/etc/sygnal/sygnal.yaml`):** ```yaml server: port: 8765 database: type: postgresql user: sygnal password: database: sygnal apps: com.element.android: type: gcm api_key: im.riot.x.ios: type: apns platform: production certfile: /etc/sygnal/apns/element-x-cert.pem topic: im.riot.x.ios ``` **Synapse integration:** ```yaml # homeserver.yaml push: push_gateways: - url: "http://localhost:8765" ``` --- ### 10. LiveKit VP9/AV1 + Dynacast (Quality Improvement) Currently H264 only. Enabling VP9/AV1 unlocks Dynacast (pauses video layers no one is watching) which significantly reduces bandwidth/CPU for low-viewer rooms. **`/etc/livekit/config.yaml` additions:** ```yaml video: codecs: - mime: video/H264 fmtp: "level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01e" - mime: video/VP9 fmtp: "profile=0" - mime: video/AV1 fmtp: "profile=0" dynacast: true ``` Note: Dynacast only works with VP9 or AV1 (SVC-capable codecs). H264 subscribers continue to work normally alongside VP9/AV1 subscribers. --- ### 11. Synapse Workers (Future Scaling Reference) Current single-process handles ~100–300 concurrent users before the Python GIL becomes the bottleneck. Not needed now, but documented for when usage grows. **Stage 1 trigger:** Synapse CPU >80% consistently, or >200 concurrent users. **First workers to add:** ```yaml # /etc/matrix-synapse/workers/client-reader-1.yaml worker_app: synapse.app.client_reader worker_name: client-reader-1 worker_listeners: - type: http port: 8011 resources: [{names: [client]}] ``` Add `federation_sender` next (off-loads outgoing federation from main process). Then `event_creator` for write-heavy loads. Redis required at Stage 2 (500+ users) for inter-worker coordination. --- ## Bot Checklist ### Core - [x] matrix-nio async client with E2EE - [x] Device trust (auto-trust all devices) - [x] Graceful shutdown (SIGTERM/SIGINT) - [x] Initial sync token (ignores old messages on startup) - [x] Auto-accept room invites - [x] Deployed as systemd service (`matrixbot.service`) on LXC 151 - [x] Fix E2EE key errors — full store + credentials wipe, fresh device registration (`BBRZSEUECZ`); stale devices removed via admin API ### Commands - [x] `!help` — list commands - [x] `!ping` — latency check - [x] `!8ball ` — magic 8-ball - [x] `!fortune` — fortune cookie - [x] `!flip` — coin flip - [x] `!roll ` — dice roller - [x] `!random ` — random number - [x] `!rps ` — rock paper scissors - [x] `!poll ` — poll with reactions - [x] `!trivia` — trivia game (reactions, 30s reveal) - [x] `!champion [lane]` — random LoL champion - [x] `!agent [role]` — random Valorant agent - [x] `!wordle` — full Wordle game (daily, hard mode, stats, share) - [x] `!minecraft ` — RCON whitelist add - [x] `!ask ` — Ollama LLM (lotusllm, 2min cooldown) - [x] `!health` — bot uptime + service status ### Welcome System - [x] Watches Space joins and DMs new members automatically - [x] React-to-join: react with ✅ in DM → bot invites to General, Commands, Memes - [x] Welcome event ID persisted to `welcome_state.json` ### Wordle - [x] Daily puzzles with two-pass letter evaluation - [x] Hard mode with constraint validation - [x] Stats persistence (`wordle_stats.json`) - [x] Cinny-compatible rendering (inline `` tiles) - [x] DM-based gameplay, `!wordle share` posts result to public room - [x] Virtual keyboard display --- ## Tech Stack | Component | Technology | Version | |-----------|-----------|---------| | Bot language | Python 3 | 3.x | | Bot library | matrix-nio (E2EE) | latest | | Homeserver | Synapse | 1.148.0 | | Database | PostgreSQL | 17.9 | | TURN | coturn | latest | | Video/voice calls | LiveKit SFU | 1.9.11 | | LiveKit JWT | lk-jwt-service | latest | | SSO | Authelia (OIDC) + LLDAP | — | | Webhook bridge | matrix-hookshot | 7.3.2 | | Reverse proxy | Nginx Proxy Manager | — | | Web client | Cinny (custom build, `add-joined-call-controls` branch) | 4.10.5+ | | Bot dependencies | matrix-nio[e2ee], aiohttp, python-dotenv, mcrcon | — | ## Bot Files ``` matrixBot/ ├── bot.py # Entry point, client setup, event loop ├── callbacks.py # Message + reaction event handlers ├── commands.py # All command implementations ├── config.py # Environment config + validation ├── utils.py # send_text, send_html, send_reaction, get_or_create_dm ├── welcome.py # Welcome message + react-to-join logic ├── wordle.py # Full Wordle game engine ├── wordlist_answers.py # Wordle answer word list ├── wordlist_valid.py # Wordle valid guess word list ├── .env.example # Environment variable template └── requirements.txt # Python dependencies ```