Files
matrix/README.md
2026-03-10 20:37:52 -04:00

17 KiB
Raw Blame History

Lotus Matrix Bot & Server Roadmap

Matrix bot and server infrastructure for the Lotus Guild homeserver (matrix.lotusguild.org).

Repo: https://code.lotusguild.org/LotusGuild/matrixBot

Status: Phase 7 — Moderation & Client Customisation


Priority Order

  1. PostgreSQL migration
  2. TURN server
  3. Room structure + space setup
  4. Matrix bot (core + commands)
  5. LiveKit / Element Call
  6. SSO / OIDC (Authelia)
  7. Webhook integrations (hookshot)
  8. Voice stability & quality tuning
  9. Custom Cinny client (chat.lotusguild.org)
  10. Custom emoji packs (partially finished)
  11. Cinny custom branding (Lotus Guild theme)
  12. Draupnir moderation bot
  13. Push notifications (Sygnal)

Infrastructure

Service IP LXC RAM vCPUs Disk Versions
Synapse 10.10.10.29 151 8GB 4 (Ryzen 9 7900) 50GB Synapse 1.149.0, LiveKit 1.9.11, hookshot 7.3.2, coturn latest
PostgreSQL 17 10.10.10.44 109 6GB 3 (Ryzen 9 7900) 30GB PostgreSQL 17.9
Cinny Web 10.10.10.6 106 256MB 1 8GB Debian 13, nginx, Node 24, Cinny 4.10.5
Draupnir 10.10.10.24 110 1GB 2 (Ryzen 9 7900) 10GB Draupnir v2.9.0, Node.js v22
Prometheus 10.10.10.48 118 Prometheus — scrapes all Matrix services
Grafana 10.10.10.49 107 Grafana 12.4.0 — dashboard.lotusguild.org
NPM 10.10.10.27 139 Nginx Proxy Manager
Authelia 10.10.10.36 167 SSO/OIDC provider
LLDAP 10.10.10.39 147 LDAP user directory
Uptime Kuma 10.10.10.25 101 Uptime monitoring (micro1 node)

Key paths on Synapse LXC (151):

  • Synapse config: /etc/matrix-synapse/homeserver.yaml
  • Synapse conf.d: /etc/matrix-synapse/conf.d/ (metrics.yaml, report_stats.yaml, server_name.yaml)
  • coturn config: /etc/turnserver.conf
  • LiveKit config: /etc/livekit/config.yaml
  • LiveKit service: livekit-server.service
  • lk-jwt-service: lk-jwt-service.service (binds :8070, serves JWT tokens for MatrixRTC)
  • Hookshot: /opt/hookshot/, service: matrix-hookshot.service
  • Hookshot config: /opt/hookshot/config.yml
  • Hookshot registration: /etc/matrix-synapse/hookshot-registration.yaml
  • Bot: /opt/matrixbot/, service: matrixbot.service
  • Landing page: /var/www/matrix-landing/index.html (on NPM LXC 139)

Key paths on Draupnir LXC (110):

  • Install path: /opt/draupnir/
  • Config: /opt/draupnir/config/production.yaml
  • Data/SQLite DBs: /data/storage/
  • Service: draupnir.service
  • Management room: #management:matrix.lotusguild.org (!mEvR5fe3jMmzwd-FwNygD72OY_yu8H3UP_N-57oK7MI)
  • Bot account: @draupnir:matrix.lotusguild.org (power level 100 in all protected rooms)
  • Subscribed ban lists: #community-moderation-effort-bl:neko.dev, #matrix-org-coc-bl:matrix.org
  • Rebuild: NODE_OPTIONS="--max-old-space-size=768" npx tsc --project tsconfig.json

Key paths on PostgreSQL LXC (109):

  • PostgreSQL config: /etc/postgresql/17/main/postgresql.conf
  • Tuning conf.d: /etc/postgresql/17/main/conf.d/synapse_tuning.conf
  • HBA config: /etc/postgresql/17/main/pg_hba.conf
  • Data directory: /var/lib/postgresql/17/main

Key paths on Cinny LXC (106):

  • Source: /opt/cinny/ (branch: add-joined-call-controls)
  • Built files: /var/www/html/
  • Cinny config: /var/www/html/config.json
  • Config backup (survives rebuilds): /opt/cinny-config.json
  • Nginx site config: /etc/nginx/sites-available/cinny
  • Rebuild script: /usr/local/bin/cinny-update

Port Maps

Router → 10.10.10.29 (forwarded):

  • TCP+UDP 3478 — TURN/STUN
  • TCP+UDP 5349 — TURNS/TLS
  • TCP 7881 — LiveKit ICE TCP fallback
  • TCP+UDP 49152-65535 — TURN relay range

Internal port map (LXC 151):

Port Service Bind
8008 Synapse HTTP 0.0.0.0
9000 Synapse metrics 127.0.0.1 + 10.10.10.29
9001 Hookshot widgets 0.0.0.0
9002 Hookshot bridge (appservice) 127.0.0.1
9003 Hookshot webhooks 0.0.0.0
9004 Hookshot metrics 0.0.0.0
9100 node_exporter 0.0.0.0
9101 matrix-admin exporter 0.0.0.0
6789 LiveKit metrics 0.0.0.0
7880 LiveKit HTTP 0.0.0.0
7881 LiveKit RTC TCP 0.0.0.0
8070 lk-jwt-service 0.0.0.0
8080 synapse-admin (nginx) 0.0.0.0
3478 coturn STUN/TURN 0.0.0.0
5349 coturn TURNS/TLS 0.0.0.0

Internal port map (LXC 109 — PostgreSQL):

Port Service Bind
5432 PostgreSQL 0.0.0.0 (hba-restricted to 10.10.10.29)
9100 node_exporter 0.0.0.0
9187 postgres_exporter 0.0.0.0

Rooms (all v12)

Room Room ID Join Rule
The Lotus Guild (Space) !-1ZBnAH-JiCOV8MGSKN77zDGTuI3pgSdy8Unu_DrDyc public
General !wfokQ1-pE896scu_AOcCBA2s3L4qFo-PTBAFTd0WMI0 public
Commands !ou56mVZQ8ZB7AhDYPmBV5_BR28WMZ4x5zwZkPCqjq1s restricted (Space members)
Memes !GK6v5cLEEnowIooQJv5jECfISUjADjt8aKhWv9VbG5U restricted (Space members)
Management !mEvR5fe3jMmzwd-FwNygD72OY_yu8H3UP_N-57oK7MI invite
Cool Kids !R7DT3QZHG9P8QQvX6zsZYxjkKgmUucxDz_n31qNrC94 invite
Spam and Stuff !GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg invite, no E2EE (hookshot)

Power level roles (Cinny tags):

  • 100: Owner (jared)
  • 50: The Nerdy Council (enhuynh, lonely)
  • 48: Panel of Geeks
  • 35: Cool Kids
  • 0: Member

Webhook Integrations (matrix-hookshot 7.3.2)

Generic webhooks bridged into Spam and Stuff. Each service gets its own virtual user (@hookshot_<service>) with a unique avatar. Webhook URL format: https://matrix.lotusguild.org/webhook/<uuid>

Service Webhook UUID Notes
Grafana df4a1302-2d62-4a01-b858-fb56f4d3781a Unified alerting contact point
Proxmox 9b3eafe5-7689-4011-addd-c466e524661d Notification system (8.1+)
Sonarr aeffc311-0686-42cb-9eeb-6757140c072e All event types
Radarr 34913454-c1ac-4cda-82ea-924d4a9e60eb All event types
Readarr e57ab4f3-56e6-4dc4-8b30-2f4fd4bbeb0b All event types
Lidarr 66ac6fdd-69f6-4f47-bb00-b7f6d84d7c1c All event types
Uptime Kuma 1a02e890-bb25-42f1-99fe-bba6a19f1811 Status change notifications
Seerr 555185af-90a1-42ff-aed5-c344e11955cf Request/approval events
Owncast (Livestream) 9993e911-c68b-4271-a178-c2d65ca88499 STREAM_STARTED / STREAM_STOPPED
Bazarr 470fb267-3436-4dd3-a70c-e6e8db1721be Subtitle events (Apprise JSON notifier)
Tinker-Tickets 6e306faf-8eea-4ba5-83ef-bf8f421f929e Custom transformation code

Hookshot notes:

  • Spam and Stuff is intentionally unencrypted — hookshot bridges cannot join E2EE rooms
  • JS transformation functions use hookshot v2 API: result = { version: "v2", plain, html, msgtype }
  • The result variable must be assigned without var/let/const (QuickJS IIFE sandbox)
  • NPM proxies https://matrix.lotusguild.org/webhook/*http://10.10.10.29:9003

Moderation (Draupnir v2.9.0)

Draupnir runs on LXC 110, manages moderation across all 9 protected rooms via #management:matrix.lotusguild.org.

Subscribed ban lists:

  • #community-moderation-effort-bl:neko.dev — 12,599 banned users, 245 servers, 59 rooms
  • #matrix-org-coc-bl:matrix.org — 4,589 banned users, 220 servers, 2 rooms

Common commands (send in management room):

!draupnir status                          — current status + protected rooms
!draupnir ban @user:server * "reason"     — ban from all protected rooms
!draupnir redact @user:server             — redact their recent messages
!draupnir rooms add !roomid:server        — add a room to protection
!draupnir watch <alias> --no-confirm      — subscribe to a ban list

Known Issues

coturn TLS Reset Errors

Periodic TLS/TCP socket error: Connection reset by peer in coturn logs. Normal — clients probe TURN and drop once they establish a direct P2P path.

BBR Congestion Control

net.ipv4.tcp_congestion_control = bbr must be set on the Proxmox host, not inside an unprivileged LXC. All other sysctl tuning (TCP/UDP buffers, fin_timeout) is applied inside LXC 151.


Server Checklist

Quality of Life

  • Migrate from SQLite to PostgreSQL
  • TURN/STUN server (coturn) for reliable voice/video
  • URL previews
  • Upload size limit 200MB
  • Full-text message search (PostgreSQL backend)
  • Media retention policy (remote: 1yr, local: 3yr)
  • Sliding sync (native Synapse)
  • LiveKit for Element Call video rooms
  • Default room version v12, all rooms upgraded
  • Landing page with client recommendations
  • Synapse metrics endpoint (port 9000, Prometheus-compatible)
  • Custom Cinny client LXC 106 — Cinny 4.10.5, add-joined-call-controls branch, weekly auto-update cron
  • Push notifications gateway (Sygnal) — needs Apple/Google developer credentials
  • Cinny custom branding — Lotus Guild theme (colours, title, favicon, PWA name)

Performance Tuning

  • PostgreSQL shared_buffers → 1500MB, effective_cache_size, work_mem, checkpoint tuning
  • PostgreSQL pg_stat_statements extension installed
  • PostgreSQL autovacuum tuned per-table (5 high-churn tables), autovacuum_max_workers → 5
  • Synapse event_cache_size → 30K, per-cache factors tuned
  • sysctl TCP/UDP buffer alignment on LXC 151 (/etc/sysctl.d/99-matrix-tuning.conf)
  • LiveKit: empty_timeout: 300, departure_timeout: 20, max_participants: 50
  • LiveKit ICE port range expanded to 50000-51000
  • LiveKit TURN TTL reduced to 1h
  • LiveKit VP9/AV1 codecs enabled
  • BBR congestion control — must be applied on Proxmox host

Auth & SSO

  • Token-based registration
  • SSO/OIDC via Authelia
  • allow_existing_users: true for linking accounts to SSO
  • Password auth alongside SSO

Webhooks & Integrations

  • matrix-hookshot 7.3.2 — 11 active webhook services
  • Per-service JS transformation functions
  • Per-service virtual user avatars
  • NPM reverse proxy for /webhook path

Room Structure

  • The Lotus Guild space with all core rooms
  • Correct power levels and join rules per room
  • Custom room avatars

Hardening

  • Rate limiting
  • E2EE on all rooms (except Spam and Stuff — intentional for hookshot)
  • coturn internal peer deny rules (blocks relay to RFC1918 except allowed subnet)
  • coturn hardening: stale-nonce=600, user-quota=100, total-quota=1000, strong cipher list
  • pg_hba.conf locked down — remote access restricted to Synapse LXC only
  • Federation open with key verification
  • fail2ban on Synapse login endpoint (5 retries / 24h ban)
  • Synapse metrics port 9000 restricted to 127.0.0.1 + 10.10.10.29
  • coturn cert auto-renewal — daily sync cron on compute-storage-01
  • /.well-known/matrix/client and /server live on lotusguild.org
  • suppress_key_server_warning: true
  • Automated database + media backups
  • Federation bad-actor blocking via Draupnir ban lists (17,000+ entries)

Monitoring

  • Grafana dashboard — dashboard.lotusguild.org/d/matrix-synapse-dashboard (140+ panels)
  • Prometheus scraping all Matrix services (Synapse, Hookshot, LiveKit, node_exporter, postgres)
  • 14 active alert rules across matrix-folder and infra-folder
  • Uptime Kuma monitors: Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt-service, Hookshot

Admin

  • Synapse admin API dashboard (synapse-admin at http://10.10.10.29:8080)
  • Draupnir moderation bot — LXC 110, v2.9.0, 9 protected rooms, 2 ban lists
  • Cinny custom branding

Monitoring & Observability

Prometheus Scrape Jobs

Job Target Metrics
synapse 10.10.10.29:9000 Full Synapse internals
matrix-admin 10.10.10.29:9101 DAU, MAU, room/user/media totals
livekit 10.10.10.29:6789 Rooms, participants, packets, latency
hookshot 10.10.10.29:9004 Connections, API calls/failures, Node.js runtime
matrix-node 10.10.10.29:9100 CPU, RAM, network, load average, disk
postgres 10.10.10.44:9187 pg_stat_database, connections, WAL, block I/O
postgres-node 10.10.10.44:9100 CPU, RAM, network, load average, disk

Disk I/O: All servers use Ceph-backed storage. Per-device disk I/O metrics are meaningless — use Network I/O panels to see actual storage traffic.

Alert Rules

Matrix folder:

Alert Fires when Severity
Synapse Down up{job="synapse"} < 1 for 2m critical
PostgreSQL Down pg_up < 1 for 2m critical
LiveKit Down up{job="livekit"} < 1 for 2m critical
Hookshot Down up{job="hookshot"} < 1 for 2m critical
PG Connection Saturation connections > 80% of max for 5m warning
Federation Queue Backing Up pending PDUs > 100 for 10m warning
Synapse High Memory RSS > 2000MB for 10m warning
Synapse High Response Time p99 latency (excl. /sync) > 10s for 5m warning
Synapse Event Processing Lag any processor > 30s behind for 5m warning
Synapse DB Query Latency High p99 query time > 1s for 5m warning

Infrastructure folder:

Alert Fires when Severity
Service Exporter Down any up == 0 for 3m critical
Node High CPU Usage CPU > 90% for 10m warning
Node High Memory Usage RAM > 90% for 10m warning
Node Disk Space Low available < 15% (excl. tmpfs/overlay) for 10m warning

/sync long-poll: The Matrix /sync endpoint is a long-poll (clients hold it open ≤30s). It is excluded from the High Response Time alert to prevent false positives.

Synapse Event Processing Lag can fire transiently after a Synapse restart while processors drain their backlog. Self-resolves in 1020 minutes.


Bot Checklist

Core

  • matrix-nio async client with E2EE
  • Device trust (auto-trust all devices)
  • Graceful shutdown (SIGTERM/SIGINT)
  • Initial sync token (ignores old messages on startup)
  • Auto-accept room invites
  • Deployed as systemd service (matrixbot.service) on LXC 151

Commands

  • !help — list commands
  • !ping — latency check
  • !8ball <question> — magic 8-ball
  • !fortune — fortune cookie
  • !flip — coin flip
  • !roll <NdS> — dice roller
  • !random <min> <max> — random number
  • !rps <choice> — rock paper scissors
  • !poll <question> — poll with reactions
  • !trivia — trivia game (reactions, 30s reveal)
  • !champion [lane] — random LoL champion
  • !agent [role] — random Valorant agent
  • !wordle — full Wordle game (daily, hard mode, stats, share)
  • !minecraft <username> — RCON whitelist add
  • !ask <question> — Ollama LLM (lotusllm, 2min cooldown)
  • !health — bot uptime + service status

Welcome System

  • Watches Space joins and DMs new members automatically
  • React-to-join: react with in DM → bot invites to General, Commands, Memes
  • Welcome event ID persisted to welcome_state.json

Wordle

  • Daily puzzles with two-pass letter evaluation
  • Hard mode with constraint validation
  • Stats persistence (wordle_stats.json)
  • Cinny-compatible rendering (inline <span> tiles)
  • DM-based gameplay, !wordle share posts result to public room
  • Virtual keyboard display

Tech Stack

Component Technology Version
Bot language Python 3 3.x
Bot library matrix-nio (E2EE) latest
Homeserver Synapse 1.149.0
Database PostgreSQL 17.9
TURN coturn latest
Video/voice calls LiveKit SFU 1.9.11
LiveKit JWT lk-jwt-service latest
Moderation Draupnir 2.9.0
SSO Authelia (OIDC) + LLDAP
Webhook bridge matrix-hookshot 7.3.2
Reverse proxy Nginx Proxy Manager
Web client Cinny (add-joined-call-controls branch) 4.10.5
Bot dependencies matrix-nio[e2ee], aiohttp, python-dotenv, mcrcon

Bot Files

matrixBot/
├── bot.py              # Entry point, client setup, event loop
├── callbacks.py        # Message + reaction event handlers
├── commands.py         # All command implementations
├── config.py           # Environment config + validation
├── utils.py            # send_text, send_html, send_reaction, get_or_create_dm
├── welcome.py          # Welcome message + react-to-join logic
├── wordle.py           # Full Wordle game engine
├── wordlist_answers.py # Wordle answer word list
├── wordlist_valid.py   # Wordle valid guess word list
├── .env.example        # Environment variable template
└── requirements.txt    # Python dependencies