2026-02-11 00:12:19 -05:00
# Lotus Matrix Bot & Server Roadmap
2026-02-11 00:06:21 -05:00
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
Matrix bot and server infrastructure for the Lotus Guild homeserver (`matrix.lotusguild.org` ).
2026-02-11 00:12:19 -05:00
**Repo**: https://code.lotusguild.org/LotusGuild/matrixBot
2026-02-11 00:06:21 -05:00
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
## Status: Phase 6 — Monitoring, Observability & Hardening
2026-02-11 00:06:21 -05:00
2026-02-11 00:12:19 -05:00
---
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
## Priority Order
1. ~~PostgreSQL migration~~
2. ~~TURN server~~
3. ~~Room structure + space setup~~
4. ~~Matrix bot (core + commands)~~
5. ~~LiveKit / Element Call~~
6. ~~SSO / OIDC (Authelia)~~
7. ~~Webhook integrations (hookshot)~~
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
8. ~~Voice stability & quality tuning~~
9. ~~Custom Cinny client (chat.lotusguild.org)~~
10. Custom emoji packs (partially finished)
11. Cinny custom branding (Lotus Guild theme)
12. Draupnir moderation bot
13. Push notifications (Sygnal)
2026-02-11 00:12:19 -05:00
---
2026-02-11 19:57:01 -05:00
## Infrastructure
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
| Service | IP | LXC | RAM | vCPUs | Disk | Versions |
|---------|----|-----|-----|-------|------|----------|
| Synapse | 10.10.10.29 | 151 | 8GB | 4 (Ryzen 9 7900) | 50GB (21% used) | Synapse 1.148.0, LiveKit 1.9.11, hookshot 7.3.2, coturn latest |
| PostgreSQL 17 | 10.10.10.44 | 109 | 6GB | 3 (Ryzen 9 7900) | 30GB (5% used) | PostgreSQL 17.9 |
| Cinny Web | 10.10.10.6 | 106 | 256MB runtime | 1 | 8GB (27% used) | Debian 13, nginx, Node 24, Cinny 4.10.5 |
| NPM | 10.10.10.27 | 139 | — | — | — | Nginx Proxy Manager |
| Authelia | 10.10.10.36 | 167 | — | — | — | SSO/OIDC provider |
| LLDAP | 10.10.10.39 | 147 | — | — | — | LDAP user directory |
| Uptime Kuma | 10.10.10.25 | 101 | — | — | — | Uptime monitoring (micro1 node) |
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
| Prometheus | 10.10.10.48 | 118 | — | — | — | Prometheus — scrapes all Matrix services |
| Grafana | 10.10.10.49 | 107 | — | — | — | Grafana 12.4.0 — dashboard.lotusguild.org |
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
> **Note:** PostgreSQL container IP is `10.10.10.44`, not `.2` — update any stale references.
**Key paths on Synapse/matrix LXC (151):**
2026-02-11 19:57:01 -05:00
- Synapse config: `/etc/matrix-synapse/homeserver.yaml`
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- Synapse conf.d: `/etc/matrix-synapse/conf.d/` (metrics.yaml, report_stats.yaml, server_name.yaml)
2026-02-11 19:57:01 -05:00
- coturn config: `/etc/turnserver.conf`
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
- LiveKit config: `/etc/livekit/config.yaml`
- LiveKit service: `livekit-server.service`
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- lk-jwt-service: `lk-jwt-service.service` (binds `:8070` , serves JWT tokens for MatrixRTC)
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
- Hookshot: `/opt/hookshot/` , service: `matrix-hookshot.service`
- Hookshot config: `/opt/hookshot/config.yml`
- Hookshot registration: `/etc/matrix-synapse/hookshot-registration.yaml`
- Landing page: `/var/www/matrix-landing/index.html` (on NPM LXC 139)
- Bot: `/opt/matrixbot/` , service: `matrixbot.service`
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
**Key paths on PostgreSQL LXC (109):**
- PostgreSQL config: `/etc/postgresql/17/main/postgresql.conf`
- PostgreSQL conf.d: `/etc/postgresql/17/main/conf.d/`
- HBA config: `/etc/postgresql/17/main/pg_hba.conf`
- Data directory: `/var/lib/postgresql/17/main`
**Running services on LXC 151:**
| Service | PID status | Memory | Notes |
|---------|-----------|--------|-------|
| matrix-synapse | active, 2+ days | 231MB peak 312MB | No workers, single process |
| livekit-server | active, 2+ days | 22MB peak 58MB | v1.9.11, node IP = 162.192.14.139 |
| lk-jwt-service | active, 2+ days | 2.7MB | Binds :8070, LIVEKIT_URL=wss://matrix.lotusguild.org |
| matrix-hookshot | active, 2+ days | 76MB peak 172MB | Actively receiving webhooks |
| matrixbot | active, 2+ days | 26MB peak 59MB | Some E2EE key errors (see known issues) |
| coturn | active, 2+ days | 13MB | Periodic TCP reset errors (normal) |
**Currently Open Port forwarding (router → 10.10.10.29):**
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
- TCP+UDP 3478 (TURN/STUN signaling)
- TCP+UDP 5349 (TURNS/TLS)
- TCP 7881 (LiveKit ICE TCP fallback)
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- TCP+UDP 49152-65535 (TURN relay range)
- LiveKit WebRTC media: 50100-50500 (subset of above, only 400 ports — see improvements)
**Internal port map (LXC 151):**
| Port | Service | Bind |
|------|---------|------|
| 8008 | Synapse HTTP | 0.0.0.0 + ::1 |
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
| 9000 | Synapse metrics (Prometheus) | 127.0.0.1 + 10.10.10.29 |
| 9001 | Hookshot widgets | 0.0.0.0 |
| 9002 | Hookshot bridge (appservice) | 127.0.0.1 |
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
| 9003 | Hookshot webhooks | 0.0.0.0 |
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
| 9004 | Hookshot metrics (Prometheus) | 0.0.0.0 |
| 9100 | node_exporter (Prometheus) | 0.0.0.0 |
| 9101 | matrix-admin exporter | 0.0.0.0 |
| 6789 | LiveKit metrics (Prometheus) | 0.0.0.0 |
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
| 7880 | LiveKit HTTP | 0.0.0.0 |
| 7881 | LiveKit RTC TCP | 0.0.0.0 |
| 8070 | lk-jwt-service | 0.0.0.0 |
| 8080 | synapse-admin (nginx) | 0.0.0.0 |
| 3478 | coturn STUN/TURN | 0.0.0.0 |
| 5349 | coturn TURNS/TLS | 0.0.0.0 |
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
**Internal port map (LXC 109 — PostgreSQL):**
| Port | Service | Bind |
|------|---------|------|
| 5432 | PostgreSQL | 0.0.0.0 (hba-restricted to 10.10.10.29) |
| 9100 | node_exporter (Prometheus) | 0.0.0.0 |
| 9187 | postgres_exporter | 0.0.0.0 |
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
---
2026-02-11 19:57:01 -05:00
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
## Rooms (all v12)
| Room | Room ID | Join Rule |
|------|---------|-----------|
| The Lotus Guild (Space) | `!-1ZBnAH-JiCOV8MGSKN77zDGTuI3pgSdy8Unu_DrDyc` | public |
| General | `!wfokQ1-pE896scu_AOcCBA2s3L4qFo-PTBAFTd0WMI0` | public |
| Commands | `!ou56mVZQ8ZB7AhDYPmBV5_BR28WMZ4x5zwZkPCqjq1s` | restricted (Space members) |
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
| Memes | `!GK6v5cLEEnowIooQJv5jECfISUjADjt8aKhWv9VbG5U` | restricted (Space members) |
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
| Management | `!mEvR5fe3jMmzwd-FwNygD72OY_yu8H3UP_N-57oK7MI` | invite |
| Cool Kids | `!R7DT3QZHG9P8QQvX6zsZYxjkKgmUucxDz_n31qNrC94` | invite |
| Spam and Stuff | `!GttT4QYd1wlGlkHU3qTmq_P3gbyYKKeSSN6R7TPcJHg` | invite, **no E2EE ** (hookshot) |
**Power level roles (Cinny tags):**
- 100: Owner (jared)
- 50: The Nerdy Council (enhuynh, lonely)
- 48: Panel of Geeks
- 35: Cool Kids
- 0: Member
2026-02-11 19:57:01 -05:00
---
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
## Webhook Integrations (matrix-hookshot 7.3.2)
Generic webhooks bridged into **Spam and Stuff ** via [matrix-hookshot ](https://github.com/matrix-org/matrix-hookshot ).
Each service gets its own virtual user (`@hookshot_<service>` ) with a unique avatar.
Webhook URL format: `https://matrix.lotusguild.org/webhook/<uuid>`
| Service | Webhook UUID | Notes |
|---------|-------------|-------|
| Grafana | `df4a1302-2d62-4a01-b858-fb56f4d3781a` | Unified alerting contact point |
| Proxmox | `9b3eafe5-7689-4011-addd-c466e524661d` | Notification system (8.1+) |
| Sonarr | `aeffc311-0686-42cb-9eeb-6757140c072e` | All event types |
| Radarr | `34913454-c1ac-4cda-82ea-924d4a9e60eb` | All event types |
| Readarr | `e57ab4f3-56e6-4dc4-8b30-2f4fd4bbeb0b` | All event types |
| Lidarr | `66ac6fdd-69f6-4f47-bb00-b7f6d84d7c1c` | All event types |
| Uptime Kuma | `1a02e890-bb25-42f1-99fe-bba6a19f1811` | Status change notifications |
| Seerr | `555185af-90a1-42ff-aed5-c344e11955cf` | Request/approval events |
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
| Owncast (Livestream) | `9993e911-c68b-4271-a178-c2d65ca88499` | STREAM_STARTED / STREAM_STOPPED (hookshot display name: "Livestream") |
2026-02-20 14:59:04 -05:00
| Bazarr | `470fb267-3436-4dd3-a70c-e6e8db1721be` | Subtitle events (Apprise JSON notifier) |
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
| Tinker-Tickets | `6e306faf-8eea-4ba5-83ef-bf8f421f929e` | Custom transformation code |
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
**Hookshot notes:**
- Spam and Stuff is intentionally **unencrypted ** — hookshot bridges cannot join E2EE rooms
- Webhook tokens stored in Synapse PostgreSQL `room_account_data` for `@hookshot`
- JS transformation functions use hookshot v2 API: set `result = { version: "v2", plain, html, msgtype }`
- The `result` variable must be assigned without `var` /`let` /`const` (needs implicit global scope in the QuickJS IIFE sandbox)
- NPM proxies `https://matrix.lotusguild.org/webhook/*` → `http://10.10.10.29:9003`
- Virtual user avatars: set via appservice token (`as_token` in hookshot-registration.yaml) impersonating each user
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- Hookshot bridge port (9002) binds `127.0.0.1` only; webhook ingest (9003) binds `0.0.0.0` (NPM-proxied)
---
## Known Issues
### coturn TLS Reset Errors
Periodic `TLS/TCP socket error: Connection reset by peer` in coturn logs from external IPs. This is normal — clients probe TURN and drop the connection once they establish a direct P2P path. Not an issue.
### BBR Congestion Control — Host-Level Only
`net.ipv4.tcp_congestion_control = bbr` and `net.core.default_qdisc = fq` cannot be set from inside an unprivileged LXC container — they affect the host kernel's network namespace. These must be applied on the Proxmox host itself to take effect for all containers. All other sysctl tuning (TCP/UDP buffers, fin_timeout) applied successfully inside LXC 151.
---
## Optimizations & Improvements
### 1. LiveKit / Voice Quality ✅ Applied
Noise suppression and volume normalization are **client-side only ** (browser/Element X handles this via WebRTC's built-in audio processing). The server cannot enforce these. Applied server-side improvements:
- **ICE port range expanded:** 50100-50500 (400 ports) → **50000-51000 (1001 ports) ** = ~500 concurrent WebRTC streams
- **TURN TTL reduced:** 86400s (24h) → **3600s (1h) ** — stale allocations expire faster
- **Room defaults added:** `empty_timeout: 300` , `departure_timeout: 20` , `max_participants: 50`
**Client-side audio advice for users:**
- **Element Web/Desktop:** Settings → Voice & Video → enable "Noise Suppression" and "Echo Cancellation"
- **Element X (mobile):** automatic via WebRTC stack
- **Cinny (chat.lotusguild.org):** voice via embedded Element Call widget — browser WebRTC noise suppression is active automatically
### 2. PostgreSQL Tuning (LXC 109) ✅ Applied
`/etc/postgresql/17/main/conf.d/synapse_tuning.conf` written and active. `pg_stat_statements` extension created in the `synapse` database. Config applied:
```ini
# Memory — shared_buffers = 25% RAM, effective_cache_size = 75% RAM
shared_buffers = 1500MB
effective_cache_size = 4500MB
work_mem = 32MB # Per sort/hash operation (safe at low connection count)
maintenance_work_mem = 256MB # VACUUM, CREATE INDEX
wal_buffers = 64MB # WAL write buffer
# Checkpointing
checkpoint_completion_target = 0.9 # Spread checkpoint I/O (default 0.5 is aggressive)
max_wal_size = 2GB
# Storage (Ceph RBD block device = SSD-equivalent random I/O)
random_page_cost = 1.1 # Default 4.0 assumes spinning disk
effective_io_concurrency = 200 # For SSDs/Ceph
# Parallel queries (3 vCPUs)
max_worker_processes = 3
max_parallel_workers_per_gather = 1
max_parallel_workers = 2
# Monitoring
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
```
Restarted `postgresql@17-main` . Expected impact: Synapse query latency drops as the DB grows — the entire current 120MB database fits in shared_buffers.
### 3. PostgreSQL Security — pg_hba.conf (LXC 109) ✅ Applied
Removed the two open rules (`0.0.0.0/24 md5` and `0.0.0.0/0 md5` ). Remote access is now restricted to Synapse LXC only:
```
host synapse synapse_user 10.10.10.29/32 scram-sha-256
```
All other remote connections are rejected. Local Unix socket and loopback remain functional for admin access.
### 4. Synapse Cache Tuning (LXC 151) ✅ Applied
`event_cache_size` bumped 15K → 30K. `_get_state_group_for_events: 3.0` added to `per_cache_factors` (heavily hit during E2EE key sharing). Synapse restarted cleanly.
```yaml
event_cache_size: 30K
caches:
global_factor: 2.0
per_cache_factors:
get_users_in_room: 3.0
get_current_state_ids: 3.0
_get_state_group_for_events: 3.0
```
### 5. Network / sysctl Tuning (LXC 151) ✅ Applied
`/etc/sysctl.d/99-matrix-tuning.conf` written and active. TCP/UDP buffers aligned and fin_timeout reduced.
```ini
# Align TCP buffers with core maximums
net.ipv4.tcp_rmem = 4096 131072 26214400
net.ipv4.tcp_wmem = 4096 65536 26214400
# UDP buffer sizing for WebRTC media streams
net.core.rmem_max = 26214400
net.core.wmem_max = 26214400
net.ipv4.udp_rmem_min = 65536
net.ipv4.udp_wmem_min = 65536
# Reduce latency for short-lived TURN connections
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 30
```
> **BBR note:** `tcp_congestion_control = bbr` and `default_qdisc = fq` require host-level sysctl — cannot be set inside an unprivileged LXC. Apply on the Proxmox host to benefit all containers:
> ```bash
> echo "net.ipv4.tcp_congestion_control = bbr" >> /etc/sysctl.d/99-bbr.conf
> echo "net.core.default_qdisc = fq" >> /etc/sysctl.d/99-bbr.conf
> sysctl --system
> ```
### 6. Synapse Federation Hardening
The server is effectively a private server for friends. Restricting federation prevents abuse and reduces load. Add to `homeserver.yaml` :
```yaml
# Allow federation only with specific trusted servers (or disable entirely)
federation_domain_whitelist:
- matrix.org # Keep for bridging if needed
- matrix.lotusguild.org
# OR to go fully closed (recommended for friends-only):
# federation_enabled: false
```
### 7. Bot E2EE Key Fix (LXC 151) ✅ Applied
`nio_store/` cleared and bot restarted cleanly. Megolm session errors resolved.
---
## Custom Cinny Client (chat.lotusguild.org)
Cinny v4 is the preferred client — clean UI, Cinny-style rendering already used by the bot's Wordle tiles. We build from source to get voice support and full branding control.
### Why Cinny over Element Web
- Much cleaner aesthetics, already the de-facto client for guild members
- Element Web voice suppression (Krisp) is only on `app.element.io` — a custom build loses it
- Cinny `add-joined-call-controls` branch uses `@element-hq/element-call-embedded` which talks to the **existing ** MatrixRTC → lk-jwt-service → LiveKit stack with zero new infrastructure
- Static build (nginx serving ~5MB of files) — nearly zero runtime resource cost
### Voice support status (as of March 2026)
The official `add-joined-call-controls` branch (maintained by `ajbura` , last commit March 8 2026) embeds Element Call as a widget via `@element-hq/element-call-embedded: 0.16.3` . This uses the same MatrixRTC protocol that lk-jwt-service already handles. Two direct LiveKit integration PRs (#2703 , #2704 ) were proposed but closed without merge — so the embedded Element Call approach is the official path.
Since lk-jwt-service is already running on LXC 151 and configured for `wss://matrix.lotusguild.org` , voice calls will work out of the box once the Cinny build is deployed.
### LXC Setup
**Create the LXC** (run on the host):
```bash
# ProxmoxVE Debian 13 community script
bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/ct/debian.sh)"
```
Recommended settings: 2GB RAM, 1-2 vCPUs, 20GB disk, Debian 13, static IP on VLAN 10 (e.g. `10.10.10.XX` ).
**Inside the new LXC:**
```bash
# Install nginx + git + nvm dependencies
apt update && apt install -y nginx git curl
# Install Node.js 24 via nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
source ~/.bashrc
nvm install 24
nvm use 24
# Clone Cinny and switch to voice-support branch
git clone https://github.com/cinnyapp/cinny.git /opt/cinny
cd /opt/cinny
git checkout add-joined-call-controls
# Install dependencies and build
npm ci
NODE_OPTIONS=--max_old_space_size=4096 npm run build
# Output: /opt/cinny/dist/
# Deploy to nginx root
cp -r /opt/cinny/dist/* /var/www/html/
```
**Configure Cinny** — edit `/var/www/html/config.json` :
```json
{
"defaultHomeserver": 0,
"homeserverList": ["matrix.lotusguild.org"],
"allowCustomHomeservers": false,
"featuredCommunities": {
"openAsDefault": false,
"spaces": [],
"rooms": [],
"servers": []
},
"hashRouter": {
"enabled": false,
"basename": "/"
}
}
```
**Nginx config** — `/etc/nginx/sites-available/cinny` (matches the official `docker-nginx.conf` ):
```nginx
server {
listen 80;
listen [::]:80;
server_name chat.lotusguild.org;
root /var/www/html;
index index.html;
location / {
rewrite ^/config.json$ /config.json break;
rewrite ^/manifest.json$ /manifest.json break;
rewrite ^/sw.js$ /sw.js break;
rewrite ^/pdf.worker.min.js$ /pdf.worker.min.js break;
rewrite ^/public/(.*)$ /public/$1 break;
rewrite ^/assets/(.*)$ /assets/$1 break;
rewrite ^(.+)$ /index.html break;
}
}
```
```bash
ln -s /etc/nginx/sites-available/cinny /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx
```
Then in **NPM ** : add a proxy host for `chat.lotusguild.org` → `http://10.10.10.XX:80` with SSL.
### Rebuilding after updates
```bash
cd /opt/cinny
git pull
npm ci
NODE_OPTIONS=--max_old_space_size=4096 npm run build
cp -r dist/* /var/www/html/
# Preserve your config.json — it gets overwritten by the copy above, so:
# Option: keep config.json outside dist and symlink/copy it in after each build
```
### Key paths (Cinny LXC 106 — 10.10.10.6)
- Source: `/opt/cinny/` (branch: `add-joined-call-controls` )
- Built files: `/var/www/html/`
- Cinny config: `/var/www/html/config.json`
- Config backup (survives rebuilds): `/opt/cinny-config.json`
- Nginx site config: `/etc/nginx/sites-available/cinny`
- Rebuild script: `/usr/local/bin/cinny-update`
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
---
## Server Checklist
### Quality of Life
- [x] Migrate from SQLite to PostgreSQL
- [x] TURN/STUN server (coturn) for reliable voice/video
- [x] URL previews
- [x] Upload size limit 200MB
- [x] Full-text message search (PostgreSQL backend)
- [x] Media retention policy (remote: 1yr, local: 3yr)
- [x] Sliding sync (native Synapse)
- [x] LiveKit for Element Call video rooms
- [x] Default room version v12, all rooms upgraded
- [x] Landing page with client recommendations (Cinny, Commet, Element, Element X mobile)
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [x] Synapse metrics endpoint (port 9000, Prometheus-compatible)
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
- [ ] Push notifications gateway (Sygnal) for mobile clients
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
- [x] LiveKit port range expanded to 50000-51000 for voice call capacity
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [x] Custom Cinny client LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 built from `add-joined-call-controls` , nginx serving, HA enabled
- [x] NPM proxy entry for `chat.lotusguild.org` → 10.10.10.6:80, SSL via Cloudflare DNS challenge, HTTPS forced, HTTP/2 + HSTS enabled
- [x] Cinny weekly auto-update cron (`/etc/cron.d/cinny-update` , Sundays 3am, logs to `/var/log/cinny-update.log` )
- [ ] Cinny custom branding — Lotus Guild theme (colors, title, favicon, PWA name)
### Performance Tuning
- [x] PostgreSQL `shared_buffers` → 1500MB, `effective_cache_size` , `work_mem` , checkpoint tuning applied
- [x] PostgreSQL `pg_stat_statements` extension installed in `synapse` database
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
- [x] PostgreSQL autovacuum tuned per-table (`state_groups_state` , `events` , `receipts_linearized` , `receipts_graph` , `device_lists_stream` , `presence_stream` ), `autovacuum_max_workers` → 5
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [x] Synapse `event_cache_size` → 30K, `_get_state_group_for_events` cache factor added
- [x] sysctl TCP/UDP buffer alignment applied to LXC 151 (`/etc/sysctl.d/99-matrix-tuning.conf` )
- [x] LiveKit room `empty_timeout: 300` , `departure_timeout: 20` , `max_participants: 50`
- [x] LiveKit ICE port range expanded to 50000-51000
- [x] LiveKit TURN TTL reduced from 24h to 1h
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
- [x] LiveKit VP9/AV1 codecs enabled (`video_codecs: [VP8, H264, VP9, AV1]` )
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [ ] BBR congestion control — must be applied on Proxmox host, not inside LXC (see Known Issues)
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
### Auth & SSO
- [x] Token-based registration
- [x] SSO/OIDC via Authelia
- [x] `allow_existing_users: true` for linking accounts to SSO
- [x] Password auth alongside SSO
### Webhooks & Integrations
- [x] matrix-hookshot 7.3.2 installed and running
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
- [x] Generic webhook bridge for 11 active services (Grafana, Proxmox, Sonarr, Radarr, Readarr, Lidarr, Uptime Kuma, Seerr, Owncast/Livestream, Bazarr, Tinker-Tickets)
docs: rewrite all hookshot transformation functions, purge Huntarr
- Remove all Huntarr references (webhook removed for security reasons)
- Rewrite transformation functions for all 11 active webhooks via Matrix
state event API — all now handle the full event payload:
- Sonarr/Radarr/Readarr/Lidarr: all event types (Grab, Download, Rename,
Add, Delete, HealthIssue, HealthRestored, ApplicationUpdate) with release
group, download client, upgrade indicator
- Grafana: multi-alert support with per-alert severity/instance/summary,
generator URLs, truncation notice for >5 alerts
- Proxmox: VM/CT name+ID, task type/status, property bag fields
- Uptime Kuma: ping time on UP, downtime duration on DOWN, URL linkified
- Seerr: all notification types, 4K flag, issue type, comment field
- Owncast: all event types (STREAM_STARTED/STOPPED, USER_JOINED, CHAT)
- Bazarr: multi-line message support from Apprise JSON payload
- Tinker-Tickets: preserved as-is (already comprehensive)
- Huntarr state event cleared in room, UUID removed from account_data map
- Owncast and Uptime Kuma functions restored (had lost their functions)
- Hookshot restarted to pick up all changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:40:42 -04:00
- [x] Per-service JS transformation functions — all rewritten to handle full event payloads (all event types, health alerts, app updates, release groups, download clients)
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
- [x] Per-service virtual user avatars
- [x] NPM reverse proxy for `/webhook` path
docs: rewrite all hookshot transformation functions, purge Huntarr
- Remove all Huntarr references (webhook removed for security reasons)
- Rewrite transformation functions for all 11 active webhooks via Matrix
state event API — all now handle the full event payload:
- Sonarr/Radarr/Readarr/Lidarr: all event types (Grab, Download, Rename,
Add, Delete, HealthIssue, HealthRestored, ApplicationUpdate) with release
group, download client, upgrade indicator
- Grafana: multi-alert support with per-alert severity/instance/summary,
generator URLs, truncation notice for >5 alerts
- Proxmox: VM/CT name+ID, task type/status, property bag fields
- Uptime Kuma: ping time on UP, downtime duration on DOWN, URL linkified
- Seerr: all notification types, 4K flag, issue type, comment field
- Owncast: all event types (STREAM_STARTED/STOPPED, USER_JOINED, CHAT)
- Bazarr: multi-line message support from Apprise JSON payload
- Tinker-Tickets: preserved as-is (already comprehensive)
- Huntarr state event cleared in room, UUID removed from account_data map
- Owncast and Uptime Kuma functions restored (had lost their functions)
- Hookshot restarted to pick up all changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:40:42 -04:00
- [x] Tinker Tickets custom transformation code
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
### Room Structure
- [x] The Lotus Guild space
- [x] All core rooms with correct power levels and join rules
- [x] Spam and Stuff room for service notifications (hookshot)
- [x] Custom room avatars
### Hardening
- [x] Rate limiting
- [x] E2EE on all rooms (except Spam and Stuff — intentional for hookshot)
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [x] coturn internal peer deny rules (blocks relay to RFC1918 except allowed subnet)
- [x] `pg_hba.conf` locked down — remote access restricted to Synapse LXC (10.10.10.29) only
- [x] Federation enabled with key verification (open for invite-only growth to friends/family/coworkers)
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
- [x] fail2ban on Synapse login endpoint (5 retries / 24h ban, LXC 151)
- [x] Synapse metrics port 9000 restricted to `127.0.0.1` + `10.10.10.29` (was `0.0.0.0` )
- [x] coturn cert auto-renewal — daily sync cron on compute-storage-01 copies NPM cert → coturn
- [x] `/.well-known/matrix/client` and `/server` live on lotusguild.org (NPM advanced config)
- [x] `suppress_key_server_warning: true` in homeserver.yaml
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [ ] Federation allow/deny lists for known bad actors
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
- [ ] Regular Synapse updates
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [x] Automated database + media backups
### Monitoring
- [x] Synapse metrics endpoint (port 9000, Prometheus-compatible)
- [x] Uptime Kuma monitors added: Synapse HTTP, LiveKit TCP, PostgreSQL TCP, Cinny Web, coturn TCP 3478, lk-jwt-service, Hookshot
- [ ] Uptime Kuma: coturn UDP STUN monitoring (requires push/heartbeat — no native UDP type in Kuma)
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
- [x] Grafana dashboard — custom Synapse dashboard at `dashboard.lotusguild.org/d/matrix-synapse-dashboard/matrix-synapse` (140+ panels, see Monitoring section below)
- [x] Prometheus scraping all Matrix services: Synapse, Hookshot, LiveKit, matrix-node, postgres-node, matrix-admin, postgres, postgres-exporter
- [x] node_exporter installed on LXC 151 (Matrix) and LXC 109 (PostgreSQL)
- [x] LiveKit Prometheus metrics enabled (`prometheus_port: 6789` )
- [x] Hookshot metrics enabled (`metrics: { enabled: true }` ) on dedicated port 9004
- [x] Grafana alert rules — 9 Matrix/infra alerts active (see Alert Rules section below)
- [x] Duplicate Grafana "Infrastructure" folder merged and deleted
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
### Admin
- [x] Synapse admin API dashboard (synapse-admin at http://10.10.10.29:8080)
- [x] Power levels per room
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
- [ ] Draupnir moderation bot (new LXC or alongside existing bot)
- [ ] Cinny custom branding (Lotus Guild theme — colors, title, favicon, PWA name)
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
- [ ] **Storj node update ** — `storj_uptodate=0` on LXC 138 (10.10.10.133), risk of disqualification
2026-02-11 00:12:19 -05:00
---
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
## Improvement Audit (March 2026)
Comprehensive audit of the current infrastructure against official documentation and security best practices. Applied March 9 2026.
### Priority Summary
| Issue | Severity | Status |
|-------|----------|--------|
| coturn TLS cert expires May 12 — no auto-renewal | **CRITICAL ** | ✅ Fixed — daily sync cron on compute-storage-01 copies NPM-renewed cert to coturn |
| Synapse metrics port 9000 bound to `0.0.0.0` | **HIGH ** | ✅ Fixed — now binds `127.0.0.1` + `10.10.10.29` (Prometheus still works, internet blocked) |
| `/.well-known/matrix/client` returns 404 | MEDIUM | ✅ Fixed — NPM lotusguild.org proxy host updated, live at `https://lotusguild.org/.well-known/matrix/client` |
| `suppress_key_server_warning` not set | MEDIUM | ✅ Fixed — added to homeserver.yaml |
| No fail2ban on `/_matrix/client/.*/login` | MEDIUM | ✅ Fixed — fail2ban installed, matrix-synapse jail active (5 retries / 24h ban) |
| No media purge cron (retention policy set but never triggers) | MEDIUM | ✅ N/A — `media_retention` block already in homeserver.yaml; Synapse runs the purge internally on schedule |
| PostgreSQL autovacuum not tuned per-table | LOW | ✅ Fixed — all 5 high-churn tables tuned, `autovacuum_max_workers` → 5 |
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
| Hookshot metrics scrape unconfirmed | LOW | ✅ Fixed — `metrics: { enabled: true }` added to config, metrics split to dedicated port 9004, Prometheus scraping confirmed |
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
| LiveKit VP9/AV1 codec support | LOW | ✅ Applied — `video_codecs: [VP8, H264, VP9, AV1]` added to livekit config |
| Federation allow/deny list not configured | LOW | Pending — Mjolnir/Draupnir on roadmap |
| Sygnal push notifications not deployed | INFO | Deferred |
---
### 1. coturn Cert Auto-Renewal ✅
The coturn cert is managed by NPM (cert ID 91, stored at `/etc/letsencrypt/live/npm-91/` on LXC 139). NPM renews it automatically. A sync script on `compute-storage-01` detects when NPM renews and copies it to coturn.
**Deployed:** `/usr/local/bin/coturn-cert-sync.sh` on compute-storage-01, cron `/etc/cron.d/coturn-cert-sync` (runs 03:30 daily).
Script compares cert expiry dates between LXC 139 and LXC 151. If they differ (NPM renewed), it copies `fullchain.pem` + `privkey.pem` and restarts coturn.
2026-03-10 14:05:59 -04:00
**Additional coturn hardening — ✅ Applied March 2026:**
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
```
# /etc/turnserver.conf
2026-03-10 14:05:59 -04:00
stale-nonce=600 # Nonce expires 600s (prevents replay attacks)
user-quota=100 # Max concurrent relay allocations per user
total-quota=1000 # Total relay allocations server-wide
cipher-list=ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-CHACHA20-POLY1305
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
```
---
### 2. Synapse Configuration Gaps
**a) Metrics port exposed to 0.0.0.0 (HIGH)**
Port 9000 currently binds `0.0.0.0` — exposes internal state, user counts, DB query times externally. Fix in `homeserver.yaml` :
```yaml
metrics_flags:
some_legacy_unrestricted_resources: false
listeners:
- port: 9000
bind_addresses: ['127.0.0.1'] # NOT 0.0.0.0
type: metrics
resources: []
```
Grafana at `10.10.10.49` scrapes port 9000 from within the VLAN so this is safe to lock down.
**b) suppress_key_server_warning (MEDIUM)**
Fills Synapse logs with noise on every restart. One line in `homeserver.yaml` :
```yaml
suppress_key_server_warning: true
```
**c) Database connection pooling (LOW — track for growth)**
Current defaults (`cp_min: 5` , `cp_max: 10` ) are fine for single-process. When adding workers, increase `cp_max` to 20– 30 per worker group. Add explicitly to `homeserver.yaml` to make it visible:
```yaml
database:
name: psycopg2
args:
cp_min: 5
cp_max: 10
```
---
### 3. Matrix Well-Known 404
`/.well-known/matrix/client` returns 404. This breaks client autodiscovery — users who type `lotusguild.org` instead of `matrix.lotusguild.org` get an error. Fix in NPM with a custom location block on the `lotusguild.org` proxy host:
```nginx
location /.well-known/matrix/client {
add_header Content-Type application/json;
add_header Access-Control-Allow-Origin *;
return 200 '{"m.homeserver":{"base_url":"https://matrix.lotusguild.org"}}';
}
location /.well-known/matrix/server {
add_header Content-Type application/json;
add_header Access-Control-Allow-Origin *;
return 200 '{"m.server":"matrix.lotusguild.org:443"}';
}
```
---
### 4. fail2ban for Synapse Login
No brute-force protection on `/_matrix/client/*/login` . Easy win.
**`/etc/fail2ban/jail.d/matrix-synapse.conf` :**
```ini
[matrix-synapse]
enabled = true
port = http,https
filter = matrix-synapse
logpath = /var/log/matrix-synapse/homeserver.log
backend = systemd
journalmatch = _SYSTEMD_UNIT=matrix-synapse.service + PRIORITY=3
findtime = 600
maxretry = 5
bantime = 86400
```
**`/etc/fail2ban/filter.d/matrix-synapse.conf` :**
```ini
[Definition]
failregex = ^.*Failed \(password\|SAML\) login attempt for user .* from <HOST>.*$
^.*"POST /.*login.*" 401.*$
ignoreregex = ^.*"GET /sync.*".*$
```
---
### 5. Synapse Media Purge Cron
Retention policy is configured (remote 1yr, local 3yr) but nothing actually triggers the purge — media accumulates silently. The Synapse admin API purge endpoint must be called explicitly.
**`/usr/local/bin/purge-synapse-media.sh` ** (create on LXC 151):
```bash
#!/bin/bash
ADMIN_TOKEN="syt_your_admin_token"
# Purge remote media (cached from other homeservers) older than 90 days
CUTOFF_TS=$(($(date +%s000) - 7776000000))
curl -X POST \
"http://localhost:8008/_synapse/admin/v1/purge_media_cache?before_ts=$CUTOFF_TS" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-s -o /dev/null
echo "$(date): Synapse remote media purge completed" >> /var/log/synapse-purge.log
```
```bash
chmod +x /usr/local/bin/purge-synapse-media.sh
echo "0 4 * * * root /usr/local/bin/purge-synapse-media.sh" > /etc/cron.d/synapse-purge
```
---
### 6. PostgreSQL Autovacuum Per-Table Tuning
The high-churn Synapse tables (`state_groups_state` , `events` , `receipts` ) are not tuned for aggressive autovacuum. As the DB grows, bloat accumulates and queries slow down. Run on LXC 109 (PostgreSQL):
```sql
-- state_groups_state: biggest bloat source
ALTER TABLE state_groups_state SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 5,
autovacuum_naptime = 30
);
-- events: second priority
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_delay = 5,
autovacuum_naptime = 30
);
-- receipts and device_lists_stream
ALTER TABLE receipts SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_cost_delay = 5);
ALTER TABLE device_lists_stream SET (autovacuum_vacuum_scale_factor = 0.02);
ALTER TABLE presence_stream SET (autovacuum_vacuum_scale_factor = 0.02);
```
Also bump `autovacuum_max_workers` from 3 → 5:
```sql
ALTER SYSTEM SET autovacuum_max_workers = 5;
SELECT pg_reload_conf();
```
**Monitor vacuum health:**
```sql
SELECT relname, last_autovacuum, n_dead_tup, n_live_tup
FROM pg_stat_user_tables
WHERE relname IN ('events', 'state_groups_state', 'receipts')
ORDER BY n_dead_tup DESC;
```
---
### 7. Hookshot Metrics + Grafana
**Hookshot metrics** are exposed at `127.0.0.1:9001/metrics` but it's unconfirmed whether Prometheus at `10.10.10.49` is scraping them. Verify:
```bash
# On LXC 151
curl http://127.0.0.1:9001/metrics | head -20
```
If Prometheus is scraping, add the hookshot dashboard from the repo:
`contrib/hookshot-dashboard.json` → import into Grafana.
**Grafana Synapse dashboard** — Prometheus is already scraping Synapse at port 9000. Import the official dashboard:
- Grafana → Dashboards → Import → ID `18618` (Synapse Monitoring)
- Set Prometheus datasource → done
- Shows room count, message rates, federation lag, cache hit rates, DB query times in real time
---
### 8. Federation Security
Currently: open federation with key verification (correct for invite-only friends server). Recommended additions:
**Server-level allow/deny in `homeserver.yaml` ** (optional, for closing federation entirely):
```yaml
# Fully closed (recommended long-term for private guild):
federation_enabled: false
# OR: whitelist-only federation
federation_domain_whitelist:
- matrix.lotusguild.org
- matrix.org # Keep if bridging needed
```
**Per-room ACLs** for reactive blocking of specific bad servers:
```json
{
"type": "m.room.server_acl",
"content": {
"allow": ["*"],
"deny": ["spam.example.com"]
}
}
```
**Mjolnir/Draupnir** (already on roadmap) handles this automatically with ban list subscriptions (t2bot spam lists etc).
---
### 9. Sygnal Push Notifications
Sygnal is the official Matrix push gateway for mobile (Element X on iOS/Android). Without it, notifications don't arrive when the app is backgrounded.
**Requirements:**
- Apple Developer account (APNS cert) for iOS
- Firebase project (FCM API key) for Android
- New LXC or run alongside existing services
**Basic config (`/etc/sygnal/sygnal.yaml` ):**
```yaml
server:
port: 8765
database:
type: postgresql
user: sygnal
password: <password>
database: sygnal
apps:
com.element.android:
type: gcm
api_key: <FCM_API_KEY>
im.riot.x.ios:
type: apns
platform: production
certfile: /etc/sygnal/apns/element-x-cert.pem
topic: im.riot.x.ios
```
**Synapse integration:**
```yaml
# homeserver.yaml
push:
push_gateways:
- url: "http://localhost:8765"
```
---
### 10. LiveKit VP9/AV1 + Dynacast (Quality Improvement)
Currently H264 only. Enabling VP9/AV1 unlocks Dynacast (pauses video layers no one is watching) which significantly reduces bandwidth/CPU for low-viewer rooms.
**`/etc/livekit/config.yaml` additions:**
```yaml
video:
codecs:
- mime: video/H264
fmtp: "level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01e"
- mime: video/VP9
fmtp: "profile=0"
- mime: video/AV1
fmtp: "profile=0"
dynacast: true
```
Note: Dynacast only works with VP9 or AV1 (SVC-capable codecs). H264 subscribers continue to work normally alongside VP9/AV1 subscribers.
---
### 11. Synapse Workers (Future Scaling Reference)
Current single-process handles ~100– 300 concurrent users before the Python GIL becomes the bottleneck. Not needed now, but documented for when usage grows.
**Stage 1 trigger:** Synapse CPU >80% consistently, or >200 concurrent users.
**First workers to add:**
```yaml
# /etc/matrix-synapse/workers/client-reader-1.yaml
worker_app: synapse.app.client_reader
worker_name: client-reader-1
worker_listeners:
- type: http
port: 8011
resources: [{names: [client]}]
```
Add `federation_sender` next (off-loads outgoing federation from main process). Then `event_creator` for write-heavy loads. Redis required at Stage 2 (500+ users) for inter-worker coordination.
---
docs: update README for Phase 6 — monitoring, observability, alert rules
- Add Prometheus and Grafana to infrastructure table
- Update port map: Hookshot metrics on 9004, node_exporter on 9100, LiveKit metrics on 6789
- Add PostgreSQL LXC port map
- Update monitoring checklist — all Prometheus/Grafana items now complete
- Mark Hookshot metrics audit item as resolved
- Add Storj node outdated to admin checklist
- Add full Monitoring & Observability section:
- Prometheus scrape jobs table (synapse, livekit, hookshot, matrix-node, postgres, postgres-node)
- Grafana dashboard section listing all 21 panel groups
- Alert rules tables (Matrix + Infrastructure folders, Prometheus rules)
- /sync long-poll false positive note
- Known alert watch items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:30:03 -04:00
---
## Monitoring & Observability (March 2026)
### Prometheus Scrape Jobs
All Matrix-related services scraped by Prometheus at `10.10.10.48` (LXC 118):
| Job | Target | Metrics |
|-----|--------|---------|
| `synapse` | `10.10.10.29:9000` | Full Synapse internals (events, federation, caches, DB, HTTP) |
| `matrix-admin` | `10.10.10.29:9101` | DAU, MAU, room/user/media totals |
| `livekit` | `10.10.10.29:6789` | Rooms, participants, packets, forward latency, quality |
| `hookshot` | `10.10.10.29:9004` | Connections by service, API calls/failures, Node.js runtime |
| `matrix-node` | `10.10.10.29:9100` | CPU, RAM, network, disk space, load avg (Matrix LXC host) |
| `postgres` | `10.10.10.44:9187` | pg_stat_database, connections, WAL, block I/O |
| `postgres-node` | `10.10.10.44:9100` | CPU, RAM, network, disk space, load avg (PostgreSQL LXC host) |
| `postgres-exporter-2` | `10.10.10.160:9711` | Secondary postgres exporter |
> **Disk I/O note:** All servers use Ceph-backed storage. Per-device disk I/O metrics are meaningless; use Network I/O panels to see actual storage traffic.
### Grafana Dashboard
**URL:** `https://dashboard.lotusguild.org/d/matrix-synapse-dashboard/matrix-synapse`
140+ panels across 18 sections:
| Section | Key panels |
|---------|-----------|
| Synapse Overview | Up status, users, rooms, DAU/MAU, media, federation peers |
| Synapse Process Health | CPU, memory, FDs, thread pool, GC, Twisted reactor |
| HTTP API Requests | Rate, response codes, p99/p50 latency, in-flight, DB txn time |
| Federation | Outgoing/incoming PDUs, queue depth, staging, known servers |
| Events & Rooms | Event persistence, notifier, sync responses |
| Presence & Push | Presence updates, pushers, state transitions |
| Rate Limiting | Rejections, sleeps, queue wait time p99 |
| Users & Registration | Login rate, registration rate, growth over time |
| Synapse Database Performance | Txn rate/duration, schedule latency, query latency |
| Synapse Caches | Hit rate (top 5), sizes, evictions, response cache |
| Event Processing & Lag | Lag by processor, stream positions, event fetch ongoing |
| State Resolution | Forward extremities, state resolution CPU, state groups |
| App Services (Hookshot) | Events sent, transactions sent vs failed |
| HTTP Push | Push processed vs failed, badge updates |
| Sliding Sync & Slow Endpoints | Sliding sync p99, slowest endpoints, rate limit wait |
| Background Processes | In-flight by name, start rate, CPU, scheduler tasks |
| PostgreSQL Database | Size, connections, transactions, block I/O, WAL, locks |
| LiveKit SFU | Rooms, participants, network, packets out/dropped, forward latency |
| Hookshot | Matrix API calls/failures, active connections, Node.js event loop lag |
| Matrix LXC Host | CPU, RAM, network (incl. Ceph), load average, disk space |
| PostgreSQL LXC Host | CPU, RAM, network (incl. Ceph), load average, disk space |
### Alert Rules
All alerts are Grafana-native (Alerting → Alert Rules). Current active rules:
**Matrix folder (`matrix-folder` ):**
| Alert | Fires when | Severity |
|-------|-----------|----------|
| Synapse Down | `up{job="synapse"}` < 1 for 2m | critical |
| PostgreSQL Down | `pg_up` < 1 for 2m | critical |
| LiveKit Down | `up{job="livekit"}` < 1 for 2m | critical |
| Hookshot Down | `up{job="hookshot"}` < 1 for 2m | critical |
| PG Connection Saturation | connections > 80% of max for 5m | warning |
| Federation Queue Backing Up | pending PDUs > 100 for 10m | warning |
| Synapse High Memory | RSS > 2000MB for 10m | warning |
| Synapse High Response Time | p99 latency (excl. /sync) > 10s for 5m | warning |
| Synapse Event Processing Lag | any processor > 30s behind for 5m | warning |
| Synapse DB Query Latency High | p99 query time > 1s for 5m | warning |
**Infrastructure folder (`infra-folder` ):**
| Alert | Fires when | Severity |
|-------|-----------|----------|
| Service Exporter Down | any `up == 0` for 3m | critical |
| Node High CPU Usage | CPU > 90% for 10m | warning |
| Node High Memory Usage | RAM > 90% for 10m | warning |
| Node Disk Space Low | available < 15% (excl. tmpfs/overlay) for 10m | warning |
**Prometheus rules (`/etc/prometheus/prometheus_rules.yml` ):**
| Alert | Fires when |
|-------|-----------|
| InstanceDown | any `up == 0` for 1m |
| DiskSpaceFree10Percent | available < 10% (excl. tmpfs/overlay) for 5m |
> **`/sync` long-poll note:** The Matrix `/sync` endpoint is a long-poll (clients hold it open ≤30s). It is excluded from the High Response Time alert to prevent false positives. Without exclusion, p99 reads ~10s even when the server is healthy.
### Known Alert False Positives / Watch Items
- **Synapse Event Processing Lag** — can fire transiently after Synapse restart while processors catch up on backlog. Self-resolves in 10– 20 minutes. If it grows continuously (>10 min) and doesn't plateau, restart Synapse.
- **Node Disk Space Low** — excludes `tmpfs` , `overlay` , `squashfs` , `devtmpfs` , and `/boot` /`/run` mounts. If new filesystem types appear, add them to the `fstype!~` filter in the rule.
---
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
## Bot Checklist
### Core
- [x] matrix-nio async client with E2EE
- [x] Device trust (auto-trust all devices)
- [x] Graceful shutdown (SIGTERM/SIGINT)
- [x] Initial sync token (ignores old messages on startup)
- [x] Auto-accept room invites
- [x] Deployed as systemd service (`matrixbot.service` ) on LXC 151
docs: comprehensive March 2026 audit and applied fixes
- Add Improvement Audit section tracking all identified gaps and their status
- All critical/high/medium items applied: coturn cert auto-renewal (sync cron
on compute-storage-01), Synapse metrics port locked to 127.0.0.1+10.10.10.29,
well-known matrix endpoints live on lotusguild.org, suppress_key_server_warning,
fail2ban on login endpoint, PostgreSQL autovacuum per-table tuning, LiveKit
VP9/AV1 codecs
- Bot E2EE reset: full store+credentials wipe, stale devices removed, fresh
device BBRZSEUECZ registered
- Checklist updated: LiveKit port range, autovacuum, hardening items, Grafana IP
- Hookshot: Owncast renamed to Livestream in display name (same UUID)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 13:44:53 -04:00
- [x] Fix E2EE key errors — full store + credentials wipe, fresh device registration (`BBRZSEUECZ` ); stale devices removed via admin API
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
### Commands
- [x] `!help` — list commands
- [x] `!ping` — latency check
- [x] `!8ball <question>` — magic 8-ball
- [x] `!fortune` — fortune cookie
- [x] `!flip` — coin flip
- [x] `!roll <NdS>` — dice roller
- [x] `!random <min> <max>` — random number
- [x] `!rps <choice>` — rock paper scissors
- [x] `!poll <question>` — poll with reactions
- [x] `!trivia` — trivia game (reactions, 30s reveal)
- [x] `!champion [lane]` — random LoL champion
- [x] `!agent [role]` — random Valorant agent
- [x] `!wordle` — full Wordle game (daily, hard mode, stats, share)
- [x] `!minecraft <username>` — RCON whitelist add
- [x] `!ask <question>` — Ollama LLM (lotusllm, 2min cooldown)
- [x] `!health` — bot uptime + service status
### Welcome System
2026-02-20 10:31:01 -05:00
- [x] Watches Space joins and DMs new members automatically
- [x] React-to-join: react with ✅ in DM → bot invites to General, Commands, Memes
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
- [x] Welcome event ID persisted to `welcome_state.json`
### Wordle
- [x] Daily puzzles with two-pass letter evaluation
- [x] Hard mode with constraint validation
- [x] Stats persistence (`wordle_stats.json` )
- [x] Cinny-compatible rendering (inline `<span>` tiles)
- [x] DM-based gameplay, `!wordle share` posts result to public room
- [x] Virtual keyboard display
2026-02-11 00:06:21 -05:00
2026-02-11 00:12:19 -05:00
---
2026-02-11 00:06:21 -05:00
## Tech Stack
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
docs: comprehensive infrastructure audit, Cinny deployment, and optimization pass
- Fix PostgreSQL IP (10.10.10.44, not .2) and add all LXC resource/version details
- Add Cinny Web LXC 106 (10.10.10.6) — Debian 13, Cinny 4.10.5 from add-joined-call-controls
branch, nginx, HA enabled, weekly auto-update cron, NPM proxy with SSL
- Add Uptime Kuma LXC 101 (10.10.10.25) to infrastructure table
- Add full internal port map, running service table, and key paths for all LXCs
- Apply and document all optimizations:
- PostgreSQL: shared_buffers 128MB→1500MB, tuning conf, pg_stat_statements, pg_hba lockdown
- LiveKit: ICE ports 400→1001, TURN TTL 24h→1h, room empty/departure timeouts
- Synapse: event_cache_size 15K→30K, _get_state_group_for_events cache factor
- sysctl: TCP/UDP buffer alignment on LXC 151 (BBR noted as host-level only)
- Bot: nio_store cleared, E2EE key errors resolved
- Add 7 Uptime Kuma monitors (Synapse, LiveKit, PostgreSQL, Cinny, coturn, lk-jwt, hookshot)
- Add Draupnir and Cinny branding as upcoming TODO items
- Update priority order, checklists, and Known Issues throughout
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:16:44 -04:00
| Component | Technology | Version |
|-----------|-----------|---------|
| Bot language | Python 3 | 3.x |
| Bot library | matrix-nio (E2EE) | latest |
| Homeserver | Synapse | 1.148.0 |
| Database | PostgreSQL | 17.9 |
| TURN | coturn | latest |
| Video/voice calls | LiveKit SFU | 1.9.11 |
| LiveKit JWT | lk-jwt-service | latest |
| SSO | Authelia (OIDC) + LLDAP | — |
| Webhook bridge | matrix-hookshot | 7.3.2 |
| Reverse proxy | Nginx Proxy Manager | — |
| Web client | Cinny (custom build, `add-joined-call-controls` branch) | 4.10.5+ |
| Bot dependencies | matrix-nio[e2ee], aiohttp, python-dotenv, mcrcon | — |
Add Wordle, welcome system, integrations, and update roadmap
- Add Wordle game engine with daily puzzles, hard mode, stats, and share
- Add welcome module (react-to-join onboarding, Space join DMs)
- Add Ollama LLM integration (!ask), Minecraft RCON whitelist (!minecraft)
- Add !trivia, !champion, !agent, !health commands
- Add DM routing for Wordle (games in DMs, share to public room)
- Update README: reflect Phase 4 completion, hookshot webhook setup,
infrastructure migration (LXC 151/109 to large1), Spam and Stuff room,
all 12 webhook connections with UUIDs and transform notes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:29:36 -05:00
## Bot Files
```
matrixBot/
├── bot.py # Entry point, client setup, event loop
├── callbacks.py # Message + reaction event handlers
├── commands.py # All command implementations
├── config.py # Environment config + validation
├── utils.py # send_text, send_html, send_reaction, get_or_create_dm
├── welcome.py # Welcome message + react-to-join logic
├── wordle.py # Full Wordle game engine
├── wordlist_answers.py # Wordle answer word list
├── wordlist_valid.py # Wordle valid guess word list
├── .env.example # Environment variable template
└── requirements.txt # Python dependencies
```