Commit Graph

9 Commits

Author SHA1 Message Date
271c3c4373 Exclude LXC IPs from link stats collection
Add links_exclude_ips to monitor config; collect() skips any Prometheus
instance whose IP is in that list, preventing LXC containers from
appearing on the links/inspector pages as phantom hosts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 20:39:47 -04:00
b80fda7cb2 Fix host filtering: only show/monitor configured hosts; add PBS
- _collect_snapshot() and _process_interfaces() now skip any Prometheus
  instance not explicitly listed in config.json hosts[]. LXC app servers
  (postgresql, matrix, etc.) report node_exporter metrics but are not
  infrastructure hosts Gandalf should display or alert on.
- Add PBS (10.10.10.3) to config hosts[] with prometheus_instance;
  remove from ping_hosts (node_exporter already running on PBS, now
  added to Prometheus scrape config as job pbs-node).
- The _instance_map membership check is now consistent across snapshot,
  alerting, and ethtool SSH collection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 17:17:40 -04:00
fa7512a2c2 feat: terminal aesthetic rewrite + link debug page
- Full dark terminal aesthetic (Pulse/TinkerTickets style):
  - #0a0a0a background, #00ff41 green, #ffb000 amber, #00ffff cyan
  - CRT scanline overlay, phosphor glow, ASCII corner pseudoelements
  - Bracket-notation badges [CRITICAL], monospace font throughout
  - style.css, base.html, index.html, suppressions.html all rewritten

- New Link Debug page (/links, /api/links):
  - Per-host, per-interface cards with speed/duplex/port type/auto-neg
  - Traffic bars (TX cyan, RX green) with rate labels
  - Error/drop counters, carrier change history
  - SFP/DOM optical panel: vendor, temp, voltage, bias, TX/RX power dBm bars
  - RX-TX delta shown; color-coded warn/crit thresholds
  - Auto-refresh every 60s, anchor-jump to #hostname

- LinkStatsCollector in monitor.py:
  - SSHes to each host (one connection, all ifaces batched)
  - Parses ethtool + ethtool -m (SFP DOM) output
  - Merges with Prometheus traffic/error/carrier metrics
  - Stores as link_stats in monitor_state table

- config.json: added ssh section for ethtool collection
- app.js: terminal chip style consistency (uppercase, ● bullet)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 12:43:11 -05:00
0c0150f698 Complete rewrite: full-featured network monitoring dashboard
- Two-service architecture: Flask web app (gandalf.service) + background
  polling daemon (gandalf-monitor.service)
- Monitor polls Prometheus node_network_up for physical NIC states on all
  6 hypervisors (added storage-01 at 10.10.10.11:9100)
- UniFi API monitoring for switches, APs, and gateway device status
- Ping reachability for hosts without node_exporter (pbs only now)
- Smart baseline: interfaces first seen as down are never alerted on;
  only UP→DOWN regressions trigger tickets
- Cluster-wide P1 ticket when 3+ hosts have genuine simultaneous
  interface regressions (guards against false positives on startup)
- Tinker Tickets integration with 24-hour hash-based deduplication
- Alert suppression: manual toggle or timed windows (30m/1h/4h/8h)
- Authelia SSO via forward-auth headers, admin group required
- Network topology: Internet → UDM-Pro → Agg Switch (10G DAC) →
  PoE Switch (10G DAC) → Hosts
- MariaDB schema, suppression management UI, host/interface cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 23:03:18 -05:00
68beb7b1c4 more dynamic 2025-02-08 00:11:28 -05:00
576352819e identify device by ip in api 2025-02-07 21:53:12 -05:00
84679a49bb integrated with unifi api 2025-02-07 20:55:38 -05:00
112838d33a added troub 2025-02-07 20:31:56 -05:00
7b70f4a980 added functionality 2025-01-04 01:42:16 -05:00