Commit Graph

15 Commits

Author SHA1 Message Date
0ca6b1f744 feat: link health summary, recently resolved panel, event duration
- dashboard: pass recent_resolved (last 24h, limit 10) to index template;
  render "Recently Resolved" section showing type, target, resolved time,
  and calculated duration (first_seen → resolved_at)
- dashboard: event-age spans now also update via setInterval; duration
  shown for resolved events (e.g. "2h 15m")
- links page: link health summary panel shows server iface count,
  error/flap counts, switch port up/down, PoE total draw/capacity bar;
  only shows problematic stats if non-zero; shows "All OK ✔" when clean
- style.css: new classes for summary panel, resolved row/badge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 21:48:40 -04:00
6b6eaa6227 feat: UI improvements — event ages, error badges, PoE bars, mismatch detection
- events table: add Last Seen column; show relative times ("3h ago") with
  absolute timestamp on hover; update updateEventsTable() in app.js to match
- links.html: add error/drop/flap alert badges to interface and port card headers
- links.html: PoE power bar (draw/max ratio with colour-coded fill) and poe_mode
- links.html: stale data warning banner when link_stats are >2 minutes old
- links.html: improved error handler shows HTTP status instead of generic message
- links.html: fix collapse state persisted to localStorage (was sessionStorage,
  lost on browser restart); fix collapseAll/expandAll to also persist state
- inspector.html: duplex mismatch and speed mismatch warnings in path debug panel
- inspector.html: carrier changes added to server column of path debug
- style.css: new classes — .link-alert-badge, .poe-bar-*, .path-mismatch-alert,
  .error-state; fix .stale-banner to use CSS variables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 21:46:11 -04:00
14eaa6a8c9 De-hardcode ticket URL and cluster name; improve diagnostic polling UX
app.py:
- Context processor injects config.ticket_api.web_url into all templates
  (falls back to 'http://t.lotusguild.org/ticket/' if not set in config)

templates/base.html:
- Inject GANDALF_CONFIG JS global with ticket_web_url before app.js loads

static/app.js:
- Use GANDALF_CONFIG.ticket_web_url instead of hardcoded domain

templates/index.html:
- Use {{ config.ticket_api.web_url }} Jinja var instead of hardcoded domain

monitor.py:
- CLUSTER_NAME constant kept as default; NetworkMonitor now reads cluster_name
  from config monitor.cluster_name, falling back to the constant
- All CLUSTER_NAME references inside class methods replaced with self.cluster_name

templates/inspector.html:
- pollDiagnostic() .catch() now clears interval and shows error message instead
  of silently ignoring network failures during active polling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 14:31:57 -04:00
af26407363 Fix setDur implicit event, title XSS, hardcoded pulse URL, suppress error toast
- suppressions.html: setDur() now takes explicit element param instead of relying
  on implicit global event.target (which fails outside direct click handlers)
- suppressions.html: removeSuppression() now shows error toast on failed DELETE
- templates/index.html: escape description in title attribute with |e filter
  to prevent attribute breakout on quotes in description text
- diagnose.py: derive Pulse execution URL from pulse_client.url instead of
  hardcoding http://pulse.lotusguild.org

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 14:36:55 -04:00
0278dad502 feat: inspector page, link debug enhancements, security hardening
- Add /inspector page: visual model-accurate switch chassis diagrams
  (USF5P, USL8A, US24PRO, USPPDUP, USMINI), clickable port blocks
  with color coding (green=up, amber=PoE, cyan=uplink, grey=down),
  detail panel with stats/PoE/LLDP, LLDP-based path debug side-by-side

- Link Debug: port number badges (#N), LLDP neighbor line, PoE class/max,
  collapsible host/switch panels with sessionStorage persistence

- monitor.py: collect LLDP neighbor map + PoE class/max/mode per switch
  port; PulseClient uses requests.Session() for HTTP keep-alive; add
  shlex.quote() around interface names (defense-in-depth)

- Security: suppress buttons use data-* attrs + delegated click handler
  instead of inline onclick with Jinja2 variable interpolation; remove
  | safe filter from user-controlled fields in suppressions.html;
  setDuration() takes explicit el param instead of implicit event global

- db.py: thread-local connection reuse with ping(reconnect=True) to
  avoid a new TCP handshake per query

- .gitignore: add config.json (contains credentials), __pycache__

- README: full rewrite covering architecture, all 4 pages, alert logic,
  config reference, deployment, troubleshooting, security notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:39:48 -05:00
fa7512a2c2 feat: terminal aesthetic rewrite + link debug page
- Full dark terminal aesthetic (Pulse/TinkerTickets style):
  - #0a0a0a background, #00ff41 green, #ffb000 amber, #00ffff cyan
  - CRT scanline overlay, phosphor glow, ASCII corner pseudoelements
  - Bracket-notation badges [CRITICAL], monospace font throughout
  - style.css, base.html, index.html, suppressions.html all rewritten

- New Link Debug page (/links, /api/links):
  - Per-host, per-interface cards with speed/duplex/port type/auto-neg
  - Traffic bars (TX cyan, RX green) with rate labels
  - Error/drop counters, carrier change history
  - SFP/DOM optical panel: vendor, temp, voltage, bias, TX/RX power dBm bars
  - RX-TX delta shown; color-coded warn/crit thresholds
  - Auto-refresh every 60s, anchor-jump to #hostname

- LinkStatsCollector in monitor.py:
  - SSHes to each host (one connection, all ifaces batched)
  - Parses ethtool + ethtool -m (SFP DOM) output
  - Merges with Prometheus traffic/error/carrier metrics
  - Stores as link_stats in monitor_state table

- config.json: added ssh section for ethtool collection
- app.js: terminal chip style consistency (uppercase, ● bullet)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 12:43:11 -05:00
0c0150f698 Complete rewrite: full-featured network monitoring dashboard
- Two-service architecture: Flask web app (gandalf.service) + background
  polling daemon (gandalf-monitor.service)
- Monitor polls Prometheus node_network_up for physical NIC states on all
  6 hypervisors (added storage-01 at 10.10.10.11:9100)
- UniFi API monitoring for switches, APs, and gateway device status
- Ping reachability for hosts without node_exporter (pbs only now)
- Smart baseline: interfaces first seen as down are never alerted on;
  only UP→DOWN regressions trigger tickets
- Cluster-wide P1 ticket when 3+ hosts have genuine simultaneous
  interface regressions (guards against false positives on startup)
- Tinker Tickets integration with 24-hour hash-based deduplication
- Alert suppression: manual toggle or timed windows (30m/1h/4h/8h)
- Authelia SSO via forward-auth headers, admin group required
- Network topology: Internet → UDM-Pro → Agg Switch (10G DAC) →
  PoE Switch (10G DAC) → Hosts
- MariaDB schema, suppression management UI, host/interface cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 23:03:18 -05:00
004c97f492 interface update 2025-02-08 00:32:25 -05:00
610f55710d updated index html 2025-02-07 23:51:13 -05:00
067ce4d316 update html 2025-02-07 23:38:49 -05:00
3fac013088 added diag content 2025-02-07 21:33:55 -05:00
4318dcd0d2 Added interface status 2025-02-07 21:28:54 -05:00
d791312579 Update file structure for Flask 2025-02-07 21:22:43 -05:00
21dfad35bf made everything static 2025-01-04 01:07:18 -05:00
109dff1cd0 test 2025-01-04 00:33:04 -05:00