- events table: add Last Seen column; show relative times ("3h ago") with
absolute timestamp on hover; update updateEventsTable() in app.js to match
- links.html: add error/drop/flap alert badges to interface and port card headers
- links.html: PoE power bar (draw/max ratio with colour-coded fill) and poe_mode
- links.html: stale data warning banner when link_stats are >2 minutes old
- links.html: improved error handler shows HTTP status instead of generic message
- links.html: fix collapse state persisted to localStorage (was sessionStorage,
lost on browser restart); fix collapseAll/expandAll to also persist state
- inspector.html: duplex mismatch and speed mismatch warnings in path debug panel
- inspector.html: carrier changes added to server column of path debug
- style.css: new classes — .link-alert-badge, .poe-bar-*, .path-mismatch-alert,
.error-state; fix .stale-banner to use CSS variables
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
static/app.js:
- Browser tab title updates to show alert count: '(3 CRIT) GANDALF' or '(2 WARN) GANDALF'
- Stale monitoring banner: injected into .main if last_check > 15 min old,
warns operator that the monitor daemon may be down
static/style.css:
- .stale-banner: amber top-border warning strip
app.py:
- /health now checks DB connectivity and monitor freshness (last_check age)
Returns 503 + degraded status if DB unreachable or monitor stale >20min
db.py:
- cleanup_expired_suppressions(): marks time-limited suppressions inactive when
expires_at <= NOW() (was only filtered in SELECTs, never marked inactive)
- purge_old_resolved_events(days=90): deletes old resolved events to prevent
unbounded table growth
monitor.py:
- Calls cleanup_expired_suppressions() and purge_old_resolved_events() each cycle
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
app.py:
- Context processor injects config.ticket_api.web_url into all templates
(falls back to 'http://t.lotusguild.org/ticket/' if not set in config)
templates/base.html:
- Inject GANDALF_CONFIG JS global with ticket_web_url before app.js loads
static/app.js:
- Use GANDALF_CONFIG.ticket_web_url instead of hardcoded domain
templates/index.html:
- Use {{ config.ticket_api.web_url }} Jinja var instead of hardcoded domain
monitor.py:
- CLUSTER_NAME constant kept as default; NetworkMonitor now reads cluster_name
from config monitor.cluster_name, falling back to the constant
- All CLUSTER_NAME references inside class methods replaced with self.cluster_name
templates/inspector.html:
- pollDiagnostic() .catch() now clears interval and shows error message instead
of silently ignoring network failures during active polling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Two-service architecture: Flask web app (gandalf.service) + background
polling daemon (gandalf-monitor.service)
- Monitor polls Prometheus node_network_up for physical NIC states on all
6 hypervisors (added storage-01 at 10.10.10.11:9100)
- UniFi API monitoring for switches, APs, and gateway device status
- Ping reachability for hosts without node_exporter (pbs only now)
- Smart baseline: interfaces first seen as down are never alerted on;
only UP→DOWN regressions trigger tickets
- Cluster-wide P1 ticket when 3+ hosts have genuine simultaneous
interface regressions (guards against false positives on startup)
- Tinker Tickets integration with 24-hour hash-based deduplication
- Alert suppression: manual toggle or timed windows (30m/1h/4h/8h)
- Authelia SSO via forward-auth headers, admin group required
- Network topology: Internet → UDM-Pro → Agg Switch (10G DAC) →
PoE Switch (10G DAC) → Hosts
- MariaDB schema, suppression management UI, host/interface cards
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>