Topology:
- Correct series layout: UDM-Pro → USW-Agg → Pro 24 PoE (not a fork)
- Remove CSS fork divs, replace with straight vertical connectors
- Labels: WAN · 10G SFP+ (UDM→Agg), 10G trunk (Agg→PoE)
- Remove ISL from legend (no parallel switch pair)
Inspector:
- Fix USW-Agg port blocks appearing narrower than other switches
- SFP ports in rows now use same width (34px) as copper ports;
all-SFP switches like USL8A no longer look undersized
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove SVG fork with preserveAspectRatio="none" (caused line width
distortion and stretched 10G DAC label like a tube TV)
- Replace with pure CSS .topo-fork: stem + horizontal bar + left/right
drops, all absolutely positioned at consistent 2px width
- Use .topo-sw-row with two 50% halves so switch centres land at
exactly 25% and 75% — matching fork drop positions mathematically
- ISL rendered via ::before/::after on .topo-sw-row (switch boxes
with solid bg cover the line at their edges, leaving only the gap)
- Add .topo-sw-drops: two vertical stubs from switch centres to bus rails
- All lines are now exactly 2px, no distortion, no misalignment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace flat topology with tiered bus-bar layout: Internet → UDM-Pro → SVG fork → USW-Agg + Pro 24 PoE → dual-homed servers
- Show 10G VLAN90 (Ceph) bus from USW-Agg and 1G DHCP management bus from Pro 24 PoE per host
- Add per-host drop wires (solid 10G + dashed 1G) with correct rack positions
- Mark large1 as off-rack (dashed border), ZimaBoards as off-rack mon-01/mon-02
- Add topology legend, inter-switch 10G ISL indicator
- Add recently resolved events section (last 24h) to dashboard
- Add last_seen column and relative timestamps to events table
- Add stale data banner when monitoring data >15 min old
- Improve inspector chassis with port speed labels, LLDP neighbor info, mounting ears, chassis legend
- Add duplex/speed mismatch warnings and carrier changes to path debug panel
- Bump updateTopology() to handle both topo-v2-status-* and topo-status-* classes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All rack servers (and large1 on table) have both a 10G link to USW-Agg
and a 1G management link to Pro 24 PoE. Update topology:
- Move all 6 hosts into single row (including large1)
- Update sublabels to "10G+1G" for all nodes
- large1 dashed-border (off-rack) with "table · 10G+1G"
- Add dashed amber "1G mgmt (PoE)" horizontal band above hosts
to represent the PoE switch management connections
- 10G primary fan-out lines still drop from Agg switch above
- large1 primary line rendered as dashed green (off-rack run)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace linear Internet→UDM→Agg→PoE→all-hosts chain with accurate topology:
- USW-Aggregation and Pro 24 PoE switch shown side-by-side with horizontal
10G SFP+ link between them (not in series)
- 5 compute/storage/monitor nodes fanned out under Agg Switch with 10G labels
and rack unit positions (RU4–12, RU14–17) as sublabels
- large1 shown separately under PoE switch, dashed border = off-rack (table)
- Add device specs as subtitles on all nodes (Dream Machine Pro · RU24, etc.)
- Shorter display names: csg-01 / cs-01 instead of full hostnames
- Live status badges still updated by JS via data-host attributes
- New CSS: .topo-node-sub, .topo-switch-tier, .topo-h-link, .topo-host-tier,
.topo-host-table (dashed), .topo-badge-unknown
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- dashboard: pass recent_resolved (last 24h, limit 10) to index template;
render "Recently Resolved" section showing type, target, resolved time,
and calculated duration (first_seen → resolved_at)
- dashboard: event-age spans now also update via setInterval; duration
shown for resolved events (e.g. "2h 15m")
- links page: link health summary panel shows server iface count,
error/flap counts, switch port up/down, PoE total draw/capacity bar;
only shows problematic stats if non-zero; shows "All OK ✔" when clean
- style.css: new classes for summary panel, resolved row/badge
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- events table: add Last Seen column; show relative times ("3h ago") with
absolute timestamp on hover; update updateEventsTable() in app.js to match
- links.html: add error/drop/flap alert badges to interface and port card headers
- links.html: PoE power bar (draw/max ratio with colour-coded fill) and poe_mode
- links.html: stale data warning banner when link_stats are >2 minutes old
- links.html: improved error handler shows HTTP status instead of generic message
- links.html: fix collapse state persisted to localStorage (was sessionStorage,
lost on browser restart); fix collapseAll/expandAll to also persist state
- inspector.html: duplex mismatch and speed mismatch warnings in path debug panel
- inspector.html: carrier changes added to server column of path debug
- style.css: new classes — .link-alert-badge, .poe-bar-*, .path-mismatch-alert,
.error-state; fix .stale-banner to use CSS variables
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
app.py:
- Context processor injects config.ticket_api.web_url into all templates
(falls back to 'http://t.lotusguild.org/ticket/' if not set in config)
templates/base.html:
- Inject GANDALF_CONFIG JS global with ticket_web_url before app.js loads
static/app.js:
- Use GANDALF_CONFIG.ticket_web_url instead of hardcoded domain
templates/index.html:
- Use {{ config.ticket_api.web_url }} Jinja var instead of hardcoded domain
monitor.py:
- CLUSTER_NAME constant kept as default; NetworkMonitor now reads cluster_name
from config monitor.cluster_name, falling back to the constant
- All CLUSTER_NAME references inside class methods replaced with self.cluster_name
templates/inspector.html:
- pollDiagnostic() .catch() now clears interval and shows error message instead
of silently ignoring network failures during active polling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- suppressions.html: setDur() now takes explicit element param instead of relying
on implicit global event.target (which fails outside direct click handlers)
- suppressions.html: removeSuppression() now shows error toast on failed DELETE
- templates/index.html: escape description in title attribute with |e filter
to prevent attribute breakout on quotes in description text
- diagnose.py: derive Pulse execution URL from pulse_client.url instead of
hardcoding http://pulse.lotusguild.org
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Two-service architecture: Flask web app (gandalf.service) + background
polling daemon (gandalf-monitor.service)
- Monitor polls Prometheus node_network_up for physical NIC states on all
6 hypervisors (added storage-01 at 10.10.10.11:9100)
- UniFi API monitoring for switches, APs, and gateway device status
- Ping reachability for hosts without node_exporter (pbs only now)
- Smart baseline: interfaces first seen as down are never alerted on;
only UP→DOWN regressions trigger tickets
- Cluster-wide P1 ticket when 3+ hosts have genuine simultaneous
interface regressions (guards against false positives on startup)
- Tinker Tickets integration with 24-hour hash-based deduplication
- Alert suppression: manual toggle or timed windows (30m/1h/4h/8h)
- Authelia SSO via forward-auth headers, admin group required
- Network topology: Internet → UDM-Pro → Agg Switch (10G DAC) →
PoE Switch (10G DAC) → Hosts
- MariaDB schema, suppression management UI, host/interface cards
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>