Commit Graph

30 Commits

Author SHA1 Message Date
jared 9d6583a08a Add LDAP avatar photos, UX polish, and TDS component upgrades
Lint / Python (flake8) (push) Successful in 1m13s
Lint / JS (eslint) (push) Successful in 9s
Security / Python Security (bandit) (push) Failing after 45s
Test / Python Tests (pytest) (push) Successful in 57s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 5s
- Add /api/avatar endpoint querying lldap for user jpegPhoto; disk cache
  with sentinel pattern avoids repeat LDAP hits for users without photos
- Add ldap3 dependency and ldap config block to config.json
- Wire lt-avatar img overlay in base.html with capture-phase error
  fallback (lt-avatar-img-err) to reveal initials when image is absent
- Fix lt-avatar CSS shim: position:relative + absolute inset on img
  (local base.css was missing these; added to style.css)
- Replace all empty-state paragraphs with proper lt-empty-state markup
  (icon + title + body) across index, suppressions, inspector, app.js
- Add lt-spinner--cyan next to refresh button; shows during refreshAll()
- Replace inspector panel-section-title with lt-divider throughout
- Add data-tooltip attributes to SFP DOM metrics, TX/RX/Carrier/Duplex/
  Auto-neg/Error labels in links.html and inspector panel
- Add tooltips to events table column headers (Sev, First Seen, Failures)
- Fix links.html host panel timestamp (was reading sample.updated which
  is always undefined; now uses data.updated)
- Fix UniFi status text casing (Online→ONLINE to match server render)
- Remove dead topo-status-* class manipulation from updateTopology()
- Always render alert-count-badge; toggle display:none when count is 0
- Fix double UniFi get_devices() call in monitor.py run loop
- Fix chip-critical animation (was using green pulse-glow; now red)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 21:09:56 -04:00
jared 29267c9933 Integrate test code improvements using web_template components
Lint / Python (flake8) (push) Successful in 45s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 41s
Test / Python Tests (pytest) (push) Successful in 52s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
lt-alert:
- Replace custom .stale-banner with lt-alert lt-alert--warning in app.js
  and links.html; remove stale-banner CSS, reuse lt-alert margin rule

lt-progress:
- Replace custom .traffic-bar-track/.traffic-bar-fill in links.html with
  lt-progress from base.css; TX uses default (orange), RX uses --cyan,
  both flip to --red when utilisation >85% (trafficBarClass helper)
- Keep traffic layout classes (.traffic-section/.traffic-row etc.) for structure

Suppression type badges:
- Map target_type to distinct badge colors: host→badge-warning (orange),
  interface→badge-info (cyan), unifi_device→badge-purple (new alias using
  --accent-purple from base.css), all→badge-critical (red)
- Applied in both server-rendered table (Jinja2 dict lookup) and
  renderActiveRows() JS

Topology animated down-wire:
- Add data-host attribute to .topo-v2-wire-10g/.topo-v2-wire-1g elements
- updateTopology() toggles .wire-down class on the 10G drop-wire when
  host.status === 'down'
- .wire-down CSS: animated repeating-linear-gradient dashed red line
  via wire-dash-anim @keyframes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 23:37:47 -04:00
jared 03375ef22f Remove all inline event handlers; replace with data-action delegation
Lint / Python (flake8) (push) Successful in 39s
Lint / JS (eslint) (push) Successful in 6s
Security / Python Security (bandit) (push) Successful in 50s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
- inspector.html: onclick on port blocks, close button, run-diagnostic button,
  and diag-toggle sections all converted to data-action attributes; single
  delegated click listener handles all cases + Escape key closes panel
- links.html: onclick on panel title headers, Collapse All, Expand All
  converted to data-action with delegated listener
- suppressions.html: onsubmit/onchange wired via addEventListener at init
- index.html: onsubmit/onchange on suppress modal form wired at init

No behavioural changes — pure event-handling refactor for TDS compliance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 17:53:48 -04:00
jared c025da85c1 Audit quick wins: null guard, API error toasts, aria-labels on suppress buttons
- app.js: guard events array with || [] before .filter() to prevent crash on null
- app.js: show warning toast when /api/network or /api/status fail (was silent)
- app.js: add aria-label to all dynamically-generated suppress buttons
- index.html: add aria-label to server-rendered suppress buttons

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 17:50:00 -04:00
jared c45dd007d1 Fix field name mismatches, add events filter, in-place suppression refresh
Lint / Python (flake8) (push) Failing after 50s
Lint / JS (eslint) (push) Successful in 7s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
Security / Python Security (bandit) (push) Failing after 59s
- links.html: fix all field name bugs (auto_negotiation→autoneg, full_duplex,
  tx/rx_errors/drops_per_sec→_rate, tx/rx_bytes_per_sec→_rate, poe_total_w/poe_max_w
  computed from ports, renderUnifiSwitches uses top-level updated timestamp)
- suppressions.html: in-place DOM refresh after create/remove (no page reload),
  datalist autocomplete for target names, form reset after submit
- inspector.html: ESC key closes detail panel via lt.keys.on
- index.html: events filter bar with search input + severity pills (All/Critical/Warning),
  MutationObserver re-applies filter after dynamic updates
- style.css: g-section-actions, events-filter-bar, sev-pills layout
- app.js/db.py/monitor.py: carry forward prior session fixes (Promise.allSettled,
  daemon_ok, stale connection handling, double Prometheus call, self.cfg fix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 23:35:02 -04:00
jared 293edd674e Integrate TDS v1.2 lt.* APIs throughout app
Lint / Python (flake8) (push) Failing after 58s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Failing after 44s
Test / Python Tests (pytest) (push) Successful in 1m24s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
- app.js: replace raw fetch/escHtml/fmtRelTime with lt.api, lt.escHtml, lt.time.ago; modal open/close via lt.modal; add _toIso() for timestamp normalisation
- index.html: data-action="refresh", data-duration pills, lt.autoRefresh.start, remove local fmtRelTime
- suppressions.html: lt.api.post/delete, data-dur pill delegation
- base.html: user avatar with initials, admin badge, lt.keys.on('r') replaces manual keydown handler
- base.css: add dot-*, chip, row-state aliases so apps can use unprefixed class names

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 23:46:44 -04:00
jared e05f1f6c55 Align base.html and modal with tinker_tickets reference implementation
Lint / Notify on failure (push) Blocked by required conditions
Lint / Python (flake8) (push) Failing after 1m12s
Lint / JS (eslint) (push) Successful in 13s
Security / Python Security (bandit) (push) Failing after 1m14s
Test / Python Tests (pytest) (push) Failing after 24m3s
Lint / Deploy (push) Failing after 12m10s
- Add lt-glitch + data-text to brand title (signature glitch effect)
- Add mobile nav drawer (lt-nav-drawer) and hamburger button (lt-menu-btn)
- Add VT323 font, theme toggle button (lt-theme-btn), footer with hints
- Add command palette overlay + lt.cmdPalette.init() with nav commands
- Add keyboard shortcuts help modal and skip link
- Move base.js to <head> so lt.* is available for inline scripts
- Fix suppress modal: lt-modal-backdrop → lt-modal-overlay (base.css class)
- Fix modal open/close: use .is-open / aria-hidden instead of inline style

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 21:42:04 -04:00
jared e8de40250a Restructure app to use LotusGuild Terminal Design System v1.2
Lint / Python (flake8) (push) Failing after 45s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Failing after 1m22s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
Replace custom phosphor-green terminal aesthetic with the lt-* component
system from base.css/base.js. All templates now inherit the LotusGuild
multi-accent Anduril palette via variable aliases in style.css, and use
lt-header, lt-nav, lt-card, lt-table, lt-btn, lt-modal, lt-badge etc.
Custom components (topology, inspector chassis, link debug, SFP panels)
are preserved with color values updated to base.css palette variables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 21:01:20 -04:00
jared e2b65db2fc Add pagination to event queries, input validation, daily event purge
- get_active_events() now takes limit/offset (default 200) to cap unbounded queries
- count_active_events() added to return total for pagination display
- /api/events supports ?limit=, ?offset=, ?status= query params (max 1000)
- /api/status includes total_active count alongside paginated events list
- index() route passes total_active to template for server-side truncation notice
- Show "Showing X of Y" notice in dashboard when events are truncated
- Suppression POST validates: reason ≤500 chars, target_name/detail ≤255 chars
- _purge_old_jobs_loop runs purge_old_resolved_events(90d) once per day

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 20:32:32 -04:00
jared 2c67944b4b Fix topology chain order and inspector SFP port width
Topology:
- Correct series layout: UDM-Pro → USW-Agg → Pro 24 PoE (not a fork)
- Remove CSS fork divs, replace with straight vertical connectors
- Labels: WAN · 10G SFP+ (UDM→Agg), 10G trunk (Agg→PoE)
- Remove ISL from legend (no parallel switch pair)

Inspector:
- Fix USW-Agg port blocks appearing narrower than other switches
- SFP ports in rows now use same width (34px) as copper ports;
  all-SFP switches like USL8A no longer look undersized

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:42:38 -04:00
jared e8314b5ba3 Fix topology diagram: replace SVG fork with CSS, fix line alignment
- Remove SVG fork with preserveAspectRatio="none" (caused line width
  distortion and stretched 10G DAC label like a tube TV)
- Replace with pure CSS .topo-fork: stem + horizontal bar + left/right
  drops, all absolutely positioned at consistent 2px width
- Use .topo-sw-row with two 50% halves so switch centres land at
  exactly 25% and 75% — matching fork drop positions mathematically
- ISL rendered via ::before/::after on .topo-sw-row (switch boxes
  with solid bg cover the line at their edges, leaving only the gap)
- Add .topo-sw-drops: two vertical stubs from switch centres to bus rails
- All lines are now exactly 2px, no distortion, no misalignment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:35:02 -04:00
jared 3dce602938 Redesign topology diagram with dual-homed bus layout and improve inspector chassis
- Replace flat topology with tiered bus-bar layout: Internet → UDM-Pro → SVG fork → USW-Agg + Pro 24 PoE → dual-homed servers
- Show 10G VLAN90 (Ceph) bus from USW-Agg and 1G DHCP management bus from Pro 24 PoE per host
- Add per-host drop wires (solid 10G + dashed 1G) with correct rack positions
- Mark large1 as off-rack (dashed border), ZimaBoards as off-rack mon-01/mon-02
- Add topology legend, inter-switch 10G ISL indicator
- Add recently resolved events section (last 24h) to dashboard
- Add last_seen column and relative timestamps to events table
- Add stale data banner when monitoring data >15 min old
- Improve inspector chassis with port speed labels, LLDP neighbor info, mounting ears, chassis legend
- Add duplex/speed mismatch warnings and carrier changes to path debug panel
- Bump updateTopology() to handle both topo-v2-status-* and topo-status-* classes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:22:19 -04:00
jared 6eb21055ef fix: topology — reflect VLAN90 Ceph network and DHCP management separation
10G SFP+ ports on USW-Agg are VLAN90 (10.10.90.x/24, static IPs, Ceph storage).
1G ports on Pro 24 PoE are DHCP management. Update topology to show this:
- USW-Agg sublabel shows VLAN90 · 10.10.90.x (cyan)
- Pro 24 PoE sublabel shows DHCP mgmt (cyan)
- Host sublabels changed from "10G+1G" to "VLAN90" for the 10G Agg connection
- 1G management band label updated to "← 1G DHCP mgmt (Pro 24 PoE) →"
- Add .topo-vlan-tag CSS for cyan VLAN annotation on switch nodes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:10:17 -04:00
jared f2541eb45c fix: topology — all servers dual-homed 10G+1G, show mgmt band
All rack servers (and large1 on table) have both a 10G link to USW-Agg
and a 1G management link to Pro 24 PoE. Update topology:
- Move all 6 hosts into single row (including large1)
- Update sublabels to "10G+1G" for all nodes
- large1 dashed-border (off-rack) with "table · 10G+1G"
- Add dashed amber "1G mgmt (PoE)" horizontal band above hosts
  to represent the PoE switch management connections
- 10G primary fan-out lines still drop from Agg switch above
- large1 primary line rendered as dashed green (off-rack run)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:08:48 -04:00
jared e779b21db4 feat: redesign network topology diagram with accurate rack layout
Replace linear Internet→UDM→Agg→PoE→all-hosts chain with accurate topology:
- USW-Aggregation and Pro 24 PoE switch shown side-by-side with horizontal
  10G SFP+ link between them (not in series)
- 5 compute/storage/monitor nodes fanned out under Agg Switch with 10G labels
  and rack unit positions (RU4–12, RU14–17) as sublabels
- large1 shown separately under PoE switch, dashed border = off-rack (table)
- Add device specs as subtitles on all nodes (Dream Machine Pro · RU24, etc.)
- Shorter display names: csg-01 / cs-01 instead of full hostnames
- Live status badges still updated by JS via data-host attributes
- New CSS: .topo-node-sub, .topo-switch-tier, .topo-h-link, .topo-host-tier,
  .topo-host-table (dashed), .topo-badge-unknown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:06:03 -04:00
jared 0ca6b1f744 feat: link health summary, recently resolved panel, event duration
- dashboard: pass recent_resolved (last 24h, limit 10) to index template;
  render "Recently Resolved" section showing type, target, resolved time,
  and calculated duration (first_seen → resolved_at)
- dashboard: event-age spans now also update via setInterval; duration
  shown for resolved events (e.g. "2h 15m")
- links page: link health summary panel shows server iface count,
  error/flap counts, switch port up/down, PoE total draw/capacity bar;
  only shows problematic stats if non-zero; shows "All OK ✔" when clean
- style.css: new classes for summary panel, resolved row/badge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 21:48:40 -04:00
jared 6b6eaa6227 feat: UI improvements — event ages, error badges, PoE bars, mismatch detection
- events table: add Last Seen column; show relative times ("3h ago") with
  absolute timestamp on hover; update updateEventsTable() in app.js to match
- links.html: add error/drop/flap alert badges to interface and port card headers
- links.html: PoE power bar (draw/max ratio with colour-coded fill) and poe_mode
- links.html: stale data warning banner when link_stats are >2 minutes old
- links.html: improved error handler shows HTTP status instead of generic message
- links.html: fix collapse state persisted to localStorage (was sessionStorage,
  lost on browser restart); fix collapseAll/expandAll to also persist state
- inspector.html: duplex mismatch and speed mismatch warnings in path debug panel
- inspector.html: carrier changes added to server column of path debug
- style.css: new classes — .link-alert-badge, .poe-bar-*, .path-mismatch-alert,
  .error-state; fix .stale-banner to use CSS variables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 21:46:11 -04:00
jared 14eaa6a8c9 De-hardcode ticket URL and cluster name; improve diagnostic polling UX
app.py:
- Context processor injects config.ticket_api.web_url into all templates
  (falls back to 'http://t.lotusguild.org/ticket/' if not set in config)

templates/base.html:
- Inject GANDALF_CONFIG JS global with ticket_web_url before app.js loads

static/app.js:
- Use GANDALF_CONFIG.ticket_web_url instead of hardcoded domain

templates/index.html:
- Use {{ config.ticket_api.web_url }} Jinja var instead of hardcoded domain

monitor.py:
- CLUSTER_NAME constant kept as default; NetworkMonitor now reads cluster_name
  from config monitor.cluster_name, falling back to the constant
- All CLUSTER_NAME references inside class methods replaced with self.cluster_name

templates/inspector.html:
- pollDiagnostic() .catch() now clears interval and shows error message instead
  of silently ignoring network failures during active polling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 14:31:57 -04:00
jared af26407363 Fix setDur implicit event, title XSS, hardcoded pulse URL, suppress error toast
- suppressions.html: setDur() now takes explicit element param instead of relying
  on implicit global event.target (which fails outside direct click handlers)
- suppressions.html: removeSuppression() now shows error toast on failed DELETE
- templates/index.html: escape description in title attribute with |e filter
  to prevent attribute breakout on quotes in description text
- diagnose.py: derive Pulse execution URL from pulse_client.url instead of
  hardcoding http://pulse.lotusguild.org

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 14:36:55 -04:00
jared 0278dad502 feat: inspector page, link debug enhancements, security hardening
- Add /inspector page: visual model-accurate switch chassis diagrams
  (USF5P, USL8A, US24PRO, USPPDUP, USMINI), clickable port blocks
  with color coding (green=up, amber=PoE, cyan=uplink, grey=down),
  detail panel with stats/PoE/LLDP, LLDP-based path debug side-by-side

- Link Debug: port number badges (#N), LLDP neighbor line, PoE class/max,
  collapsible host/switch panels with sessionStorage persistence

- monitor.py: collect LLDP neighbor map + PoE class/max/mode per switch
  port; PulseClient uses requests.Session() for HTTP keep-alive; add
  shlex.quote() around interface names (defense-in-depth)

- Security: suppress buttons use data-* attrs + delegated click handler
  instead of inline onclick with Jinja2 variable interpolation; remove
  | safe filter from user-controlled fields in suppressions.html;
  setDuration() takes explicit el param instead of implicit event global

- db.py: thread-local connection reuse with ping(reconnect=True) to
  avoid a new TCP handshake per query

- .gitignore: add config.json (contains credentials), __pycache__

- README: full rewrite covering architecture, all 4 pages, alert logic,
  config reference, deployment, troubleshooting, security notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:39:48 -05:00
jared fa7512a2c2 feat: terminal aesthetic rewrite + link debug page
- Full dark terminal aesthetic (Pulse/TinkerTickets style):
  - #0a0a0a background, #00ff41 green, #ffb000 amber, #00ffff cyan
  - CRT scanline overlay, phosphor glow, ASCII corner pseudoelements
  - Bracket-notation badges [CRITICAL], monospace font throughout
  - style.css, base.html, index.html, suppressions.html all rewritten

- New Link Debug page (/links, /api/links):
  - Per-host, per-interface cards with speed/duplex/port type/auto-neg
  - Traffic bars (TX cyan, RX green) with rate labels
  - Error/drop counters, carrier change history
  - SFP/DOM optical panel: vendor, temp, voltage, bias, TX/RX power dBm bars
  - RX-TX delta shown; color-coded warn/crit thresholds
  - Auto-refresh every 60s, anchor-jump to #hostname

- LinkStatsCollector in monitor.py:
  - SSHes to each host (one connection, all ifaces batched)
  - Parses ethtool + ethtool -m (SFP DOM) output
  - Merges with Prometheus traffic/error/carrier metrics
  - Stores as link_stats in monitor_state table

- config.json: added ssh section for ethtool collection
- app.js: terminal chip style consistency (uppercase, ● bullet)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 12:43:11 -05:00
jared 0c0150f698 Complete rewrite: full-featured network monitoring dashboard
- Two-service architecture: Flask web app (gandalf.service) + background
  polling daemon (gandalf-monitor.service)
- Monitor polls Prometheus node_network_up for physical NIC states on all
  6 hypervisors (added storage-01 at 10.10.10.11:9100)
- UniFi API monitoring for switches, APs, and gateway device status
- Ping reachability for hosts without node_exporter (pbs only now)
- Smart baseline: interfaces first seen as down are never alerted on;
  only UP→DOWN regressions trigger tickets
- Cluster-wide P1 ticket when 3+ hosts have genuine simultaneous
  interface regressions (guards against false positives on startup)
- Tinker Tickets integration with 24-hour hash-based deduplication
- Alert suppression: manual toggle or timed windows (30m/1h/4h/8h)
- Authelia SSO via forward-auth headers, admin group required
- Network topology: Internet → UDM-Pro → Agg Switch (10G DAC) →
  PoE Switch (10G DAC) → Hosts
- MariaDB schema, suppression management UI, host/interface cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 23:03:18 -05:00
jared 004c97f492 interface update 2025-02-08 00:32:25 -05:00
jared 610f55710d updated index html 2025-02-07 23:51:13 -05:00
jared 067ce4d316 update html 2025-02-07 23:38:49 -05:00
jared 3fac013088 added diag content 2025-02-07 21:33:55 -05:00
jared 4318dcd0d2 Added interface status 2025-02-07 21:28:54 -05:00
jared d791312579 Update file structure for Flask 2025-02-07 21:22:43 -05:00
jared 21dfad35bf made everything static 2025-01-04 01:07:18 -05:00
jared 109dff1cd0 test 2025-01-04 00:33:04 -05:00