Commit Graph

72 Commits

Author SHA1 Message Date
f8395dcd24 Fix port_idx type coercion and add logging to silent except blocks
- port_idx now coerced to int() with 400 on invalid type (prevents string/int mismatch)
- api_network and api_links bare except blocks now log errors instead of silently passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:35:41 -04:00
0335845101 Security and reliability fixes: input validation, logging, job cleanup
- C5: Validate host_ip (IPv4 check) and iface (allowlist regex) before SSH command builder
- H6: Upgrade Pulse failure logging from debug to error so operators see outages
- M6: Replace per-request O(n) purge with background daemon thread (runs every 2 min)
- M7: Background thread marks jobs stuck in 'running' > 5 min as errored

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:30:50 -04:00
b1dd5f9cad feat: deep link diagnostics via Pulse SSH
Adds comprehensive per-port link troubleshooting triggered from the
Inspector panel when a port has an LLDP-identified server counterpart.

- diagnose.py: DiagnosticsRunner with 15-section SSH command (carrier,
  operstate, sysfs counters, ethtool, ethtool -i/-a/-g/-S/-m, ip link,
  ip addr, ip route, dmesg, lldpctl); parsers for all sections; health
  analyzer with 14 check codes (NO_CARRIER, HALF_DUPLEX, SPEED_MISMATCH,
  SFP_RX_CRITICAL, CARRIER_FLAPPING, CRC_ERRORS_HIGH, LLDP_MISMATCH, etc.)
- monitor.py: PulseClient now tracks last_execution_id so callers can
  link back to the raw Pulse execution URL
- app.py: POST /api/diagnose + GET /api/diagnose/<job_id> with daemon
  thread background execution and 10-minute in-memory job store
- inspector.html: "Run Link Diagnostics" button (shown only when LLDP
  host is resolvable); full results panel: health banner, physical layer,
  SFP/DOM with power bars, NIC error counters, collapsible ethtool -S,
  flow control/ring buffers, driver info, LLDP 2-col validation,
  collapsible dmesg, switch port summary, "View in Pulse" link
- style.css: all .diag-* CSS classes with terminal aesthetic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 16:03:54 -05:00
0278dad502 feat: inspector page, link debug enhancements, security hardening
- Add /inspector page: visual model-accurate switch chassis diagrams
  (USF5P, USL8A, US24PRO, USPPDUP, USMINI), clickable port blocks
  with color coding (green=up, amber=PoE, cyan=uplink, grey=down),
  detail panel with stats/PoE/LLDP, LLDP-based path debug side-by-side

- Link Debug: port number badges (#N), LLDP neighbor line, PoE class/max,
  collapsible host/switch panels with sessionStorage persistence

- monitor.py: collect LLDP neighbor map + PoE class/max/mode per switch
  port; PulseClient uses requests.Session() for HTTP keep-alive; add
  shlex.quote() around interface names (defense-in-depth)

- Security: suppress buttons use data-* attrs + delegated click handler
  instead of inline onclick with Jinja2 variable interpolation; remove
  | safe filter from user-controlled fields in suppressions.html;
  setDuration() takes explicit el param instead of implicit event global

- db.py: thread-local connection reuse with ping(reconnect=True) to
  avoid a new TCP handshake per query

- .gitignore: add config.json (contains credentials), __pycache__

- README: full rewrite covering architecture, all 4 pages, alert logic,
  config reference, deployment, troubleshooting, security notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:39:48 -05:00
fa7512a2c2 feat: terminal aesthetic rewrite + link debug page
- Full dark terminal aesthetic (Pulse/TinkerTickets style):
  - #0a0a0a background, #00ff41 green, #ffb000 amber, #00ffff cyan
  - CRT scanline overlay, phosphor glow, ASCII corner pseudoelements
  - Bracket-notation badges [CRITICAL], monospace font throughout
  - style.css, base.html, index.html, suppressions.html all rewritten

- New Link Debug page (/links, /api/links):
  - Per-host, per-interface cards with speed/duplex/port type/auto-neg
  - Traffic bars (TX cyan, RX green) with rate labels
  - Error/drop counters, carrier change history
  - SFP/DOM optical panel: vendor, temp, voltage, bias, TX/RX power dBm bars
  - RX-TX delta shown; color-coded warn/crit thresholds
  - Auto-refresh every 60s, anchor-jump to #hostname

- LinkStatsCollector in monitor.py:
  - SSHes to each host (one connection, all ifaces batched)
  - Parses ethtool + ethtool -m (SFP DOM) output
  - Merges with Prometheus traffic/error/carrier metrics
  - Stores as link_stats in monitor_state table

- config.json: added ssh section for ethtool collection
- app.js: terminal chip style consistency (uppercase, ● bullet)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 12:43:11 -05:00
4356af1d84 chore: remove deploy test line from README 2026-03-02 12:08:16 -05:00
56f86f6169 chore: test auto-deploy pipeline 2026-03-02 12:05:59 -05:00
4600229207 chore: clean up deploy test line from README 2026-03-02 12:00:46 -05:00
ff1edb5e0f chore: trigger deploy test 2026-03-02 11:58:42 -05:00
67072099ca docs: update README for storage-01 Prometheus migration
- storage-01 now monitored via Prometheus node_exporter (10.10.10.11:9100),
  removed from ping_hosts
- Updated data sources table (6 hosts via Prometheus, pbs only via ping)
- Added storage-01 to monitored hosts table
- Fixed Authelia reload command (restart, not reload)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 23:05:27 -05:00
0c0150f698 Complete rewrite: full-featured network monitoring dashboard
- Two-service architecture: Flask web app (gandalf.service) + background
  polling daemon (gandalf-monitor.service)
- Monitor polls Prometheus node_network_up for physical NIC states on all
  6 hypervisors (added storage-01 at 10.10.10.11:9100)
- UniFi API monitoring for switches, APs, and gateway device status
- Ping reachability for hosts without node_exporter (pbs only now)
- Smart baseline: interfaces first seen as down are never alerted on;
  only UP→DOWN regressions trigger tickets
- Cluster-wide P1 ticket when 3+ hosts have genuine simultaneous
  interface regressions (guards against false positives on startup)
- Tinker Tickets integration with 24-hour hash-based deduplication
- Alert suppression: manual toggle or timed windows (30m/1h/4h/8h)
- Authelia SSO via forward-auth headers, admin group required
- Network topology: Internet → UDM-Pro → Agg Switch (10G DAC) →
  PoE Switch (10G DAC) → Hosts
- MariaDB schema, suppression management UI, host/interface cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 23:03:18 -05:00
4ed5ecacbb added git ignore 2025-03-01 13:34:25 -05:00
004c97f492 interface update 2025-02-08 00:32:25 -05:00
9f92ac5c1a fixed indent 2025-02-08 00:16:45 -05:00
ea5e86ef33 lots logs 2025-02-08 00:16:06 -05:00
19224d14df added raw back 2025-02-08 00:13:06 -05:00
68beb7b1c4 more dynamic 2025-02-08 00:11:28 -05:00
dc117b276e fix syntax error 2025-02-08 00:04:42 -05:00
b67a5d10c2 dynamic devices 2025-02-08 00:03:01 -05:00
4c90fbb168 interfaces update 2025-02-07 23:57:34 -05:00
da59d50560 update index 2025-02-07 23:54:28 -05:00
610f55710d updated index html 2025-02-07 23:51:13 -05:00
067ce4d316 update html 2025-02-07 23:38:49 -05:00
02d03f4f3f json update 2025-02-07 23:26:41 -05:00
1549f39c2c v2 api 2025-02-07 23:24:36 -05:00
a2c8368439 wrong indentation 2025-02-07 23:20:14 -05:00
75cdef709f Bearer token 2025-02-07 23:19:50 -05:00
0417106e88 Auth order plz 2025-02-07 23:17:12 -05:00
3c4a9651b5 CSRF token 2025-02-07 23:14:36 -05:00
de24b9ef98 bearer token 2025-02-07 23:12:29 -05:00
37022b132f updated stats 2025-02-07 23:09:11 -05:00
5d5aea3cf4 acquire site id 2025-02-07 23:06:49 -05:00
de7b731269 site id to default 2025-02-07 23:01:58 -05:00
ac3eaf4f92 DEBUG BROSKI 2025-02-07 22:54:43 -05:00
c939a37344 fixed endpoint 2025-02-07 22:52:19 -05:00
baf7d23cd0 new header 2025-02-07 22:50:52 -05:00
20bfeda30e raw data 2025-02-07 22:48:55 -05:00
a42c4b6e8c new endpoints again 2025-02-07 22:47:04 -05:00
5298349ac7 legacy endpoints 2025-02-07 22:44:45 -05:00
2dea9ddc8d device and site id 2025-02-07 22:42:47 -05:00
f7990f34c6 initializes the session before attempting to use it 2025-02-07 22:31:07 -05:00
5fdd84b5f7 get site id 2025-02-07 22:29:26 -05:00
de5efc15cb x-api- 2025-02-07 22:27:26 -05:00
d8ede8f264 json instead of html 2025-02-07 22:24:32 -05:00
131da674d3 raw response 2025-02-07 22:23:15 -05:00
9aa9ab08c2 updated json parsing 2025-02-07 22:21:34 -05:00
c080aa1b87 revert 2025-02-07 22:20:20 -05:00
e4d8ee0941 Bearer token 2025-02-07 22:17:27 -05:00
82f07bb0aa attempt new login method 2025-02-07 22:14:57 -05:00
18059a9983 updated auth 2025-02-07 22:09:09 -05:00