Compare commits

..

37 Commits

Author SHA1 Message Date
jared 31747c4bd3 Replace deprecated datetime.utcnow() with datetime.now(timezone.utc)
Lint / Python (flake8) (push) Successful in 1m9s
Lint / JS (eslint) (push) Successful in 11s
Security / Python Security (bandit) (push) Successful in 44s
Test / Python Tests (pytest) (push) Successful in 58s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
datetime.utcnow() is deprecated in Python 3.12 and removed in 3.13.
Replace all four call sites with timezone-aware equivalents so the
codebase is ready for Python 3.12+.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 15:34:41 -04:00
jared faa0707f79 Add ESLint config enforcing no-undef and eqeqeq
Lint / Python (flake8) (push) Successful in 53s
Lint / JS (eslint) (push) Successful in 12s
Security / Python Security (bandit) (push) Successful in 1m44s
Test / Python Tests (pytest) (push) Successful in 59s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Without a config file, ESLint was running with no-undef disabled, meaning
undefined variable references in static/app.js were silently ignored.
Add .eslintrc.json with no-undef: error and eqeqeq: error so CI actually
catches JS bugs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 15:33:26 -04:00
jared 9c52e4ad1a Fix inspector auto-refresh ignoring 'Off' setting on page load
Lint / Python (flake8) (push) Successful in 41s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m0s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
Same ?? / || issue as the previous fix in index.html and links.html.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:20:42 -04:00
jared 156ef97667 Fix auto-refresh ignoring 'Off' setting on page load
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 39s
Test / Python Tests (pytest) (push) Successful in 53s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
Using || 30 / || 60 as a fallback treats refreshInterval=0 (Off) as
falsy and replaces it with the default, causing auto-refresh to start
even when the user saved 'Off'. Replace with nullish coalescing (??)
so only null/undefined triggers the default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:19:44 -04:00
jared 2f74266bd9 Fix monitor loop double-sleep on error; add grep -F regression test
Lint / Python (flake8) (push) Successful in 49s
Lint / JS (eslint) (push) Successful in 9s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
On exception the monitor slept 30s inside the except block then fell
through to time.sleep(poll_interval), giving a 150s recovery gap instead
of 30s. Adding continue after the error sleep fixes this.

Also adds a regression test asserting dmesg filtering uses grep -F --
so a future refactor cannot silently reintroduce the regex wildcard bug.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:16:43 -04:00
jared 222bdb08ab Fix suppression annotation for interface_down not checking host-level rules
Lint / Python (flake8) (push) Successful in 38s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 39s
Test / Python Tests (pytest) (push) Successful in 1m5s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 4s
monitor.py checks both 'interface' and 'host' suppressions for interface_down
events, but _annotate_suppressions only checked 'interface'. A host-level
suppression would silently suppress tickets but not mark the table row as
suppressed in the UI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:14:46 -04:00
jared 8dd744b039 Show suppressed badge on host cards during global maintenance windows
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 38s
Test / Python Tests (pytest) (push) Successful in 52s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Global suppressions (target_type='all') have an empty target_name, so
the selectattr filter never matched them, leaving no visual indicator
when a global maintenance window was active. Pre-compute has_global_sup
before the host loop and OR it into the badge condition.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:12:25 -04:00
jared 9e2be150b5 Use grep -F in dmesg filter to prevent interface name treated as regex
Lint / Python (flake8) (push) Successful in 38s
Lint / JS (eslint) (push) Failing after 13s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
grep {iface} treats dots and other special chars as regex metacharacters.
Switch to grep -F -- {iface} for fixed-string matching and to prevent
a leading dash in the interface name from being parsed as a grep flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 11:12:02 -04:00
jared ed5ba5c59e Remove unused is_new parameter from ticket helper methods
After fixing the is_new guard bug, is_new is no longer used inside
_ticket_interface, _ticket_unifi, or _ticket_unreachable. Drop it from
their signatures and call sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 11:10:32 -04:00
jared 2be44d8b24 Fix ticket_id never stored when fail_thresh>1; guard sessionStorage JSON.parse
Lint / Python (flake8) (push) Successful in 45s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 43s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
monitor.py: _ticket_interface/_ticket_unifi/_ticket_unreachable all used
`if tid and is_new` to guard db.set_ticket_id(). Since is_new is True only
on the first upsert (consec=1) but tickets are created at consec>=fail_thresh
(default 2), is_new is always False when the ticket is created, so the
ticket link never appeared in the UI. Changed to `if tid:`.

links.html: JSON.parse(sessionStorage.getItem(...)) in togglePanel and
restoreCollapseState had no try-catch. Corrupt/stale session storage would
throw an uncaught SyntaxError. Also wrapped all sessionStorage.setItem
calls in try-catch to defend against storage-full / private-browsing errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:45:20 -04:00
jared 2d6dcd782f Cancel in-flight diagnostic poll when user selects a new port
Lint / Python (flake8) (push) Successful in 45s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 52s
Test / Python Tests (pytest) (push) Successful in 1m2s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
Previously switching ports while a diagnostic was running left the
setInterval timer active, causing the result to be written into the
old (now detached) DOM elements and never shown to the user.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:26:53 -04:00
jared a1a3a52dd8 Fix empty-object false negative in links page no-data check
Lint / Python (flake8) (push) Successful in 51s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 46s
Test / Python Tests (pytest) (push) Successful in 1m3s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
The check `!data.hosts && !data.unifi_switches` never caught empty
objects `{}`, which are truthy. Replace with Object.keys length checks
so the friendly "no data yet" banner renders when both collections
are empty.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:21:50 -04:00
jared bcc2ad7f5c Use shlex.quote for remote_cmd in build_ssh_command
Lint / Python (flake8) (push) Successful in 1m3s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 49s
Test / Python Tests (pytest) (push) Successful in 1m10s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 4s
Matches the pattern already used in monitor.py's _ssh_batch(); prevents
quoting breakage if shlex.quote(iface) emits single-quoted tokens inside
the remote command string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:17:11 -04:00
jared d4f159ee7c fix: escape ticket_id text content in dynamic events table
Lint / Python (flake8) (push) Successful in 44s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 1m7s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
ticket_id was already escaped in the href attribute but the visible
text (#<id>) used the raw value in an innerHTML template literal.
Apply lt.escHtml() for defense-in-depth against a compromised ticket API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:02:09 -04:00
jared 61019418d3 fix: add aria-required to s-reason field in suppressions form
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 57s
Test / Python Tests (pytest) (push) Successful in 1m27s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 6s
The reason input had `required` for browser validation but was missing
`aria-required="true"`, so screen readers did not announce it as required.
Matches the fix already applied to the equivalent field in base.html.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 15:11:05 -04:00
jared 1a53718cc5 fix: SSH shell quoting bug breaks ethtool collection; ticket_id KeyError
Lint / Python (flake8) (push) Successful in 41s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 55s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
monitor.py _ssh_batch(): the remote command was wrapped in double-quotes
(f'root@{ip} "{shell_cmd}"') but shell_cmd itself contains double-quoted
echo sentinels ("___IFACE:eth0___"). When Pulse's shell parses the full
ssh invocation, the nested double-quotes cause mis-parsing — the remote
command is split incorrectly, silently breaking all ethtool/SFP DOM
collection. Fix: use shlex.quote(shell_cmd) so the entire remote command
is single-quoted, leaving inner double-quotes untouched.

TicketClient.create(): data['ticket_id'] raises KeyError if the Tinker
Tickets API returns success=true without a ticket_id field (malformed
response). Use data.get('ticket_id') with an explicit warning log.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 13:41:09 -04:00
jared afaeb64636 fix: UTC timezone suffix missing from all isoformat() timestamp outputs
db.py returned all datetime columns (first_seen, last_seen, resolved_at,
created_at, expires_at) as bare ISO strings like "2026-03-14T14:14:21"
with no timezone marker. Per the ECMAScript spec, new Date() on a
datetime string without timezone treats it as LOCAL time, not UTC.
This made lt.time.ago() and stale-detection wrong for any user whose
browser is not in UTC — event ages and stale warnings would be off by
the client's UTC offset.

monitor.py had the same issue on the network_snapshot 'updated' field.

Fix: append 'Z' to all isoformat() calls (UTC datetimes confirmed by
MySQL server timezone and _now_utc() pattern used throughout codebase).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 13:28:49 -04:00
jared b6ee45a842 fix: inspector.html stale/updated timestamp broken date parsing
Lint / Python (flake8) (push) Successful in 1m8s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 50s
Test / Python Tests (pytest) (push) Successful in 52s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Same bug as was just fixed in links.html: data.updated is stored as
"YYYY-MM-DD HH:MM:SS UTC" by monitor.py, so appending 'Z' produced
"…UTCZ" — an invalid date. The stale-data warning and Updated timestamp
in Inspector were silently showing "Invalid Date" and the stale overlay
never fired. Fixed to use _toIso() (already global via app.js).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 13:25:17 -04:00
jared 9c4dd5df51 fix: admin-only suppression enforcement, links.html broken date parsing
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 44s
Test / Python Tests (pytest) (push) Successful in 49s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Security: add require_admin decorator; apply to POST/DELETE /api/suppressions
and /suppressions page. Previously any user in allowed_groups could create or
delete suppressions even though the nav restricts the UI to admins.

Bug: links.html "Updated:" timestamp and stale-warning both produced
Invalid Date because the raw "YYYY-MM-DD HH:MM:SS UTC" string was appended
with 'Z' instead of being normalised through _toIso(). Fix both call sites to
use _toIso(), and remove the now-redundant local _toIso redefinition.

Style: use `with open(sentinel, 'w'): pass` consistently (was open().close()
at avatar JPEG validation path).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 13:03:37 -04:00
jared 4e3d0a1f0a fix: aria-required sync, aria-label pills, deduplicate setDuration logic
Lint / Python (flake8) (push) Successful in 39s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 1m3s
Test / Python Tests (pytest) (push) Successful in 1m5s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
- updateSuppressForm() now sets required + aria-required on sup-name/sup-detail
  when target type changes; sup-reason gets static aria-required="true"
- onTypeChange() in suppressions page syncs aria-required on s-name
- s-name in suppressions.html gets initial required/aria-required (default type=host)
- Duration pills in both modal and suppressions page now have descriptive
  aria-label ("30 minutes", "1 hour", etc.) alongside the group aria-label
- setDuration() in app.js accepts optional {expiresId,pillSel,hintId} opts so
  logic lives in one place; suppressions.html setDur() delegates to it
- Post-create form reset uses setDur() instead of manually patching DOM

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 12:58:32 -04:00
jared 49869fd9f7 fix: inspector stale data warning, remove dead supported_modes code
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 39s
Test / Python Tests (pytest) (push) Successful in 55s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 5s
- inspector.html: show orange '⚠ Stale: HH:MM' with tooltip when link_stats data is >15 min old (previously just showed the time with no visual warning)
- style.css: add .g-stale-warn helper class (orange, bold) for the stale indicator
- diagnose.py: remove supported_modes accumulation from parse_ethtool() — field was collected but never consumed by analyze() or displayed anywhere

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 12:05:08 -04:00
jared c68e797f31 fix: diagnostic toggle hint, link_stats schema, pagination UX, rate-limit feedback
Lint / Python (flake8) (push) Successful in 46s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 41s
Test / Python Tests (pytest) (push) Successful in 49s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
- inspector.html: collapsible section hint text now toggles between [expand]/[collapse] when clicked
- inspector.html: timeout and connection-loss during diagnostic poll now show a Retry button instead of a dead end
- inspector.html: 429 rate-limit response shows a clear human-readable message instead of generic error
- app.py: empty link_stats fallback now includes unifi_switches:{} for schema consistency with real data shape
- index.html: pagination overflow notice now says "export all as JSON" (opens in new tab) instead of misleadingly linking to raw API as navigation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 12:01:56 -04:00
jared fc2be88915 fix: escape poe_class in inspector panel for consistency
Lint / Python (flake8) (push) Successful in 1m49s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 1m35s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 5s
d.poe_mode was already wrapped in escHtml(); apply same to d.poe_class.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 11:56:11 -04:00
jared cd0b725f3e fix: LLDP port label bug, suppression SQL dead code, avatar path hardening
Lint / Python (flake8) (push) Successful in 1m13s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
- inspector.html: fix LLDP neighbor label in port blocks — port.lldp_table never exists; data is at port.lldp (dict with system_name/chassis_id); both port block renderers corrected
- db.py: remove dead 'target_detail IS NULL' branch in suppression check — target_detail is always stored as '' not NULL; query simplified to target_detail=''
- app.py: resolve cache_dir/cache_file/sentinel to absolute paths; guard against path escape before use
- app.py: wrap sentinel os.path.getmtime() in try/except OSError to handle TOCTOU deletion race

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 09:31:25 -04:00
jared 77c74098a3 fix: flake8 E701 in avatar handler; update SSH test to match accept-new
Lint / Python (flake8) (push) Successful in 55s
Lint / JS (eslint) (push) Successful in 11s
Security / Python Security (bandit) (push) Successful in 1m15s
Test / Python Tests (pytest) (push) Successful in 59s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
- app.py: split 'with open(sentinel): pass' onto two lines (flake8 E701)
- tests/test_diagnose.py: rename test and assert StrictHostKeyChecking=accept-new (not =no which was fixed earlier)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 09:23:06 -04:00
jared aa52047016 fix: cache_ttl config validation; ticket_web_url via tojson in base.html
Lint / Python (flake8) (push) Failing after 44s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Failing after 1m13s
Lint / Notify on failure (push) Successful in 4s
Lint / Deploy (push) Has been skipped
- app.py: wrap int(cache_ttl) in try/except so a misconfigured non-integer value falls back to 3600 instead of raising ValueError
- base.html: use Jinja2 tojson filter for ticket_web_url to ensure proper JS string escaping regardless of URL contents

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 09:05:53 -04:00
jared e166e3fcb4 fix: LDAP conn leak, health timing info, security headers, link_stats size guard
Lint / Python (flake8) (push) Failing after 51s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 1m2s
Test / Python Tests (pytest) (push) Failing after 1m21s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
- app.py: move conn.unbind() into finally block in api_avatar() so connection is always closed even if conn.search() throws
- app.py: remove elapsed-time strings from /health response (unauthenticated endpoint no longer leaks monitor timing)
- app.py: add after_request hook setting X-Content-Type-Options, X-Frame-Options, Referrer-Policy on all responses
- app.py: add 10 MB size guard on link_stats before JSON parse; log actual exception on parse failure
- app.py: wrap suppressions_page network_snapshot parse in try/except (same protection as index page)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:50:51 -04:00
jared d4d4208145 fix: LDAP empty-password guard, expires_minutes bounds, snapshot JSON safety, rate dict cleanup
Lint / Python (flake8) (push) Failing after 39s
Lint / JS (eslint) (push) Failing after 12s
Security / Python Security (bandit) (push) Successful in 41s
Test / Python Tests (pytest) (push) Failing after 1m28s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
- app.py: fail loudly if LDAP bind_pw is not configured rather than attempting anonymous bind
- app.py: validate expires_minutes is 1–43200 (max 30 days) before storing suppression
- app.py: wrap network_snapshot JSON parse in try/except so a corrupt DB value returns degraded page instead of 500
- app.py: prune _diag_rate entries inactive for >1h to prevent unbounded growth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:47:43 -04:00
jared 61408645a5 fix: LLDP input validation, mgmt_ip early validation, poll timer cleanup, monitor backoff
Lint / Python (flake8) (push) Failing after 41s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Failing after 1m35s
Lint / Notify on failure (push) Successful in 5s
Lint / Deploy (push) Has been skipped
- app.py: validate server_name from LLDP with fullmatch before use in logs/lookups (prevents log injection)
- app.py: validate each mgmt_ip candidate before assigning host_ip (avoids assigning non-IP string that then fails later check)
- app.py: log actual exception in link_stats JSON parse error
- inspector.html: clear _diagPollTimer in closePanel() so timer doesn't orphan when panel is closed mid-poll
- monitor.py: sleep 30s after a monitor loop exception before resuming normal poll interval

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:45:28 -04:00
jared 25baec67ac fix: diagnostic rate limiting, lock-held ownership check, iface name length cap
Lint / Python (flake8) (push) Failing after 47s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 43s
Test / Python Tests (pytest) (push) Failing after 1m22s
Lint / Notify on failure (push) Successful in 3s
Lint / Deploy (push) Has been skipped
- app.py: add per-user diagnostic rate limit (5/min) enforced atomically under _diag_lock
- app.py: move diagnostic job ownership check inside _diag_lock to close TOCTOU window; snapshot result before releasing lock
- monitor.py: cap interface name regex to 15 chars (Linux IFNAMSIZ limit)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:42:50 -04:00
jared c71d0da97d security: harden exception exposure, SSL config, and Pulse response parsing
Lint / Python (flake8) (push) Failing after 42s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 1m22s
Test / Python Tests (pytest) (push) Failing after 1m23s
Lint / Notify on failure (push) Successful in 3s
Lint / Deploy (push) Has been skipped
- app.py: replace raw str(e) in diagnostic _run() with generic client message; log internally only
- app.py: /health endpoint no longer leaks exception strings to unauthenticated callers; errors logged server-side
- monitor.py: UniFi SSL verification now defaults True, configurable via config.json unifi.verify_ssl; urllib3 warning suppression scoped to verify=False only (removed global disable)
- monitor.py: Pulse execution_id extracted with .get() + explicit None check to avoid KeyError on malformed response
- monitor.py: interface name regex drops '@' (not a valid kernel interface char) to match app.py and fix inconsistency

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:40:25 -04:00
jared 38297e616f arch+security: route all server contact through Pulse, harden SSH
Lint / Python (flake8) (push) Failing after 43s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m4s
Test / Python Tests (pytest) (push) Failing after 1m5s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
Architecture:
- Remove direct subprocess ping from Gandalf; add PulseClient.ping()
  which runs the ping via the Pulse worker instead
- Remove standalone ping() function and subprocess import from monitor.py
- Add self.pulse alias to NetworkMonitor for convenience
- Both _process_ping_hosts() and snapshot builder now use self.pulse.ping()

Security:
- Change StrictHostKeyChecking=no → accept-new in both SSH command
  builders (monitor.py _ssh_batch, diagnose.py build_ssh_command).
  The Pulse worker's known_hosts is now authoritative; host keys are
  recorded on first connection and verified on all subsequent ones.
  MITM attacks after initial key exchange are now detectable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:58:16 -04:00
jared ca41486c45 security+a11y: job ownership check, aria-live chips, aria-hidden topo
Lint / Python (flake8) (push) Failing after 45s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m5s
Test / Python Tests (pytest) (push) Successful in 49s
Lint / Notify on failure (push) Successful in 3s
Lint / Deploy (push) Has been skipped
security:
- Fix bare open(sentinel, 'w').close() file descriptor leak; use
  context manager instead
- Store requesting username in _diag_jobs at creation time; return 403
  from api_diagnose_poll if the polling user does not match the job owner

accessibility:
- Add aria-live="polite" aria-atomic="true" to .status-chips container
  so screen readers announce critical/warning count changes on refresh
- Add aria-controls="events-table-wrap" to critical and warning stat
  cards so assistive tech knows these buttons control the events table
- Add aria-hidden sync to topology setCollapsed() — hidden topology
  content is now removed from the accessibility tree when collapsed,
  preventing keyboard focus from entering invisible elements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:53:17 -04:00
jared 0f2506d5a4 refactor: const for _inspInterval in inspector.html
Lint / Python (flake8) (push) Successful in 54s
Lint / JS (eslint) (push) Successful in 9s
Security / Python Security (bandit) (push) Successful in 1m17s
Test / Python Tests (pytest) (push) Successful in 53s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Last remaining var declaration; matches the pattern in index.html and
links.html.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:45:42 -04:00
jared 678ede4e76 refactor: replace inline onclick with data-action event delegation
Lint / Python (flake8) (push) Successful in 42s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m0s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
The command palette button used an inline onclick handler while every
other interactive element in base.html uses data-action + event
delegation. Now consistent: data-action="open-cmdpalette" handled in
the global footer click listener.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:45:09 -04:00
jared b51b39c3a7 a11y: keyboard-accessible panel toggles, region landmarks in inspector
Lint / Python (flake8) (push) Successful in 43s
Lint / JS (eslint) (push) Successful in 14s
Security / Python Security (bandit) (push) Successful in 45s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
- Add role="button" tabindex="0" aria-expanded to .link-host-title
  in both static and JS-rendered panels (host panels + UniFi switches)
- Sync aria-expanded in togglePanel(), restoreCollapseState(),
  collapseAll(), and expandAll()
- Add keydown handler (Enter/Space) so panel headers are keyboard-operable
- Add role="region" aria-label to inspector main chassis area
- Add role="complementary" aria-label to inspector port detail panel
- Replace last inline date-parse in renderLinks() with _toIso() helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:44:23 -04:00
jared 41695a3faa security: escape user input in 403 error response to prevent XSS
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 9s
Security / Python Security (bandit) (push) Successful in 1m0s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
The require_auth decorator was interpolating user['username'] and the
allowed_groups list directly into HTML strings. An attacker with a
crafted username or control over group names could inject arbitrary HTML.

Use html.escape() on both values before insertion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:41:31 -04:00
13 changed files with 372 additions and 179 deletions
+21
View File
@@ -0,0 +1,21 @@
{
"env": {
"browser": true,
"es2021": true
},
"globals": {
"lt": "readonly",
"GANDALF_CONFIG": "readonly",
"CSS": "readonly"
},
"rules": {
"no-undef": "error",
"no-unused-vars": ["warn", { "argsIgnorePattern": "^_", "varsIgnorePattern": "^_" }],
"no-console": "off",
"eqeqeq": ["error", "always", { "null": "ignore" }]
},
"parserOptions": {
"ecmaVersion": 2021,
"sourceType": "script"
}
}
+151 -42
View File
@@ -5,6 +5,7 @@ management UI. Authentication via Authelia forward-auth headers.
All monitoring and alerting is handled by the separate monitor.py daemon.
"""
import hashlib
import html
import ipaddress
import json
import logging
@@ -58,6 +59,8 @@ def inject_config():
# In-memory diagnostic job store { job_id: { status, result, created_at } }
_diag_jobs: dict = {}
_diag_lock = threading.Lock()
# Per-user rate-limit: { username: [epoch_float, ...] } — cleaned inside _diag_lock
_diag_rate: dict = {}
def _purge_old_jobs_loop():
@@ -91,6 +94,14 @@ def _config() -> dict:
return _cfg
@app.after_request
def add_security_headers(response):
response.headers.setdefault('X-Content-Type-Options', 'nosniff')
response.headers.setdefault('X-Frame-Options', 'DENY')
response.headers.setdefault('Referrer-Policy', 'strict-origin-when-cross-origin')
return response
def _daemon_ok(last_check: str) -> bool:
"""Return True if monitor last checked within 20 minutes."""
if not last_check or last_check == 'Never':
@@ -132,16 +143,29 @@ def require_auth(f):
)
allowed = _config().get('auth', {}).get('allowed_groups', ['admin'])
if not any(g in allowed for g in user['groups']):
safe_user = html.escape(user['username'])
safe_groups = html.escape(', '.join(allowed))
return (
f'<h1>403 Access denied</h1>'
f'<p>Your account ({user["username"]}) is not in an allowed group '
f'({", ".join(allowed)}).</p>',
f'<p>Your account ({safe_user}) is not in an allowed group '
f'({safe_groups}).</p>',
403,
)
return f(*args, **kwargs)
return wrapper
def require_admin(f):
"""Decorator: require require_auth AND membership in the 'admin' group."""
@wraps(f)
def wrapper(*args, **kwargs):
user = _get_user()
if 'admin' not in user.get('groups', []):
return jsonify({'error': 'Admin access required'}), 403
return f(*args, **kwargs)
return wrapper
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
@@ -150,17 +174,26 @@ _PAGE_LIMIT = 200 # max events returned per request
def _annotate_suppressions(events: list, suppressions: list) -> None:
"""Annotate each event dict in-place with an is_suppressed bool."""
"""Annotate each event dict in-place with an is_suppressed bool.
Mirrors the suppression check order in monitor.py exactly:
interface_down → interface OR host
unifi_device_* → unifi_device
everything else → host
"""
for ev in events:
sup_type = (
'unifi_device' if ev.get('event_type') == 'unifi_device_offline'
else 'interface' if ev.get('event_type') == 'interface_down'
else 'host'
)
ev['is_suppressed'] = db.check_suppressed(
suppressions, sup_type,
ev.get('target_name', ''), ev.get('target_detail', '') or '',
)
etype = ev.get('event_type', '')
name = ev.get('target_name', '')
detail = ev.get('target_detail', '') or ''
if etype == 'interface_down':
ev['is_suppressed'] = (
db.check_suppressed(suppressions, 'interface', name, detail) or
db.check_suppressed(suppressions, 'host', name)
)
elif etype == 'unifi_device_offline':
ev['is_suppressed'] = db.check_suppressed(suppressions, 'unifi_device', name, detail)
else:
ev['is_suppressed'] = db.check_suppressed(suppressions, 'host', name, detail)
# ---------------------------------------------------------------------------
@@ -177,7 +210,11 @@ def index():
summary = db.get_status_summary()
snapshot_raw = db.get_state('network_snapshot')
last_check = db.get_state('last_check', 'Never')
snapshot = json.loads(snapshot_raw) if snapshot_raw else {}
try:
snapshot = json.loads(snapshot_raw) if snapshot_raw else {}
except Exception as e:
logger.error(f'Failed to parse network_snapshot JSON: {e}')
snapshot = {}
suppressions = db.get_active_suppressions()
_annotate_suppressions(events, suppressions)
recent_resolved = db.get_recent_resolved(hours=24, limit=10)
@@ -211,12 +248,17 @@ def inspector():
@app.route('/suppressions')
@require_auth
@require_admin
def suppressions_page():
user = _get_user()
active = db.get_active_suppressions()
history = db.get_suppression_history(limit=50)
snapshot_raw = db.get_state('network_snapshot')
snapshot = json.loads(snapshot_raw) if snapshot_raw else {}
try:
snapshot = json.loads(snapshot_raw) if snapshot_raw else {}
except Exception as e:
logger.error(f'Failed to parse network_snapshot JSON: {e}')
snapshot = {}
return render_template(
'suppressions.html',
user=user,
@@ -263,11 +305,14 @@ def api_network():
def api_links():
raw = db.get_state('link_stats')
if raw:
if len(raw) > 10_000_000:
logger.error(f'link_stats exceeds 10 MB ({len(raw)} bytes); possible corruption')
return jsonify({'error': 'Invalid cached data'}), 503
try:
return jsonify(json.loads(raw))
except Exception:
logger.error('Failed to parse link_stats JSON')
return jsonify({'hosts': {}, 'updated': None})
except Exception as e:
logger.error(f'Failed to parse link_stats JSON: {e}')
return jsonify({'hosts': {}, 'unifi_switches': {}, 'updated': None})
@app.route('/api/events')
@@ -299,6 +344,7 @@ def api_get_suppressions():
@app.route('/api/suppressions', methods=['POST'])
@require_auth
@require_admin
def api_create_suppression():
user = _get_user()
data = request.get_json(silent=True) or {}
@@ -322,13 +368,21 @@ def api_create_suppression():
if len(target_detail) > 255:
return jsonify({'error': 'target_detail must be 255 characters or fewer'}), 400
if expires_minutes is not None:
try:
expires_minutes = int(expires_minutes)
if expires_minutes <= 0 or expires_minutes > 43200:
return jsonify({'error': 'expires_minutes must be between 1 and 43200 (30 days)'}), 400
except (ValueError, TypeError):
return jsonify({'error': 'expires_minutes must be a valid integer'}), 400
sup_id = db.create_suppression(
target_type=target_type,
target_name=target_name,
target_detail=target_detail,
reason=reason,
suppressed_by=user['username'],
expires_minutes=int(expires_minutes) if expires_minutes else None,
expires_minutes=expires_minutes,
)
logger.info(
f'Suppression #{sup_id} created by {user["username"]}: '
@@ -339,6 +393,7 @@ def api_create_suppression():
@app.route('/api/suppressions/<int:sup_id>', methods=['DELETE'])
@require_auth
@require_admin
def api_delete_suppression(sup_id: int):
user = _get_user()
db.deactivate_suppression(sup_id)
@@ -366,8 +421,8 @@ def api_diagnose_start():
return jsonify({'error': 'No link_stats data available'}), 503
try:
link_data = json.loads(raw)
except Exception:
logger.error('Failed to parse link_stats JSON in /api/diagnose')
except Exception as e:
logger.error(f'Failed to parse link_stats JSON in /api/diagnose: {e}')
return jsonify({'error': 'Internal data error'}), 500
switches = link_data.get('unifi_switches', {})
@@ -391,6 +446,9 @@ def api_diagnose_start():
return jsonify({'error': 'No LLDP neighbor data for this port'}), 400
server_name = lldp['system_name']
if not re.fullmatch(r'[a-zA-Z0-9._-]+', server_name):
logger.error(f'Refusing diagnostic: invalid server_name from LLDP: {server_name!r}')
return jsonify({'error': 'LLDP neighbor name contains invalid characters'}), 400
lldp_port_id = lldp.get('port_id', '')
# Find matching host + interface in link_stats hosts
@@ -416,9 +474,14 @@ def api_diagnose_start():
# Resolve host IP from link_stats host data
host_ip = (server_ifaces.get(matched_iface) or {}).get('host_ip')
if not host_ip:
# Fallback: use LLDP mgmt IPs
mgmt_ips = lldp.get('mgmt_ips') or []
host_ip = mgmt_ips[0] if mgmt_ips else None
# Fallback: use first valid IP from LLDP mgmt IPs
for candidate in (lldp.get('mgmt_ips') or []):
try:
ipaddress.ip_address(candidate)
host_ip = candidate
break
except ValueError:
continue
if not host_ip:
return jsonify({'error': 'Cannot determine host IP for SSH'}), 400
@@ -433,8 +496,22 @@ def api_diagnose_start():
return jsonify({'error': 'Resolved interface name contains invalid characters'}), 400
job_id = str(uuid.uuid4())
requesting_user = _get_user()['username']
now = time.time()
with _diag_lock:
_diag_jobs[job_id] = {'status': 'running', 'result': None, 'created_at': time.time()}
# Rate limit: max 5 diagnostic jobs per user per minute; prune stale user entries
stale_users = [u for u, ts in _diag_rate.items() if not ts or max(ts) < now - 3600]
for u in stale_users:
del _diag_rate[u]
recent = [t for t in _diag_rate.get(requesting_user, []) if now - t < 60]
if len(recent) >= 5:
return jsonify({'error': 'Rate limit exceeded: max 5 diagnostics per minute'}), 429
recent.append(now)
_diag_rate[requesting_user] = recent
_diag_jobs[job_id] = {
'status': 'running', 'result': None,
'created_at': now, 'user': requesting_user,
}
def _run():
try:
@@ -444,7 +521,7 @@ def api_diagnose_start():
result = runner.run(host_ip, server_name, matched_iface, port_data)
except Exception as e:
logger.error(f'Diagnostic job {job_id} failed: {e}', exc_info=True)
result = {'status': 'error', 'error': str(e)}
result = {'status': 'error', 'error': 'Diagnostic failed; check server logs.'}
with _diag_lock:
if job_id in _diag_jobs:
_diag_jobs[job_id]['status'] = 'done'
@@ -460,11 +537,15 @@ def api_diagnose_start():
@require_auth
def api_diagnose_poll(job_id: str):
"""Poll a diagnostic job. Returns {status, result}."""
current_user = _get_user()['username']
with _diag_lock:
job = _diag_jobs.get(job_id)
if not job:
return jsonify({'error': 'Job not found'}), 404
return jsonify({'status': job['status'], 'result': job.get('result')})
if not job:
return jsonify({'error': 'Job not found'}), 404
if job.get('user') != current_user:
return jsonify({'error': 'Forbidden'}), 403
snapshot = {'status': job['status'], 'result': job.get('result')}
return jsonify(snapshot)
@app.route('/api/avatar')
@@ -481,11 +562,21 @@ def api_avatar():
# Build a safe cache filename from the username (alphanumeric + - _ .)
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '_', username)
cache_dir = ldap_cfg.get('cache_dir', os.path.join(tempfile.gettempdir(), 'gandalf_avatars'))
cache_dir = os.path.abspath(
ldap_cfg.get('cache_dir', os.path.join(tempfile.gettempdir(), 'gandalf_avatars'))
)
os.makedirs(cache_dir, exist_ok=True)
cache_file = os.path.join(cache_dir, f'user_{safe_name}.jpg')
sentinel = os.path.join(cache_dir, f'user_{safe_name}.none')
cache_ttl = int(ldap_cfg.get('cache_ttl', 3600))
cache_file = os.path.abspath(os.path.join(cache_dir, f'user_{safe_name}.jpg'))
sentinel = os.path.abspath(os.path.join(cache_dir, f'user_{safe_name}.none'))
# Guard against path escape (shouldn't happen with sanitised safe_name, but be explicit)
if not cache_file.startswith(cache_dir + os.sep) or not sentinel.startswith(cache_dir + os.sep):
logger.error(f'Avatar path escape detected for user {username!r}')
return '', 404
try:
cache_ttl = int(ldap_cfg.get('cache_ttl', 3600))
except (ValueError, TypeError):
logger.warning('Invalid cache_ttl in ldap config; using default 3600')
cache_ttl = 3600
now = time.time()
@@ -495,33 +586,48 @@ def api_avatar():
max_age=cache_ttl, conditional=True)
# Skip LDAP if we already know this user has no avatar
if os.path.exists(sentinel) and now - os.path.getmtime(sentinel) < cache_ttl:
return '', 404
try:
if os.path.exists(sentinel) and now - os.path.getmtime(sentinel) < cache_ttl:
return '', 404
except OSError:
pass
# Query lldap
bind_pw = ldap_cfg.get('bind_pw', '')
if not bind_pw:
logger.error('LDAP bind_pw not configured — avatar lookup disabled')
return '', 404
avatar_data = None
conn = None
try:
import ldap3
server = ldap3.Server(ldap_cfg['host'], port=int(ldap_cfg.get('port', 3890)))
conn = ldap3.Connection(server,
user=ldap_cfg['bind_dn'],
password=ldap_cfg.get('bind_pw', ''),
password=bind_pw,
auto_bind=True, receive_timeout=5)
safe_uid = ldap3.utils.conv.escape_filter_chars(username)
conn.search(ldap_cfg.get('user_base', 'ou=people,dc=example,dc=com'),
f'(uid={safe_uid})', attributes=['avatar'])
if conn.entries and conn.entries[0]['avatar'].value:
avatar_data = conn.entries[0]['avatar'].value
conn.unbind()
except ImportError:
logger.error('ldap3 not installed — run: pip install ldap3')
return '', 404
except Exception as e:
logger.error(f'LDAP avatar lookup failed for {username}: {e}')
return '', 404
finally:
if conn is not None:
try:
conn.unbind()
except Exception:
pass
if not avatar_data or len(avatar_data) < 100:
open(sentinel, 'w').close()
with open(sentinel, 'w'):
pass
return '', 404
# Validate JPEG magic bytes (FF D8 FF)
@@ -529,7 +635,8 @@ def api_avatar():
avatar_data = avatar_data.encode('latin-1')
if avatar_data[:3] != b'\xFF\xD8\xFF':
logger.warning(f'Non-JPEG avatar data for {username}')
open(sentinel, 'w').close()
with open(sentinel, 'w'):
pass
return '', 404
with open(cache_file, 'wb') as f:
@@ -554,7 +661,8 @@ def health():
db.get_state('last_check')
checks['db'] = 'ok'
except Exception as e:
checks['db'] = f'error: {e}'
logger.error(f'Health check db error: {e}')
checks['db'] = 'error'
overall = 'degraded'
# Monitor freshness: fail if last_check is older than 20 minutes
@@ -564,14 +672,15 @@ def health():
ts = datetime.strptime(last_check, '%Y-%m-%d %H:%M:%S UTC').replace(tzinfo=timezone.utc)
age_s = (datetime.now(timezone.utc) - ts).total_seconds()
if age_s > 1200:
checks['monitor'] = f'stale ({int(age_s)}s since last check)'
checks['monitor'] = 'stale'
overall = 'degraded'
else:
checks['monitor'] = f'ok ({int(age_s)}s ago)'
checks['monitor'] = 'ok'
else:
checks['monitor'] = 'no data yet'
except Exception as e:
checks['monitor'] = f'error: {e}'
logger.error(f'Health check monitor error: {e}')
checks['monitor'] = 'error'
overall = 'degraded'
status_code = 200 if overall == 'ok' else 503
+7 -7
View File
@@ -3,7 +3,7 @@ import json
import logging
import threading
from contextlib import contextmanager
from datetime import datetime, timedelta
from datetime import datetime, timedelta, timezone
from typing import Optional
import pymysql
@@ -182,7 +182,7 @@ def get_active_events(limit: int = 200, offset: int = 0) -> list:
for r in rows:
for k in ('first_seen', 'last_seen'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -210,7 +210,7 @@ def get_recent_resolved(hours: int = 24, limit: int = 50) -> list:
for r in rows:
for k in ('first_seen', 'last_seen', 'resolved_at'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -252,7 +252,7 @@ def get_active_suppressions() -> list:
for r in rows:
for k in ('created_at', 'expires_at'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -267,7 +267,7 @@ def get_suppression_history(limit: int = 50) -> list:
for r in rows:
for k in ('created_at', 'expires_at'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -281,7 +281,7 @@ def create_suppression(
) -> int:
expires_at = None
if expires_minutes:
expires_at = datetime.utcnow() + timedelta(minutes=int(expires_minutes))
expires_at = datetime.now(timezone.utc) + timedelta(minutes=int(expires_minutes))
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(
@@ -365,7 +365,7 @@ def is_suppressed(target_type: str, target_name: str, target_detail: str = '') -
"""SELECT id FROM suppression_rules
WHERE active=TRUE AND (expires_at IS NULL OR expires_at > NOW())
AND target_type=%s AND target_name=%s
AND (target_detail IS NULL OR target_detail='') LIMIT 1""",
AND target_detail='' LIMIT 1""",
(target_type, target_name),
)
if cur.fetchone():
+3 -5
View File
@@ -68,17 +68,17 @@ class DiagnosticsRunner:
f' echo "=== ip_route ===";'
f' ip route show dev {q} 2>/dev/null;'
f' echo "=== dmesg ===";'
f' dmesg 2>/dev/null | grep {q} | tail -50;'
f' dmesg 2>/dev/null | grep -F -- {q} | tail -50;'
f' echo "=== lldpctl ===";'
f' lldpctl 2>/dev/null || echo "lldpd not running";'
f' echo "=== end ==="'
)
return (
f'ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 '
f'ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 '
f'-o BatchMode=yes -o LogLevel=ERROR '
f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 '
f'root@{ip_q} \'{remote_cmd}\''
f'root@{ip_q} {shlex.quote(remote_cmd)}'
)
# ------------------------------------------------------------------
@@ -221,8 +221,6 @@ class DiagnosticsRunner:
data['auto_neg'] = (val.lower() == 'on')
elif key == 'Link detected':
data['link_detected'] = (val.lower() == 'yes')
elif 'Supported link modes' in key:
data.setdefault('supported_modes', []).append(val)
return data
@staticmethod
+41 -37
View File
@@ -11,9 +11,8 @@ import json
import logging
import re
import shlex
import subprocess
import time
from datetime import datetime
from datetime import datetime, timezone
from typing import Dict, List, Optional
import requests
@@ -21,7 +20,6 @@ from urllib3.exceptions import InsecureRequestWarning
import db
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
logging.basicConfig(
level=logging.INFO,
@@ -91,7 +89,9 @@ class UnifiClient:
self.base_url = cfg['controller']
self.site_id = cfg.get('site_id', 'default')
self.session = requests.Session()
self.session.verify = False
self.session.verify = cfg.get('verify_ssl', True)
if not self.session.verify:
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
self.headers = {
'X-API-KEY': cfg['api_key'],
'Accept': 'application/json',
@@ -215,7 +215,10 @@ class TicketClient:
resp.raise_for_status()
data = resp.json()
if data.get('success'):
tid = data['ticket_id']
tid = data.get('ticket_id')
if not tid:
logger.warning(f'Ticket API success but no ticket_id in response: {data}')
return None
logger.info(f'Created ticket #{tid}: {title}')
return tid
if data.get('existing_ticket_id'):
@@ -263,7 +266,10 @@ class PulseClient:
timeout=10,
)
resp.raise_for_status()
execution_id = resp.json()['execution_id']
execution_id = resp.json().get('execution_id')
if not execution_id:
logger.error('Pulse submit response missing execution_id')
return None
self.last_execution_id = execution_id
except Exception as e:
logger.error(f'Pulse command submit failed: {e}')
@@ -315,6 +321,14 @@ class PulseClient:
return self.run_command(command, _retry=False)
return None
def ping(self, ip: str, count: int = 3, timeout: int = 2) -> bool:
"""Ping *ip* via the Pulse worker. Returns True if host responds."""
ip_q = shlex.quote(ip)
output = self.run_command(
f'ping -c {count} -W {timeout} {ip_q} >/dev/null 2>&1 && echo REACHABLE || echo UNREACHABLE'
)
return output is not None and output.strip() == 'REACHABLE'
# --------------------------------------------------------------------------
# Link stats collector (ethtool + Prometheus traffic metrics)
@@ -344,8 +358,8 @@ class LinkStatsCollector:
if not ifaces or not self.pulse.url:
return {}
# Validate interface names (kernel names only contain [a-zA-Z0-9_.-])
safe_ifaces = [i for i in ifaces if re.match(r'^[a-zA-Z0-9_.@-]+$', i)]
# Validate interface names (kernel names: [a-zA-Z0-9_.-], max 15 chars per IFNAMSIZ)
safe_ifaces = [i for i in ifaces if re.match(r'^[a-zA-Z0-9_.-]{1,15}$', i)]
if not safe_ifaces:
return {}
@@ -363,10 +377,10 @@ class LinkStatsCollector:
shell_cmd = ' '.join(parts)
ssh_cmd = (
f'ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 '
f'ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 '
f'-o BatchMode=yes -o LogLevel=ERROR '
f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 '
f'root@{ip} "{shell_cmd}"'
f'root@{ip} {shlex.quote(shell_cmd)}'
)
output = self.pulse.run_command(ssh_cmd)
if output is None:
@@ -604,7 +618,7 @@ class LinkStatsCollector:
return {
'hosts': result_hosts,
'unifi_switches': unifi_switches,
'updated': datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC'),
'updated': datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC'),
}
def _compute_unifi_rates(self, raw: Dict[str, dict], now: float) -> Dict[str, dict]:
@@ -638,21 +652,8 @@ class LinkStatsCollector:
# --------------------------------------------------------------------------
# Helpers
# --------------------------------------------------------------------------
def ping(ip: str, count: int = 3, timeout: int = 2) -> bool:
try:
r = subprocess.run(
['ping', '-c', str(count), '-W', str(timeout), ip],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
timeout=30,
)
return r.returncode == 0
except Exception:
return False
def _now_utc() -> str:
return datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC')
return datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC')
# --------------------------------------------------------------------------
@@ -671,6 +672,7 @@ class NetworkMonitor:
self.unifi = UnifiClient(self.cfg['unifi'])
self.tickets = TicketClient(self.cfg.get('ticket_api', {}))
self.link_stats = LinkStatsCollector(self.cfg, self.prom, self.unifi)
self.pulse = self.link_stats.pulse # convenience alias
mon = self.cfg.get('monitor', {})
self.poll_interval = mon.get('poll_interval', 120)
@@ -732,7 +734,7 @@ class NetworkMonitor:
f'Interface {iface} on {host} went link-down ({_now_utc()})',
)
if not sup and consec >= self.fail_thresh:
self._ticket_interface(event_id, is_new, host, iface, consec)
self._ticket_interface(event_id, host, iface, consec)
if host_has_regression:
hosts_with_regression.append(host)
@@ -769,7 +771,7 @@ class NetworkMonitor:
db.resolve_event('cluster_network_issue', self.cluster_name, '')
def _ticket_interface(
self, event_id: int, is_new: bool, host: str, iface: str, consec: int
self, event_id: int, host: str, iface: str, consec: int
) -> None:
title = (
f'[{host}][auto][production][issue][network][single-node] '
@@ -787,7 +789,7 @@ class NetworkMonitor:
f'Please inspect the cable/SFP/switch port for {host}/{iface}.'
)
tid = self.tickets.create(title, desc, priority='2')
if tid and is_new:
if tid:
db.set_ticket_id(event_id, tid)
# ------------------------------------------------------------------
@@ -808,11 +810,11 @@ class NetworkMonitor:
f'UniFi {name} ({d.get("ip","")}) offline ({_now_utc()})',
)
if not sup and consec >= self.fail_thresh:
self._ticket_unifi(event_id, is_new, d)
self._ticket_unifi(event_id, d)
else:
db.resolve_event('unifi_device_offline', name, d.get('type', ''))
def _ticket_unifi(self, event_id: int, is_new: bool, device: dict) -> None:
def _ticket_unifi(self, event_id: int, device: dict) -> None:
name = device['name']
title = (
f'[{name}][auto][production][issue][network][single-node] '
@@ -829,7 +831,7 @@ class NetworkMonitor:
f'Please check power and cable connectivity.'
)
tid = self.tickets.create(title, desc, priority='2')
if tid and is_new:
if tid:
db.set_ticket_id(event_id, tid)
# ------------------------------------------------------------------
@@ -838,7 +840,7 @@ class NetworkMonitor:
def _process_ping_hosts(self, suppressions: list) -> None:
for h in self.cfg.get('monitor', {}).get('ping_hosts', []):
name, ip = h['name'], h['ip']
reachable = ping(ip)
reachable = self.pulse.ping(ip)
if not reachable:
sup = db.check_suppressed(suppressions, 'host', name)
@@ -848,12 +850,12 @@ class NetworkMonitor:
f'Host {name} ({ip}) unreachable via ping ({_now_utc()})',
)
if not sup and consec >= self.fail_thresh:
self._ticket_unreachable(event_id, is_new, name, ip, consec)
self._ticket_unreachable(event_id, name, ip, consec)
else:
db.resolve_event('host_unreachable', name, ip)
def _ticket_unreachable(
self, event_id: int, is_new: bool, name: str, ip: str, consec: int
self, event_id: int, name: str, ip: str, consec: int
) -> None:
title = (
f'[{name}][auto][production][issue][network][single-node] '
@@ -871,7 +873,7 @@ class NetworkMonitor:
f'Please check the host power, management interface, and network connectivity.'
)
tid = self.tickets.create(title, desc, priority='2')
if tid and is_new:
if tid:
db.set_ticket_id(event_id, tid)
# ------------------------------------------------------------------
@@ -908,7 +910,7 @@ class NetworkMonitor:
for h in self.cfg.get('monitor', {}).get('ping_hosts', []):
name, ip = h['name'], h['ip']
reachable = ping(ip, count=1, timeout=2)
reachable = self.pulse.ping(ip, count=1, timeout=2)
hosts[name] = {
'ip': ip,
'interfaces': {},
@@ -919,7 +921,7 @@ class NetworkMonitor:
return {
'hosts': hosts,
'unifi': display_unifi,
'updated': datetime.utcnow().isoformat(),
'updated': datetime.now(timezone.utc).isoformat().replace('+00:00', 'Z'),
}
# ------------------------------------------------------------------
@@ -967,6 +969,8 @@ class NetworkMonitor:
except Exception as e:
logger.error(f'Monitor loop error: {e}', exc_info=True)
time.sleep(30)
continue
time.sleep(self.poll_interval)
+20 -5
View File
@@ -220,7 +220,7 @@ function updateEventsTable(events, totalActive) {
? GANDALF_CONFIG.ticket_web_url : 'http://t.lotusguild.org/ticket/';
const ticket = e.ticket_id
? `<a href="${lt.escHtml(ticketBase)}${lt.escHtml(String(e.ticket_id))}" target="_blank"
class="ticket-link">#${e.ticket_id}</a>`
class="ticket-link">#${lt.escHtml(String(e.ticket_id))}</a>`
: '';
const supBadge = e.is_suppressed
? `<span class="lt-badge badge-suppressed" title="Alert suppressed">🔕 sup</span>`
@@ -294,18 +294,33 @@ function updateSuppressForm() {
const type = document.getElementById('sup-type').value;
const nameGrp = document.getElementById('sup-name-group');
const detailGrp = document.getElementById('sup-detail-group');
const nameInput = document.getElementById('sup-name');
const detailInput = document.getElementById('sup-detail');
if (nameGrp) nameGrp.style.display = (type === 'all') ? 'none' : '';
if (detailGrp) detailGrp.style.display = (type === 'interface') ? '' : 'none';
if (nameInput) {
const req = (type !== 'all');
nameInput.required = req;
nameInput.setAttribute('aria-required', String(req));
}
if (detailInput) {
const req = (type === 'interface');
detailInput.required = req;
detailInput.setAttribute('aria-required', String(req));
}
}
function setDuration(mins, el) {
document.getElementById('sup-expires').value = mins || '';
document.querySelectorAll('#suppress-modal .pill').forEach(p => {
function setDuration(mins, el, opts) {
const o = opts || {};
const expiresEl = document.getElementById(o.expiresId || 'sup-expires');
const pillSel = o.pillSel || '#suppress-modal .pill';
const hint = document.getElementById(o.hintId || 'duration-hint');
if (expiresEl) expiresEl.value = mins || '';
document.querySelectorAll(pillSel).forEach(p => {
p.classList.remove('active');
p.setAttribute('aria-pressed', 'false');
});
if (el) { el.classList.add('active'); el.setAttribute('aria-pressed', 'true'); }
const hint = document.getElementById('duration-hint');
if (hint) {
if (mins) {
const h = Math.floor(mins / 60), m = mins % 60;
+1
View File
@@ -217,6 +217,7 @@
.sev-pills { display: flex; gap: 4px; }
.g-page-sub { font-size: .78em; color: var(--text-muted); margin-top: 4px; }
.g-page-sub-aside { font-size: .78em; color: var(--text-muted); margin-left: 8px; }
.g-stale-warn { color: var(--orange); font-weight: 600; }
/* ── Badge severity color variants (used with lt-badge) ───────────── */
.badge-critical { color: var(--red); border-color: var(--red); text-shadow: var(--glow-red); }
+12 -11
View File
@@ -144,9 +144,9 @@
<!-- ⌘K affordance -->
<button type="button"
class="lt-btn lt-btn-ghost lt-btn-sm lt-cmd-hint-btn"
data-action="open-cmdpalette"
title="Command palette (Ctrl+K)"
aria-label="Open command palette"
onclick="if(window.lt&&lt.cmdPalette)lt.cmdPalette.open()">&#x2315;&nbsp;K</button>
aria-label="Open command palette">&#x2315;&nbsp;K</button>
<button type="button" class="lt-theme-btn" id="lt-theme-btn"
aria-label="Toggle theme" title="Toggle light/dark mode">&#x2600;</button>
@@ -227,16 +227,16 @@
<div class="lt-form-group">
<label class="lt-label" for="sup-reason">Reason <span class="required">*</span></label>
<input type="text" class="lt-input" id="sup-reason" name="reason"
placeholder="e.g. Planned switch reboot" required>
placeholder="e.g. Planned switch reboot" required aria-required="true">
</div>
<div class="lt-form-group lt-form-group--last">
<label class="lt-label">Duration</label>
<div class="duration-pills" role="group" aria-label="Select suppression duration">
<button type="button" class="pill" data-duration="30" aria-pressed="false">30 min</button>
<button type="button" class="pill" data-duration="60" aria-pressed="false">1 hr</button>
<button type="button" class="pill" data-duration="240" aria-pressed="false">4 hr</button>
<button type="button" class="pill" data-duration="480" aria-pressed="false">8 hr</button>
<button type="button" class="pill pill-manual active" data-duration="" aria-pressed="true">Manual &#x221E;</button>
<button type="button" class="pill" data-duration="30" aria-pressed="false" aria-label="30 minutes">30 min</button>
<button type="button" class="pill" data-duration="60" aria-pressed="false" aria-label="1 hour">1 hr</button>
<button type="button" class="pill" data-duration="240" aria-pressed="false" aria-label="4 hours">4 hr</button>
<button type="button" class="pill" data-duration="480" aria-pressed="false" aria-label="8 hours">8 hr</button>
<button type="button" class="pill pill-manual active" data-duration="" aria-pressed="true" aria-label="Manual, no expiry">Manual &#x221E;</button>
</div>
<input type="hidden" id="sup-expires" name="expires_minutes" value="">
<div class="lt-field-hint" id="duration-hint">Persists until manually removed.</div>
@@ -313,7 +313,7 @@
<script>
const GANDALF_CONFIG = {
ticket_web_url: "{{ config.get('ticket_api', {}).get('web_url', 'http://t.lotusguild.org/ticket/') }}"
ticket_web_url: {{ config.get('ticket_api', {}).get('web_url', 'http://t.lotusguild.org/ticket/') | tojson }}
};
</script>
<script src="{{ url_for('static', filename='app.js') }}"></script>
@@ -346,8 +346,9 @@
const btn = e.target.closest('[data-action]');
if (!btn) return;
const action = btn.getAttribute('data-action');
if (action === 'show-keyboard-help' && window.lt) lt.modal.open('lt-keys-help');
if (action === 'open-settings' && window.lt) lt.modal.open('lt-settings-modal');
if (action === 'open-cmdpalette' && window.lt && lt.cmdPalette) lt.cmdPalette.open();
if (action === 'show-keyboard-help' && window.lt) lt.modal.open('lt-keys-help');
if (action === 'open-settings' && window.lt) lt.modal.open('lt-settings-modal');
});
lt.keys.on('r', function() { lt.autoRefresh.now(); });
+10 -6
View File
@@ -5,7 +5,7 @@
<!-- ── Status bar ──────────────────────────────────────────────────── -->
<div class="status-bar">
<div class="status-chips">
<div class="status-chips" id="status-chips" aria-live="polite" aria-atomic="true">
{% if not daemon_ok %}
<span class="chip chip-critical">⚠ MONITOR OFFLINE</span>
{% endif %}
@@ -30,7 +30,8 @@
<div class="lt-stats-grid">
<div class="lt-stat-card{% if summary.critical %} lt-stat-card--alert{% endif %}"
id="stat-critical" role="button" tabindex="0"
data-stat-filter="critical" aria-label="{{ summary.critical or 0 }} critical alerts">
data-stat-filter="critical" aria-label="{{ summary.critical or 0 }} critical alerts"
aria-controls="events-table-wrap">
<span class="lt-stat-icon lt-text-red" aria-hidden="true"></span>
<div class="lt-stat-info">
<span class="lt-stat-value lt-text-red" id="stat-critical-val">{{ summary.critical or 0 }}</span>
@@ -39,7 +40,8 @@
</div>
<div class="lt-stat-card"
id="stat-warning" role="button" tabindex="0"
data-stat-filter="warning" aria-label="{{ summary.warning or 0 }} warning alerts">
data-stat-filter="warning" aria-label="{{ summary.warning or 0 }} warning alerts"
aria-controls="events-table-wrap">
<span class="lt-stat-icon lt-text-amber" aria-hidden="true"></span>
<div class="lt-stat-info">
<span class="lt-stat-value lt-text-amber" id="stat-warning-val">{{ summary.warning or 0 }}</span>
@@ -90,7 +92,7 @@
<div id="events-table-wrap">
{% if events %}
{% if total_active is defined and total_active > events|length %}
<div class="pagination-notice">Showing {{ events|length }} of {{ total_active }} active alerts &mdash; <a href="/api/events?limit=1000">view all via API</a></div>
<div class="pagination-notice">Showing {{ events|length }} of {{ total_active }} active alerts — use the search box to filter, or <a href="/api/events?limit=1000" target="_blank" rel="noopener">export all as JSON</a></div>
{% endif %}
<div class="lt-table-wrap">
<table class="lt-table" id="events-table">
@@ -322,6 +324,7 @@
</div>
</div>
<div class="host-grid" id="host-grid">
{%- set has_global_sup = suppressions | selectattr('target_type', 'equalto', 'all') | list | length > 0 -%}
{% for name, host in snapshot.hosts.items() %}
{% set suppressed = suppressions | selectattr('target_name', 'equalto', name) | list %}
<div class="host-card host-card-{{ host.status }}" data-host="{{ name }}">
@@ -329,7 +332,7 @@
<div class="host-name-row">
<span class="host-status-dot dot-{{ host.status }}"></span>
<span class="host-name">{{ name }}</span>
{% if suppressed %}
{% if suppressed or has_global_sup %}
<span class="badge-suppressed" title="Suppressed">🔕</span>
{% endif %}
</div>
@@ -466,7 +469,7 @@
{% block scripts %}
<script>
// Start auto-refresh using saved settings interval (default 30 s)
const _savedInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 30;
const _savedInterval = window.gandalfSettings?.refreshInterval ?? 30;
if (_savedInterval > 0) lt.autoRefresh.start(refreshAll, _savedInterval * 1000);
// When settings change, restart auto-refresh with new interval
@@ -484,6 +487,7 @@
function setCollapsed(v) {
wrap.classList.toggle('is-collapsed', v);
wrap.setAttribute('aria-hidden', v ? 'true' : 'false');
btn.setAttribute('aria-expanded', v ? 'false' : 'true');
btn.textContent = v ? '▾ Expand' : '▴ Collapse';
try { localStorage.setItem(LS_KEY, v ? '1' : '0'); } catch(_) {}
+42 -17
View File
@@ -14,10 +14,10 @@
</div>
<div class="inspector-layout">
<div class="inspector-main" id="inspector-main">
<div class="inspector-main" id="inspector-main" role="region" aria-label="Switch chassis diagrams">
<div class="link-loading">Loading inspector data</div>
</div>
<div class="inspector-panel" id="inspector-panel">
<div class="inspector-panel" id="inspector-panel" role="complementary" aria-label="Port detail panel">
<div class="inspector-panel-inner" id="inspector-panel-inner"></div>
</div>
</div>
@@ -107,10 +107,8 @@ function portBlockHtml(idx, port, swName, sfpBlock) {
const sfpCls = sfpBlock ? ' sfp-block' : '';
const speedTxt = portSpeedLabel(port);
// LLDP neighbor: first 6 chars of hostname
const lldpName = (port && port.lldp_table && port.lldp_table.length)
? escHtml((port.lldp_table[0].chassis_id_subtype === 'local'
? port.lldp_table[0].chassis_id
: port.lldp_table[0].system_name || port.lldp_table[0].chassis_id || '').slice(0, 6))
const lldpName = (port && port.lldp && (port.lldp.system_name || port.lldp.chassis_id))
? escHtml((port.lldp.system_name || port.lldp.chassis_id || '').slice(0, 6))
: '';
const lldpHtml = lldpName ? `<span class="port-lldp">${lldpName}</span>` : '';
const speedHtml = speedTxt ? `<span class="port-speed">${speedTxt}</span>` : '';
@@ -162,10 +160,8 @@ function renderChassis(swName, sw) {
const state = portBlockState(port);
const title = port ? escHtml(port.name) : `Port ${idx}`;
const speedTxt = portSpeedLabel(port);
const lldpName = (port && port.lldp_table && port.lldp_table.length)
? escHtml((port.lldp_table[0].chassis_id_subtype === 'local'
? port.lldp_table[0].chassis_id
: port.lldp_table[0].system_name || port.lldp_table[0].chassis_id || '').slice(0, 6))
const lldpName = (port && port.lldp && (port.lldp.system_name || port.lldp.chassis_id))
? escHtml((port.lldp.system_name || port.lldp.chassis_id || '').slice(0, 6))
: '';
const speedHtml = speedTxt ? `<span class="port-speed">${speedTxt}</span>` : '';
const lldpHtml = lldpName ? `<span class="port-lldp">${lldpName}</span>` : '';
@@ -222,6 +218,7 @@ let _apiData = null;
function selectPort(el) {
const swName = el.dataset.switch;
const idx = parseInt(el.dataset.portIdx, 10);
if (_diagPollTimer) { clearInterval(_diagPollTimer); _diagPollTimer = null; }
document.querySelectorAll('.switch-port-block.selected')
.forEach(e => e.classList.remove('selected'));
el.classList.add('selected');
@@ -231,6 +228,7 @@ function selectPort(el) {
}
function closePanel() {
if (_diagPollTimer) { clearInterval(_diagPollTimer); _diagPollTimer = null; }
document.getElementById('inspector-panel').classList.remove('open');
document.querySelectorAll('.switch-port-block.selected')
.forEach(el => el.classList.remove('selected'));
@@ -262,7 +260,7 @@ function renderPanel(swName, idx) {
const poeCurStr = (d.poe_power != null && d.poe_power > 0) ? ` / draw <span class="val-amber">${d.poe_power.toFixed(1)}W</span>` : '';
poeHtml = `
<div class="lt-divider"><span class="lt-divider-label">PoE</span></div>
<div class="panel-row"><span class="panel-label">Class</span><span class="panel-val">class ${d.poe_class}${poeMaxStr}</span></div>
<div class="panel-row"><span class="panel-label">Class</span><span class="panel-val">class ${escHtml(String(d.poe_class))}${poeMaxStr}</span></div>
${d.poe_power != null ? `<div class="panel-row"><span class="panel-label">Draw</span><span class="panel-val">${d.poe_power > 0 ? `<span class="val-amber">${d.poe_power.toFixed(1)}W</span>` : '0W'}</span></div>` : ''}
${d.poe_mode ? `<div class="panel-row"><span class="panel-label">Mode</span><span class="panel-val">${escHtml(d.poe_mode)}</span></div>` : ''}`;
}
@@ -431,7 +429,14 @@ function renderInspector(data) {
const updEl = document.getElementById('inspector-updated');
if (updEl && data.updated) {
updEl.textContent = 'Updated: ' + new Date(data.updated + (data.updated.includes('Z') ? '' : 'Z')).toLocaleTimeString();
const updMs = new Date(_toIso(data.updated));
const ageMin = (Date.now() - updMs) / 60000;
const timeStr = updMs.toLocaleTimeString();
if (ageMin > 15) {
updEl.innerHTML = `<span class="g-stale-warn" title="Data is ${Math.floor(ageMin)} minutes old — monitor may be down">⚠ Stale: ${timeStr}</span>`;
} else {
updEl.textContent = 'Updated: ' + timeStr;
}
}
if (!Object.keys(switches).length) {
@@ -468,7 +473,7 @@ async function loadInspector() {
}
loadInspector();
var _inspInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 60;
const _inspInterval = window.gandalfSettings?.refreshInterval ?? 60;
if (_inspInterval > 0) lt.autoRefresh.start(loadInspector, Math.max(_inspInterval, 15) * 1000);
window.onGandalfSettingsChanged = function(s) {
@@ -490,7 +495,13 @@ document.addEventListener('click', e => {
if (diagBtn) { runDiagnostic(diagBtn.dataset.sw, parseInt(diagBtn.dataset.idx, 10)); return; }
const toggleDiag = e.target.closest('[data-action="toggle-diag"]');
if (toggleDiag) { toggleDiag.parentElement.classList.toggle('diag-open'); return; }
if (toggleDiag) {
const section = toggleDiag.parentElement;
const nowOpen = section.classList.toggle('diag-open');
const hint = toggleDiag.querySelector('.diag-toggle-hint');
if (hint) hint.textContent = nowOpen ? '[collapse]' : '[expand]';
return;
}
});
// ── Link Diagnostics ─────────────────────────────────────────────────
@@ -513,7 +524,10 @@ function runDiagnostic(swName, portIdx) {
pollDiagnostic(resp.job_id, statusEl, resultsEl);
})
.catch(e => {
statusEl.textContent = 'Error: ' + (e.message || 'Request failed');
const msg = (e && e.status === 429)
? 'Rate limit reached — max 5 diagnostics per minute. Please wait.'
: 'Error: ' + (e && e.message || 'Request failed');
statusEl.textContent = msg;
});
}
@@ -523,7 +537,13 @@ function pollDiagnostic(jobId, statusEl, resultsEl) {
attempts++;
if (attempts > 120) { // 2min timeout
clearInterval(_diagPollTimer);
statusEl.textContent = 'Timed out waiting for results.';
_diagPollTimer = null;
statusEl.innerHTML = 'Timed out waiting for results. '
+ '<button class="lt-btn lt-btn-ghost lt-btn-sm" id="diag-retry-btn">Retry</button>';
document.getElementById('diag-retry-btn')?.addEventListener('click', () => {
const sel = document.querySelector('.switch-port-block.selected');
if (sel) runDiagnostic(sel.dataset.switch, parseInt(sel.dataset.portIdx));
});
return;
}
lt.api.get(`/api/diagnose/${jobId}`)
@@ -538,7 +558,12 @@ function pollDiagnostic(jobId, statusEl, resultsEl) {
.catch(() => {
clearInterval(_diagPollTimer);
_diagPollTimer = null;
statusEl.textContent = 'Error: lost connection while collecting diagnostics.';
statusEl.innerHTML = 'Error: lost connection while collecting diagnostics. '
+ '<button class="lt-btn lt-btn-ghost lt-btn-sm" id="diag-retry-btn">Retry</button>';
document.getElementById('diag-retry-btn')?.addEventListener('click', () => {
const sel = document.querySelector('.switch-port-block.selected');
if (sel) runDiagnostic(sel.dataset.switch, parseInt(sel.dataset.portIdx));
});
});
}, 2000);
}
+41 -23
View File
@@ -36,7 +36,6 @@
{% block scripts %}
<script>
const escHtml = s => lt.escHtml(s);
const _toIso = s => s ? s.replace(' UTC', 'Z').replace(' ', 'T') : s;
// ── Formatting helpers ────────────────────────────────────────────
function fmtRate(bytesPerSec) {
@@ -349,7 +348,7 @@ function renderUnifiSwitches(unifiSwitches, dataUpdated) {
return `
<div class="link-host-panel" id="panel-${CSS.escape(swName)}">
<div class="link-host-title" data-action="toggle-panel">
<div class="link-host-title" data-action="toggle-panel" role="button" tabindex="0" aria-expanded="true">
<span class="link-host-name">${escHtml(swName)}</span>
<span class="link-host-ip">${escHtml(sw.ip || '')}</span>
<span class="link-host-upd">${escHtml(sw.model || '')}${updStr ? ' · ' + updStr : ''}${poeLoad}</span>
@@ -366,25 +365,32 @@ function renderUnifiSwitches(unifiSwitches, dataUpdated) {
// ── Panel collapse / expand ───────────────────────────────────────
function togglePanel(panel) {
panel.classList.toggle('collapsed');
const btn = panel.querySelector('.panel-toggle');
if (btn) btn.textContent = panel.classList.contains('collapsed') ? '[+]' : '[]';
const isCollapsed = panel.classList.contains('collapsed');
const btn = panel.querySelector('.panel-toggle');
const title = panel.querySelector('.link-host-title');
if (btn) btn.textContent = isCollapsed ? '[+]' : '[]';
if (title) title.setAttribute('aria-expanded', isCollapsed ? 'false' : 'true');
const id = panel.id;
if (id) {
const collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}');
let collapsed = {};
try { collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}'); } catch(_) {}
collapsed[id] = panel.classList.contains('collapsed');
sessionStorage.setItem('linksCollapsed', JSON.stringify(collapsed));
try { sessionStorage.setItem('linksCollapsed', JSON.stringify(collapsed)); } catch(_) {}
}
}
function restoreCollapseState() {
const collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}');
let collapsed = {};
try { collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}'); } catch(_) {}
for (const [id, isCollapsed] of Object.entries(collapsed)) {
const panel = document.getElementById(id);
if (!panel) continue;
if (isCollapsed) {
panel.classList.add('collapsed');
const btn = panel.querySelector('.panel-toggle');
if (btn) btn.textContent = '[+]';
const btn = panel.querySelector('.panel-toggle');
const title = panel.querySelector('.link-host-title');
if (btn) btn.textContent = '[+]';
if (title) title.setAttribute('aria-expanded', 'false');
}
}
}
@@ -463,12 +469,12 @@ function renderLinks(data) {
const sample = Object.values(ifaces)[0] || {};
const ip = sample.host_ip || '';
const updStr = data.updated
? new Date(data.updated.replace(' UTC', 'Z').replace(' ', 'T')).toLocaleTimeString()
? new Date(_toIso(data.updated)).toLocaleTimeString()
: '';
parts.push(`
<div class="link-host-panel" id="panel-${CSS.escape(hostname)}">
<div class="link-host-title" data-action="toggle-panel">
<div class="link-host-title" data-action="toggle-panel" role="button" tabindex="0" aria-expanded="true">
<span class="link-host-name">${escHtml(hostname)}</span>
<span class="link-host-ip">${escHtml(ip)}</span>
<span class="link-host-upd">${updStr}</span>
@@ -498,27 +504,33 @@ function applyLinksSearch() {
function collapseAll() {
document.querySelectorAll('.link-host-panel').forEach(p => {
p.classList.add('collapsed');
const btn = p.querySelector('.panel-toggle');
if (btn) btn.textContent = '[+]';
const btn = p.querySelector('.panel-toggle');
const title = p.querySelector('.link-host-title');
if (btn) btn.textContent = '[+]';
if (title) title.setAttribute('aria-expanded', 'false');
});
sessionStorage.setItem('linksCollapsed', JSON.stringify(
Object.fromEntries([...document.querySelectorAll('.link-host-panel')].map(p => [p.id, true]))
));
try {
sessionStorage.setItem('linksCollapsed', JSON.stringify(
Object.fromEntries([...document.querySelectorAll('.link-host-panel')].map(p => [p.id, true]))
));
} catch(_) {}
}
function expandAll() {
document.querySelectorAll('.link-host-panel').forEach(p => {
p.classList.remove('collapsed');
const btn = p.querySelector('.panel-toggle');
if (btn) btn.textContent = '[]';
const btn = p.querySelector('.panel-toggle');
const title = p.querySelector('.link-host-title');
if (btn) btn.textContent = '[]';
if (title) title.setAttribute('aria-expanded', 'true');
});
sessionStorage.setItem('linksCollapsed', '{}');
try { sessionStorage.setItem('linksCollapsed', '{}'); } catch(_) {}
}
// ── Stale data warning ────────────────────────────────────────────
function checkLinksStale(updatedStr) {
if (!updatedStr) return;
const age = (Date.now() - new Date(updatedStr + (updatedStr.includes('Z') ? '' : 'Z'))) / 1000;
const age = (Date.now() - new Date(_toIso(updatedStr))) / 1000;
let banner = document.getElementById('links-stale-banner');
if (age > 120) {
if (!banner) {
@@ -540,14 +552,14 @@ function checkLinksStale(updatedStr) {
async function loadLinks() {
try {
const data = await lt.api.get('/api/links');
if (!data.hosts && !data.unifi_switches) {
if ((!data.hosts || !Object.keys(data.hosts).length) && (!data.unifi_switches || !Object.keys(data.unifi_switches).length)) {
document.getElementById('links-container').innerHTML =
'<div class="link-no-data">No link data yet — monitor has not completed a full cycle.</div>';
return;
}
const updEl = document.getElementById('links-updated');
if (updEl && data.updated) {
updEl.textContent = 'Updated: ' + new Date(data.updated + (data.updated.includes('Z') ? '' : 'Z')).toLocaleTimeString();
updEl.textContent = 'Updated: ' + new Date(_toIso(data.updated)).toLocaleTimeString();
}
renderLinks(data);
checkLinksStale(data.updated);
@@ -559,7 +571,7 @@ async function loadLinks() {
}
loadLinks();
const _linksInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 60;
const _linksInterval = window.gandalfSettings?.refreshInterval ?? 60;
if (_linksInterval > 0) lt.autoRefresh.start(loadLinks, Math.max(_linksInterval, 15) * 1000);
window.onGandalfSettingsChanged = function(s) {
@@ -575,6 +587,12 @@ document.addEventListener('click', e => {
if (e.target.closest('[data-action="expand-all"]')) { expandAll(); return; }
});
document.addEventListener('keydown', e => {
if (e.key !== 'Enter' && e.key !== ' ') return;
const toggleTitle = e.target.closest('[data-action="toggle-panel"]');
if (toggleTitle) { e.preventDefault(); togglePanel(toggleTitle.closest('.link-host-panel')); }
});
document.getElementById('links-search')?.addEventListener('input', applyLinksSearch);
</script>
{% endblock %}
+15 -24
View File
@@ -32,7 +32,7 @@
<label class="lt-label" for="s-name">Target Name <span class="required">*</span></label>
<input type="text" class="lt-input" id="s-name" name="target_name"
placeholder="hostname or device name" autocomplete="off"
list="target-name-list">
required aria-required="true" list="target-name-list">
<datalist id="target-name-list">
{% for name in snapshot.hosts.keys() | sort %}
<option value="{{ name }}">
@@ -51,7 +51,7 @@
<label class="lt-label" for="s-reason">Reason <span class="required">*</span></label>
<input type="text" class="lt-input" id="s-reason" name="reason"
placeholder="e.g. Planned switch maintenance, replacing SFP on large1/enp43s0"
required>
required aria-required="true">
</div>
</div>
@@ -59,11 +59,11 @@
<div class="lt-form-group">
<label class="lt-label">Duration</label>
<div class="duration-pills" role="group" aria-label="Select suppression duration">
<button type="button" class="pill" data-duration="30" aria-pressed="false">30 min</button>
<button type="button" class="pill" data-duration="60" aria-pressed="false">1 hr</button>
<button type="button" class="pill" data-duration="240" aria-pressed="false">4 hr</button>
<button type="button" class="pill" data-duration="480" aria-pressed="false">8 hr</button>
<button type="button" class="pill pill-manual active" data-duration="" aria-pressed="true">Manual ∞</button>
<button type="button" class="pill" data-duration="30" aria-pressed="false" aria-label="30 minutes">30 min</button>
<button type="button" class="pill" data-duration="60" aria-pressed="false" aria-label="1 hour">1 hr</button>
<button type="button" class="pill" data-duration="240" aria-pressed="false" aria-label="4 hours">4 hr</button>
<button type="button" class="pill" data-duration="480" aria-pressed="false" aria-label="8 hours">8 hr</button>
<button type="button" class="pill pill-manual active" data-duration="" aria-pressed="true" aria-label="Manual, no expiry">Manual ∞</button>
</div>
<input type="hidden" id="s-expires" name="expires_minutes" value="">
<div class="lt-field-hint" id="s-dur-hint">Persists until manually removed.</div>
@@ -217,23 +217,16 @@
const t = document.getElementById('s-type').value;
document.getElementById('name-group').style.display = (t==='all') ? 'none' : '';
document.getElementById('detail-group').style.display = (t==='interface') ? '' : 'none';
document.getElementById('s-name').required = (t!=='all');
const nameInput = document.getElementById('s-name');
if (nameInput) {
const req = (t !== 'all');
nameInput.required = req;
nameInput.setAttribute('aria-required', String(req));
}
}
function setDur(mins, el) {
document.getElementById('s-expires').value = mins || '';
document.querySelectorAll('.duration-pills .pill').forEach(p => {
p.classList.remove('active');
p.setAttribute('aria-pressed', 'false');
});
if (el) { el.classList.add('active'); el.setAttribute('aria-pressed', 'true'); }
const hint = document.getElementById('s-dur-hint');
if (mins) {
const h = Math.floor(mins/60), m = mins%60;
hint.textContent = `Expires in ${h?h+'h ':''}${m?m+'m':''}`.trim()+'.';
} else {
hint.textContent = 'Persists until manually removed.';
}
setDuration(mins, el, { expiresId: 's-expires', pillSel: '#create-suppression-form .pill', hintId: 's-dur-hint' });
}
function renderActiveRows(rows) {
@@ -302,9 +295,7 @@
showToast('Suppression applied', 'success');
form.reset();
onTypeChange();
document.querySelectorAll('.duration-pills .pill').forEach(p => p.classList.remove('active'));
document.querySelector('.duration-pills .pill-manual')?.classList.add('active');
document.getElementById('s-dur-hint').textContent = 'Persists until manually removed.';
setDur(null, document.querySelector('#create-suppression-form .pill-manual'));
await refreshActive();
} catch (err) {
showToast(err.message || 'Error', 'error');
+8 -2
View File
@@ -9,9 +9,9 @@ from diagnose import DiagnosticsRunner # noqa: E402
# ── build_ssh_command ────────────────────────────────────────────────────────
class TestBuildSshCommand:
def test_contains_stricthostkeychecking_no(self):
def test_contains_stricthostkeychecking_accept_new(self):
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')
assert 'StrictHostKeyChecking=no' in cmd
assert 'StrictHostKeyChecking=accept-new' in cmd
def test_contains_host_ip(self):
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')
@@ -36,6 +36,12 @@ class TestBuildSshCommand:
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')
assert 'ethtool' in cmd
def test_dmesg_uses_fixed_string_grep(self):
# grep -F prevents iface names with dots (e.g. eth0.1) being treated as
# regex wildcards; -- prevents leading - from being parsed as a flag
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')
assert 'grep -F --' in cmd
# ── parse_output ─────────────────────────────────────────────────────────────