upsert_event now returns ticket_id (4th element) so callers can skip
ticket creation when one already exists. This prevents calling the ticket
API every poll cycle for ongoing issues while still retrying if the
previous creation attempt failed (ticket_id stays NULL until success).
Cluster events use (is_new or not ticket_id) so they too get retried
on failure rather than relying solely on is_new.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The comment claimed the function "runs daily event purge" — that
housekeeping is done by monitor.py's main loop, not here.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_collect_snapshot called pulse.ping(count=1) independently from
_process_ping_hosts which called pulse.ping(count=3). This doubled
network load and could show a host as 'up' in the dashboard while
simultaneously firing an 'unreachable' alert, or vice versa.
Now ping_states is computed once in run() using the alert-quality
parameters (count=3) and shared by both snapshot and alert processing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
datetime.utcnow() is deprecated in Python 3.12 and removed in 3.13.
Replace all four call sites with timezone-aware equivalents so the
codebase is ready for Python 3.12+.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without a config file, ESLint was running with no-undef disabled, meaning
undefined variable references in static/app.js were silently ignored.
Add .eslintrc.json with no-undef: error and eqeqeq: error so CI actually
catches JS bugs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Using || 30 / || 60 as a fallback treats refreshInterval=0 (Off) as
falsy and replaces it with the default, causing auto-refresh to start
even when the user saved 'Off'. Replace with nullish coalescing (??)
so only null/undefined triggers the default.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On exception the monitor slept 30s inside the except block then fell
through to time.sleep(poll_interval), giving a 150s recovery gap instead
of 30s. Adding continue after the error sleep fixes this.
Also adds a regression test asserting dmesg filtering uses grep -F --
so a future refactor cannot silently reintroduce the regex wildcard bug.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
monitor.py checks both 'interface' and 'host' suppressions for interface_down
events, but _annotate_suppressions only checked 'interface'. A host-level
suppression would silently suppress tickets but not mark the table row as
suppressed in the UI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Global suppressions (target_type='all') have an empty target_name, so
the selectattr filter never matched them, leaving no visual indicator
when a global maintenance window was active. Pre-compute has_global_sup
before the host loop and OR it into the badge condition.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
grep {iface} treats dots and other special chars as regex metacharacters.
Switch to grep -F -- {iface} for fixed-string matching and to prevent
a leading dash in the interface name from being parsed as a grep flag.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After fixing the is_new guard bug, is_new is no longer used inside
_ticket_interface, _ticket_unifi, or _ticket_unreachable. Drop it from
their signatures and call sites.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
monitor.py: _ticket_interface/_ticket_unifi/_ticket_unreachable all used
`if tid and is_new` to guard db.set_ticket_id(). Since is_new is True only
on the first upsert (consec=1) but tickets are created at consec>=fail_thresh
(default 2), is_new is always False when the ticket is created, so the
ticket link never appeared in the UI. Changed to `if tid:`.
links.html: JSON.parse(sessionStorage.getItem(...)) in togglePanel and
restoreCollapseState had no try-catch. Corrupt/stale session storage would
throw an uncaught SyntaxError. Also wrapped all sessionStorage.setItem
calls in try-catch to defend against storage-full / private-browsing errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously switching ports while a diagnostic was running left the
setInterval timer active, causing the result to be written into the
old (now detached) DOM elements and never shown to the user.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The check `!data.hosts && !data.unifi_switches` never caught empty
objects `{}`, which are truthy. Replace with Object.keys length checks
so the friendly "no data yet" banner renders when both collections
are empty.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matches the pattern already used in monitor.py's _ssh_batch(); prevents
quoting breakage if shlex.quote(iface) emits single-quoted tokens inside
the remote command string.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ticket_id was already escaped in the href attribute but the visible
text (#<id>) used the raw value in an innerHTML template literal.
Apply lt.escHtml() for defense-in-depth against a compromised ticket API.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The reason input had `required` for browser validation but was missing
`aria-required="true"`, so screen readers did not announce it as required.
Matches the fix already applied to the equivalent field in base.html.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
monitor.py _ssh_batch(): the remote command was wrapped in double-quotes
(f'root@{ip} "{shell_cmd}"') but shell_cmd itself contains double-quoted
echo sentinels ("___IFACE:eth0___"). When Pulse's shell parses the full
ssh invocation, the nested double-quotes cause mis-parsing — the remote
command is split incorrectly, silently breaking all ethtool/SFP DOM
collection. Fix: use shlex.quote(shell_cmd) so the entire remote command
is single-quoted, leaving inner double-quotes untouched.
TicketClient.create(): data['ticket_id'] raises KeyError if the Tinker
Tickets API returns success=true without a ticket_id field (malformed
response). Use data.get('ticket_id') with an explicit warning log.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
db.py returned all datetime columns (first_seen, last_seen, resolved_at,
created_at, expires_at) as bare ISO strings like "2026-03-14T14:14:21"
with no timezone marker. Per the ECMAScript spec, new Date() on a
datetime string without timezone treats it as LOCAL time, not UTC.
This made lt.time.ago() and stale-detection wrong for any user whose
browser is not in UTC — event ages and stale warnings would be off by
the client's UTC offset.
monitor.py had the same issue on the network_snapshot 'updated' field.
Fix: append 'Z' to all isoformat() calls (UTC datetimes confirmed by
MySQL server timezone and _now_utc() pattern used throughout codebase).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same bug as was just fixed in links.html: data.updated is stored as
"YYYY-MM-DD HH:MM:SS UTC" by monitor.py, so appending 'Z' produced
"…UTCZ" — an invalid date. The stale-data warning and Updated timestamp
in Inspector were silently showing "Invalid Date" and the stale overlay
never fired. Fixed to use _toIso() (already global via app.js).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Security: add require_admin decorator; apply to POST/DELETE /api/suppressions
and /suppressions page. Previously any user in allowed_groups could create or
delete suppressions even though the nav restricts the UI to admins.
Bug: links.html "Updated:" timestamp and stale-warning both produced
Invalid Date because the raw "YYYY-MM-DD HH:MM:SS UTC" string was appended
with 'Z' instead of being normalised through _toIso(). Fix both call sites to
use _toIso(), and remove the now-redundant local _toIso redefinition.
Style: use `with open(sentinel, 'w'): pass` consistently (was open().close()
at avatar JPEG validation path).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- updateSuppressForm() now sets required + aria-required on sup-name/sup-detail
when target type changes; sup-reason gets static aria-required="true"
- onTypeChange() in suppressions page syncs aria-required on s-name
- s-name in suppressions.html gets initial required/aria-required (default type=host)
- Duration pills in both modal and suppressions page now have descriptive
aria-label ("30 minutes", "1 hour", etc.) alongside the group aria-label
- setDuration() in app.js accepts optional {expiresId,pillSel,hintId} opts so
logic lives in one place; suppressions.html setDur() delegates to it
- Post-create form reset uses setDur() instead of manually patching DOM
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- inspector.html: show orange '⚠ Stale: HH:MM' with tooltip when link_stats data is >15 min old (previously just showed the time with no visual warning)
- style.css: add .g-stale-warn helper class (orange, bold) for the stale indicator
- diagnose.py: remove supported_modes accumulation from parse_ethtool() — field was collected but never consumed by analyze() or displayed anywhere
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- inspector.html: collapsible section hint text now toggles between [expand]/[collapse] when clicked
- inspector.html: timeout and connection-loss during diagnostic poll now show a Retry button instead of a dead end
- inspector.html: 429 rate-limit response shows a clear human-readable message instead of generic error
- app.py: empty link_stats fallback now includes unifi_switches:{} for schema consistency with real data shape
- index.html: pagination overflow notice now says "export all as JSON" (opens in new tab) instead of misleadingly linking to raw API as navigation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- inspector.html: fix LLDP neighbor label in port blocks — port.lldp_table never exists; data is at port.lldp (dict with system_name/chassis_id); both port block renderers corrected
- db.py: remove dead 'target_detail IS NULL' branch in suppression check — target_detail is always stored as '' not NULL; query simplified to target_detail=''
- app.py: resolve cache_dir/cache_file/sentinel to absolute paths; guard against path escape before use
- app.py: wrap sentinel os.path.getmtime() in try/except OSError to handle TOCTOU deletion race
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- app.py: split 'with open(sentinel): pass' onto two lines (flake8 E701)
- tests/test_diagnose.py: rename test and assert StrictHostKeyChecking=accept-new (not =no which was fixed earlier)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- app.py: wrap int(cache_ttl) in try/except so a misconfigured non-integer value falls back to 3600 instead of raising ValueError
- base.html: use Jinja2 tojson filter for ticket_web_url to ensure proper JS string escaping regardless of URL contents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- app.py: move conn.unbind() into finally block in api_avatar() so connection is always closed even if conn.search() throws
- app.py: remove elapsed-time strings from /health response (unauthenticated endpoint no longer leaks monitor timing)
- app.py: add after_request hook setting X-Content-Type-Options, X-Frame-Options, Referrer-Policy on all responses
- app.py: add 10 MB size guard on link_stats before JSON parse; log actual exception on parse failure
- app.py: wrap suppressions_page network_snapshot parse in try/except (same protection as index page)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- app.py: fail loudly if LDAP bind_pw is not configured rather than attempting anonymous bind
- app.py: validate expires_minutes is 1–43200 (max 30 days) before storing suppression
- app.py: wrap network_snapshot JSON parse in try/except so a corrupt DB value returns degraded page instead of 500
- app.py: prune _diag_rate entries inactive for >1h to prevent unbounded growth
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- app.py: validate server_name from LLDP with fullmatch before use in logs/lookups (prevents log injection)
- app.py: validate each mgmt_ip candidate before assigning host_ip (avoids assigning non-IP string that then fails later check)
- app.py: log actual exception in link_stats JSON parse error
- inspector.html: clear _diagPollTimer in closePanel() so timer doesn't orphan when panel is closed mid-poll
- monitor.py: sleep 30s after a monitor loop exception before resuming normal poll interval
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- app.py: replace raw str(e) in diagnostic _run() with generic client message; log internally only
- app.py: /health endpoint no longer leaks exception strings to unauthenticated callers; errors logged server-side
- monitor.py: UniFi SSL verification now defaults True, configurable via config.json unifi.verify_ssl; urllib3 warning suppression scoped to verify=False only (removed global disable)
- monitor.py: Pulse execution_id extracted with .get() + explicit None check to avoid KeyError on malformed response
- monitor.py: interface name regex drops '@' (not a valid kernel interface char) to match app.py and fix inconsistency
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Architecture:
- Remove direct subprocess ping from Gandalf; add PulseClient.ping()
which runs the ping via the Pulse worker instead
- Remove standalone ping() function and subprocess import from monitor.py
- Add self.pulse alias to NetworkMonitor for convenience
- Both _process_ping_hosts() and snapshot builder now use self.pulse.ping()
Security:
- Change StrictHostKeyChecking=no → accept-new in both SSH command
builders (monitor.py _ssh_batch, diagnose.py build_ssh_command).
The Pulse worker's known_hosts is now authoritative; host keys are
recorded on first connection and verified on all subsequent ones.
MITM attacks after initial key exchange are now detectable.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
security:
- Fix bare open(sentinel, 'w').close() file descriptor leak; use
context manager instead
- Store requesting username in _diag_jobs at creation time; return 403
from api_diagnose_poll if the polling user does not match the job owner
accessibility:
- Add aria-live="polite" aria-atomic="true" to .status-chips container
so screen readers announce critical/warning count changes on refresh
- Add aria-controls="events-table-wrap" to critical and warning stat
cards so assistive tech knows these buttons control the events table
- Add aria-hidden sync to topology setCollapsed() — hidden topology
content is now removed from the accessibility tree when collapsed,
preventing keyboard focus from entering invisible elements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The command palette button used an inline onclick handler while every
other interactive element in base.html uses data-action + event
delegation. Now consistent: data-action="open-cmdpalette" handled in
the global footer click listener.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add role="button" tabindex="0" aria-expanded to .link-host-title
in both static and JS-rendered panels (host panels + UniFi switches)
- Sync aria-expanded in togglePanel(), restoreCollapseState(),
collapseAll(), and expandAll()
- Add keydown handler (Enter/Space) so panel headers are keyboard-operable
- Add role="region" aria-label to inspector main chassis area
- Add role="complementary" aria-label to inspector port detail panel
- Replace last inline date-parse in renderLinks() with _toIso() helper
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The require_auth decorator was interpolating user['username'] and the
allowed_groups list directly into HTML strings. An attacker with a
crafted username or control over group names could inject arbitrary HTML.
Use html.escape() on both values before insertion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract identical suppression-annotation loop from index() and
api_status() into _annotate_suppressions() helper to eliminate DRY
violation
- Improve stuck-job error message: 'thread crash' → 'no activity for
5 minutes' (less alarming, more accurate)
- Remove orphaned .events-filter-bar CSS class (never referenced in
any template or JS file)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace all var declarations in base.html, index.html scripts with
const/let (const for bindings that are never reassigned, let otherwise)
- Add _toIso() helper to links.html script block and replace the two
inline .replace(' UTC','Z').replace(' ','T') patterns with it
- Replace var with const in links.html _linksInterval
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add role="group" + aria-label to duration-pills and sev-pills containers
- Add aria-pressed to severity filter, duration, and refresh-interval pills
- Keep aria-pressed in sync with JS (setDuration, applyRefreshPillUI, modal reset)
- Add aria-label to events-search, host-search, links-search inputs
- Add aria-label to host and UniFi device suppress buttons in templates
- Replace dynamic style color strings in links.html stat cards with TDS
utility classes (lt-text-red/green/amber) via downCls/errCls variables
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace .empty-state (removed class) with TDS lt-empty-state--sm in
both error branches of renderInspector() and loadInspector()
- Diagnostic run button: add aria-label, apply lt-btn TDS classes for
consistent styling instead of custom btn-diag-only styling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- lt-divider--unifi / lt-divider-label--unifi: replace hardcoded margin
and cyan label color on the UniFi switch section divider
- lt-text-amber / lt-text-cyan on stat card icons and values (matches
same migration done in index.html)
- lt-stats-grid--mb: margin-bottom:16px on the summary stats grid
- g-page-sub-aside: replaces margin-left:8px on the updated timestamp
span in links and inspector page subtitle
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- lt-notif-empty: replaces all hardcoded padding/font/color/align on
the empty-state and loading/error text in the notification bell panel
- lt-notif-view-all: replaces width/text-align/display/font-size inline
style on the 'View dashboard' footer link
- lt-notif-dot: moves border-radius:50%;margin-top from inline style
(only background color remains inline, which is dynamic per-severity)
- Initial 'Loading…' text in the panel HTML uses lt-notif-empty
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Stat card icons and values: style="color:var(--red)" etc replaced with
lt-text-red, lt-text-amber, lt-text-cyan, lt-text-green (defined in
base.css with both color and glow-shadow)
- Host search input: style="width:180px" extracted to .lt-search-input--sm
- base.html: suppress modal form groups use lt-form-group--last for last
item (already committed); lt-divider--compact applied to settings divider
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Suppress modal form groups: strip margin-bottom:12px (lt-form-group
already has margin-bottom in TDS); use lt-form-group--last for the
final group where zero margin is needed
- Keyboard shortcuts table: remove width:100% (lt-table is already full-
width in base.css)
- Settings divider: replace style=margin override with .lt-divider--compact
- Topology bus section: move max-width:860px into .topo-bus-section rule
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Active events now carry an is_suppressed boolean (added in api_status()
and the index() route via check_suppressed() against the pre-loaded
suppression list). The events table renders a muted '🔕 sup' badge next
to the severity and dims the entire row (.row-suppressed) so operators
can immediately see which firing alerts are silenced.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>