Compare commits

..

19 Commits

Author SHA1 Message Date
jared a34898b8e8 Fix ping-only hosts polled twice per cycle with inconsistent parameters
Lint / Python (flake8) (push) Successful in 57s
Lint / JS (eslint) (push) Successful in 28s
Security / Python Security (bandit) (push) Successful in 1m14s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 7s
Test / Python Tests (pytest) (push) Failing after 13m52s
_collect_snapshot called pulse.ping(count=1) independently from
_process_ping_hosts which called pulse.ping(count=3). This doubled
network load and could show a host as 'up' in the dashboard while
simultaneously firing an 'unreachable' alert, or vice versa.

Now ping_states is computed once in run() using the alert-quality
parameters (count=3) and shared by both snapshot and alert processing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 23:13:43 -04:00
jared 31747c4bd3 Replace deprecated datetime.utcnow() with datetime.now(timezone.utc)
Lint / Python (flake8) (push) Successful in 1m9s
Lint / JS (eslint) (push) Successful in 11s
Security / Python Security (bandit) (push) Successful in 44s
Test / Python Tests (pytest) (push) Successful in 58s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
datetime.utcnow() is deprecated in Python 3.12 and removed in 3.13.
Replace all four call sites with timezone-aware equivalents so the
codebase is ready for Python 3.12+.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 15:34:41 -04:00
jared faa0707f79 Add ESLint config enforcing no-undef and eqeqeq
Lint / Python (flake8) (push) Successful in 53s
Lint / JS (eslint) (push) Successful in 12s
Security / Python Security (bandit) (push) Successful in 1m44s
Test / Python Tests (pytest) (push) Successful in 59s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Without a config file, ESLint was running with no-undef disabled, meaning
undefined variable references in static/app.js were silently ignored.
Add .eslintrc.json with no-undef: error and eqeqeq: error so CI actually
catches JS bugs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 15:33:26 -04:00
jared 9c52e4ad1a Fix inspector auto-refresh ignoring 'Off' setting on page load
Lint / Python (flake8) (push) Successful in 41s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m0s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
Same ?? / || issue as the previous fix in index.html and links.html.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:20:42 -04:00
jared 156ef97667 Fix auto-refresh ignoring 'Off' setting on page load
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 39s
Test / Python Tests (pytest) (push) Successful in 53s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
Using || 30 / || 60 as a fallback treats refreshInterval=0 (Off) as
falsy and replaces it with the default, causing auto-refresh to start
even when the user saved 'Off'. Replace with nullish coalescing (??)
so only null/undefined triggers the default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:19:44 -04:00
jared 2f74266bd9 Fix monitor loop double-sleep on error; add grep -F regression test
Lint / Python (flake8) (push) Successful in 49s
Lint / JS (eslint) (push) Successful in 9s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
On exception the monitor slept 30s inside the except block then fell
through to time.sleep(poll_interval), giving a 150s recovery gap instead
of 30s. Adding continue after the error sleep fixes this.

Also adds a regression test asserting dmesg filtering uses grep -F --
so a future refactor cannot silently reintroduce the regex wildcard bug.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:16:43 -04:00
jared 222bdb08ab Fix suppression annotation for interface_down not checking host-level rules
Lint / Python (flake8) (push) Successful in 38s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 39s
Test / Python Tests (pytest) (push) Successful in 1m5s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 4s
monitor.py checks both 'interface' and 'host' suppressions for interface_down
events, but _annotate_suppressions only checked 'interface'. A host-level
suppression would silently suppress tickets but not mark the table row as
suppressed in the UI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:14:46 -04:00
jared 8dd744b039 Show suppressed badge on host cards during global maintenance windows
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 38s
Test / Python Tests (pytest) (push) Successful in 52s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Global suppressions (target_type='all') have an empty target_name, so
the selectattr filter never matched them, leaving no visual indicator
when a global maintenance window was active. Pre-compute has_global_sup
before the host loop and OR it into the badge condition.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 13:12:25 -04:00
jared 9e2be150b5 Use grep -F in dmesg filter to prevent interface name treated as regex
Lint / Python (flake8) (push) Successful in 38s
Lint / JS (eslint) (push) Failing after 13s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
grep {iface} treats dots and other special chars as regex metacharacters.
Switch to grep -F -- {iface} for fixed-string matching and to prevent
a leading dash in the interface name from being parsed as a grep flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 11:12:02 -04:00
jared ed5ba5c59e Remove unused is_new parameter from ticket helper methods
After fixing the is_new guard bug, is_new is no longer used inside
_ticket_interface, _ticket_unifi, or _ticket_unreachable. Drop it from
their signatures and call sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 11:10:32 -04:00
jared 2be44d8b24 Fix ticket_id never stored when fail_thresh>1; guard sessionStorage JSON.parse
Lint / Python (flake8) (push) Successful in 45s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 43s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
monitor.py: _ticket_interface/_ticket_unifi/_ticket_unreachable all used
`if tid and is_new` to guard db.set_ticket_id(). Since is_new is True only
on the first upsert (consec=1) but tickets are created at consec>=fail_thresh
(default 2), is_new is always False when the ticket is created, so the
ticket link never appeared in the UI. Changed to `if tid:`.

links.html: JSON.parse(sessionStorage.getItem(...)) in togglePanel and
restoreCollapseState had no try-catch. Corrupt/stale session storage would
throw an uncaught SyntaxError. Also wrapped all sessionStorage.setItem
calls in try-catch to defend against storage-full / private-browsing errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:45:20 -04:00
jared 2d6dcd782f Cancel in-flight diagnostic poll when user selects a new port
Lint / Python (flake8) (push) Successful in 45s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 52s
Test / Python Tests (pytest) (push) Successful in 1m2s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
Previously switching ports while a diagnostic was running left the
setInterval timer active, causing the result to be written into the
old (now detached) DOM elements and never shown to the user.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:26:53 -04:00
jared a1a3a52dd8 Fix empty-object false negative in links page no-data check
Lint / Python (flake8) (push) Successful in 51s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 46s
Test / Python Tests (pytest) (push) Successful in 1m3s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
The check `!data.hosts && !data.unifi_switches` never caught empty
objects `{}`, which are truthy. Replace with Object.keys length checks
so the friendly "no data yet" banner renders when both collections
are empty.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:21:50 -04:00
jared bcc2ad7f5c Use shlex.quote for remote_cmd in build_ssh_command
Lint / Python (flake8) (push) Successful in 1m3s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 49s
Test / Python Tests (pytest) (push) Successful in 1m10s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 4s
Matches the pattern already used in monitor.py's _ssh_batch(); prevents
quoting breakage if shlex.quote(iface) emits single-quoted tokens inside
the remote command string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:17:11 -04:00
jared d4f159ee7c fix: escape ticket_id text content in dynamic events table
Lint / Python (flake8) (push) Successful in 44s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 1m7s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
ticket_id was already escaped in the href attribute but the visible
text (#<id>) used the raw value in an innerHTML template literal.
Apply lt.escHtml() for defense-in-depth against a compromised ticket API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 23:02:09 -04:00
jared 61019418d3 fix: add aria-required to s-reason field in suppressions form
Lint / Python (flake8) (push) Successful in 40s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 57s
Test / Python Tests (pytest) (push) Successful in 1m27s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 6s
The reason input had `required` for browser validation but was missing
`aria-required="true"`, so screen readers did not announce it as required.
Matches the fix already applied to the equivalent field in base.html.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 15:11:05 -04:00
jared 1a53718cc5 fix: SSH shell quoting bug breaks ethtool collection; ticket_id KeyError
Lint / Python (flake8) (push) Successful in 41s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 55s
Test / Python Tests (pytest) (push) Successful in 51s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
monitor.py _ssh_batch(): the remote command was wrapped in double-quotes
(f'root@{ip} "{shell_cmd}"') but shell_cmd itself contains double-quoted
echo sentinels ("___IFACE:eth0___"). When Pulse's shell parses the full
ssh invocation, the nested double-quotes cause mis-parsing — the remote
command is split incorrectly, silently breaking all ethtool/SFP DOM
collection. Fix: use shlex.quote(shell_cmd) so the entire remote command
is single-quoted, leaving inner double-quotes untouched.

TicketClient.create(): data['ticket_id'] raises KeyError if the Tinker
Tickets API returns success=true without a ticket_id field (malformed
response). Use data.get('ticket_id') with an explicit warning log.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 13:41:09 -04:00
jared afaeb64636 fix: UTC timezone suffix missing from all isoformat() timestamp outputs
db.py returned all datetime columns (first_seen, last_seen, resolved_at,
created_at, expires_at) as bare ISO strings like "2026-03-14T14:14:21"
with no timezone marker. Per the ECMAScript spec, new Date() on a
datetime string without timezone treats it as LOCAL time, not UTC.
This made lt.time.ago() and stale-detection wrong for any user whose
browser is not in UTC — event ages and stale warnings would be off by
the client's UTC offset.

monitor.py had the same issue on the network_snapshot 'updated' field.

Fix: append 'Z' to all isoformat() calls (UTC datetimes confirmed by
MySQL server timezone and _now_utc() pattern used throughout codebase).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 13:28:49 -04:00
jared b6ee45a842 fix: inspector.html stale/updated timestamp broken date parsing
Lint / Python (flake8) (push) Successful in 1m8s
Lint / JS (eslint) (push) Successful in 10s
Security / Python Security (bandit) (push) Successful in 50s
Test / Python Tests (pytest) (push) Successful in 52s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Same bug as was just fixed in links.html: data.updated is stored as
"YYYY-MM-DD HH:MM:SS UTC" by monitor.py, so appending 'Z' produced
"…UTCZ" — an invalid date. The stale-data warning and Updated timestamp
in Inspector were silently showing "Invalid Date" and the stale overlay
never fired. Fixed to use _toIso() (already global via app.js).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 13:25:17 -04:00
11 changed files with 107 additions and 54 deletions
+21
View File
@@ -0,0 +1,21 @@
{
"env": {
"browser": true,
"es2021": true
},
"globals": {
"lt": "readonly",
"GANDALF_CONFIG": "readonly",
"CSS": "readonly"
},
"rules": {
"no-undef": "error",
"no-unused-vars": ["warn", { "argsIgnorePattern": "^_", "varsIgnorePattern": "^_" }],
"no-console": "off",
"eqeqeq": ["error", "always", { "null": "ignore" }]
},
"parserOptions": {
"ecmaVersion": 2021,
"sourceType": "script"
}
}
+19 -10
View File
@@ -174,17 +174,26 @@ _PAGE_LIMIT = 200 # max events returned per request
def _annotate_suppressions(events: list, suppressions: list) -> None:
"""Annotate each event dict in-place with an is_suppressed bool."""
"""Annotate each event dict in-place with an is_suppressed bool.
Mirrors the suppression check order in monitor.py exactly:
interface_down → interface OR host
unifi_device_* → unifi_device
everything else → host
"""
for ev in events:
sup_type = (
'unifi_device' if ev.get('event_type') == 'unifi_device_offline'
else 'interface' if ev.get('event_type') == 'interface_down'
else 'host'
)
ev['is_suppressed'] = db.check_suppressed(
suppressions, sup_type,
ev.get('target_name', ''), ev.get('target_detail', '') or '',
)
etype = ev.get('event_type', '')
name = ev.get('target_name', '')
detail = ev.get('target_detail', '') or ''
if etype == 'interface_down':
ev['is_suppressed'] = (
db.check_suppressed(suppressions, 'interface', name, detail) or
db.check_suppressed(suppressions, 'host', name)
)
elif etype == 'unifi_device_offline':
ev['is_suppressed'] = db.check_suppressed(suppressions, 'unifi_device', name, detail)
else:
ev['is_suppressed'] = db.check_suppressed(suppressions, 'host', name, detail)
# ---------------------------------------------------------------------------
+6 -6
View File
@@ -3,7 +3,7 @@ import json
import logging
import threading
from contextlib import contextmanager
from datetime import datetime, timedelta
from datetime import datetime, timedelta, timezone
from typing import Optional
import pymysql
@@ -182,7 +182,7 @@ def get_active_events(limit: int = 200, offset: int = 0) -> list:
for r in rows:
for k in ('first_seen', 'last_seen'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -210,7 +210,7 @@ def get_recent_resolved(hours: int = 24, limit: int = 50) -> list:
for r in rows:
for k in ('first_seen', 'last_seen', 'resolved_at'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -252,7 +252,7 @@ def get_active_suppressions() -> list:
for r in rows:
for k in ('created_at', 'expires_at'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -267,7 +267,7 @@ def get_suppression_history(limit: int = 50) -> list:
for r in rows:
for k in ('created_at', 'expires_at'):
if r.get(k) and hasattr(r[k], 'isoformat'):
r[k] = r[k].isoformat()
r[k] = r[k].isoformat() + 'Z'
return rows
@@ -281,7 +281,7 @@ def create_suppression(
) -> int:
expires_at = None
if expires_minutes:
expires_at = datetime.utcnow() + timedelta(minutes=int(expires_minutes))
expires_at = datetime.now(timezone.utc) + timedelta(minutes=int(expires_minutes))
with get_conn() as conn:
with conn.cursor() as cur:
cur.execute(
+2 -2
View File
@@ -68,7 +68,7 @@ class DiagnosticsRunner:
f' echo "=== ip_route ===";'
f' ip route show dev {q} 2>/dev/null;'
f' echo "=== dmesg ===";'
f' dmesg 2>/dev/null | grep {q} | tail -50;'
f' dmesg 2>/dev/null | grep -F -- {q} | tail -50;'
f' echo "=== lldpctl ===";'
f' lldpctl 2>/dev/null || echo "lldpd not running";'
f' echo "=== end ==="'
@@ -78,7 +78,7 @@ class DiagnosticsRunner:
f'ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 '
f'-o BatchMode=yes -o LogLevel=ERROR '
f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 '
f'root@{ip_q} \'{remote_cmd}\''
f'root@{ip_q} {shlex.quote(remote_cmd)}'
)
# ------------------------------------------------------------------
+32 -21
View File
@@ -12,7 +12,7 @@ import logging
import re
import shlex
import time
from datetime import datetime
from datetime import datetime, timezone
from typing import Dict, List, Optional
import requests
@@ -215,7 +215,10 @@ class TicketClient:
resp.raise_for_status()
data = resp.json()
if data.get('success'):
tid = data['ticket_id']
tid = data.get('ticket_id')
if not tid:
logger.warning(f'Ticket API success but no ticket_id in response: {data}')
return None
logger.info(f'Created ticket #{tid}: {title}')
return tid
if data.get('existing_ticket_id'):
@@ -377,7 +380,7 @@ class LinkStatsCollector:
f'ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 '
f'-o BatchMode=yes -o LogLevel=ERROR '
f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 '
f'root@{ip} "{shell_cmd}"'
f'root@{ip} {shlex.quote(shell_cmd)}'
)
output = self.pulse.run_command(ssh_cmd)
if output is None:
@@ -615,7 +618,7 @@ class LinkStatsCollector:
return {
'hosts': result_hosts,
'unifi_switches': unifi_switches,
'updated': datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC'),
'updated': datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC'),
}
def _compute_unifi_rates(self, raw: Dict[str, dict], now: float) -> Dict[str, dict]:
@@ -650,7 +653,7 @@ class LinkStatsCollector:
# Helpers
# --------------------------------------------------------------------------
def _now_utc() -> str:
return datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC')
return datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC')
# --------------------------------------------------------------------------
@@ -731,7 +734,7 @@ class NetworkMonitor:
f'Interface {iface} on {host} went link-down ({_now_utc()})',
)
if not sup and consec >= self.fail_thresh:
self._ticket_interface(event_id, is_new, host, iface, consec)
self._ticket_interface(event_id, host, iface, consec)
if host_has_regression:
hosts_with_regression.append(host)
@@ -768,7 +771,7 @@ class NetworkMonitor:
db.resolve_event('cluster_network_issue', self.cluster_name, '')
def _ticket_interface(
self, event_id: int, is_new: bool, host: str, iface: str, consec: int
self, event_id: int, host: str, iface: str, consec: int
) -> None:
title = (
f'[{host}][auto][production][issue][network][single-node] '
@@ -786,7 +789,7 @@ class NetworkMonitor:
f'Please inspect the cable/SFP/switch port for {host}/{iface}.'
)
tid = self.tickets.create(title, desc, priority='2')
if tid and is_new:
if tid:
db.set_ticket_id(event_id, tid)
# ------------------------------------------------------------------
@@ -807,11 +810,11 @@ class NetworkMonitor:
f'UniFi {name} ({d.get("ip","")}) offline ({_now_utc()})',
)
if not sup and consec >= self.fail_thresh:
self._ticket_unifi(event_id, is_new, d)
self._ticket_unifi(event_id, d)
else:
db.resolve_event('unifi_device_offline', name, d.get('type', ''))
def _ticket_unifi(self, event_id: int, is_new: bool, device: dict) -> None:
def _ticket_unifi(self, event_id: int, device: dict) -> None:
name = device['name']
title = (
f'[{name}][auto][production][issue][network][single-node] '
@@ -828,16 +831,16 @@ class NetworkMonitor:
f'Please check power and cable connectivity.'
)
tid = self.tickets.create(title, desc, priority='2')
if tid and is_new:
if tid:
db.set_ticket_id(event_id, tid)
# ------------------------------------------------------------------
# Ping-only hosts (no node_exporter)
# ------------------------------------------------------------------
def _process_ping_hosts(self, suppressions: list) -> None:
def _process_ping_hosts(self, suppressions: list, ping_states: Dict[str, bool]) -> None:
for h in self.cfg.get('monitor', {}).get('ping_hosts', []):
name, ip = h['name'], h['ip']
reachable = self.pulse.ping(ip)
reachable = ping_states.get(name, False)
if not reachable:
sup = db.check_suppressed(suppressions, 'host', name)
@@ -847,12 +850,12 @@ class NetworkMonitor:
f'Host {name} ({ip}) unreachable via ping ({_now_utc()})',
)
if not sup and consec >= self.fail_thresh:
self._ticket_unreachable(event_id, is_new, name, ip, consec)
self._ticket_unreachable(event_id, name, ip, consec)
else:
db.resolve_event('host_unreachable', name, ip)
def _ticket_unreachable(
self, event_id: int, is_new: bool, name: str, ip: str, consec: int
self, event_id: int, name: str, ip: str, consec: int
) -> None:
title = (
f'[{name}][auto][production][issue][network][single-node] '
@@ -870,7 +873,7 @@ class NetworkMonitor:
f'Please check the host power, management interface, and network connectivity.'
)
tid = self.tickets.create(title, desc, priority='2')
if tid and is_new:
if tid:
db.set_ticket_id(event_id, tid)
# ------------------------------------------------------------------
@@ -879,6 +882,7 @@ class NetworkMonitor:
def _collect_snapshot(
self, iface_states: Dict[str, Dict[str, bool]],
unifi_devices: Optional[List[dict]] = None,
ping_states: Optional[Dict[str, bool]] = None,
) -> dict:
# Accept pre-fetched devices; fall back to empty list if unavailable
display_unifi = unifi_devices if unifi_devices is not None else []
@@ -907,7 +911,7 @@ class NetworkMonitor:
for h in self.cfg.get('monitor', {}).get('ping_hosts', []):
name, ip = h['name'], h['ip']
reachable = self.pulse.ping(ip, count=1, timeout=2)
reachable = (ping_states or {}).get(name, False)
hosts[name] = {
'ip': ip,
'interfaces': {},
@@ -918,7 +922,7 @@ class NetworkMonitor:
return {
'hosts': hosts,
'unifi': display_unifi,
'updated': datetime.utcnow().isoformat(),
'updated': datetime.now(timezone.utc).isoformat().replace('+00:00', 'Z'),
}
# ------------------------------------------------------------------
@@ -939,8 +943,14 @@ class NetworkMonitor:
# 2. Fetch UniFi devices once — used by both snapshot and alert processing
unifi_devices = self.unifi.get_devices()
# 3. Collect and store snapshot for dashboard
snapshot = self._collect_snapshot(iface_states, unifi_devices)
# 3a. Ping-only hosts once — shared by snapshot and alert processing
ping_states: Dict[str, bool] = {
h['name']: self.pulse.ping(h['ip'])
for h in self.cfg.get('monitor', {}).get('ping_hosts', [])
}
# 3b. Collect and store snapshot for dashboard
snapshot = self._collect_snapshot(iface_states, unifi_devices, ping_states)
db.set_state('network_snapshot', snapshot)
db.set_state('last_check', _now_utc())
@@ -956,7 +966,7 @@ class NetworkMonitor:
self._process_interfaces(iface_states, suppressions)
self._process_unifi(unifi_devices, suppressions)
self._process_ping_hosts(suppressions)
self._process_ping_hosts(suppressions, ping_states)
# Housekeeping: deactivate expired suppressions and purge old resolved events
db.cleanup_expired_suppressions()
@@ -967,6 +977,7 @@ class NetworkMonitor:
except Exception as e:
logger.error(f'Monitor loop error: {e}', exc_info=True)
time.sleep(30)
continue
time.sleep(self.poll_interval)
+1 -1
View File
@@ -220,7 +220,7 @@ function updateEventsTable(events, totalActive) {
? GANDALF_CONFIG.ticket_web_url : 'http://t.lotusguild.org/ticket/';
const ticket = e.ticket_id
? `<a href="${lt.escHtml(ticketBase)}${lt.escHtml(String(e.ticket_id))}" target="_blank"
class="ticket-link">#${e.ticket_id}</a>`
class="ticket-link">#${lt.escHtml(String(e.ticket_id))}</a>`
: '';
const supBadge = e.is_suppressed
? `<span class="lt-badge badge-suppressed" title="Alert suppressed">🔕 sup</span>`
+3 -2
View File
@@ -324,6 +324,7 @@
</div>
</div>
<div class="host-grid" id="host-grid">
{%- set has_global_sup = suppressions | selectattr('target_type', 'equalto', 'all') | list | length > 0 -%}
{% for name, host in snapshot.hosts.items() %}
{% set suppressed = suppressions | selectattr('target_name', 'equalto', name) | list %}
<div class="host-card host-card-{{ host.status }}" data-host="{{ name }}">
@@ -331,7 +332,7 @@
<div class="host-name-row">
<span class="host-status-dot dot-{{ host.status }}"></span>
<span class="host-name">{{ name }}</span>
{% if suppressed %}
{% if suppressed or has_global_sup %}
<span class="badge-suppressed" title="Suppressed">🔕</span>
{% endif %}
</div>
@@ -468,7 +469,7 @@
{% block scripts %}
<script>
// Start auto-refresh using saved settings interval (default 30 s)
const _savedInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 30;
const _savedInterval = window.gandalfSettings?.refreshInterval ?? 30;
if (_savedInterval > 0) lt.autoRefresh.start(refreshAll, _savedInterval * 1000);
// When settings change, restart auto-refresh with new interval
+3 -2
View File
@@ -218,6 +218,7 @@ let _apiData = null;
function selectPort(el) {
const swName = el.dataset.switch;
const idx = parseInt(el.dataset.portIdx, 10);
if (_diagPollTimer) { clearInterval(_diagPollTimer); _diagPollTimer = null; }
document.querySelectorAll('.switch-port-block.selected')
.forEach(e => e.classList.remove('selected'));
el.classList.add('selected');
@@ -428,7 +429,7 @@ function renderInspector(data) {
const updEl = document.getElementById('inspector-updated');
if (updEl && data.updated) {
const updMs = new Date(data.updated + (data.updated.includes('Z') ? '' : 'Z'));
const updMs = new Date(_toIso(data.updated));
const ageMin = (Date.now() - updMs) / 60000;
const timeStr = updMs.toLocaleTimeString();
if (ageMin > 15) {
@@ -472,7 +473,7 @@ async function loadInspector() {
}
loadInspector();
const _inspInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 60;
const _inspInterval = window.gandalfSettings?.refreshInterval ?? 60;
if (_inspInterval > 0) lt.autoRefresh.start(loadInspector, Math.max(_inspInterval, 15) * 1000);
window.onGandalfSettingsChanged = function(s) {
+13 -9
View File
@@ -372,14 +372,16 @@ function togglePanel(panel) {
if (title) title.setAttribute('aria-expanded', isCollapsed ? 'false' : 'true');
const id = panel.id;
if (id) {
const collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}');
let collapsed = {};
try { collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}'); } catch(_) {}
collapsed[id] = panel.classList.contains('collapsed');
sessionStorage.setItem('linksCollapsed', JSON.stringify(collapsed));
try { sessionStorage.setItem('linksCollapsed', JSON.stringify(collapsed)); } catch(_) {}
}
}
function restoreCollapseState() {
const collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}');
let collapsed = {};
try { collapsed = JSON.parse(sessionStorage.getItem('linksCollapsed') || '{}'); } catch(_) {}
for (const [id, isCollapsed] of Object.entries(collapsed)) {
const panel = document.getElementById(id);
if (!panel) continue;
@@ -507,9 +509,11 @@ function collapseAll() {
if (btn) btn.textContent = '[+]';
if (title) title.setAttribute('aria-expanded', 'false');
});
sessionStorage.setItem('linksCollapsed', JSON.stringify(
Object.fromEntries([...document.querySelectorAll('.link-host-panel')].map(p => [p.id, true]))
));
try {
sessionStorage.setItem('linksCollapsed', JSON.stringify(
Object.fromEntries([...document.querySelectorAll('.link-host-panel')].map(p => [p.id, true]))
));
} catch(_) {}
}
function expandAll() {
@@ -520,7 +524,7 @@ function expandAll() {
if (btn) btn.textContent = '[]';
if (title) title.setAttribute('aria-expanded', 'true');
});
sessionStorage.setItem('linksCollapsed', '{}');
try { sessionStorage.setItem('linksCollapsed', '{}'); } catch(_) {}
}
// ── Stale data warning ────────────────────────────────────────────
@@ -548,7 +552,7 @@ function checkLinksStale(updatedStr) {
async function loadLinks() {
try {
const data = await lt.api.get('/api/links');
if (!data.hosts && !data.unifi_switches) {
if ((!data.hosts || !Object.keys(data.hosts).length) && (!data.unifi_switches || !Object.keys(data.unifi_switches).length)) {
document.getElementById('links-container').innerHTML =
'<div class="link-no-data">No link data yet — monitor has not completed a full cycle.</div>';
return;
@@ -567,7 +571,7 @@ async function loadLinks() {
}
loadLinks();
const _linksInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 60;
const _linksInterval = window.gandalfSettings?.refreshInterval ?? 60;
if (_linksInterval > 0) lt.autoRefresh.start(loadLinks, Math.max(_linksInterval, 15) * 1000);
window.onGandalfSettingsChanged = function(s) {
+1 -1
View File
@@ -51,7 +51,7 @@
<label class="lt-label" for="s-reason">Reason <span class="required">*</span></label>
<input type="text" class="lt-input" id="s-reason" name="reason"
placeholder="e.g. Planned switch maintenance, replacing SFP on large1/enp43s0"
required>
required aria-required="true">
</div>
</div>
+6
View File
@@ -36,6 +36,12 @@ class TestBuildSshCommand:
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')
assert 'ethtool' in cmd
def test_dmesg_uses_fixed_string_grep(self):
# grep -F prevents iface names with dots (e.g. eth0.1) being treated as
# regex wildcards; -- prevents leading - from being parsed as a flag
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')
assert 'grep -F --' in cmd
# ── parse_output ─────────────────────────────────────────────────────────────