Compare commits

...

13 Commits

Author SHA1 Message Date
jared fc2be88915 fix: escape poe_class in inspector panel for consistency
Lint / Python (flake8) (push) Successful in 1m49s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 1m35s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 5s
d.poe_mode was already wrapped in escHtml(); apply same to d.poe_class.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 11:56:11 -04:00
jared cd0b725f3e fix: LLDP port label bug, suppression SQL dead code, avatar path hardening
Lint / Python (flake8) (push) Successful in 1m13s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
- inspector.html: fix LLDP neighbor label in port blocks — port.lldp_table never exists; data is at port.lldp (dict with system_name/chassis_id); both port block renderers corrected
- db.py: remove dead 'target_detail IS NULL' branch in suppression check — target_detail is always stored as '' not NULL; query simplified to target_detail=''
- app.py: resolve cache_dir/cache_file/sentinel to absolute paths; guard against path escape before use
- app.py: wrap sentinel os.path.getmtime() in try/except OSError to handle TOCTOU deletion race

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 09:31:25 -04:00
jared 77c74098a3 fix: flake8 E701 in avatar handler; update SSH test to match accept-new
Lint / Python (flake8) (push) Successful in 55s
Lint / JS (eslint) (push) Successful in 11s
Security / Python Security (bandit) (push) Successful in 1m15s
Test / Python Tests (pytest) (push) Successful in 59s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
- app.py: split 'with open(sentinel): pass' onto two lines (flake8 E701)
- tests/test_diagnose.py: rename test and assert StrictHostKeyChecking=accept-new (not =no which was fixed earlier)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 09:23:06 -04:00
jared aa52047016 fix: cache_ttl config validation; ticket_web_url via tojson in base.html
Lint / Python (flake8) (push) Failing after 44s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Failing after 1m13s
Lint / Notify on failure (push) Successful in 4s
Lint / Deploy (push) Has been skipped
- app.py: wrap int(cache_ttl) in try/except so a misconfigured non-integer value falls back to 3600 instead of raising ValueError
- base.html: use Jinja2 tojson filter for ticket_web_url to ensure proper JS string escaping regardless of URL contents

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 09:05:53 -04:00
jared e166e3fcb4 fix: LDAP conn leak, health timing info, security headers, link_stats size guard
Lint / Python (flake8) (push) Failing after 51s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 1m2s
Test / Python Tests (pytest) (push) Failing after 1m21s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
- app.py: move conn.unbind() into finally block in api_avatar() so connection is always closed even if conn.search() throws
- app.py: remove elapsed-time strings from /health response (unauthenticated endpoint no longer leaks monitor timing)
- app.py: add after_request hook setting X-Content-Type-Options, X-Frame-Options, Referrer-Policy on all responses
- app.py: add 10 MB size guard on link_stats before JSON parse; log actual exception on parse failure
- app.py: wrap suppressions_page network_snapshot parse in try/except (same protection as index page)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:50:51 -04:00
jared d4d4208145 fix: LDAP empty-password guard, expires_minutes bounds, snapshot JSON safety, rate dict cleanup
Lint / Python (flake8) (push) Failing after 39s
Lint / JS (eslint) (push) Failing after 12s
Security / Python Security (bandit) (push) Successful in 41s
Test / Python Tests (pytest) (push) Failing after 1m28s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
- app.py: fail loudly if LDAP bind_pw is not configured rather than attempting anonymous bind
- app.py: validate expires_minutes is 1–43200 (max 30 days) before storing suppression
- app.py: wrap network_snapshot JSON parse in try/except so a corrupt DB value returns degraded page instead of 500
- app.py: prune _diag_rate entries inactive for >1h to prevent unbounded growth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:47:43 -04:00
jared 61408645a5 fix: LLDP input validation, mgmt_ip early validation, poll timer cleanup, monitor backoff
Lint / Python (flake8) (push) Failing after 41s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 42s
Test / Python Tests (pytest) (push) Failing after 1m35s
Lint / Notify on failure (push) Successful in 5s
Lint / Deploy (push) Has been skipped
- app.py: validate server_name from LLDP with fullmatch before use in logs/lookups (prevents log injection)
- app.py: validate each mgmt_ip candidate before assigning host_ip (avoids assigning non-IP string that then fails later check)
- app.py: log actual exception in link_stats JSON parse error
- inspector.html: clear _diagPollTimer in closePanel() so timer doesn't orphan when panel is closed mid-poll
- monitor.py: sleep 30s after a monitor loop exception before resuming normal poll interval

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:45:28 -04:00
jared 25baec67ac fix: diagnostic rate limiting, lock-held ownership check, iface name length cap
Lint / Python (flake8) (push) Failing after 47s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 43s
Test / Python Tests (pytest) (push) Failing after 1m22s
Lint / Notify on failure (push) Successful in 3s
Lint / Deploy (push) Has been skipped
- app.py: add per-user diagnostic rate limit (5/min) enforced atomically under _diag_lock
- app.py: move diagnostic job ownership check inside _diag_lock to close TOCTOU window; snapshot result before releasing lock
- monitor.py: cap interface name regex to 15 chars (Linux IFNAMSIZ limit)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:42:50 -04:00
jared c71d0da97d security: harden exception exposure, SSL config, and Pulse response parsing
Lint / Python (flake8) (push) Failing after 42s
Lint / JS (eslint) (push) Successful in 7s
Security / Python Security (bandit) (push) Successful in 1m22s
Test / Python Tests (pytest) (push) Failing after 1m23s
Lint / Notify on failure (push) Successful in 3s
Lint / Deploy (push) Has been skipped
- app.py: replace raw str(e) in diagnostic _run() with generic client message; log internally only
- app.py: /health endpoint no longer leaks exception strings to unauthenticated callers; errors logged server-side
- monitor.py: UniFi SSL verification now defaults True, configurable via config.json unifi.verify_ssl; urllib3 warning suppression scoped to verify=False only (removed global disable)
- monitor.py: Pulse execution_id extracted with .get() + explicit None check to avoid KeyError on malformed response
- monitor.py: interface name regex drops '@' (not a valid kernel interface char) to match app.py and fix inconsistency

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 08:40:25 -04:00
jared 38297e616f arch+security: route all server contact through Pulse, harden SSH
Lint / Python (flake8) (push) Failing after 43s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m4s
Test / Python Tests (pytest) (push) Failing after 1m5s
Lint / Notify on failure (push) Successful in 2s
Lint / Deploy (push) Has been skipped
Architecture:
- Remove direct subprocess ping from Gandalf; add PulseClient.ping()
  which runs the ping via the Pulse worker instead
- Remove standalone ping() function and subprocess import from monitor.py
- Add self.pulse alias to NetworkMonitor for convenience
- Both _process_ping_hosts() and snapshot builder now use self.pulse.ping()

Security:
- Change StrictHostKeyChecking=no → accept-new in both SSH command
  builders (monitor.py _ssh_batch, diagnose.py build_ssh_command).
  The Pulse worker's known_hosts is now authoritative; host keys are
  recorded on first connection and verified on all subsequent ones.
  MITM attacks after initial key exchange are now detectable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:58:16 -04:00
jared ca41486c45 security+a11y: job ownership check, aria-live chips, aria-hidden topo
Lint / Python (flake8) (push) Failing after 45s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m5s
Test / Python Tests (pytest) (push) Successful in 49s
Lint / Notify on failure (push) Successful in 3s
Lint / Deploy (push) Has been skipped
security:
- Fix bare open(sentinel, 'w').close() file descriptor leak; use
  context manager instead
- Store requesting username in _diag_jobs at creation time; return 403
  from api_diagnose_poll if the polling user does not match the job owner

accessibility:
- Add aria-live="polite" aria-atomic="true" to .status-chips container
  so screen readers announce critical/warning count changes on refresh
- Add aria-controls="events-table-wrap" to critical and warning stat
  cards so assistive tech knows these buttons control the events table
- Add aria-hidden sync to topology setCollapsed() — hidden topology
  content is now removed from the accessibility tree when collapsed,
  preventing keyboard focus from entering invisible elements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:53:17 -04:00
jared 0f2506d5a4 refactor: const for _inspInterval in inspector.html
Lint / Python (flake8) (push) Successful in 54s
Lint / JS (eslint) (push) Successful in 9s
Security / Python Security (bandit) (push) Successful in 1m17s
Test / Python Tests (pytest) (push) Successful in 53s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 3s
Last remaining var declaration; matches the pattern in index.html and
links.html.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:45:42 -04:00
jared 678ede4e76 refactor: replace inline onclick with data-action event delegation
Lint / Python (flake8) (push) Successful in 42s
Lint / JS (eslint) (push) Successful in 8s
Security / Python Security (bandit) (push) Successful in 1m0s
Test / Python Tests (pytest) (push) Successful in 50s
Lint / Notify on failure (push) Has been skipped
Lint / Deploy (push) Successful in 2s
The command palette button used an inline onclick handler while every
other interactive element in base.html uses data-action + event
delegation. Now consistent: data-action="open-cmdpalette" handled in
the global footer click listener.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 23:45:09 -04:00
8 changed files with 155 additions and 72 deletions
+110 -28
View File
@@ -59,6 +59,8 @@ def inject_config():
# In-memory diagnostic job store { job_id: { status, result, created_at } } # In-memory diagnostic job store { job_id: { status, result, created_at } }
_diag_jobs: dict = {} _diag_jobs: dict = {}
_diag_lock = threading.Lock() _diag_lock = threading.Lock()
# Per-user rate-limit: { username: [epoch_float, ...] } — cleaned inside _diag_lock
_diag_rate: dict = {}
def _purge_old_jobs_loop(): def _purge_old_jobs_loop():
@@ -92,6 +94,14 @@ def _config() -> dict:
return _cfg return _cfg
@app.after_request
def add_security_headers(response):
response.headers.setdefault('X-Content-Type-Options', 'nosniff')
response.headers.setdefault('X-Frame-Options', 'DENY')
response.headers.setdefault('Referrer-Policy', 'strict-origin-when-cross-origin')
return response
def _daemon_ok(last_check: str) -> bool: def _daemon_ok(last_check: str) -> bool:
"""Return True if monitor last checked within 20 minutes.""" """Return True if monitor last checked within 20 minutes."""
if not last_check or last_check == 'Never': if not last_check or last_check == 'Never':
@@ -180,7 +190,11 @@ def index():
summary = db.get_status_summary() summary = db.get_status_summary()
snapshot_raw = db.get_state('network_snapshot') snapshot_raw = db.get_state('network_snapshot')
last_check = db.get_state('last_check', 'Never') last_check = db.get_state('last_check', 'Never')
snapshot = json.loads(snapshot_raw) if snapshot_raw else {} try:
snapshot = json.loads(snapshot_raw) if snapshot_raw else {}
except Exception as e:
logger.error(f'Failed to parse network_snapshot JSON: {e}')
snapshot = {}
suppressions = db.get_active_suppressions() suppressions = db.get_active_suppressions()
_annotate_suppressions(events, suppressions) _annotate_suppressions(events, suppressions)
recent_resolved = db.get_recent_resolved(hours=24, limit=10) recent_resolved = db.get_recent_resolved(hours=24, limit=10)
@@ -219,7 +233,11 @@ def suppressions_page():
active = db.get_active_suppressions() active = db.get_active_suppressions()
history = db.get_suppression_history(limit=50) history = db.get_suppression_history(limit=50)
snapshot_raw = db.get_state('network_snapshot') snapshot_raw = db.get_state('network_snapshot')
snapshot = json.loads(snapshot_raw) if snapshot_raw else {} try:
snapshot = json.loads(snapshot_raw) if snapshot_raw else {}
except Exception as e:
logger.error(f'Failed to parse network_snapshot JSON: {e}')
snapshot = {}
return render_template( return render_template(
'suppressions.html', 'suppressions.html',
user=user, user=user,
@@ -266,10 +284,13 @@ def api_network():
def api_links(): def api_links():
raw = db.get_state('link_stats') raw = db.get_state('link_stats')
if raw: if raw:
if len(raw) > 10_000_000:
logger.error(f'link_stats exceeds 10 MB ({len(raw)} bytes); possible corruption')
return jsonify({'error': 'Invalid cached data'}), 503
try: try:
return jsonify(json.loads(raw)) return jsonify(json.loads(raw))
except Exception: except Exception as e:
logger.error('Failed to parse link_stats JSON') logger.error(f'Failed to parse link_stats JSON: {e}')
return jsonify({'hosts': {}, 'updated': None}) return jsonify({'hosts': {}, 'updated': None})
@@ -325,13 +346,21 @@ def api_create_suppression():
if len(target_detail) > 255: if len(target_detail) > 255:
return jsonify({'error': 'target_detail must be 255 characters or fewer'}), 400 return jsonify({'error': 'target_detail must be 255 characters or fewer'}), 400
if expires_minutes is not None:
try:
expires_minutes = int(expires_minutes)
if expires_minutes <= 0 or expires_minutes > 43200:
return jsonify({'error': 'expires_minutes must be between 1 and 43200 (30 days)'}), 400
except (ValueError, TypeError):
return jsonify({'error': 'expires_minutes must be a valid integer'}), 400
sup_id = db.create_suppression( sup_id = db.create_suppression(
target_type=target_type, target_type=target_type,
target_name=target_name, target_name=target_name,
target_detail=target_detail, target_detail=target_detail,
reason=reason, reason=reason,
suppressed_by=user['username'], suppressed_by=user['username'],
expires_minutes=int(expires_minutes) if expires_minutes else None, expires_minutes=expires_minutes,
) )
logger.info( logger.info(
f'Suppression #{sup_id} created by {user["username"]}: ' f'Suppression #{sup_id} created by {user["username"]}: '
@@ -369,8 +398,8 @@ def api_diagnose_start():
return jsonify({'error': 'No link_stats data available'}), 503 return jsonify({'error': 'No link_stats data available'}), 503
try: try:
link_data = json.loads(raw) link_data = json.loads(raw)
except Exception: except Exception as e:
logger.error('Failed to parse link_stats JSON in /api/diagnose') logger.error(f'Failed to parse link_stats JSON in /api/diagnose: {e}')
return jsonify({'error': 'Internal data error'}), 500 return jsonify({'error': 'Internal data error'}), 500
switches = link_data.get('unifi_switches', {}) switches = link_data.get('unifi_switches', {})
@@ -394,6 +423,9 @@ def api_diagnose_start():
return jsonify({'error': 'No LLDP neighbor data for this port'}), 400 return jsonify({'error': 'No LLDP neighbor data for this port'}), 400
server_name = lldp['system_name'] server_name = lldp['system_name']
if not re.fullmatch(r'[a-zA-Z0-9._-]+', server_name):
logger.error(f'Refusing diagnostic: invalid server_name from LLDP: {server_name!r}')
return jsonify({'error': 'LLDP neighbor name contains invalid characters'}), 400
lldp_port_id = lldp.get('port_id', '') lldp_port_id = lldp.get('port_id', '')
# Find matching host + interface in link_stats hosts # Find matching host + interface in link_stats hosts
@@ -419,9 +451,14 @@ def api_diagnose_start():
# Resolve host IP from link_stats host data # Resolve host IP from link_stats host data
host_ip = (server_ifaces.get(matched_iface) or {}).get('host_ip') host_ip = (server_ifaces.get(matched_iface) or {}).get('host_ip')
if not host_ip: if not host_ip:
# Fallback: use LLDP mgmt IPs # Fallback: use first valid IP from LLDP mgmt IPs
mgmt_ips = lldp.get('mgmt_ips') or [] for candidate in (lldp.get('mgmt_ips') or []):
host_ip = mgmt_ips[0] if mgmt_ips else None try:
ipaddress.ip_address(candidate)
host_ip = candidate
break
except ValueError:
continue
if not host_ip: if not host_ip:
return jsonify({'error': 'Cannot determine host IP for SSH'}), 400 return jsonify({'error': 'Cannot determine host IP for SSH'}), 400
@@ -436,8 +473,22 @@ def api_diagnose_start():
return jsonify({'error': 'Resolved interface name contains invalid characters'}), 400 return jsonify({'error': 'Resolved interface name contains invalid characters'}), 400
job_id = str(uuid.uuid4()) job_id = str(uuid.uuid4())
requesting_user = _get_user()['username']
now = time.time()
with _diag_lock: with _diag_lock:
_diag_jobs[job_id] = {'status': 'running', 'result': None, 'created_at': time.time()} # Rate limit: max 5 diagnostic jobs per user per minute; prune stale user entries
stale_users = [u for u, ts in _diag_rate.items() if not ts or max(ts) < now - 3600]
for u in stale_users:
del _diag_rate[u]
recent = [t for t in _diag_rate.get(requesting_user, []) if now - t < 60]
if len(recent) >= 5:
return jsonify({'error': 'Rate limit exceeded: max 5 diagnostics per minute'}), 429
recent.append(now)
_diag_rate[requesting_user] = recent
_diag_jobs[job_id] = {
'status': 'running', 'result': None,
'created_at': now, 'user': requesting_user,
}
def _run(): def _run():
try: try:
@@ -447,7 +498,7 @@ def api_diagnose_start():
result = runner.run(host_ip, server_name, matched_iface, port_data) result = runner.run(host_ip, server_name, matched_iface, port_data)
except Exception as e: except Exception as e:
logger.error(f'Diagnostic job {job_id} failed: {e}', exc_info=True) logger.error(f'Diagnostic job {job_id} failed: {e}', exc_info=True)
result = {'status': 'error', 'error': str(e)} result = {'status': 'error', 'error': 'Diagnostic failed; check server logs.'}
with _diag_lock: with _diag_lock:
if job_id in _diag_jobs: if job_id in _diag_jobs:
_diag_jobs[job_id]['status'] = 'done' _diag_jobs[job_id]['status'] = 'done'
@@ -463,11 +514,15 @@ def api_diagnose_start():
@require_auth @require_auth
def api_diagnose_poll(job_id: str): def api_diagnose_poll(job_id: str):
"""Poll a diagnostic job. Returns {status, result}.""" """Poll a diagnostic job. Returns {status, result}."""
current_user = _get_user()['username']
with _diag_lock: with _diag_lock:
job = _diag_jobs.get(job_id) job = _diag_jobs.get(job_id)
if not job: if not job:
return jsonify({'error': 'Job not found'}), 404 return jsonify({'error': 'Job not found'}), 404
return jsonify({'status': job['status'], 'result': job.get('result')}) if job.get('user') != current_user:
return jsonify({'error': 'Forbidden'}), 403
snapshot = {'status': job['status'], 'result': job.get('result')}
return jsonify(snapshot)
@app.route('/api/avatar') @app.route('/api/avatar')
@@ -484,11 +539,21 @@ def api_avatar():
# Build a safe cache filename from the username (alphanumeric + - _ .) # Build a safe cache filename from the username (alphanumeric + - _ .)
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '_', username) safe_name = re.sub(r'[^a-zA-Z0-9._-]', '_', username)
cache_dir = ldap_cfg.get('cache_dir', os.path.join(tempfile.gettempdir(), 'gandalf_avatars')) cache_dir = os.path.abspath(
ldap_cfg.get('cache_dir', os.path.join(tempfile.gettempdir(), 'gandalf_avatars'))
)
os.makedirs(cache_dir, exist_ok=True) os.makedirs(cache_dir, exist_ok=True)
cache_file = os.path.join(cache_dir, f'user_{safe_name}.jpg') cache_file = os.path.abspath(os.path.join(cache_dir, f'user_{safe_name}.jpg'))
sentinel = os.path.join(cache_dir, f'user_{safe_name}.none') sentinel = os.path.abspath(os.path.join(cache_dir, f'user_{safe_name}.none'))
cache_ttl = int(ldap_cfg.get('cache_ttl', 3600)) # Guard against path escape (shouldn't happen with sanitised safe_name, but be explicit)
if not cache_file.startswith(cache_dir + os.sep) or not sentinel.startswith(cache_dir + os.sep):
logger.error(f'Avatar path escape detected for user {username!r}')
return '', 404
try:
cache_ttl = int(ldap_cfg.get('cache_ttl', 3600))
except (ValueError, TypeError):
logger.warning('Invalid cache_ttl in ldap config; using default 3600')
cache_ttl = 3600
now = time.time() now = time.time()
@@ -498,33 +563,48 @@ def api_avatar():
max_age=cache_ttl, conditional=True) max_age=cache_ttl, conditional=True)
# Skip LDAP if we already know this user has no avatar # Skip LDAP if we already know this user has no avatar
if os.path.exists(sentinel) and now - os.path.getmtime(sentinel) < cache_ttl: try:
return '', 404 if os.path.exists(sentinel) and now - os.path.getmtime(sentinel) < cache_ttl:
return '', 404
except OSError:
pass
# Query lldap # Query lldap
bind_pw = ldap_cfg.get('bind_pw', '')
if not bind_pw:
logger.error('LDAP bind_pw not configured — avatar lookup disabled')
return '', 404
avatar_data = None avatar_data = None
conn = None
try: try:
import ldap3 import ldap3
server = ldap3.Server(ldap_cfg['host'], port=int(ldap_cfg.get('port', 3890))) server = ldap3.Server(ldap_cfg['host'], port=int(ldap_cfg.get('port', 3890)))
conn = ldap3.Connection(server, conn = ldap3.Connection(server,
user=ldap_cfg['bind_dn'], user=ldap_cfg['bind_dn'],
password=ldap_cfg.get('bind_pw', ''), password=bind_pw,
auto_bind=True, receive_timeout=5) auto_bind=True, receive_timeout=5)
safe_uid = ldap3.utils.conv.escape_filter_chars(username) safe_uid = ldap3.utils.conv.escape_filter_chars(username)
conn.search(ldap_cfg.get('user_base', 'ou=people,dc=example,dc=com'), conn.search(ldap_cfg.get('user_base', 'ou=people,dc=example,dc=com'),
f'(uid={safe_uid})', attributes=['avatar']) f'(uid={safe_uid})', attributes=['avatar'])
if conn.entries and conn.entries[0]['avatar'].value: if conn.entries and conn.entries[0]['avatar'].value:
avatar_data = conn.entries[0]['avatar'].value avatar_data = conn.entries[0]['avatar'].value
conn.unbind()
except ImportError: except ImportError:
logger.error('ldap3 not installed — run: pip install ldap3') logger.error('ldap3 not installed — run: pip install ldap3')
return '', 404 return '', 404
except Exception as e: except Exception as e:
logger.error(f'LDAP avatar lookup failed for {username}: {e}') logger.error(f'LDAP avatar lookup failed for {username}: {e}')
return '', 404 return '', 404
finally:
if conn is not None:
try:
conn.unbind()
except Exception:
pass
if not avatar_data or len(avatar_data) < 100: if not avatar_data or len(avatar_data) < 100:
open(sentinel, 'w').close() with open(sentinel, 'w'):
pass
return '', 404 return '', 404
# Validate JPEG magic bytes (FF D8 FF) # Validate JPEG magic bytes (FF D8 FF)
@@ -557,7 +637,8 @@ def health():
db.get_state('last_check') db.get_state('last_check')
checks['db'] = 'ok' checks['db'] = 'ok'
except Exception as e: except Exception as e:
checks['db'] = f'error: {e}' logger.error(f'Health check db error: {e}')
checks['db'] = 'error'
overall = 'degraded' overall = 'degraded'
# Monitor freshness: fail if last_check is older than 20 minutes # Monitor freshness: fail if last_check is older than 20 minutes
@@ -567,14 +648,15 @@ def health():
ts = datetime.strptime(last_check, '%Y-%m-%d %H:%M:%S UTC').replace(tzinfo=timezone.utc) ts = datetime.strptime(last_check, '%Y-%m-%d %H:%M:%S UTC').replace(tzinfo=timezone.utc)
age_s = (datetime.now(timezone.utc) - ts).total_seconds() age_s = (datetime.now(timezone.utc) - ts).total_seconds()
if age_s > 1200: if age_s > 1200:
checks['monitor'] = f'stale ({int(age_s)}s since last check)' checks['monitor'] = 'stale'
overall = 'degraded' overall = 'degraded'
else: else:
checks['monitor'] = f'ok ({int(age_s)}s ago)' checks['monitor'] = 'ok'
else: else:
checks['monitor'] = 'no data yet' checks['monitor'] = 'no data yet'
except Exception as e: except Exception as e:
checks['monitor'] = f'error: {e}' logger.error(f'Health check monitor error: {e}')
checks['monitor'] = 'error'
overall = 'degraded' overall = 'degraded'
status_code = 200 if overall == 'ok' else 503 status_code = 200 if overall == 'ok' else 503
+1 -1
View File
@@ -365,7 +365,7 @@ def is_suppressed(target_type: str, target_name: str, target_detail: str = '') -
"""SELECT id FROM suppression_rules """SELECT id FROM suppression_rules
WHERE active=TRUE AND (expires_at IS NULL OR expires_at > NOW()) WHERE active=TRUE AND (expires_at IS NULL OR expires_at > NOW())
AND target_type=%s AND target_name=%s AND target_type=%s AND target_name=%s
AND (target_detail IS NULL OR target_detail='') LIMIT 1""", AND target_detail='' LIMIT 1""",
(target_type, target_name), (target_type, target_name),
) )
if cur.fetchone(): if cur.fetchone():
+1 -1
View File
@@ -75,7 +75,7 @@ class DiagnosticsRunner:
) )
return ( return (
f'ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 ' f'ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 '
f'-o BatchMode=yes -o LogLevel=ERROR ' f'-o BatchMode=yes -o LogLevel=ERROR '
f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 ' f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 '
f'root@{ip_q} \'{remote_cmd}\'' f'root@{ip_q} \'{remote_cmd}\''
+22 -22
View File
@@ -11,7 +11,6 @@ import json
import logging import logging
import re import re
import shlex import shlex
import subprocess
import time import time
from datetime import datetime from datetime import datetime
from typing import Dict, List, Optional from typing import Dict, List, Optional
@@ -21,7 +20,6 @@ from urllib3.exceptions import InsecureRequestWarning
import db import db
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
logging.basicConfig( logging.basicConfig(
level=logging.INFO, level=logging.INFO,
@@ -91,7 +89,9 @@ class UnifiClient:
self.base_url = cfg['controller'] self.base_url = cfg['controller']
self.site_id = cfg.get('site_id', 'default') self.site_id = cfg.get('site_id', 'default')
self.session = requests.Session() self.session = requests.Session()
self.session.verify = False self.session.verify = cfg.get('verify_ssl', True)
if not self.session.verify:
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
self.headers = { self.headers = {
'X-API-KEY': cfg['api_key'], 'X-API-KEY': cfg['api_key'],
'Accept': 'application/json', 'Accept': 'application/json',
@@ -263,7 +263,10 @@ class PulseClient:
timeout=10, timeout=10,
) )
resp.raise_for_status() resp.raise_for_status()
execution_id = resp.json()['execution_id'] execution_id = resp.json().get('execution_id')
if not execution_id:
logger.error('Pulse submit response missing execution_id')
return None
self.last_execution_id = execution_id self.last_execution_id = execution_id
except Exception as e: except Exception as e:
logger.error(f'Pulse command submit failed: {e}') logger.error(f'Pulse command submit failed: {e}')
@@ -315,6 +318,14 @@ class PulseClient:
return self.run_command(command, _retry=False) return self.run_command(command, _retry=False)
return None return None
def ping(self, ip: str, count: int = 3, timeout: int = 2) -> bool:
"""Ping *ip* via the Pulse worker. Returns True if host responds."""
ip_q = shlex.quote(ip)
output = self.run_command(
f'ping -c {count} -W {timeout} {ip_q} >/dev/null 2>&1 && echo REACHABLE || echo UNREACHABLE'
)
return output is not None and output.strip() == 'REACHABLE'
# -------------------------------------------------------------------------- # --------------------------------------------------------------------------
# Link stats collector (ethtool + Prometheus traffic metrics) # Link stats collector (ethtool + Prometheus traffic metrics)
@@ -344,8 +355,8 @@ class LinkStatsCollector:
if not ifaces or not self.pulse.url: if not ifaces or not self.pulse.url:
return {} return {}
# Validate interface names (kernel names only contain [a-zA-Z0-9_.-]) # Validate interface names (kernel names: [a-zA-Z0-9_.-], max 15 chars per IFNAMSIZ)
safe_ifaces = [i for i in ifaces if re.match(r'^[a-zA-Z0-9_.@-]+$', i)] safe_ifaces = [i for i in ifaces if re.match(r'^[a-zA-Z0-9_.-]{1,15}$', i)]
if not safe_ifaces: if not safe_ifaces:
return {} return {}
@@ -363,7 +374,7 @@ class LinkStatsCollector:
shell_cmd = ' '.join(parts) shell_cmd = ' '.join(parts)
ssh_cmd = ( ssh_cmd = (
f'ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 ' f'ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 '
f'-o BatchMode=yes -o LogLevel=ERROR ' f'-o BatchMode=yes -o LogLevel=ERROR '
f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 ' f'-o ServerAliveInterval=10 -o ServerAliveCountMax=2 '
f'root@{ip} "{shell_cmd}"' f'root@{ip} "{shell_cmd}"'
@@ -638,19 +649,6 @@ class LinkStatsCollector:
# -------------------------------------------------------------------------- # --------------------------------------------------------------------------
# Helpers # Helpers
# -------------------------------------------------------------------------- # --------------------------------------------------------------------------
def ping(ip: str, count: int = 3, timeout: int = 2) -> bool:
try:
r = subprocess.run(
['ping', '-c', str(count), '-W', str(timeout), ip],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
timeout=30,
)
return r.returncode == 0
except Exception:
return False
def _now_utc() -> str: def _now_utc() -> str:
return datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC') return datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC')
@@ -671,6 +669,7 @@ class NetworkMonitor:
self.unifi = UnifiClient(self.cfg['unifi']) self.unifi = UnifiClient(self.cfg['unifi'])
self.tickets = TicketClient(self.cfg.get('ticket_api', {})) self.tickets = TicketClient(self.cfg.get('ticket_api', {}))
self.link_stats = LinkStatsCollector(self.cfg, self.prom, self.unifi) self.link_stats = LinkStatsCollector(self.cfg, self.prom, self.unifi)
self.pulse = self.link_stats.pulse # convenience alias
mon = self.cfg.get('monitor', {}) mon = self.cfg.get('monitor', {})
self.poll_interval = mon.get('poll_interval', 120) self.poll_interval = mon.get('poll_interval', 120)
@@ -838,7 +837,7 @@ class NetworkMonitor:
def _process_ping_hosts(self, suppressions: list) -> None: def _process_ping_hosts(self, suppressions: list) -> None:
for h in self.cfg.get('monitor', {}).get('ping_hosts', []): for h in self.cfg.get('monitor', {}).get('ping_hosts', []):
name, ip = h['name'], h['ip'] name, ip = h['name'], h['ip']
reachable = ping(ip) reachable = self.pulse.ping(ip)
if not reachable: if not reachable:
sup = db.check_suppressed(suppressions, 'host', name) sup = db.check_suppressed(suppressions, 'host', name)
@@ -908,7 +907,7 @@ class NetworkMonitor:
for h in self.cfg.get('monitor', {}).get('ping_hosts', []): for h in self.cfg.get('monitor', {}).get('ping_hosts', []):
name, ip = h['name'], h['ip'] name, ip = h['name'], h['ip']
reachable = ping(ip, count=1, timeout=2) reachable = self.pulse.ping(ip, count=1, timeout=2)
hosts[name] = { hosts[name] = {
'ip': ip, 'ip': ip,
'interfaces': {}, 'interfaces': {},
@@ -967,6 +966,7 @@ class NetworkMonitor:
except Exception as e: except Exception as e:
logger.error(f'Monitor loop error: {e}', exc_info=True) logger.error(f'Monitor loop error: {e}', exc_info=True)
time.sleep(30)
time.sleep(self.poll_interval) time.sleep(self.poll_interval)
+6 -5
View File
@@ -144,9 +144,9 @@
<!-- ⌘K affordance --> <!-- ⌘K affordance -->
<button type="button" <button type="button"
class="lt-btn lt-btn-ghost lt-btn-sm lt-cmd-hint-btn" class="lt-btn lt-btn-ghost lt-btn-sm lt-cmd-hint-btn"
data-action="open-cmdpalette"
title="Command palette (Ctrl+K)" title="Command palette (Ctrl+K)"
aria-label="Open command palette" aria-label="Open command palette">&#x2315;&nbsp;K</button>
onclick="if(window.lt&&lt.cmdPalette)lt.cmdPalette.open()">&#x2315;&nbsp;K</button>
<button type="button" class="lt-theme-btn" id="lt-theme-btn" <button type="button" class="lt-theme-btn" id="lt-theme-btn"
aria-label="Toggle theme" title="Toggle light/dark mode">&#x2600;</button> aria-label="Toggle theme" title="Toggle light/dark mode">&#x2600;</button>
@@ -313,7 +313,7 @@
<script> <script>
const GANDALF_CONFIG = { const GANDALF_CONFIG = {
ticket_web_url: "{{ config.get('ticket_api', {}).get('web_url', 'http://t.lotusguild.org/ticket/') }}" ticket_web_url: {{ config.get('ticket_api', {}).get('web_url', 'http://t.lotusguild.org/ticket/') | tojson }}
}; };
</script> </script>
<script src="{{ url_for('static', filename='app.js') }}"></script> <script src="{{ url_for('static', filename='app.js') }}"></script>
@@ -346,8 +346,9 @@
const btn = e.target.closest('[data-action]'); const btn = e.target.closest('[data-action]');
if (!btn) return; if (!btn) return;
const action = btn.getAttribute('data-action'); const action = btn.getAttribute('data-action');
if (action === 'show-keyboard-help' && window.lt) lt.modal.open('lt-keys-help'); if (action === 'open-cmdpalette' && window.lt && lt.cmdPalette) lt.cmdPalette.open();
if (action === 'open-settings' && window.lt) lt.modal.open('lt-settings-modal'); if (action === 'show-keyboard-help' && window.lt) lt.modal.open('lt-keys-help');
if (action === 'open-settings' && window.lt) lt.modal.open('lt-settings-modal');
}); });
lt.keys.on('r', function() { lt.autoRefresh.now(); }); lt.keys.on('r', function() { lt.autoRefresh.now(); });
+6 -3
View File
@@ -5,7 +5,7 @@
<!-- ── Status bar ──────────────────────────────────────────────────── --> <!-- ── Status bar ──────────────────────────────────────────────────── -->
<div class="status-bar"> <div class="status-bar">
<div class="status-chips"> <div class="status-chips" id="status-chips" aria-live="polite" aria-atomic="true">
{% if not daemon_ok %} {% if not daemon_ok %}
<span class="chip chip-critical">⚠ MONITOR OFFLINE</span> <span class="chip chip-critical">⚠ MONITOR OFFLINE</span>
{% endif %} {% endif %}
@@ -30,7 +30,8 @@
<div class="lt-stats-grid"> <div class="lt-stats-grid">
<div class="lt-stat-card{% if summary.critical %} lt-stat-card--alert{% endif %}" <div class="lt-stat-card{% if summary.critical %} lt-stat-card--alert{% endif %}"
id="stat-critical" role="button" tabindex="0" id="stat-critical" role="button" tabindex="0"
data-stat-filter="critical" aria-label="{{ summary.critical or 0 }} critical alerts"> data-stat-filter="critical" aria-label="{{ summary.critical or 0 }} critical alerts"
aria-controls="events-table-wrap">
<span class="lt-stat-icon lt-text-red" aria-hidden="true"></span> <span class="lt-stat-icon lt-text-red" aria-hidden="true"></span>
<div class="lt-stat-info"> <div class="lt-stat-info">
<span class="lt-stat-value lt-text-red" id="stat-critical-val">{{ summary.critical or 0 }}</span> <span class="lt-stat-value lt-text-red" id="stat-critical-val">{{ summary.critical or 0 }}</span>
@@ -39,7 +40,8 @@
</div> </div>
<div class="lt-stat-card" <div class="lt-stat-card"
id="stat-warning" role="button" tabindex="0" id="stat-warning" role="button" tabindex="0"
data-stat-filter="warning" aria-label="{{ summary.warning or 0 }} warning alerts"> data-stat-filter="warning" aria-label="{{ summary.warning or 0 }} warning alerts"
aria-controls="events-table-wrap">
<span class="lt-stat-icon lt-text-amber" aria-hidden="true"></span> <span class="lt-stat-icon lt-text-amber" aria-hidden="true"></span>
<div class="lt-stat-info"> <div class="lt-stat-info">
<span class="lt-stat-value lt-text-amber" id="stat-warning-val">{{ summary.warning or 0 }}</span> <span class="lt-stat-value lt-text-amber" id="stat-warning-val">{{ summary.warning or 0 }}</span>
@@ -484,6 +486,7 @@
function setCollapsed(v) { function setCollapsed(v) {
wrap.classList.toggle('is-collapsed', v); wrap.classList.toggle('is-collapsed', v);
wrap.setAttribute('aria-hidden', v ? 'true' : 'false');
btn.setAttribute('aria-expanded', v ? 'false' : 'true'); btn.setAttribute('aria-expanded', v ? 'false' : 'true');
btn.textContent = v ? '▾ Expand' : '▴ Collapse'; btn.textContent = v ? '▾ Expand' : '▴ Collapse';
try { localStorage.setItem(LS_KEY, v ? '1' : '0'); } catch(_) {} try { localStorage.setItem(LS_KEY, v ? '1' : '0'); } catch(_) {}
+7 -10
View File
@@ -107,10 +107,8 @@ function portBlockHtml(idx, port, swName, sfpBlock) {
const sfpCls = sfpBlock ? ' sfp-block' : ''; const sfpCls = sfpBlock ? ' sfp-block' : '';
const speedTxt = portSpeedLabel(port); const speedTxt = portSpeedLabel(port);
// LLDP neighbor: first 6 chars of hostname // LLDP neighbor: first 6 chars of hostname
const lldpName = (port && port.lldp_table && port.lldp_table.length) const lldpName = (port && port.lldp && (port.lldp.system_name || port.lldp.chassis_id))
? escHtml((port.lldp_table[0].chassis_id_subtype === 'local' ? escHtml((port.lldp.system_name || port.lldp.chassis_id || '').slice(0, 6))
? port.lldp_table[0].chassis_id
: port.lldp_table[0].system_name || port.lldp_table[0].chassis_id || '').slice(0, 6))
: ''; : '';
const lldpHtml = lldpName ? `<span class="port-lldp">${lldpName}</span>` : ''; const lldpHtml = lldpName ? `<span class="port-lldp">${lldpName}</span>` : '';
const speedHtml = speedTxt ? `<span class="port-speed">${speedTxt}</span>` : ''; const speedHtml = speedTxt ? `<span class="port-speed">${speedTxt}</span>` : '';
@@ -162,10 +160,8 @@ function renderChassis(swName, sw) {
const state = portBlockState(port); const state = portBlockState(port);
const title = port ? escHtml(port.name) : `Port ${idx}`; const title = port ? escHtml(port.name) : `Port ${idx}`;
const speedTxt = portSpeedLabel(port); const speedTxt = portSpeedLabel(port);
const lldpName = (port && port.lldp_table && port.lldp_table.length) const lldpName = (port && port.lldp && (port.lldp.system_name || port.lldp.chassis_id))
? escHtml((port.lldp_table[0].chassis_id_subtype === 'local' ? escHtml((port.lldp.system_name || port.lldp.chassis_id || '').slice(0, 6))
? port.lldp_table[0].chassis_id
: port.lldp_table[0].system_name || port.lldp_table[0].chassis_id || '').slice(0, 6))
: ''; : '';
const speedHtml = speedTxt ? `<span class="port-speed">${speedTxt}</span>` : ''; const speedHtml = speedTxt ? `<span class="port-speed">${speedTxt}</span>` : '';
const lldpHtml = lldpName ? `<span class="port-lldp">${lldpName}</span>` : ''; const lldpHtml = lldpName ? `<span class="port-lldp">${lldpName}</span>` : '';
@@ -231,6 +227,7 @@ function selectPort(el) {
} }
function closePanel() { function closePanel() {
if (_diagPollTimer) { clearInterval(_diagPollTimer); _diagPollTimer = null; }
document.getElementById('inspector-panel').classList.remove('open'); document.getElementById('inspector-panel').classList.remove('open');
document.querySelectorAll('.switch-port-block.selected') document.querySelectorAll('.switch-port-block.selected')
.forEach(el => el.classList.remove('selected')); .forEach(el => el.classList.remove('selected'));
@@ -262,7 +259,7 @@ function renderPanel(swName, idx) {
const poeCurStr = (d.poe_power != null && d.poe_power > 0) ? ` / draw <span class="val-amber">${d.poe_power.toFixed(1)}W</span>` : ''; const poeCurStr = (d.poe_power != null && d.poe_power > 0) ? ` / draw <span class="val-amber">${d.poe_power.toFixed(1)}W</span>` : '';
poeHtml = ` poeHtml = `
<div class="lt-divider"><span class="lt-divider-label">PoE</span></div> <div class="lt-divider"><span class="lt-divider-label">PoE</span></div>
<div class="panel-row"><span class="panel-label">Class</span><span class="panel-val">class ${d.poe_class}${poeMaxStr}</span></div> <div class="panel-row"><span class="panel-label">Class</span><span class="panel-val">class ${escHtml(String(d.poe_class))}${poeMaxStr}</span></div>
${d.poe_power != null ? `<div class="panel-row"><span class="panel-label">Draw</span><span class="panel-val">${d.poe_power > 0 ? `<span class="val-amber">${d.poe_power.toFixed(1)}W</span>` : '0W'}</span></div>` : ''} ${d.poe_power != null ? `<div class="panel-row"><span class="panel-label">Draw</span><span class="panel-val">${d.poe_power > 0 ? `<span class="val-amber">${d.poe_power.toFixed(1)}W</span>` : '0W'}</span></div>` : ''}
${d.poe_mode ? `<div class="panel-row"><span class="panel-label">Mode</span><span class="panel-val">${escHtml(d.poe_mode)}</span></div>` : ''}`; ${d.poe_mode ? `<div class="panel-row"><span class="panel-label">Mode</span><span class="panel-val">${escHtml(d.poe_mode)}</span></div>` : ''}`;
} }
@@ -468,7 +465,7 @@ async function loadInspector() {
} }
loadInspector(); loadInspector();
var _inspInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 60; const _inspInterval = (window.gandalfSettings && window.gandalfSettings.refreshInterval) || 60;
if (_inspInterval > 0) lt.autoRefresh.start(loadInspector, Math.max(_inspInterval, 15) * 1000); if (_inspInterval > 0) lt.autoRefresh.start(loadInspector, Math.max(_inspInterval, 15) * 1000);
window.onGandalfSettingsChanged = function(s) { window.onGandalfSettingsChanged = function(s) {
+2 -2
View File
@@ -9,9 +9,9 @@ from diagnose import DiagnosticsRunner # noqa: E402
# ── build_ssh_command ──────────────────────────────────────────────────────── # ── build_ssh_command ────────────────────────────────────────────────────────
class TestBuildSshCommand: class TestBuildSshCommand:
def test_contains_stricthostkeychecking_no(self): def test_contains_stricthostkeychecking_accept_new(self):
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0') cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')
assert 'StrictHostKeyChecking=no' in cmd assert 'StrictHostKeyChecking=accept-new' in cmd
def test_contains_host_ip(self): def test_contains_host_ip(self):
cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0') cmd = DiagnosticsRunner.build_ssh_command('10.0.0.1', 'eth0')