From 0c0150f698eb193ce83f7bf7b723f27889dd07f7 Mon Sep 17 00:00:00 2001 From: Jared Vititoe Date: Sun, 1 Mar 2026 23:03:18 -0500 Subject: [PATCH] Complete rewrite: full-featured network monitoring dashboard MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Two-service architecture: Flask web app (gandalf.service) + background polling daemon (gandalf-monitor.service) - Monitor polls Prometheus node_network_up for physical NIC states on all 6 hypervisors (added storage-01 at 10.10.10.11:9100) - UniFi API monitoring for switches, APs, and gateway device status - Ping reachability for hosts without node_exporter (pbs only now) - Smart baseline: interfaces first seen as down are never alerted on; only UP→DOWN regressions trigger tickets - Cluster-wide P1 ticket when 3+ hosts have genuine simultaneous interface regressions (guards against false positives on startup) - Tinker Tickets integration with 24-hour hash-based deduplication - Alert suppression: manual toggle or timed windows (30m/1h/4h/8h) - Authelia SSO via forward-auth headers, admin group required - Network topology: Internet → UDM-Pro → Agg Switch (10G DAC) → PoE Switch (10G DAC) → Hosts - MariaDB schema, suppression management UI, host/interface cards Co-Authored-By: Claude Sonnet 4.6 --- README.md | 218 ++++++++-- app.py | 323 ++++++++------ config.json | 60 ++- db.py | 304 +++++++++++++ gandalf-monitor.service | 22 + monitor.py | 479 +++++++++++++++++++++ requirements.txt | 5 + schema.sql | 50 +++ static/app.js | 373 ++++++++++------ static/style.css | 823 +++++++++++++++++++++++++++++------- templates/base.html | 36 ++ templates/index.html | 354 +++++++++++++--- templates/suppressions.html | 252 +++++++++++ 13 files changed, 2787 insertions(+), 512 deletions(-) create mode 100644 db.py create mode 100644 gandalf-monitor.service create mode 100644 monitor.py create mode 100644 requirements.txt create mode 100644 schema.sql create mode 100644 templates/base.html create mode 100644 templates/suppressions.html diff --git a/README.md b/README.md index 22cc514..34fee9d 100644 --- a/README.md +++ b/README.md @@ -2,61 +2,199 @@ > Because it shall not let problems pass! -## Multiple Distributed Servers Approach +Network monitoring dashboard for the LotusGuild Proxmox cluster. +Deployed on **LXC 157** (monitor-02 / 10.10.10.9), reachable at `gandalf.lotusguild.org`. -This architecture represents the most robust implementation approach for the system. +--- -### Core Components +## Architecture -1. Multiple monitoring nodes across different network segments -2. Distributed database for sharing state -3. Consensus mechanism for alert verification +Gandalf is two processes that share a MariaDB database: -### System Architecture +| Process | Service | Role | +|---|---|---| +| `app.py` | `gandalf.service` | Flask web dashboard (gunicorn, port 8000) | +| `monitor.py` | `gandalf-monitor.service` | Background polling daemon | -#### A. Monitoring Layer +``` +[Prometheus :9090] ──▶ + monitor.py ──▶ MariaDB ◀── app.py ──▶ nginx ──▶ Authelia ──▶ Browser +[UniFi Controller] ──▶ +``` -- Multiple monitoring nodes in different locations/segments -- Each node runs independent health checks -- Mix of internal and external perspectives +### Data Sources -#### B. Data Collection +| Source | What it monitors | +|---|---| +| **Prometheus** (`10.10.10.48:9090`) | Physical NIC link state (`node_network_up`) for 5 Proxmox hypervisors | +| **UniFi API** (`https://10.10.10.1`) | Switch, AP, and gateway device status | +| **Ping** | pbs (10.10.10.3) and storage-01 (10.10.10.11) — no node_exporter | -Each node collects: -- Link status -- Latency measurements -- Error rates -- Bandwidth utilization -- Device health metrics +### Monitored Hosts (Prometheus / node_exporter) -#### C. Consensus Mechanism +| Host | Instance | +|---|---| +| large1 | 10.10.10.2:9100 | +| compute-storage-01 | 10.10.10.4:9100 | +| micro1 | 10.10.10.8:9100 | +| monitor-02 | 10.10.10.9:9100 | +| compute-storage-gpu-01 | 10.10.10.10:9100 | -- Multiple nodes must agree before declaring an outage -- Voting system implementation: - - 2/3 node agreement required for issue confirmation - - Weighted checks based on type - - Time-based consensus requirements (X seconds persistence) +--- -#### D. Alert Verification +## Features -- Cross-reference multiple data points -- Check from different network paths -- Verify both ends of connections -- Consider network topology +- **Interface monitoring** – tracks link state for all physical NICs via Prometheus +- **UniFi device monitoring** – detects offline switches, APs, and gateways +- **Ping reachability** – covers hosts without node_exporter +- **Cluster-wide detection** – creates a separate P1 ticket when 3+ hosts have simultaneous interface failures (likely a switch failure) +- **Smart baseline tracking** – interfaces that are down on first observation (unused ports) are never alerted on; only regressions from UP→DOWN trigger tickets +- **Ticket creation** – integrates with Tinker Tickets (`t.lotusguild.org`) with 24-hour deduplication +- **Alert suppression** – manual toggle or timed windows (30min / 1hr / 4hr / 8hr / manual) +- **Authelia SSO** – restricted to `admin` group via forward-auth headers -#### E. Redundancy +--- -- Eliminates single points of failure -- Nodes distributed across availability zones -- Independent power and network paths +## Alert Logic -#### F. Central Coordination +### Ticket Triggers -- Distributed database for state sharing -- Leader election for coordinating responses -- Backup coordinators ready to take over +| Condition | Priority | +|---|---| +| UniFi device offline (2+ consecutive checks) | P2 High | +| Proxmox host NIC link-down regression (2+ consecutive checks) | P2 High | +| Host unreachable via ping (2+ consecutive checks) | P2 High | +| 3+ hosts simultaneously reporting interface failures | P1 Critical | -### Additional Features +### Suppression Targets -- Alarm suppression capabilities -- Ticket creation system integration \ No newline at end of file +| Type | Suppresses | +|---|---| +| `host` | All interface alerts for a named host | +| `interface` | A specific NIC on a specific host | +| `unifi_device` | A specific UniFi device | +| `all` | Everything (global maintenance mode) | + +Suppressions can be manual (persist until removed) or timed (auto-expire). + +--- + +## Configuration + +**`config.json`** – shared by both processes: + +| Key | Description | +|---|---| +| `unifi.api_key` | UniFi API key from controller | +| `prometheus.url` | Prometheus base URL | +| `database.*` | MariaDB credentials | +| `ticket_api.api_key` | Tinker Tickets Bearer token | +| `monitor.poll_interval` | Seconds between checks (default: 120) | +| `monitor.failure_threshold` | Consecutive failures before ticketing (default: 2) | +| `monitor.cluster_threshold` | Hosts with failures to trigger cluster alert (default: 3) | +| `monitor.ping_hosts` | Hosts checked via ping (no node_exporter) | +| `hosts` | Maps Prometheus instance labels to hostnames | + +--- + +## Deployment (LXC 157) + +### 1. Database (MariaDB LXC 149 at 10.10.10.50) + +```sql +CREATE DATABASE gandalf CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; +CREATE USER 'gandalf'@'10.10.10.61' IDENTIFIED BY 'your_password'; +GRANT ALL PRIVILEGES ON gandalf.* TO 'gandalf'@'10.10.10.61'; +FLUSH PRIVILEGES; +``` + +Then import the schema: +```bash +mysql -h 10.10.10.50 -u gandalf -p gandalf < schema.sql +``` + +### 2. LXC 157 – Install dependencies + +```bash +pip3 install -r requirements.txt +``` + +### 3. Deploy files + +```bash +cp app.py db.py monitor.py config.json templates/ static/ /var/www/html/prod/ +``` + +### 4. Configure secrets in `config.json` + +- `database.password` – set the gandalf DB password +- `ticket_api.api_key` – copy from tinker tickets admin panel + +### 5. Install the monitor service + +```bash +cp gandalf-monitor.service /etc/systemd/system/ +systemctl daemon-reload +systemctl enable gandalf-monitor +systemctl start gandalf-monitor +``` + +Update existing `gandalf.service` to use a single worker: +``` +ExecStart=/usr/bin/python3 -m gunicorn --workers 1 --bind 127.0.0.1:8000 app:app +``` + +### 6. Authelia rule + +Add to `/etc/authelia/configuration.yml` access_control rules: +```yaml +- domain: gandalf.lotusguild.org + policy: one_factor + subject: + - group:admin +``` + +Reload Authelia: `systemctl reload authelia` + +### 7. NPM proxy host + +- Domain: `gandalf.lotusguild.org` +- Forward to: `http://10.10.10.61:80` (nginx on LXC 157) +- Enable Authelia forward auth +- WebSockets: **not required** + +--- + +## Service Management + +```bash +# Monitor daemon +systemctl status gandalf-monitor +journalctl -u gandalf-monitor -f + +# Web server +systemctl status gandalf +journalctl -u gandalf -f + +# Restart both after config/code changes +systemctl restart gandalf-monitor gandalf +``` + +--- + +## Troubleshooting + +**Monitor not creating tickets** +- Check `config.json` → `ticket_api.api_key` is set +- Check `journalctl -u gandalf-monitor` for errors + +**Baseline re-initializing on every restart** +- `interface_baseline` is stored in the `monitor_state` DB table; it persists across restarts + +**Interface always showing as "initial_down"** +- That interface was down on the first poll after the monitor started +- It will begin tracking once it comes up; or manually update the baseline in DB if needed + +**Prometheus data missing for a host** +- Verify node_exporter is running: `systemctl status prometheus-node-exporter` +- Check Prometheus targets: `http://10.10.10.48:9090/targets` diff --git a/app.py b/app.py index 7632b7e..a8900b5 100644 --- a/app.py +++ b/app.py @@ -1,144 +1,207 @@ -import logging +"""Gandalf – Global Advanced Network Detection And Link Facilitator. + +Flask web application serving the monitoring dashboard and suppression +management UI. Authentication via Authelia forward-auth headers. +All monitoring and alerting is handled by the separate monitor.py daemon. +""" import json -import platform -import subprocess -import threading -import time -from datetime import datetime -from flask import Flask, render_template, jsonify -import requests -from urllib3.exceptions import InsecureRequestWarning +import logging +from functools import wraps + +from flask import Flask, jsonify, redirect, render_template, request, url_for + +import db + +logging.basicConfig( + level=logging.INFO, + format='%(asctime)s %(levelname)s %(name)s %(message)s', +) +logger = logging.getLogger('gandalf.web') -logging.basicConfig(level=logging.DEBUG) -logger = logging.getLogger(__name__) -requests.packages.urllib3.disable_warnings(InsecureRequestWarning) app = Flask(__name__) -device_status = {} -def load_config(): - with open('config.json') as f: - return json.load(f) +_cfg = None -class UnifiAPI: - def __init__(self, config): - self.base_url = config['unifi']['controller'] - self.session = requests.Session() - self.session.verify = False - self.headers = { - 'X-API-KEY': config['unifi']['api_key'], - 'Accept': 'application/json' - } - self.site_id = "default" - - def get_devices(self): - try: - url = f"{self.base_url}/proxy/network/v2/api/site/{self.site_id}/device" - response = self.session.get(url, headers=self.headers) - response.raise_for_status() - - # Log raw response - logger.debug(f"Response status: {response.status_code}") - logger.debug(f"Response headers: {response.headers}") - logger.debug(f"Raw response text: {response.text}") - - devices_data = response.json() - logger.debug(f"Parsed JSON: {devices_data}") - - # Extract network_devices from the response - network_devices = devices_data.get('network_devices', []) - - devices = [] - for device in network_devices: - devices.append({ - 'name': device.get('name', 'Unknown'), - 'ip': device.get('ip', '0.0.0.0'), - 'type': device.get('type', 'unknown'), - 'connection_type': 'fiber' if device.get('uplink', {}).get('media') == 'sfp' else 'copper', - 'critical': True if device.get('type') in ['udm', 'usw'] else False, - 'device_id': device.get('mac') - }) - - logger.debug(f"Processed devices: {devices}") - return devices - - except Exception as e: - logger.error(f"Error fetching devices: {e}") - logger.exception("Full traceback:") - return [] - - def get_device_details(self, device_id): - try: - url = f"{self.base_url}/proxy/network/v2/api/site/{self.site_id}/device/{device_id}" - response = self.session.get(url, headers=self.headers) - response.raise_for_status() - return response.json() - except Exception as e: - logger.error(f"Failed to get device details: {e}") - return None - def get_device_diagnostics(self, device): - details = self.get_device_details(device['device_id']) - if not details: - return {'state': 'ERROR', 'error': 'Failed to fetch device details'} - - diagnostics = { - 'state': details.get('state', 'unknown'), - 'interfaces': { - 'ports': {} - } - } - - # Parse port information - for port in details.get('port_table', []): - diagnostics['interfaces']['ports'][f"Port {port.get('port_idx')}"] = { - 'state': 'up' if port.get('up') else 'down', - 'speed': { - 'current': port.get('speed', 0), - 'max': port.get('max_speed', 0) - }, - 'poe': port.get('poe_enable', False), - 'media': port.get('media', 'unknown') - } - - return diagnostics +def _config() -> dict: + global _cfg + if _cfg is None: + with open('config.json') as f: + _cfg = json.load(f) + return _cfg - def _parse_interfaces(self, interfaces): - result = { - 'ports': {}, - 'radios': {} - } - for port in interfaces: - result['ports'][f"port_{port['index']}"] = { - 'state': port['up'] and 'up' or 'down', - 'speed': { - 'current': port.get('speed', 0), - 'max': port.get('max_speed', 0) - } - } - return result + +# --------------------------------------------------------------------------- +# Auth helpers +# --------------------------------------------------------------------------- + +def _get_user() -> dict: + return { + 'username': request.headers.get('Remote-User', ''), + 'name': request.headers.get('Remote-Name', ''), + 'email': request.headers.get('Remote-Email', ''), + 'groups': [ + g.strip() + for g in request.headers.get('Remote-Groups', '').split(',') + if g.strip() + ], + } + + +def require_auth(f): + @wraps(f) + def wrapper(*args, **kwargs): + user = _get_user() + if not user['username']: + return ( + '

401 – Not authenticated

' + '

Please access Gandalf through ' + 'auth.lotusguild.org.

', + 401, + ) + allowed = _config().get('auth', {}).get('allowed_groups', ['admin']) + if not any(g in allowed for g in user['groups']): + return ( + f'

403 – Access denied

' + f'

Your account ({user["username"]}) is not in an allowed group ' + f'({", ".join(allowed)}).

', + 403, + ) + return f(*args, **kwargs) + return wrapper + + +# --------------------------------------------------------------------------- +# Page routes +# --------------------------------------------------------------------------- @app.route('/') -def home(): - config = load_config() - unifi = UnifiAPI(config) - devices = unifi.get_devices() - return render_template('index.html', devices=devices) +@require_auth +def index(): + user = _get_user() + events = db.get_active_events() + summary = db.get_status_summary() + snapshot_raw = db.get_state('network_snapshot') + last_check = db.get_state('last_check', 'Never') + snapshot = json.loads(snapshot_raw) if snapshot_raw else {} + suppressions = db.get_active_suppressions() + return render_template( + 'index.html', + user=user, + events=events, + summary=summary, + snapshot=snapshot, + last_check=last_check, + suppressions=suppressions, + ) + + +@app.route('/suppressions') +@require_auth +def suppressions_page(): + user = _get_user() + active = db.get_active_suppressions() + history = db.get_suppression_history(limit=50) + snapshot_raw = db.get_state('network_snapshot') + snapshot = json.loads(snapshot_raw) if snapshot_raw else {} + return render_template( + 'suppressions.html', + user=user, + active=active, + history=history, + snapshot=snapshot, + ) + + +# --------------------------------------------------------------------------- +# API routes +# --------------------------------------------------------------------------- @app.route('/api/status') -def status(): - return jsonify(device_status) +@require_auth +def api_status(): + return jsonify({ + 'summary': db.get_status_summary(), + 'last_check': db.get_state('last_check', 'Never'), + 'events': db.get_active_events(), + }) + + +@app.route('/api/network') +@require_auth +def api_network(): + raw = db.get_state('network_snapshot') + if raw: + try: + return jsonify(json.loads(raw)) + except Exception: + pass + return jsonify({'hosts': {}, 'unifi': [], 'updated': None}) + + +@app.route('/api/events') +@require_auth +def api_events(): + return jsonify({ + 'active': db.get_active_events(), + 'resolved': db.get_recent_resolved(hours=24, limit=30), + }) + + +@app.route('/api/suppressions', methods=['GET']) +@require_auth +def api_get_suppressions(): + return jsonify(db.get_active_suppressions()) + + +@app.route('/api/suppressions', methods=['POST']) +@require_auth +def api_create_suppression(): + user = _get_user() + data = request.get_json(silent=True) or {} + + target_type = data.get('target_type', 'host') + target_name = (data.get('target_name') or '').strip() + target_detail = (data.get('target_detail') or '').strip() + reason = (data.get('reason') or '').strip() + expires_minutes = data.get('expires_minutes') # None = manual/permanent + + if target_type not in ('host', 'interface', 'unifi_device', 'all'): + return jsonify({'error': 'Invalid target_type'}), 400 + if target_type != 'all' and not target_name: + return jsonify({'error': 'target_name required'}), 400 + if not reason: + return jsonify({'error': 'reason required'}), 400 + + sup_id = db.create_suppression( + target_type=target_type, + target_name=target_name, + target_detail=target_detail, + reason=reason, + suppressed_by=user['username'], + expires_minutes=int(expires_minutes) if expires_minutes else None, + ) + logger.info( + f'Suppression #{sup_id} created by {user["username"]}: ' + f'{target_type}/{target_name}/{target_detail} – {reason}' + ) + return jsonify({'success': True, 'id': sup_id}) + + +@app.route('/api/suppressions/', methods=['DELETE']) +@require_auth +def api_delete_suppression(sup_id: int): + user = _get_user() + db.deactivate_suppression(sup_id) + logger.info(f'Suppression #{sup_id} removed by {user["username"]}') + return jsonify({'success': True}) + + +@app.route('/health') +def health(): + """Health check endpoint (no auth).""" + return jsonify({'status': 'ok', 'service': 'gandalf'}) -@app.route('/api/diagnostics') -def get_diagnostics(): - config = load_config() - unifi = UnifiAPI(config) - devices = unifi.get_devices() - diagnostics = {} - for device in devices: - diagnostics[device['name']] = unifi.get_device_diagnostics(device) - return jsonify(diagnostics) if __name__ == '__main__': - status_thread = threading.Thread(target=update_status, daemon=True) - status_thread.start() - app.run(debug=True) \ No newline at end of file + app.run(debug=True, host='0.0.0.0', port=5000) diff --git a/config.json b/config.json index a2f1313..4a3e966 100644 --- a/config.json +++ b/config.json @@ -4,5 +4,61 @@ "api_key": "kyPfIsAVie3hwMD4Bc1MjAu8N7HVPIb8", "site_id": "default" }, - "check_interval": 30 -} \ No newline at end of file + "prometheus": { + "url": "http://10.10.10.48:9090" + }, + "database": { + "host": "10.10.10.50", + "port": 3306, + "user": "gandalf", + "password": "Gandalf2026Lotus", + "name": "gandalf" + }, + "ticket_api": { + "url": "http://10.10.10.45/create_ticket_api.php", + "api_key": "5acc5d3c647b84f7c6f59082ce4450ee772e2d1633238b960136f653d20c93af" + }, + "auth": { + "allowed_groups": ["admin"] + }, + "monitor": { + "poll_interval": 120, + "failure_threshold": 2, + "cluster_threshold": 3, + "ping_hosts": [ + {"name": "pbs", "ip": "10.10.10.3"} + ] + }, + "hosts": [ + { + "name": "large1", + "ip": "10.10.10.2", + "prometheus_instance": "10.10.10.2:9100" + }, + { + "name": "compute-storage-01", + "ip": "10.10.10.4", + "prometheus_instance": "10.10.10.4:9100" + }, + { + "name": "micro1", + "ip": "10.10.10.8", + "prometheus_instance": "10.10.10.8:9100" + }, + { + "name": "monitor-02", + "ip": "10.10.10.9", + "prometheus_instance": "10.10.10.9:9100" + }, + { + "name": "compute-storage-gpu-01", + "ip": "10.10.10.10", + "prometheus_instance": "10.10.10.10:9100" + }, + { + "name": "storage-01", + "ip": "10.10.10.11", + "prometheus_instance": "10.10.10.11:9100" + } + ] +} diff --git a/db.py b/db.py new file mode 100644 index 0000000..5cd638b --- /dev/null +++ b/db.py @@ -0,0 +1,304 @@ +"""Database operations for Gandalf network monitor.""" +import json +import logging +from contextlib import contextmanager +from datetime import datetime, timedelta +from typing import Optional + +import pymysql +import pymysql.cursors + +logger = logging.getLogger(__name__) + +_config_cache = None + + +def _config() -> dict: + global _config_cache + if _config_cache is None: + with open('config.json') as f: + _config_cache = json.load(f)['database'] + return _config_cache + + +@contextmanager +def get_conn(): + cfg = _config() + conn = pymysql.connect( + host=cfg['host'], + port=cfg.get('port', 3306), + user=cfg['user'], + password=cfg['password'], + database=cfg['name'], + autocommit=True, + cursorclass=pymysql.cursors.DictCursor, + connect_timeout=10, + charset='utf8mb4', + ) + try: + yield conn + finally: + conn.close() + + +# --------------------------------------------------------------------------- +# Monitor state (key/value store) +# --------------------------------------------------------------------------- + +def set_state(key: str, value) -> None: + if not isinstance(value, str): + value = json.dumps(value, default=str) + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """INSERT INTO monitor_state (key_name, value) + VALUES (%s, %s) + ON DUPLICATE KEY UPDATE value=VALUES(value), updated_at=NOW()""", + (key, value), + ) + + +def get_state(key: str, default=None): + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute('SELECT value FROM monitor_state WHERE key_name=%s', (key,)) + row = cur.fetchone() + return row['value'] if row else default + + +# --------------------------------------------------------------------------- +# Interface baseline tracking +# --------------------------------------------------------------------------- + +def get_baseline() -> dict: + raw = get_state('interface_baseline') + if raw: + try: + return json.loads(raw) + except Exception: + pass + return {} + + +def set_baseline(baseline: dict) -> None: + set_state('interface_baseline', json.dumps(baseline)) + + +# --------------------------------------------------------------------------- +# Network events +# --------------------------------------------------------------------------- + +def upsert_event( + event_type: str, + severity: str, + source_type: str, + target_name: str, + target_detail: str, + description: str, +) -> tuple: + """Insert or update a network event. Returns (id, is_new, consecutive_failures).""" + detail = target_detail or '' + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """SELECT id, consecutive_failures FROM network_events + WHERE event_type=%s AND target_name=%s AND target_detail=%s + AND resolved_at IS NULL LIMIT 1""", + (event_type, target_name, detail), + ) + existing = cur.fetchone() + + if existing: + new_count = existing['consecutive_failures'] + 1 + cur.execute( + """UPDATE network_events + SET last_seen=NOW(), consecutive_failures=%s, description=%s + WHERE id=%s""", + (new_count, description, existing['id']), + ) + return existing['id'], False, new_count + else: + cur.execute( + """INSERT INTO network_events + (event_type, severity, source_type, target_name, target_detail, description) + VALUES (%s, %s, %s, %s, %s, %s)""", + (event_type, severity, source_type, target_name, detail, description), + ) + return cur.lastrowid, True, 1 + + +def resolve_event(event_type: str, target_name: str, target_detail: str = '') -> None: + detail = target_detail or '' + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """UPDATE network_events SET resolved_at=NOW() + WHERE event_type=%s AND target_name=%s AND target_detail=%s + AND resolved_at IS NULL""", + (event_type, target_name, detail), + ) + + +def set_ticket_id(event_id: int, ticket_id: str) -> None: + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + 'UPDATE network_events SET ticket_id=%s WHERE id=%s', + (ticket_id, event_id), + ) + + +def get_active_events() -> list: + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """SELECT * FROM network_events + WHERE resolved_at IS NULL + ORDER BY + FIELD(severity,'critical','warning','info'), + first_seen DESC""" + ) + rows = cur.fetchall() + for r in rows: + for k in ('first_seen', 'last_seen'): + if r.get(k) and hasattr(r[k], 'isoformat'): + r[k] = r[k].isoformat() + return rows + + +def get_recent_resolved(hours: int = 24, limit: int = 50) -> list: + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """SELECT * FROM network_events + WHERE resolved_at IS NOT NULL + AND resolved_at > DATE_SUB(NOW(), INTERVAL %s HOUR) + ORDER BY resolved_at DESC LIMIT %s""", + (hours, limit), + ) + rows = cur.fetchall() + for r in rows: + for k in ('first_seen', 'last_seen', 'resolved_at'): + if r.get(k) and hasattr(r[k], 'isoformat'): + r[k] = r[k].isoformat() + return rows + + +def get_status_summary() -> dict: + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """SELECT severity, COUNT(*) as cnt FROM network_events + WHERE resolved_at IS NULL GROUP BY severity""" + ) + counts = {r['severity']: r['cnt'] for r in cur.fetchall()} + return { + 'critical': counts.get('critical', 0), + 'warning': counts.get('warning', 0), + 'info': counts.get('info', 0), + } + + +# --------------------------------------------------------------------------- +# Suppression rules +# --------------------------------------------------------------------------- + +def get_active_suppressions() -> list: + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """SELECT * FROM suppression_rules + WHERE active=TRUE AND (expires_at IS NULL OR expires_at > NOW()) + ORDER BY created_at DESC""" + ) + rows = cur.fetchall() + for r in rows: + for k in ('created_at', 'expires_at'): + if r.get(k) and hasattr(r[k], 'isoformat'): + r[k] = r[k].isoformat() + return rows + + +def get_suppression_history(limit: int = 50) -> list: + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + 'SELECT * FROM suppression_rules ORDER BY created_at DESC LIMIT %s', + (limit,), + ) + rows = cur.fetchall() + for r in rows: + for k in ('created_at', 'expires_at'): + if r.get(k) and hasattr(r[k], 'isoformat'): + r[k] = r[k].isoformat() + return rows + + +def create_suppression( + target_type: str, + target_name: str, + target_detail: str, + reason: str, + suppressed_by: str, + expires_minutes: Optional[int] = None, +) -> int: + expires_at = None + if expires_minutes: + expires_at = datetime.utcnow() + timedelta(minutes=int(expires_minutes)) + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + """INSERT INTO suppression_rules + (target_type, target_name, target_detail, reason, suppressed_by, expires_at, active) + VALUES (%s, %s, %s, %s, %s, %s, TRUE)""", + (target_type, target_name or '', target_detail or '', reason, suppressed_by, expires_at), + ) + return cur.lastrowid + + +def deactivate_suppression(sup_id: int) -> None: + with get_conn() as conn: + with conn.cursor() as cur: + cur.execute( + 'UPDATE suppression_rules SET active=FALSE WHERE id=%s', (sup_id,) + ) + + +def is_suppressed(target_type: str, target_name: str, target_detail: str = '') -> bool: + with get_conn() as conn: + with conn.cursor() as cur: + # Global suppression (all) + cur.execute( + """SELECT id FROM suppression_rules + WHERE active=TRUE AND (expires_at IS NULL OR expires_at > NOW()) + AND target_type='all' LIMIT 1""" + ) + if cur.fetchone(): + return True + + if not target_name: + return False + + # Host-level suppression (covers all interfaces on that host) + cur.execute( + """SELECT id FROM suppression_rules + WHERE active=TRUE AND (expires_at IS NULL OR expires_at > NOW()) + AND target_type=%s AND target_name=%s + AND (target_detail IS NULL OR target_detail='') LIMIT 1""", + (target_type, target_name), + ) + if cur.fetchone(): + return True + + # Interface/device-specific suppression + if target_detail: + cur.execute( + """SELECT id FROM suppression_rules + WHERE active=TRUE AND (expires_at IS NULL OR expires_at > NOW()) + AND target_type=%s AND target_name=%s AND target_detail=%s LIMIT 1""", + (target_type, target_name, target_detail), + ) + if cur.fetchone(): + return True + + return False diff --git a/gandalf-monitor.service b/gandalf-monitor.service new file mode 100644 index 0000000..aa3296d --- /dev/null +++ b/gandalf-monitor.service @@ -0,0 +1,22 @@ +[Unit] +Description=Gandalf Network Monitor Daemon +Documentation=https://gitea.lotusguild.org/LotusGuild/gandalf +After=network.target +Wants=network-online.target + +[Service] +Type=simple +User=www-data +WorkingDirectory=/var/www/html/prod +ExecStart=/usr/bin/python3 /var/www/html/prod/monitor.py +Restart=on-failure +RestartSec=30 +TimeoutStopSec=10 + +# Logging +StandardOutput=journal +StandardError=journal +SyslogIdentifier=gandalf-monitor + +[Install] +WantedBy=multi-user.target diff --git a/monitor.py b/monitor.py new file mode 100644 index 0000000..8d09607 --- /dev/null +++ b/monitor.py @@ -0,0 +1,479 @@ +#!/usr/bin/env python3 +"""Gandalf network monitor daemon. + +Polls Prometheus (node_exporter) and the UniFi controller for network +interface and device state. Creates tickets in Tinker Tickets when issues +are detected, with deduplication and suppression support. + +Run as a separate systemd service alongside the Flask web app. +""" +import json +import logging +import re +import subprocess +import time +from datetime import datetime +from typing import Dict, List, Optional + +import requests +from urllib3.exceptions import InsecureRequestWarning + +import db + +requests.packages.urllib3.disable_warnings(InsecureRequestWarning) + +logging.basicConfig( + level=logging.INFO, + format='%(asctime)s %(levelname)s %(name)s %(message)s', +) +logger = logging.getLogger('gandalf.monitor') + +# -------------------------------------------------------------------------- +# Interface filtering +# -------------------------------------------------------------------------- +_SKIP_PREFIXES = ( + 'lo', 'veth', 'tap', 'fwbr', 'fwln', 'fwpr', + 'docker', 'dummy', 'br-', 'virbr', 'vmbr', +) +_VLAN_SUFFIX = re.compile(r'\.\d+$') + + +def is_physical_interface(name: str) -> bool: + """Return True for physical/bond interfaces worth monitoring.""" + if any(name.startswith(p) for p in _SKIP_PREFIXES): + return False + if _VLAN_SUFFIX.search(name): + return False + return True + + +# -------------------------------------------------------------------------- +# Prometheus client +# -------------------------------------------------------------------------- +class PrometheusClient: + def __init__(self, url: str): + self.url = url.rstrip('/') + + def query(self, promql: str) -> list: + try: + resp = requests.get( + f'{self.url}/api/v1/query', + params={'query': promql}, + timeout=15, + ) + resp.raise_for_status() + data = resp.json() + if data.get('status') == 'success': + return data['data']['result'] + except Exception as e: + logger.error(f'Prometheus query failed ({promql!r}): {e}') + return [] + + def get_interface_states(self) -> Dict[str, Dict[str, bool]]: + """Return {instance: {device: is_up}} for physical interfaces.""" + results = self.query('node_network_up') + hosts: Dict[str, Dict[str, bool]] = {} + for r in results: + instance = r['metric'].get('instance', '') + device = r['metric'].get('device', '') + if not is_physical_interface(device): + continue + hosts.setdefault(instance, {})[device] = (r['value'][1] == '1') + return hosts + + +# -------------------------------------------------------------------------- +# UniFi client +# -------------------------------------------------------------------------- +class UnifiClient: + def __init__(self, cfg: dict): + self.base_url = cfg['controller'] + self.site_id = cfg.get('site_id', 'default') + self.session = requests.Session() + self.session.verify = False + self.headers = { + 'X-API-KEY': cfg['api_key'], + 'Accept': 'application/json', + } + + def get_devices(self) -> Optional[List[dict]]: + """Return list of UniFi devices, or None if the controller is unreachable.""" + try: + url = f'{self.base_url}/proxy/network/v2/api/site/{self.site_id}/device' + resp = self.session.get(url, headers=self.headers, timeout=15) + resp.raise_for_status() + data = resp.json() + devices = [] + for d in data.get('network_devices', []): + state = d.get('state', 1) + devices.append({ + 'name': d.get('name') or d.get('mac', 'unknown'), + 'mac': d.get('mac', ''), + 'ip': d.get('ip', ''), + 'type': d.get('type', 'unknown'), + 'model': d.get('model', ''), + 'state': state, + 'connected': state == 1, + }) + return devices + except Exception as e: + logger.error(f'UniFi API error: {e}') + return None + + +# -------------------------------------------------------------------------- +# Ticket client +# -------------------------------------------------------------------------- +class TicketClient: + def __init__(self, cfg: dict): + self.url = cfg.get('url', '') + self.api_key = cfg.get('api_key', '') + + def create(self, title: str, description: str, priority: str = '2') -> Optional[str]: + if not self.api_key or not self.url: + logger.warning('Ticket API not configured – skipping ticket creation') + return None + try: + resp = requests.post( + self.url, + json={ + 'title': title, + 'description': description, + 'status': 'Open', + 'priority': priority, + 'category': 'Network', + 'type': 'Issue', + }, + headers={'Authorization': f'Bearer {self.api_key}'}, + timeout=15, + ) + resp.raise_for_status() + data = resp.json() + if data.get('success'): + tid = data['ticket_id'] + logger.info(f'Created ticket #{tid}: {title}') + return tid + if data.get('existing_ticket_id'): + logger.info(f'Duplicate suppressed by API – existing #{data["existing_ticket_id"]}') + return data['existing_ticket_id'] + logger.warning(f'Unexpected ticket API response: {data}') + except Exception as e: + logger.error(f'Ticket creation failed: {e}') + return None + + +# -------------------------------------------------------------------------- +# Helpers +# -------------------------------------------------------------------------- +def ping(ip: str, count: int = 3, timeout: int = 2) -> bool: + try: + r = subprocess.run( + ['ping', '-c', str(count), '-W', str(timeout), ip], + stdout=subprocess.DEVNULL, + stderr=subprocess.DEVNULL, + timeout=30, + ) + return r.returncode == 0 + except Exception: + return False + + +def _now_utc() -> str: + return datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC') + + +# -------------------------------------------------------------------------- +# Monitor +# -------------------------------------------------------------------------- +CLUSTER_NAME = 'proxmox-cluster' + + +class NetworkMonitor: + def __init__(self): + with open('config.json') as f: + self.cfg = json.load(f) + + prom_url = self.cfg['prometheus']['url'] + self.prom = PrometheusClient(prom_url) + self.unifi = UnifiClient(self.cfg['unifi']) + self.tickets = TicketClient(self.cfg.get('ticket_api', {})) + + mon = self.cfg.get('monitor', {}) + self.poll_interval = mon.get('poll_interval', 120) + self.fail_thresh = mon.get('failure_threshold', 2) + self.cluster_thresh = mon.get('cluster_threshold', 3) + + # Build Prometheus instance → hostname lookup + self._instance_map: Dict[str, str] = { + h['prometheus_instance']: h['name'] + for h in self.cfg.get('hosts', []) + if 'prometheus_instance' in h + } + + def _hostname(self, instance: str) -> str: + return self._instance_map.get(instance, instance.split(':')[0]) + + # ------------------------------------------------------------------ + # Interface monitoring (Prometheus) + # ------------------------------------------------------------------ + def _process_interfaces(self, states: Dict[str, Dict[str, bool]]) -> None: + baseline = db.get_baseline() + new_baseline = {k: dict(v) for k, v in baseline.items()} + # Only count hosts with genuine regressions (UP→DOWN) toward cluster threshold + hosts_with_regression: List[str] = [] + + for instance, ifaces in states.items(): + host = self._hostname(instance) + new_baseline.setdefault(host, {}) + host_has_regression = False + + for iface, is_up in ifaces.items(): + prev = baseline.get(host, {}).get(iface) # 'up', 'initial_down', or None + + if is_up: + new_baseline[host][iface] = 'up' + db.resolve_event('interface_down', host, iface) + else: + if prev is None: + # First observation is down – could be unused port, don't alert + new_baseline[host][iface] = 'initial_down' + + elif prev == 'initial_down': + # Persistently down since first observation – no alert + pass + + else: # prev == 'up' + # Regression: was UP, now DOWN + host_has_regression = True + sup = ( + db.is_suppressed('interface', host, iface) or + db.is_suppressed('host', host) + ) + event_id, is_new, consec = db.upsert_event( + 'interface_down', 'critical', 'prometheus', + host, iface, + f'Interface {iface} on {host} went link-down ({_now_utc()})', + ) + if not sup and consec >= self.fail_thresh: + self._ticket_interface(event_id, is_new, host, iface, consec) + + if host_has_regression: + hosts_with_regression.append(host) + + db.set_baseline(new_baseline) + + # Cluster-wide check – only genuine regressions count + if len(hosts_with_regression) >= self.cluster_thresh: + sup = db.is_suppressed('all', '') + event_id, is_new, consec = db.upsert_event( + 'cluster_network_issue', 'critical', 'prometheus', + CLUSTER_NAME, '', + f'{len(hosts_with_regression)} hosts reporting simultaneous interface failures: ' + f'{", ".join(hosts_with_regression)}', + ) + if not sup and is_new: + title = ( + f'[{CLUSTER_NAME}][auto][production][issue][network][cluster-wide] ' + f'Multiple hosts reporting interface failures' + ) + desc = ( + f'Cluster Network Alert\n{"=" * 40}\n\n' + f'Affected hosts: {", ".join(hosts_with_regression)}\n' + f'Detected: {_now_utc()}\n\n' + f'{len(hosts_with_regression)} Proxmox hosts simultaneously reported ' + f'interface regressions (link-down on interfaces previously known UP).\n' + f'This likely indicates a switch or upstream network failure.\n\n' + f'Please check the core and management switches immediately.' + ) + tid = self.tickets.create(title, desc, priority='1') + if tid: + db.set_ticket_id(event_id, tid) + else: + db.resolve_event('cluster_network_issue', CLUSTER_NAME, '') + + def _ticket_interface( + self, event_id: int, is_new: bool, host: str, iface: str, consec: int + ) -> None: + title = ( + f'[{host}][auto][production][issue][network][single-node] ' + f'Interface {iface} link-down' + ) + desc = ( + f'Network Interface Alert\n{"=" * 40}\n\n' + f'Host: {host}\n' + f'Interface: {iface}\n' + f'Detected: {_now_utc()}\n' + f'Consecutive check failures: {consec}\n\n' + f'Interface {iface} on {host} is reporting link-down state via ' + f'Prometheus node_exporter.\n\n' + f'Note: {host} may still be reachable via its other network interface.\n' + f'Please inspect the cable/SFP/switch port for {host}/{iface}.' + ) + tid = self.tickets.create(title, desc, priority='2') + if tid and is_new: + db.set_ticket_id(event_id, tid) + + # ------------------------------------------------------------------ + # UniFi device monitoring + # ------------------------------------------------------------------ + def _process_unifi(self, devices: Optional[List[dict]]) -> None: + if devices is None: + logger.warning('UniFi API unreachable this cycle') + return + + for d in devices: + name = d['name'] + if not d['connected']: + sup = db.is_suppressed('unifi_device', name) + event_id, is_new, consec = db.upsert_event( + 'unifi_device_offline', 'critical', 'unifi', + name, d.get('type', ''), + f'UniFi {name} ({d.get("ip","")}) offline ({_now_utc()})', + ) + if not sup and consec >= self.fail_thresh: + self._ticket_unifi(event_id, is_new, d) + else: + db.resolve_event('unifi_device_offline', name, d.get('type', '')) + + def _ticket_unifi(self, event_id: int, is_new: bool, device: dict) -> None: + name = device['name'] + title = ( + f'[{name}][auto][production][issue][network][single-node] ' + f'UniFi device offline' + ) + desc = ( + f'UniFi Device Alert\n{"=" * 40}\n\n' + f'Device: {name}\n' + f'Type: {device.get("type","unknown")}\n' + f'Model: {device.get("model","")}\n' + f'Last Known IP: {device.get("ip","unknown")}\n' + f'Detected: {_now_utc()}\n\n' + f'The UniFi device {name} is offline per the UniFi controller.\n' + f'Please check power and cable connectivity.' + ) + tid = self.tickets.create(title, desc, priority='2') + if tid and is_new: + db.set_ticket_id(event_id, tid) + + # ------------------------------------------------------------------ + # Ping-only hosts (no node_exporter) + # ------------------------------------------------------------------ + def _process_ping_hosts(self) -> None: + for h in self.cfg.get('monitor', {}).get('ping_hosts', []): + name, ip = h['name'], h['ip'] + reachable = ping(ip) + + if not reachable: + sup = db.is_suppressed('host', name) + event_id, is_new, consec = db.upsert_event( + 'host_unreachable', 'critical', 'ping', + name, ip, + f'Host {name} ({ip}) unreachable via ping ({_now_utc()})', + ) + if not sup and consec >= self.fail_thresh: + self._ticket_unreachable(event_id, is_new, name, ip, consec) + else: + db.resolve_event('host_unreachable', name, ip) + + def _ticket_unreachable( + self, event_id: int, is_new: bool, name: str, ip: str, consec: int + ) -> None: + title = ( + f'[{name}][auto][production][issue][network][single-node] ' + f'Host unreachable' + ) + desc = ( + f'Host Reachability Alert\n{"=" * 40}\n\n' + f'Host: {name}\n' + f'IP: {ip}\n' + f'Detected: {_now_utc()}\n' + f'Consecutive check failures: {consec}\n\n' + f'Host {name} ({ip}) is not responding to ping from the Gandalf monitor.\n' + f'This host does not have a Prometheus node_exporter, so interface-level ' + f'detail is unavailable.\n\n' + f'Please check the host power, management interface, and network connectivity.' + ) + tid = self.tickets.create(title, desc, priority='2') + if tid and is_new: + db.set_ticket_id(event_id, tid) + + # ------------------------------------------------------------------ + # Snapshot collection (for dashboard) + # ------------------------------------------------------------------ + def _collect_snapshot(self) -> dict: + iface_states = self.prom.get_interface_states() + unifi_devices = self.unifi.get_devices() or [] + + hosts = {} + for instance, ifaces in iface_states.items(): + host = self._hostname(instance) + phys = {k: v for k, v in ifaces.items()} + up_count = sum(1 for v in phys.values() if v) + total = len(phys) + if total == 0 or up_count == total: + status = 'up' + elif up_count == 0: + status = 'down' + else: + status = 'degraded' + + hosts[host] = { + 'ip': instance.split(':')[0], + 'interfaces': {k: ('up' if v else 'down') for k, v in phys.items()}, + 'status': status, + 'source': 'prometheus', + } + + for h in self.cfg.get('monitor', {}).get('ping_hosts', []): + name, ip = h['name'], h['ip'] + reachable = ping(ip, count=1, timeout=2) + hosts[name] = { + 'ip': ip, + 'interfaces': {}, + 'status': 'up' if reachable else 'down', + 'source': 'ping', + } + + return { + 'hosts': hosts, + 'unifi': unifi_devices, + 'updated': datetime.utcnow().isoformat(), + } + + # ------------------------------------------------------------------ + # Main loop + # ------------------------------------------------------------------ + def run(self) -> None: + logger.info( + f'Gandalf monitor started – poll_interval={self.poll_interval}s ' + f'fail_thresh={self.fail_thresh}' + ) + while True: + try: + logger.info('Starting network check cycle') + + # 1. Collect and store snapshot for dashboard + snapshot = self._collect_snapshot() + db.set_state('network_snapshot', snapshot) + db.set_state('last_check', _now_utc()) + + # 2. Process alerts (separate Prometheus call for fresh data) + iface_states = self.prom.get_interface_states() + self._process_interfaces(iface_states) + + unifi_devices = self.unifi.get_devices() + self._process_unifi(unifi_devices) + + self._process_ping_hosts() + + logger.info('Network check cycle complete') + + except Exception as e: + logger.error(f'Monitor loop error: {e}', exc_info=True) + + time.sleep(self.poll_interval) + + +if __name__ == '__main__': + monitor = NetworkMonitor() + monitor.run() diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..e506263 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,5 @@ +flask>=2.2.0 +gunicorn>=20.1.0 +pymysql>=1.1.0 +requests>=2.31.0 +urllib3>=2.0.0 diff --git a/schema.sql b/schema.sql new file mode 100644 index 0000000..d663361 --- /dev/null +++ b/schema.sql @@ -0,0 +1,50 @@ +-- Gandalf Network Monitor – Database Schema +-- Run on MariaDB LXC 149 (10.10.10.50) + +CREATE DATABASE IF NOT EXISTS gandalf + CHARACTER SET utf8mb4 + COLLATE utf8mb4_unicode_ci; + +USE gandalf; + +-- ── Network events (open and resolved alerts) ───────────────────────── +CREATE TABLE IF NOT EXISTS network_events ( + id INT AUTO_INCREMENT PRIMARY KEY, + event_type VARCHAR(60) NOT NULL, + severity ENUM('critical','warning','info') NOT NULL DEFAULT 'warning', + source_type VARCHAR(20) NOT NULL, -- 'prometheus', 'unifi', 'ping' + target_name VARCHAR(255) NOT NULL, -- hostname or device name + target_detail VARCHAR(255) NOT NULL DEFAULT '', -- interface name, device type, IP + description TEXT, + first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + last_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, + resolved_at TIMESTAMP NULL, + consecutive_failures INT NOT NULL DEFAULT 1, + ticket_id VARCHAR(20) NULL, + + INDEX idx_active (resolved_at), + INDEX idx_target (target_name, target_detail), + INDEX idx_type (event_type) +) ENGINE=InnoDB; + +-- ── Suppression rules ───────────────────────────────────────────────── +CREATE TABLE IF NOT EXISTS suppression_rules ( + id INT AUTO_INCREMENT PRIMARY KEY, + target_type VARCHAR(50) NOT NULL, -- 'host', 'interface', 'unifi_device', 'all' + target_name VARCHAR(255) NOT NULL DEFAULT '', + target_detail VARCHAR(255) NOT NULL DEFAULT '', + reason TEXT NOT NULL, + suppressed_by VARCHAR(255) NOT NULL, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + expires_at TIMESTAMP NULL, -- NULL = manual (never auto-expires) + active BOOLEAN NOT NULL DEFAULT TRUE, + + INDEX idx_active_exp (active, expires_at) +) ENGINE=InnoDB; + +-- ── Monitor state (key/value store for snapshot + baseline) ─────────── +CREATE TABLE IF NOT EXISTS monitor_state ( + key_name VARCHAR(100) PRIMARY KEY, + value MEDIUMTEXT NOT NULL, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP +) ENGINE=InnoDB; diff --git a/static/app.js b/static/app.js index f1ce2ed..73095be 100644 --- a/static/app.js +++ b/static/app.js @@ -1,147 +1,272 @@ -// Initialization -const UPDATE_INTERVALS = { - deviceStatus: 30000, - diagnostics: 60000 -}; +'use strict'; -// Core update functions -function updateDeviceStatus() { - console.log('Fetching device status...'); - fetch('/api/status') - .then(response => response.json()) - .then(data => { - console.log('Received status data:', data); - Object.entries(data).forEach(([deviceName, status]) => { - const deviceElement = document.querySelector(`.device-status[data-device-name="${deviceName}"]`); - if (deviceElement) { - const indicator = deviceElement.querySelector('.status-indicator'); - indicator.className = `status-indicator status-${status ? 'up' : 'down'}`; - } - }); - }); +// ── Toast notifications ─────────────────────────────────────────────── +function showToast(msg, type = 'success') { + let container = document.querySelector('.toast-container'); + if (!container) { + container = document.createElement('div'); + container.className = 'toast-container'; + document.body.appendChild(container); + } + const toast = document.createElement('div'); + toast.className = `toast toast-${type}`; + toast.textContent = msg; + container.appendChild(toast); + setTimeout(() => toast.remove(), 3500); } -function toggleInterfaces(header) { - const list = header.nextElementSibling; - const icon = header.querySelector('.expand-icon'); - list.classList.toggle('collapsed'); - icon.style.transform = list.classList.contains('collapsed') ? 'rotate(-90deg)' : 'rotate(0deg)'; +// ── Dashboard auto-refresh ──────────────────────────────────────────── +async function refreshAll() { + try { + const [netResp, statusResp] = await Promise.all([ + fetch('/api/network'), + fetch('/api/status'), + ]); + if (!netResp.ok || !statusResp.ok) return; + + const net = await netResp.json(); + const status = await statusResp.json(); + + updateHostGrid(net.hosts || {}); + updateUnifiTable(net.unifi || []); + updateEventsTable(status.events || []); + updateStatusBar(status.summary || {}, status.last_check || ''); + updateTopology(net.hosts || {}); + + } catch (e) { + console.warn('Refresh failed:', e); + } } -function updateInterfaceStatus(deviceName, interfaces) { - const interfaceList = document.querySelector(`.interface-group[data-device-name="${deviceName}"] .interface-list`); - if (interfaceList && interfaces) { - interfaceList.innerHTML = ''; - Object.entries(interfaces.ports || {}).forEach(([portName, port]) => { - interfaceList.innerHTML += ` -
- ${portName} - ${port.speed.current}/${port.speed.max} Mbps - ${port.state} -
- `; - }); +function updateStatusBar(summary, lastCheck) { + const bar = document.querySelector('.status-chips'); + if (!bar) return; + const chips = []; + if (summary.critical) chips.push(`⬤ ${summary.critical} Critical`); + if (summary.warning) chips.push(`⬤ ${summary.warning} Warning`); + if (!summary.critical && !summary.warning) chips.push('✔ All systems nominal'); + bar.innerHTML = chips.join(''); + + const lc = document.getElementById('last-check'); + if (lc && lastCheck) lc.textContent = `Last check: ${lastCheck}`; +} + +function updateHostGrid(hosts) { + for (const [name, host] of Object.entries(hosts)) { + const card = document.querySelector(`.host-card[data-host="${CSS.escape(name)}"]`); + if (!card) continue; + + // Update card border class + card.className = card.className.replace(/host-card-(up|down|degraded|unknown)/g, ''); + card.classList.add(`host-card-${host.status}`); + + // Update status dot in header + const dot = card.querySelector('.host-status-dot'); + if (dot) dot.className = `host-status-dot dot-${host.status}`; + + // Update interface rows + const ifaceList = card.querySelector('.iface-list'); + if (ifaceList && host.interfaces && Object.keys(host.interfaces).length > 0) { + ifaceList.innerHTML = Object.entries(host.interfaces) + .sort(([a], [b]) => a.localeCompare(b)) + .map(([iface, state]) => ` +
+ + ${escHtml(iface)} + ${state} +
+ `).join(''); } + } } -function updateSystemHealth(deviceName, diagnostics) { - const metricsContainer = document.querySelector(`.health-metrics[data-device-name="${deviceName}"] .metrics-list`); - if (metricsContainer && diagnostics) { - const cpu = metricsContainer.querySelector('.cpu'); - const memory = metricsContainer.querySelector('.memory'); - const temperature = metricsContainer.querySelector('.temperature'); - - cpu.innerHTML = `CPU: ${diagnostics.system?.cpu || 'N/A'}%`; - memory.innerHTML = `Memory: ${diagnostics.system?.memory || 'N/A'}%`; - temperature.innerHTML = `Temp: ${diagnostics.system?.temperature || 'N/A'}°C`; +function updateTopology(hosts) { + document.querySelectorAll('.topo-host').forEach(node => { + const name = node.dataset.host; + const host = hosts[name]; + if (!host) return; + node.className = node.className.replace(/topo-status-(up|down|degraded|unknown)/g, ''); + node.classList.add(`topo-status-${host.status}`); + const badge = node.querySelector('.topo-badge'); + if (badge) { + badge.className = `topo-badge topo-badge-${host.status}`; + badge.textContent = host.status; } + }); } -function updateSystemMetrics() { - fetch('/api/metrics') - .then(response => response.json()) - .then(data => { - updateInterfaceStatus(data.interfaces); - updatePowerMetrics(data.power); - updateSystemHealth(data.health); - }); +function updateUnifiTable(devices) { + const tbody = document.querySelector('#unifi-table tbody'); + if (!tbody || !devices.length) return; + + tbody.innerHTML = devices.map(d => { + const statusClass = d.connected ? '' : 'row-critical'; + const dotClass = d.connected ? 'dot-up' : 'dot-down'; + const statusText = d.connected ? 'Online' : 'Offline'; + const suppressBtn = !d.connected + ? `` + : ''; + return ` + + ${statusText} + ${escHtml(d.name)} + ${escHtml(d.type)} + ${escHtml(d.model)} + ${escHtml(d.ip)} + ${suppressBtn} + `; + }).join(''); } -//Metric updates like interfaces, power, and health +function updateEventsTable(events) { + const wrap = document.getElementById('events-table-wrap'); + if (!wrap) return; -function updateDiagnostics() { - fetch('/api/diagnostics') - .then(response => response.json()) - .then(data => { - Object.entries(data).forEach(([deviceName, diagnostics]) => { - updateInterfaceStatus(deviceName, diagnostics.interfaces); - updateSystemHealth(deviceName, diagnostics); - }); - }); + const active = events.filter(e => e.severity !== 'info'); + if (!active.length) { + wrap.innerHTML = '

No active alerts ✔

'; + return; + } + + const rows = active.map(e => { + const supType = e.event_type === 'unifi_device_offline' ? 'unifi_device' + : e.event_type === 'interface_down' ? 'interface' + : 'host'; + const ticket = e.ticket_id + ? `#${e.ticket_id}` + : '–'; + return ` + + ${e.severity} + ${escHtml(e.event_type.replace(/_/g,' '))} + ${escHtml(e.target_name)} + ${escHtml(e.target_detail || '–')} + ${escHtml((e.description||'').substring(0,60))}${(e.description||'').length>60?'…':''} + ${escHtml(e.first_seen||'')} + ${e.consecutive_failures} + ${ticket} + + + + `; + }).join(''); + + wrap.innerHTML = ` + + + + + + + + ${rows} +
SeverityTypeTargetDetailDescriptionFirst SeenFailuresTicketActions
`; } -// Element creation functions -function createDiagnosticElement(device, diagnostics) { - const element = document.createElement('div'); - element.className = `diagnostic-item ${diagnostics.connection_type}-diagnostic`; - - const content = ` -

${device}

-
-
- Status: - ${diagnostics.state} -
-
- Firmware: - ${diagnostics.firmware.version} -
- ${createInterfaceHTML(diagnostics.interfaces)} -
- `; - - element.innerHTML = content; - return element; +// ── Suppression modal (dashboard) ──────────────────────────────────── +function openSuppressModal(type, name, detail) { + const modal = document.getElementById('suppress-modal'); + if (!modal) return; + + document.getElementById('sup-type').value = type; + document.getElementById('sup-name').value = name; + document.getElementById('sup-detail').value = detail; + document.getElementById('sup-reason').value = ''; + document.getElementById('sup-expires').value = ''; + + updateSuppressForm(); + modal.style.display = 'flex'; + + document.querySelectorAll('#suppress-modal .pill').forEach(p => p.classList.remove('active')); + const manualPill = document.querySelector('#suppress-modal .pill-manual'); + if (manualPill) manualPill.classList.add('active'); + const hint = document.getElementById('duration-hint'); + if (hint) hint.textContent = 'Suppression will persist until manually removed.'; } -function createInterfaceHTML(interfaces) { - let html = '
'; - - // Add port information - Object.entries(interfaces.ports || {}).forEach(([portName, port]) => { - html += ` -
- ${portName}: - ${port.speed.current}/${port.speed.max} Mbps - ${port.state} -
- `; +function closeSuppressModal() { + const modal = document.getElementById('suppress-modal'); + if (modal) modal.style.display = 'none'; +} + +function updateSuppressForm() { + const type = document.getElementById('sup-type').value; + const nameGrp = document.getElementById('sup-name-group'); + const detailGrp = document.getElementById('sup-detail-group'); + if (nameGrp) nameGrp.style.display = (type === 'all') ? 'none' : ''; + if (detailGrp) detailGrp.style.display = (type === 'interface') ? '' : 'none'; +} + +function setDuration(mins) { + document.getElementById('sup-expires').value = mins || ''; + + document.querySelectorAll('#suppress-modal .pill').forEach(p => p.classList.remove('active')); + event.currentTarget.classList.add('active'); + + const hint = document.getElementById('duration-hint'); + if (hint) { + if (mins) { + const h = Math.floor(mins / 60), m = mins % 60; + hint.textContent = `Expires in ${h ? h + 'h ' : ''}${m ? m + 'm' : ''}.`; + } else { + hint.textContent = 'Suppression will persist until manually removed.'; + } + } +} + +async function submitSuppress(e) { + e.preventDefault(); + const type = document.getElementById('sup-type').value; + const name = document.getElementById('sup-name').value; + const detail = document.getElementById('sup-detail').value; + const reason = document.getElementById('sup-reason').value; + const expires = document.getElementById('sup-expires').value; + + if (!reason.trim()) { showToast('Reason is required', 'error'); return; } + if (type !== 'all' && !name.trim()) { showToast('Target name is required', 'error'); return; } + + try { + const resp = await fetch('/api/suppressions', { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ + target_type: type, + target_name: name, + target_detail: detail, + reason: reason, + expires_minutes: expires ? parseInt(expires) : null, + }), }); - - // Add radio information - Object.entries(interfaces.radios || {}).forEach(([radioName, radio]) => { - html += ` -
- ${radioName}: - ${radio.standard} - Ch${radio.channel} (${radio.width}) -
- `; - }); - - html += '
'; - return html; + const data = await resp.json(); + if (data.success) { + closeSuppressModal(); + showToast('Suppression applied ✔', 'success'); + setTimeout(refreshAll, 500); + } else { + showToast(data.error || 'Failed to apply suppression', 'error'); + } + } catch (err) { + showToast('Network error', 'error'); + } } -// Initialize updates -function initializeUpdates() { - // Set update intervals - setInterval(updateDeviceStatus, UPDATE_INTERVALS.deviceStatus); - setInterval(updateDiagnostics, UPDATE_INTERVALS.diagnostics); +// ── Close modal on backdrop click ───────────────────────────────────── +document.addEventListener('click', e => { + const modal = document.getElementById('suppress-modal'); + if (modal && e.target === modal) closeSuppressModal(); +}); - // Initial updates - updateDeviceStatus(); - updateDiagnostics(); +// ── Utility ─────────────────────────────────────────────────────────── +function escHtml(str) { + if (str === null || str === undefined) return ''; + return String(str) + .replace(/&/g, '&') + .replace(//g, '>') + .replace(/"/g, '"'); } - -// Start the application -initializeUpdates(); \ No newline at end of file diff --git a/static/style.css b/static/style.css index a5cdeb3..051dde9 100644 --- a/static/style.css +++ b/static/style.css @@ -1,222 +1,747 @@ +/* ── Variables ──────────────────────────────────────────────────────── */ :root { - --primary-color: #006FFF; - --secondary-color: #00439C; - --background-color: #f8f9fa; - --card-background: #ffffff; - --text-color: #2c3e50; - --border-radius: 12px; + --blue: #006FFF; + --blue-dark: #00439C; + --blue-dim: rgba(0,111,255,.1); + --green: #10B981; + --red: #EF4444; + --orange: #F59E0B; + --yellow: #FBBF24; + --grey: #6B7280; + --grey-lt: #F3F4F6; + --border: #E5E7EB; + --text: #111827; + --text-sub: #6B7280; + --card-bg: #FFFFFF; + --bg: #F8FAFC; + --radius: 10px; + --shadow: 0 1px 3px rgba(0,0,0,.08), 0 4px 12px rgba(0,0,0,.06); + --font: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; + --mono: 'SF Mono', 'Fira Code', Consolas, monospace; } +/* ── Reset ──────────────────────────────────────────────────────────── */ +*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; } + body { - font-family: 'Inter', -apple-system, sans-serif; - background-color: var(--background-color); - color: var(--text-color); - margin: 0; - padding: 0; + font-family: var(--font); + background: var(--bg); + color: var(--text); + font-size: 14px; + line-height: 1.5; } -.container { - max-width: 1400px; - margin: 0 auto; - padding: 20px; +a { color: var(--blue); text-decoration: none; } +a:hover { text-decoration: underline; } + +/* ── Navbar ─────────────────────────────────────────────────────────── */ +.navbar { + background: linear-gradient(135deg, var(--blue-dark) 0%, var(--blue) 100%); + color: white; + display: flex; + align-items: center; + gap: 24px; + padding: 0 24px; + height: 56px; + box-shadow: 0 2px 8px rgba(0,0,0,.2); } -.header { - background: linear-gradient(to right, var(--primary-color), var(--secondary-color)); - color: white; - padding: 20px; - border-radius: var(--border-radius); - margin-bottom: 30px; +.nav-brand { + display: flex; + align-items: center; + gap: 8px; + flex-shrink: 0; } -.metrics-container { - display: grid; - grid-template-columns: repeat(auto-fit, minmax(350px, 1fr)); - gap: 25px; - margin-top: 20px; +.nav-logo { font-size: 20px; } + +.nav-title { + font-weight: 700; + font-size: 16px; + letter-spacing: .05em; } -.metric-card { - background: var(--card-background); - padding: 25px; - border-radius: var(--border-radius); - box-shadow: 0 4px 6px rgba(0,0,0,0.07); - transition: transform 0.2s ease; +.nav-sub { + font-size: 11px; + opacity: .7; + font-weight: 400; } -.metric-card:hover { - transform: translateY(-5px); +.nav-links { + display: flex; + gap: 4px; + flex: 1; } -.device-status { - display: flex; - align-items: center; - gap: 10px; - margin: 10px 0; +.nav-link { + color: rgba(255,255,255,.8); + padding: 6px 14px; + border-radius: 6px; + font-size: 13px; + transition: background .15s, color .15s; } -.status-indicator { - width: 12px; - height: 12px; - border-radius: 50%; +.nav-link:hover, .nav-link.active { + background: rgba(255,255,255,.15); + color: white; + text-decoration: none; } -.status-up { - background-color: #10B981; +.nav-user { + font-size: 12px; + opacity: .8; } -.status-down { - background-color: #EF4444; +/* ── Main layout ─────────────────────────────────────────────────────── */ +.main { max-width: 1400px; margin: 0 auto; padding: 24px 20px; } + +.page-header { margin-bottom: 24px; } +.page-title { font-size: 22px; font-weight: 700; } +.page-sub { color: var(--text-sub); margin-top: 4px; } + +/* ── Status bar ──────────────────────────────────────────────────────── */ +.status-bar { + display: flex; + align-items: center; + justify-content: space-between; + gap: 16px; + background: var(--card-bg); + border: 1px solid var(--border); + border-radius: var(--radius); + padding: 12px 20px; + margin-bottom: 24px; + box-shadow: var(--shadow); } -.diagnostics-panel { - margin-top: 15px; +.status-chips { display: flex; gap: 8px; flex-wrap: wrap; } + +.chip { + display: inline-flex; + align-items: center; + gap: 6px; + padding: 5px 12px; + border-radius: 20px; + font-size: 13px; + font-weight: 600; } -.diagnostic-item { - padding: 10px; - border-left: 4px solid var(--primary-color); - margin: 10px 0; - background: rgba(0,111,255,0.1); +.chip-critical { background: rgba(239,68,68,.12); color: var(--red); border: 1px solid rgba(239,68,68,.3); } +.chip-warning { background: rgba(245,158,11,.12); color: var(--orange); border: 1px solid rgba(245,158,11,.3); } +.chip-ok { background: rgba(16,185,129,.12); color: var(--green); border: 1px solid rgba(16,185,129,.3); } + +.status-meta { + display: flex; + align-items: center; + gap: 12px; + white-space: nowrap; } -.fiber-diagnostic { - border-color: #10B981; +.last-check { font-size: 12px; color: var(--text-sub); } + +.btn-refresh { + background: var(--blue-dim); + border: 1px solid rgba(0,111,255,.3); + color: var(--blue); + border-radius: 6px; + padding: 4px 12px; + font-size: 12px; + cursor: pointer; + transition: background .15s; +} +.btn-refresh:hover { background: rgba(0,111,255,.2); } + +/* ── Sections ────────────────────────────────────────────────────────── */ +.section { margin-bottom: 32px; } + +.section-title { + font-size: 16px; + font-weight: 700; + margin-bottom: 14px; + display: flex; + align-items: center; + gap: 8px; } -.copper-diagnostic { - border-color: #F59E0B; +.section-badge { + font-size: 11px; + font-weight: 600; + background: var(--red); + color: white; + padding: 2px 7px; + border-radius: 10px; } -.device-info { - display: flex; - flex-direction: column; - gap: 4px; +.section-badge:not(.badge-critical) { + background: var(--grey); } -.device-details { - font-size: 0.8em; - color: #666; +/* ── Topology diagram ────────────────────────────────────────────────── */ +.topology { + background: var(--card-bg); + border: 1px solid var(--border); + border-radius: var(--radius); + padding: 20px 16px 16px; + margin-bottom: 20px; + text-align: center; + box-shadow: var(--shadow); + overflow-x: auto; } -.diagnostic-details { - display: grid; - gap: 15px; - padding: 10px; +.topo-row { + display: flex; + justify-content: center; + gap: 16px; + flex-wrap: wrap; } -.status-group, .firmware-group, .interfaces-group { - display: flex; - flex-direction: column; - gap: 8px; +.topo-row-internet { margin-bottom: 4px; } +.topo-hosts-row { flex-wrap: wrap; gap: 12px; } + +.topo-connectors { + display: flex; + justify-content: center; + gap: 80px; + height: 20px; + margin: 2px 0; } -.interface-item { - display: flex; - align-items: center; - gap: 10px; +.topo-connectors.single { gap: 0; } +.topo-connectors.wide { gap: 60px; } + +.topo-line { + width: 2px; + height: 100%; + background: var(--border); } -.label { - font-weight: 500; - color: #666; +.topo-line-labeled { + position: relative; +} +.topo-line-labeled::after { + content: attr(data-link-label); + position: absolute; + left: 6px; + top: 50%; + transform: translateY(-50%); + font-size: 10px; + color: var(--text-dim); + white-space: nowrap; } -.value { - font-family: monospace; -} -.interface-header { - display: flex; - justify-content: space-between; - align-items: center; - padding: 10px; - cursor: pointer; - background: rgba(0,111,255,0.05); - border-radius: 8px; - margin-bottom: 5px; +.topo-node { + display: flex; + flex-direction: column; + align-items: center; + gap: 4px; + padding: 8px 14px; + border-radius: 8px; + border: 1.5px solid var(--border); + background: var(--grey-lt); + min-width: 100px; + font-size: 12px; + position: relative; + transition: border-color .2s; } -.interface-header:hover { - background: rgba(0,111,255,0.1); +.topo-internet { + border-color: var(--blue); + background: var(--blue-dim); + font-weight: 600; } -.interface-list { - max-height: 500px; - overflow-y: auto; - transition: max-height 0.3s ease-out; +.topo-switch { + border-color: var(--blue); + background: var(--blue-dim); } -.interface-list.collapsed { - max-height: 0; - overflow: hidden; +.topo-host { cursor: default; } + +.topo-icon { font-size: 16px; } + +.topo-label { + font-weight: 500; + font-size: 11px; + text-align: center; } -.interface-item { - display: grid; - grid-template-columns: 1fr 1fr auto; - padding: 8px; - border-bottom: 1px solid #eee; - align-items: center; +.topo-badge { + font-size: 10px; + padding: 2px 6px; + border-radius: 4px; + font-weight: 600; } -.port-status { - padding: 4px 8px; - border-radius: 4px; - font-size: 0.8em; - font-weight: 500; +.topo-badge-up { background: rgba(16,185,129,.15); color: var(--green); } +.topo-badge-down { background: rgba(239,68,68,.15); color: var(--red); } +.topo-badge-degraded { background: rgba(245,158,11,.15); color: var(--orange); } + +.topo-status-{{ 'up' }} { border-color: var(--green); } +.topo-status-down { border-color: var(--red); } +.topo-status-degraded { border-color: var(--orange); } + +.topo-status-up { border-color: var(--green); } +.topo-status-dot { + width: 8px; height: 8px; + border-radius: 50%; + background: var(--grey); + position: absolute; + top: 6px; right: 6px; } -.port-status.up { - background-color: #10B981; - color: white; +/* ── Host cards ──────────────────────────────────────────────────────── */ +.host-grid { + display: grid; + grid-template-columns: repeat(auto-fill, minmax(240px, 1fr)); + gap: 14px; } -.port-status.down { - background-color: #EF4444; - color: white; +.host-card { + background: var(--card-bg); + border: 1.5px solid var(--border); + border-radius: var(--radius); + padding: 14px; + box-shadow: var(--shadow); + transition: border-color .2s, box-shadow .2s; } -.expand-icon { - transition: transform 0.3s ease; +.host-card:hover { box-shadow: 0 4px 16px rgba(0,0,0,.1); } + +.host-card-up { border-left: 4px solid var(--green); } +.host-card-down { border-left: 4px solid var(--red); } +.host-card-degraded { border-left: 4px solid var(--orange); } + +.host-card-header { margin-bottom: 10px; } + +.host-name-row { + display: flex; + align-items: center; + gap: 7px; + margin-bottom: 4px; } -.collapsed + .expand-icon { - transform: rotate(-90deg); +.host-name { + font-weight: 700; + font-size: 14px; } -.port-speed { - font-family: monospace; - color: var(--secondary-color); -} -.metrics-list { - display: grid; - grid-template-columns: repeat(3, 1fr); - gap: 15px; - margin-top: 10px; +.host-meta { + display: flex; + gap: 8px; + align-items: center; } -.metric-item { - background: rgba(0,111,255,0.1); - padding: 10px; - border-radius: 8px; - text-align: center; -} -.online { - color: #10B981; +.host-ip { + font-family: var(--mono); + font-size: 11px; + color: var(--text-sub); } -.offline { - color: #EF4444; +.host-source { + font-size: 10px; + padding: 1px 6px; + border-radius: 4px; + font-weight: 600; + background: var(--grey-lt); + color: var(--text-sub); } -.interface-grid { - display: grid; - grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); - gap: 15px; +.source-prometheus { color: #E6522C; background: rgba(230,82,44,.1); } +.source-ping { color: var(--blue); background: var(--blue-dim); } + +.iface-list { + border-top: 1px solid var(--border); + padding-top: 8px; + margin-bottom: 10px; } -.metric-value { - font-family: monospace; - font-size: 1.2em; - color: var(--primary-color); +.iface-row { + display: flex; + align-items: center; + gap: 7px; + padding: 3px 0; +} + +.iface-name { + font-family: var(--mono); + font-size: 12px; + flex: 1; + color: var(--text); +} + +.iface-state { + font-size: 11px; + font-weight: 600; +} + +.state-up { color: var(--green); } +.state-down { color: var(--red); } + +.host-ping-note { + font-size: 11px; + color: var(--text-sub); + font-style: italic; + margin-bottom: 10px; + padding-top: 6px; + border-top: 1px solid var(--border); +} + +.host-actions { + border-top: 1px solid var(--border); + padding-top: 8px; +} + +/* ── Status dots ─────────────────────────────────────────────────────── */ +.host-status-dot, .iface-dot, .dot-up, .dot-down, .dot-degraded, .dot-unknown { + display: inline-block; + width: 10px; + height: 10px; + border-radius: 50%; + flex-shrink: 0; +} + +.dot-up, .host-status-dot.dot-up { background: var(--green); box-shadow: 0 0 0 2px rgba(16,185,129,.2); } +.dot-down, .host-status-dot.dot-down { background: var(--red); box-shadow: 0 0 0 2px rgba(239,68,68,.2); animation: pulse-red 2s infinite; } +.dot-degraded { background: var(--orange); box-shadow: 0 0 0 2px rgba(245,158,11,.2); } +.dot-unknown { background: var(--grey); } + +@keyframes pulse-red { + 0%,100% { box-shadow: 0 0 0 2px rgba(239,68,68,.2); } + 50% { box-shadow: 0 0 0 5px rgba(239,68,68,.4); } +} + +/* ── Badges ──────────────────────────────────────────────────────────── */ +.badge { + display: inline-block; + padding: 2px 8px; + border-radius: 6px; + font-size: 11px; + font-weight: 700; + text-transform: uppercase; + letter-spacing: .04em; +} + +.badge-critical { background: rgba(239,68,68,.12); color: var(--red); } +.badge-warning { background: rgba(245,158,11,.12); color: var(--orange); } +.badge-info { background: rgba(0,111,255,.1); color: var(--blue); } +.badge-ok { background: rgba(16,185,129,.12); color: var(--green); } +.badge-neutral { background: var(--grey-lt); color: var(--grey); } +.badge-suppressed { background: rgba(107,114,128,.12); color: var(--grey); font-size: 14px; padding: 0; } + +/* ── Tables ──────────────────────────────────────────────────────────── */ +.table-wrap { + background: var(--card-bg); + border: 1px solid var(--border); + border-radius: var(--radius); + box-shadow: var(--shadow); + overflow: hidden; +} + +.data-table { + width: 100%; + border-collapse: collapse; +} + +.data-table th { + background: var(--grey-lt); + padding: 10px 14px; + text-align: left; + font-size: 11px; + font-weight: 700; + color: var(--text-sub); + text-transform: uppercase; + letter-spacing: .06em; + border-bottom: 1px solid var(--border); + white-space: nowrap; +} + +.data-table td { + padding: 10px 14px; + border-bottom: 1px solid var(--border); + vertical-align: middle; +} + +.data-table tr:last-child td { border-bottom: none; } + +.data-table tr:hover td { background: rgba(0,111,255,.03); } + +.row-critical td { background: rgba(239,68,68,.04); } +.row-critical td:first-child { border-left: 3px solid var(--red); } + +.row-warning td { background: rgba(245,158,11,.04); } +.row-warning td:first-child { border-left: 3px solid var(--orange); } + +.row-resolved td { opacity: .6; } + +.data-table-sm td, .data-table-sm th { padding: 7px 12px; font-size: 12px; } + +.ts-cell { font-family: var(--mono); font-size: 11px; color: var(--text-sub); } +.desc-cell { max-width: 300px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; } +.ticket-link { font-family: var(--mono); font-weight: 600; } + +.empty-state { padding: 32px; text-align: center; color: var(--text-sub); } +.empty-row td { text-align: center; color: var(--text-sub); } + +/* ── Buttons ─────────────────────────────────────────────────────────── */ +.btn { + display: inline-flex; + align-items: center; + gap: 6px; + padding: 8px 16px; + border-radius: 6px; + border: none; + cursor: pointer; + font-size: 13px; + font-weight: 600; + transition: opacity .15s, background .15s; +} + +.btn:hover { opacity: .88; } +.btn:active { opacity: .75; } + +.btn-primary { background: var(--blue); color: white; } +.btn-secondary { background: var(--grey-lt); color: var(--text); border: 1px solid var(--border); } +.btn-danger { background: rgba(239,68,68,.1); color: var(--red); border: 1px solid rgba(239,68,68,.2); } +.btn-lg { padding: 10px 20px; font-size: 14px; } + +.btn-sm { + padding: 3px 8px; + font-size: 11px; + border-radius: 5px; + cursor: pointer; + border: none; + font-weight: 600; + transition: opacity .15s; +} + +.btn-suppress { + background: rgba(107,114,128,.1); + color: var(--grey); + border: 1px solid var(--border) !important; +} + +.btn-suppress:hover { background: rgba(107,114,128,.2); } + +.btn-danger.btn-sm { + background: rgba(239,68,68,.1); + color: var(--red); + border: 1px solid rgba(239,68,68,.2) !important; +} + +/* ── Modal ───────────────────────────────────────────────────────────── */ +.modal-overlay { + position: fixed; + inset: 0; + background: rgba(0,0,0,.45); + z-index: 100; + display: flex; + align-items: center; + justify-content: center; + backdrop-filter: blur(2px); +} + +.modal { + background: var(--card-bg); + border-radius: 12px; + box-shadow: 0 20px 60px rgba(0,0,0,.2); + width: 480px; + max-width: 95vw; + padding: 24px; +} + +.modal-header { + display: flex; + justify-content: space-between; + align-items: center; + margin-bottom: 20px; +} + +.modal-header h3 { font-size: 17px; font-weight: 700; } + +.modal-close { + background: none; + border: none; + cursor: pointer; + font-size: 18px; + color: var(--text-sub); + line-height: 1; + padding: 2px 6px; + border-radius: 4px; + transition: background .15s; +} + +.modal-close:hover { background: var(--grey-lt); } + +.modal-actions { + display: flex; + gap: 10px; + justify-content: flex-end; + margin-top: 20px; + padding-top: 16px; + border-top: 1px solid var(--border); +} + +/* ── Forms ───────────────────────────────────────────────────────────── */ +.form-card { + background: var(--card-bg); + border: 1px solid var(--border); + border-radius: var(--radius); + padding: 20px; + box-shadow: var(--shadow); +} + +.form-row { + display: flex; + gap: 16px; + flex-wrap: wrap; + margin-bottom: 14px; +} + +.form-row-align { align-items: flex-end; } + +.form-group { display: flex; flex-direction: column; gap: 5px; min-width: 180px; flex: 1; } +.form-group-wide { flex: 3; } +.form-group-submit { flex: 0 0 auto; min-width: unset; } + +.form-group label { + font-size: 12px; + font-weight: 600; + color: var(--text-sub); + text-transform: uppercase; + letter-spacing: .05em; +} + +.form-group input, +.form-group select { + padding: 8px 10px; + border: 1px solid var(--border); + border-radius: 6px; + font-size: 13px; + background: white; + color: var(--text); + transition: border-color .15s, box-shadow .15s; +} + +.form-group input:focus, +.form-group select:focus { + outline: none; + border-color: var(--blue); + box-shadow: 0 0 0 3px var(--blue-dim); +} + +.form-hint { font-size: 11px; color: var(--text-sub); margin-top: 2px; } +.required { color: var(--red); } + +/* ── Duration pills ──────────────────────────────────────────────────── */ +.duration-pills { + display: flex; + gap: 6px; + flex-wrap: wrap; + margin-bottom: 6px; +} + +.pill { + padding: 5px 12px; + border-radius: 20px; + border: 1.5px solid var(--border); + background: white; + font-size: 12px; + font-weight: 600; + cursor: pointer; + color: var(--text-sub); + transition: all .15s; +} + +.pill:hover { border-color: var(--blue); color: var(--blue); } + +.pill.active, +.pill-manual.active { + background: var(--blue); + border-color: var(--blue); + color: white; +} + +/* ── Targets grid (suppressions page) ───────────────────────────────── */ +.targets-grid { + display: grid; + grid-template-columns: repeat(auto-fill, minmax(200px, 1fr)); + gap: 12px; +} + +.target-card { + background: var(--card-bg); + border: 1px solid var(--border); + border-radius: 8px; + padding: 12px; +} + +.target-name { + font-weight: 700; + font-size: 14px; + margin-bottom: 4px; +} + +.target-type { + font-size: 11px; + color: var(--text-sub); + margin-bottom: 8px; +} + +.target-ifaces { + display: flex; + flex-wrap: wrap; + gap: 4px; +} + +.iface-chip { + font-family: var(--mono); + font-size: 10px; + background: var(--grey-lt); + border-radius: 4px; + padding: 1px 6px; + color: var(--text-sub); +} + +/* ── Card (generic) ──────────────────────────────────────────────────── */ +.card { + background: var(--card-bg); + border: 1px solid var(--border); + border-radius: var(--radius); + padding: 20px; + box-shadow: var(--shadow); +} + +/* ── Toast notifications ─────────────────────────────────────────────── */ +.toast-container { + position: fixed; + bottom: 24px; + right: 24px; + z-index: 200; + display: flex; + flex-direction: column; + gap: 10px; +} + +.toast { + padding: 12px 20px; + border-radius: 8px; + font-size: 13px; + font-weight: 600; + box-shadow: 0 4px 16px rgba(0,0,0,.15); + animation: slide-in .2s ease; +} + +.toast-success { background: #065f46; color: white; } +.toast-error { background: #7f1d1d; color: white; } + +@keyframes slide-in { + from { transform: translateX(120%); opacity: 0; } + to { transform: translateX(0); opacity: 1; } +} + +/* ── Responsive ──────────────────────────────────────────────────────── */ +@media (max-width: 768px) { + .host-grid { grid-template-columns: 1fr; } + .topology { display: none; } + .form-row { flex-direction: column; } + .status-bar { flex-direction: column; align-items: flex-start; } } diff --git a/templates/base.html b/templates/base.html new file mode 100644 index 0000000..78d1d90 --- /dev/null +++ b/templates/base.html @@ -0,0 +1,36 @@ + + + + + + {% block title %}GANDALF{% endblock %} + + + + + +
+ {% block content %}{% endblock %} +
+ + + {% block scripts %}{% endblock %} + + diff --git a/templates/index.html b/templates/index.html index 23aac03..13d4d18 100644 --- a/templates/index.html +++ b/templates/index.html @@ -1,69 +1,289 @@ - - - - GANDALF - Network Monitor - - - - -
-
-

GANDALF (Global Advanced Network Detection And Link Facilitator)

-

Ubiquiti Network Management Dashboard

-
+{% extends "base.html" %} +{% block title %}Dashboard – GANDALF{% endblock %} -
-
-

Network Overview

-
- {%- for device in devices %} -
- -
- {{ device.name }} - {{ device.ip }} - {{ device.type }} ({{ device.connection_type }}) - {% if device.critical %} - Critical - {% endif %} -
-
- {%- endfor %} -
-
- - - -
-

System Health

-
- {%- for device in devices %} -
-

{{ device.name }}

-
-
-
-
-
-
- {%- endfor %} -
-
-
+{% block content %} + + +
+
+ {% if summary.critical %} + ⬤ {{ summary.critical }} Critical + {% endif %} + {% if summary.warning %} + ⬤ {{ summary.warning }} Warning + {% endif %} + {% if not summary.critical and not summary.warning %} + ✔ All systems nominal + {% endif %} +
+
+ Last check: {{ last_check }} + +
+
+ + +
+

Network Hosts

+ + +
+
+
🌐 Internet
- - - \ No newline at end of file +
+
+
+
+
+ + UDM-Pro + +
+
+
+
+
+
+
+ + Agg Switch + +
+
+
+
+
+
+
+ + PoE Switch + +
+
+
+ {% for name in snapshot.hosts %} +
+ {% endfor %} +
+
+ {% for name, host in snapshot.hosts.items() %} +
+ + {{ name }} + {{ host.status }} +
+ {% endfor %} +
+
+ + +
+ {% for name, host in snapshot.hosts.items() %} + {% set suppressed = suppressions | selectattr('target_name', 'equalto', name) | list %} +
+
+
+ + {{ name }} + {% if suppressed %} + 🔕 + {% endif %} +
+
+ {{ host.ip }} + {{ host.source }} +
+
+ + {% if host.interfaces %} +
+ {% for iface, state in host.interfaces.items() | sort %} +
+ + {{ iface }} + {{ state }} +
+ {% endfor %} +
+ {% else %} +
Monitored via ping only
+ {% endif %} + +
+ +
+
+ {% else %} +

No host data yet – monitor is initializing.

+ {% endfor %} +
+
+ + +{% if snapshot.unifi %} +
+

UniFi Devices

+
+ + + + + + + + + + + + + {% for d in snapshot.unifi %} + + + + + + + + + {% endfor %} + +
StatusNameTypeModelIPActions
+ + {{ 'Online' if d.connected else 'Offline' }} + {{ d.name }}{{ d.type }}{{ d.model }}{{ d.ip }} + {% if not d.connected %} + + {% endif %} +
+
+
+{% endif %} + + +
+

+ Active Alerts + {% if summary.critical or summary.warning %} + {{ (summary.critical or 0) + (summary.warning or 0) }} open + {% endif %} +

+
+ {% if events %} + + + + + + + + + + + + + + + + {% for e in events %} + {% if e.severity != 'info' %} + + + + + + + + + + + + {% endif %} + {% else %} + + + + {% endfor %} + +
SeverityTypeTargetDetailDescriptionFirst SeenFailuresTicketActions
{{ e.severity }}{{ e.event_type | replace('_', ' ') }}{{ e.target_name }}{{ e.target_detail or '–' }}{{ e.description | truncate(60) }}{{ e.first_seen }}{{ e.consecutive_failures }} + {% if e.ticket_id %} + #{{ e.ticket_id }} + {% else %}–{% endif %} + + +
No active alerts ✔
+ {% else %} +

No active alerts ✔

+ {% endif %} +
+
+ + + + +{% endblock %} + +{% block scripts %} + +{% endblock %} diff --git a/templates/suppressions.html b/templates/suppressions.html new file mode 100644 index 0000000..ad5b922 --- /dev/null +++ b/templates/suppressions.html @@ -0,0 +1,252 @@ +{% extends "base.html" %} +{% block title %}Suppressions – GANDALF{% endblock %} + +{% block content %} + + + + +
+

Create Suppression

+
+
+
+
+ + +
+ +
+ + +
+ + +
+ +
+
+ + +
+
+ +
+
+ +
+ + + + + +
+ +
+ This suppression will persist until manually removed. +
+
+
+ +
+
+
+
+
+ + +
+

+ Active Suppressions + {{ active | length }} +

+ {% if active %} +
+ + + + + + + + + + + + + + + {% for s in active %} + + + + + + + + + + + {% endfor %} + +
TypeTargetDetailReasonByCreatedExpiresActions
{{ s.target_type }}{{ s.target_name or 'all' | safe }}{{ s.target_detail or '–' }}{{ s.reason }}{{ s.suppressed_by }}{{ s.created_at }} + {% if s.expires_at %}{{ s.expires_at }}{% else %}manual{% endif %} + + +
+
+ {% else %} +

No active suppressions.

+ {% endif %} +
+ + +
+

History {{ history | length }}

+ {% if history %} +
+ + + + + + + + + + + + + + + {% for s in history %} + + + + + + + + + + + {% endfor %} + +
TypeTargetDetailReasonByCreatedExpiresActive
{{ s.target_type }}{{ s.target_name or 'all' }}{{ s.target_detail or '–' }}{{ s.reason }}{{ s.suppressed_by }}{{ s.created_at }} + {% if s.expires_at %}{{ s.expires_at }}{% else %}manual{% endif %} + + {% if s.active %} + Yes + {% else %} + No + {% endif %} +
+
+ {% else %} +

No suppression history yet.

+ {% endif %} +
+ + +
+

Available Targets

+
+ {% for name, host in snapshot.hosts.items() %} +
+
{{ name }}
+
Proxmox Host
+ {% if host.interfaces %} +
+ {% for iface in host.interfaces.keys() | sort %} + {{ iface }} + {% endfor %} +
+ {% else %} +
ping-only
+ {% endif %} +
+ {% endfor %} +
+
+ +{% endblock %} + +{% block scripts %} + +{% endblock %}