data retention and large refactor of codebase

This commit is contained in:
2025-09-03 12:43:16 -04:00
parent 3d902620b0
commit bc73a691df
2 changed files with 461 additions and 145 deletions

View File

@ -13,6 +13,10 @@ A robust system health monitoring daemon that tracks hardware status and automat
- Configurable thresholds and monitoring parameters
- Dry-run mode for testing
- Systemd integration for automated daily checks
- LXC container storage monitoring
- Historical trend analysis for predictive failure detection
- Manufacturer-specific SMART attribute interpretation
- ECC memory error detection
## Installation
@ -53,19 +57,33 @@ python3 hwmonDaemon.py
The daemon monitors:
- Disk usage (warns at 80%, critical at 90%)
- LXC storage usage (warns at 80%, critical at 90%)
- Memory usage (warns at 80%)
- CPU usage (warns at 80%)
- CPU usage (warns at 95%)
- Network connectivity to management (10.10.10.1) and Ceph (10.10.90.1) networks
- SMART status of physical drives
- SMART status of physical drives with manufacturer-specific profiles
- Temperature monitoring (warns at 65°C)
- Automatic duplicate ticket prevention
- Enhanced logging with debug capabilities
## Data Storage
The daemon creates and maintains:
- **Log Directory**: `/var/log/hwmonDaemon/`
- **Historical SMART Data**: JSON files for trend analysis
- **Data Retention**: 30 days of historical monitoring data
## Ticket Creation
The daemon automatically creates tickets with:
- Standardized titles including hostname, hardware type, and scope
- Detailed descriptions of detected issues
- Detailed descriptions of detected issues with drive specifications
- Priority levels based on severity (P2-P4)
- Proper categorization and status tracking
- Executive summaries and technical analysis
## Dependencies
@ -73,7 +91,17 @@ The daemon automatically creates tickets with:
- Required Python packages:
- psutil
- requests
- System tools:
- smartmontools (for SMART disk monitoring)
- nvme-cli (for NVMe drive monitoring)
## Excluded Paths
The following paths are automatically excluded from monitoring:
- `/media/*`
- `/mnt/pve/mediafs/*`
- `/opt/metube_downloads`
- Pattern-based exclusions for media and download directories
## Service Configuration
@ -83,6 +111,19 @@ The daemon runs:
- As root user for hardware access
- With automatic restart on failure
## Troubleshooting
```bash
# View service logs
sudo journalctl -u hwmon.service -f
# Check service status
sudo systemctl status hwmon.timer
# Manual test run
python3 hwmonDaemon.py --dry-run
```
## Security Note
Ensure proper network security measures are in place as the service downloads and executes code from a specified URL.