Fix critical reliability and security issues in hwmonDaemon

Critical fixes implemented:
- Add 10MB storage limit with automatic cleanup of old history files
- Add file locking (fcntl) to prevent race conditions in history writes
- Disable SMART monitoring for unreliable Ridata drives
- Fix bare except clause in _read_ecc_count() to properly catch errors
- Add timeouts to all network and subprocess calls (10s for API, 30s for subprocess)
- Fix unchecked regex in ticket creation to prevent AttributeError
- Add JSON decode error handling for ticket API responses

Service configuration improvements:
- hwmon.timer: Reduce jitter from 300s to 60s, add Persistent=true
- hwmon.service: Add Restart=on-failure, TimeoutStartSec=300, logging to journal

These changes improve reliability, prevent hung processes, eliminate race
conditions, and add proper error handling throughout the daemon.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-06 16:55:48 -05:00
parent 0577c7fc1b
commit fe832c42f3
3 changed files with 170 additions and 76 deletions

View File

@@ -1,9 +1,10 @@
[Unit]
Description=Run System Health Monitoring Daemon Daily
Description=Run System Health Monitoring Daemon Hourly
[Timer]
OnCalendar=hourly
RandomizedDelaySec=300
RandomizedDelaySec=60
Persistent=true
[Install]
WantedBy=timers.target