Added intelligent categorization to match tickets with correct category and type
instead of defaulting everything to Hardware/Problem.
Changes:
- Added TICKET_CATEGORIES and TICKET_TYPES mappings for API consistency
- Created _categorize_issue() method to determine proper classification:
Hardware Issues:
- SMART/drive/disk errors → Hardware + Incident (critical/failed)
- SMART warnings → Hardware + Problem (needs investigation)
Software Issues:
- LXC/container/storage usage/CPU → Software category
- Critical levels → Software + Incident (service degradation)
- Warning levels → Software + Problem (preventive investigation)
Network Issues:
- Network failures/unreachable → Network + Incident
- Network warnings → Network + Problem
- Updated ticket creation to use _categorize_issue() and _determine_ticket_priority()
- Tickets now have correct tags: [incident] vs [problem] instead of always [maintenance]
- Category field in API payload now matches issue type (Hardware/Software/Network)
- Type field in API payload now reflects actual situation (Incident/Problem/Task)
Examples:
- "LXC storage usage >80%" → Software + Problem
- "Critical SMART errors" → Hardware + Incident
- "High CPU usage" → Software + Problem
- "Network unreachable" → Network + Incident
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed all box alignment issues and improved visual consistency:
- Standardized box width to 78 chars across all sections
- Unified field width calculations (62 chars for values)
- Fixed executive summary box with proper dynamic width
- Fixed drive specifications box alignment
- Fixed drive timeline box with proper field widths
- Fixed SMART status box and improved temperature handling (None check)
- Fixed SMART attributes box with consistent widths
- Improved partition boxes:
- 50-char usage meter (2% per block) instead of 20-char
- Added percentage display next to meter
- Truncate long mountpoints in header to prevent overflow
- Consistent field widths across all fields
- Fixed firmware alerts box alignment
All boxes now use consistent Unicode box-drawing characters (┏━┓┃┗┛│)
with proper width calculations for perfect alignment.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem: Drive capacity was being extracted but never inserted into ticket titles.
The drive_size variable was calculated from drive details but omitted from the
ticket_title string construction.
Solution: Added drive_size to ticket title format between category and issue.
Example ticket titles now show:
- Before: "[hostname][auto][hardware]Drive /dev/sda has SMART issues..."
- After: "[hostname][auto][hardware][16.0 TB] Drive /dev/sda has SMART issues..."
This makes it easier to identify which drives need attention at a glance.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive manual execution section with both:
- Direct execution from local file (python3 hwmonDaemon.py)
- Remote execution matching systemd service (one-liner download+exec)
Both modes include dry-run and normal execution examples for testing
and production use.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem: Seagate drives were triggering tickets for "Critical Seek_Error_Rate"
and "Critical Command_Timeout" even though these are operation counters used by
the manufacturer, not actual errors.
Solution: Added filtering in _detect_issues() method to skip known manufacturer
operation counters:
- Seek_Error_Rate (Seagate/WD operation counter)
- Command_Timeout (OOS/Seagate operation counter)
- Raw_Read_Error_Rate (Seagate/WD operation counter)
These attributes are already correctly excluded from monitoring in manufacturer
profiles, but were still appearing in smart_issues list. This fix prevents them
from creating tickets while still catching legitimate SMART errors.
Changes:
- hwmonDaemon.py:1351-1378 - Added operation counter filtering in _detect_issues()
- Added debug logging when filtering manufacturer counters
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Critical fixes implemented:
- Add 10MB storage limit with automatic cleanup of old history files
- Add file locking (fcntl) to prevent race conditions in history writes
- Disable SMART monitoring for unreliable Ridata drives
- Fix bare except clause in _read_ecc_count() to properly catch errors
- Add timeouts to all network and subprocess calls (10s for API, 30s for subprocess)
- Fix unchecked regex in ticket creation to prevent AttributeError
- Add JSON decode error handling for ticket API responses
Service configuration improvements:
- hwmon.timer: Reduce jitter from 300s to 60s, add Persistent=true
- hwmon.service: Add Restart=on-failure, TimeoutStartSec=300, logging to journal
These changes improve reliability, prevent hung processes, eliminate race
conditions, and add proper error handling throughout the daemon.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>