- lint.yml: add notify-failure Matrix alert job
- test.yml: new workflow running pytest with pytest-cov for coverage
- .coveragerc: omit tests and site-packages from coverage
- .gitignore: ignore __pycache__ and .pyc files
- tests/test_hwmon.py: 49 unit tests covering SystemHealthMonitor
(temperature parsing, service monitoring, disk usage, metric collection,
dry run behaviour); uses unittest.mock to isolate from env/filesystem
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add comprehensive Ceph cluster health monitoring
- Check cluster health status (HEALTH_OK/WARN/ERR)
- Monitor cluster usage with configurable thresholds
- Track OSD status (up/down) per node
- Separate cluster-wide vs node-specific issues
- Cluster-wide ticket deduplication
- Add [cluster-wide] scope tag for Ceph issues
- Cluster-wide issues deduplicate across all nodes
- Node-specific issues (OSD down) include hostname
- Add Prometheus metrics export
- export_prometheus_metrics() method
- write_prometheus_metrics() for textfile collector
- --metrics CLI flag to output metrics to stdout
- --export-json CLI flag to export health report as JSON
- Add Grafana dashboard template (grafana-dashboard.json)
- Add .gitignore
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>