Add Ceph cluster monitoring and Prometheus metrics export

- Add comprehensive Ceph cluster health monitoring
  - Check cluster health status (HEALTH_OK/WARN/ERR)
  - Monitor cluster usage with configurable thresholds
  - Track OSD status (up/down) per node
  - Separate cluster-wide vs node-specific issues

- Cluster-wide ticket deduplication
  - Add [cluster-wide] scope tag for Ceph issues
  - Cluster-wide issues deduplicate across all nodes
  - Node-specific issues (OSD down) include hostname

- Add Prometheus metrics export
  - export_prometheus_metrics() method
  - write_prometheus_metrics() for textfile collector
  - --metrics CLI flag to output metrics to stdout
  - --export-json CLI flag to export health report as JSON

- Add Grafana dashboard template (grafana-dashboard.json)
- Add .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

This commit is contained in:

Jared Vititoe

2026-01-17 15:54:16 -05:00

parent 3322c5878a

commit 0f8918fb8b

3 changed files with 956 additions and 13 deletions

.gitignore

View File

@@ -0,0 +1,2 @@
 .claude
 settings.local.json