Commit Graph

15 Commits

Author SHA1 Message Date
1848b71c2a Optimize OSD analyzer: prioritize failing drives and improve SMART collection
Major improvements to scoring and data collection:

**Scoring Changes:**
- Failed SMART reads now return 0/100 health (was 50/100)
- Critical health issues get much higher penalties:
  * Reallocated sectors: -50 pts, 5x multiplier (was -20, 2x)
  * Pending sectors: -60 pts, 10x multiplier (was -25, 5x)
  * Uncorrectable sectors: -70 pts, 15x multiplier (was -30, 5x)
  * NVMe media errors: -60 pts, 10x multiplier (was -25, 5x)
- Revised weights: 80% health, 15% capacity, 5% resilience (was 60/30/10)
- Added priority bonuses:
  * Failed SMART + small drive (<5TB): +30 points
  * Failed SMART alone: +20 points
  * Health issues + small drive: +15 points

**Priority Order Now Enforced:**
1. Failed SMART drives (score 90-100)
2. Small drives beginning to fail (70-85)
3. Small healthy drives (40-60)
4. Large failing drives (60-75)

**Enhanced SMART Collection:**
- Added metadata.devices field parsing
- Enhanced dm-device and /dev/mapper/ resolution
- Added ceph-volume lvm list fallback
- Retry logic with 3 command variations per device
- Try with/without sudo, different device flags

**Expected Impact:**
- osd.28 with reallocated sectors jumps from #14 to top 3
- SMART collection failures should drop from 6 to 0-2
- All failing drives rank above healthy drives regardless of size

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 15:05:25 -05:00
3b15377821 seperate smartctl depending on device class 2025-12-22 18:23:06 -05:00
c315fa3efc Updated readme again 2025-12-22 18:15:46 -05:00
89037ed93f Merge branch 'main' of code.lotusguild.org:LotusGuild/analyzeOSDs 2025-12-22 18:12:42 -05:00
c87c13eb1f revert 9793f8bcbe
revert Updated quick execute commands
2025-12-22 18:10:00 -05:00
c252dbcdc4 resolves /dev/dm-* now 2025-12-22 18:05:32 -05:00
9793f8bcbe Updated quick execute commands 2025-12-22 17:51:33 -05:00
1610aa2606 removed pg and latency counters 2025-12-22 17:14:02 -05:00
db757345fb Better patterns and error handling 2025-12-22 17:08:13 -05:00
e12b53238e Pushed malformed code, whoops 2025-12-22 17:00:45 -05:00
559ed9fc94 adds /dev in front of block devices 2025-12-22 16:57:53 -05:00
43d35feb46 Enables ssh to all hosts to gather smart data 2025-12-22 16:50:04 -05:00
a861276013 Created README 2025-12-22 16:46:02 -05:00
7dab2591b1 First test 2025-12-22 16:40:19 -05:00
983b1f1c29 first commit 2025-12-22 16:39:49 -05:00