|
|
35a16a1793
|
Fix reallocated sector scoring - drives with bad sectors now rank correctly
**Problem**: osd.28 with 16 reallocated sectors only ranked #7 with score 40.8
This is a CRITICAL failing drive that should rank just below failed SMART reads.
**Changes**:
- Reallocated sectors now use tiered penalties:
* 10+ sectors: -95 points (health = 5/100) - DRIVE FAILING
* 5-9 sectors: -85 points (health = 15/100) - CRITICAL
* 1-4 sectors: -70 points (health = 30/100) - SERIOUS
- Added critical_issues detection for sector problems
- Critical issues get +20 bonus (large) or +25 (small) in scoring
- Updated issue text to "DRIVE FAILING" for clarity
**Expected Result**:
- osd.28 will now score ~96/100 and rank #7 (right after 6 failed SMART)
- Any drive with reallocated/pending/uncorrectable sectors gets top priority
- Matches priority: Failed SMART > Critical sectors > Small failing > Rest
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-01-06 15:08:46 -05:00 |
|
|
|
1848b71c2a
|
Optimize OSD analyzer: prioritize failing drives and improve SMART collection
Major improvements to scoring and data collection:
**Scoring Changes:**
- Failed SMART reads now return 0/100 health (was 50/100)
- Critical health issues get much higher penalties:
* Reallocated sectors: -50 pts, 5x multiplier (was -20, 2x)
* Pending sectors: -60 pts, 10x multiplier (was -25, 5x)
* Uncorrectable sectors: -70 pts, 15x multiplier (was -30, 5x)
* NVMe media errors: -60 pts, 10x multiplier (was -25, 5x)
- Revised weights: 80% health, 15% capacity, 5% resilience (was 60/30/10)
- Added priority bonuses:
* Failed SMART + small drive (<5TB): +30 points
* Failed SMART alone: +20 points
* Health issues + small drive: +15 points
**Priority Order Now Enforced:**
1. Failed SMART drives (score 90-100)
2. Small drives beginning to fail (70-85)
3. Small healthy drives (40-60)
4. Large failing drives (60-75)
**Enhanced SMART Collection:**
- Added metadata.devices field parsing
- Enhanced dm-device and /dev/mapper/ resolution
- Added ceph-volume lvm list fallback
- Retry logic with 3 command variations per device
- Try with/without sudo, different device flags
**Expected Impact:**
- osd.28 with reallocated sectors jumps from #14 to top 3
- SMART collection failures should drop from 6 to 0-2
- All failing drives rank above healthy drives regardless of size
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-01-06 15:05:25 -05:00 |
|
|
|
3b15377821
|
seperate smartctl depending on device class
|
2025-12-22 18:23:06 -05:00 |
|
|
|
c315fa3efc
|
Updated readme again
|
2025-12-22 18:15:46 -05:00 |
|
|
|
89037ed93f
|
Merge branch 'main' of code.lotusguild.org:LotusGuild/analyzeOSDs
|
2025-12-22 18:12:42 -05:00 |
|
|
|
c87c13eb1f
|
revert 9793f8bcbe
revert Updated quick execute commands
|
2025-12-22 18:10:00 -05:00 |
|
|
|
c252dbcdc4
|
resolves /dev/dm-* now
|
2025-12-22 18:05:32 -05:00 |
|
|
|
9793f8bcbe
|
Updated quick execute commands
|
2025-12-22 17:51:33 -05:00 |
|
|
|
1610aa2606
|
removed pg and latency counters
|
2025-12-22 17:14:02 -05:00 |
|
|
|
db757345fb
|
Better patterns and error handling
|
2025-12-22 17:08:13 -05:00 |
|
|
|
e12b53238e
|
Pushed malformed code, whoops
|
2025-12-22 17:00:45 -05:00 |
|
|
|
559ed9fc94
|
adds /dev in front of block devices
|
2025-12-22 16:57:53 -05:00 |
|
|
|
43d35feb46
|
Enables ssh to all hosts to gather smart data
|
2025-12-22 16:50:04 -05:00 |
|
|
|
a861276013
|
Created README
|
2025-12-22 16:46:02 -05:00 |
|
|
|
7dab2591b1
|
First test
|
2025-12-22 16:40:19 -05:00 |
|
|
|
983b1f1c29
|
first commit
|
2025-12-22 16:39:49 -05:00 |
|