Variable descriptions for drive tickets

This commit is contained in:
2025-03-03 19:14:29 -05:00
parent 0bf29c44e8
commit 2be4f9072c

View File

@ -150,90 +150,185 @@ class SystemHealthMonitor:
# Add SMART attribute explanations # Add SMART attribute explanations
SMART_DESCRIPTIONS = { SMART_DESCRIPTIONS = {
'Reported_Uncorrect': """
Number of errors that could not be recovered using hardware ECC.
Impact:
- Indicates permanent data loss in affected sectors
- High correlation with drive hardware failure
- Critical reliability indicator
Recommended Actions:
1. Backup critical data immediately
2. Check drive logs for related errors
3. Plan for drive replacement
4. Monitor for error count increases
""",
'Reallocated_Sector_Ct': """ 'Reallocated_Sector_Ct': """
Number of sectors that have been reallocated due to errors. Number of sectors that have been reallocated due to errors.
Impact:
- High counts indicate degrading media - High counts indicate degrading media
- Each reallocation uses one of the drive's limited spare sectors - Each reallocation uses one of the drive's limited spare sectors
- Rapid increases suggest accelerating drive wear - Rapid increases suggest accelerating drive wear
Recommended Actions:
1. Monitor rate of increase
2. Check drive temperature
3. Plan replacement if count grows rapidly
""", """,
'Current_Pending_Sector': """ 'Current_Pending_Sector': """
Sectors waiting to be reallocated due to read/write errors. Sectors waiting to be reallocated due to read/write errors.
Impact:
- Indicates potentially unstable sectors - Indicates potentially unstable sectors
- May result in data loss if unrecoverable - May result in data loss if unrecoverable
- Should be monitored for increases - Should be monitored for increases
Recommended Actions:
1. Backup affected files
2. Run extended SMART tests
3. Monitor for conversion to reallocated sectors
""", """,
'Offline_Uncorrectable': """ 'Offline_Uncorrectable': """
Count of uncorrectable errors detected during offline data collection. Count of uncorrectable errors detected during offline data collection.
Impact:
- Direct indicator of media reliability issues - Direct indicator of media reliability issues
- May affect data integrity - May affect data integrity
- High values suggest drive replacement needed - High values suggest drive replacement needed
""",
'Reported_Uncorrect': """ Recommended Actions:
Number of errors that could not be recovered using hardware ECC. 1. Run extended SMART tests
- Critical indicator of drive health 2. Check drive logs
- Directly impacts data reliability 3. Plan replacement if count is increasing
- Any non-zero value requires attention
""", """,
'Spin_Retry_Count': """ 'Spin_Retry_Count': """
Number of spin start retry attempts. Number of spin start retry attempts.
Impact:
- Indicates potential motor or bearing issues - Indicates potential motor or bearing issues
- May predict imminent mechanical failure - May predict imminent mechanical failure
- Increasing values suggest degrading drive health - Increasing values suggest degrading drive health
Recommended Actions:
1. Monitor for rapid increases
2. Check drive temperature
3. Plan replacement if count grows rapidly
""", """,
'Power_On_Hours': """ 'Power_On_Hours': """
Total number of hours the device has been powered on. Total number of hours the device has been powered on.
Impact:
- Normal aging metric - Normal aging metric
- Used to gauge overall drive lifetime - Used to gauge overall drive lifetime
- Compare against manufacturer's MTBF rating - Compare against manufacturer's MTBF rating
Recommended Actions:
1. Compare to warranty period
2. Plan replacement if approaching rated lifetime
""", """,
'Media_Wearout_Indicator': """ 'Media_Wearout_Indicator': """
Percentage of drive's rated life remaining (SSDs). Percentage of drive's rated life remaining (SSDs).
Impact:
- 100 indicates new drive - 100 indicates new drive
- 0 indicates exceeded rated writes - 0 indicates exceeded rated writes
- Critical for SSD lifecycle management - Critical for SSD lifecycle management
Recommended Actions:
1. Plan replacement below 20%
2. Monitor write workload
3. Consider workload redistribution
""", """,
'Temperature_Celsius': """ 'Temperature_Celsius': """
Current drive temperature. Current drive temperature.
Impact:
- High temperatures accelerate wear - High temperatures accelerate wear
- Optimal range: 20-45°C - Optimal range: 20-45°C
- Sustained high temps reduce lifespan - Sustained high temps reduce lifespan
Recommended Actions:
1. Check system cooling
2. Verify airflow
3. Monitor for sustained high temperatures
""", """,
'Available_Spare': """ 'Available_Spare': """
Percentage of spare blocks remaining (SSDs). Percentage of spare blocks remaining (SSDs).
Impact:
- Critical for SSD endurance - Critical for SSD endurance
- Low values indicate approaching end-of-life - Low values indicate approaching end-of-life
- Rapid decreases suggest excessive writes - Rapid decreases suggest excessive writes
Recommended Actions:
1. Plan replacement if below 20%
2. Monitor write patterns
3. Consider workload changes
""", """,
'Program_Fail_Count': """ 'Program_Fail_Count': """
Number of flash program operation failures. Number of flash program operation failures.
Impact:
- Indicates NAND cell reliability - Indicates NAND cell reliability
- Important for SSD health assessment - Important for SSD health assessment
- Increasing values suggest flash degradation - Increasing values suggest flash degradation
Recommended Actions:
1. Monitor rate of increase
2. Check firmware updates
3. Plan replacement if rapidly increasing
""", """,
'Erase_Fail_Count': """ 'Erase_Fail_Count': """
Number of flash erase operation failures. Number of flash erase operation failures.
Impact:
- Related to NAND block health - Related to NAND block health
- Critical for SSD reliability - Critical for SSD reliability
- High counts suggest failing flash blocks - High counts suggest failing flash blocks
Recommended Actions:
1. Monitor count increases
2. Check firmware version
3. Plan replacement if count is high
""",
'Load_Cycle_Count': """
Number of power cycles and head load/unload events.
Impact:
- Normal operation metric
- High counts may indicate power management issues
- Compare against rated cycles (typically 600k-1M)
Recommended Actions:
1. Review power management settings
2. Monitor rate of increase
3. Plan replacement near rated limit
""",
'Wear_Leveling_Count': """
SSD block erase distribution metric.
Impact:
- Indicates wear pattern uniformity
- Higher values show more balanced wear
- Critical for SSD longevity
Recommended Actions:
1. Monitor trend over time
2. Compare with similar drives
3. Check workload distribution
""" """
} }
# Add relevant SMART descriptions
for attr in SMART_DESCRIPTIONS:
if attr in issue:
description += f"\n{attr}:\n{SMART_DESCRIPTIONS[attr]}\n"
if "SMART" in issue: if "SMART" in issue:
description += """ description += """
SMART (Self-Monitoring, Analysis, and Reporting Technology) issues indicate potential drive reliability problems. SMART (Self-Monitoring, Analysis, and Reporting Technology) Attribute Details:
- Reallocated sectors indicate bad blocks that have been remapped - Possible drive failure!
- Pending sectors are potentially failing blocks waiting to be remapped
- Uncorrectable errors indicate data that could not be read
""" """
if "Temperature" in issue: if "Temperature" in issue: