- Remove `local` from max_parallel_jobs/job_count (not inside a function)
- Document storage-01 physical layout: mobo SATA ports, HBA Mini-SAS HD
ports C0-C3, U.2 NVMe serial numbers
Ref #25
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add bash 4.2+ version check since script uses declare -g -A
- Add cleanup trap (EXIT/INT/TERM) for SMART_CACHE_DIR temp directory
- Sanitize hostname to strip unexpected characters
- Limit parallel SMART collection to 10 concurrent jobs
Fixes #25
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ceph-volume lvm list output varies the number of trailing equals
signs based on OSD number length:
- Single digit: "====== osd.5 =======" (7 equals)
- Double digit: "====== osd.19 ======" (6 equals)
Changed regex to require exactly 6 trailing equals, which matches
both formats.
Fixes: #17
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SMART output for Temperature_Celsius often includes extra sensor data
in parentheses like "26 (0 14 0 0 0)". The previous awk command was
finding "0" from the parenthetical instead of the actual temperature.
Now strips parenthetical content with sed before extracting the last
numeric value.
Fixes: #11
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added support for SAS drive temperature format "Current Drive Temperature:"
and made temperature extraction more robust by:
- Removing ^ anchor that was preventing matches with leading whitespace
- Using awk to find the first numeric value in the line
- Adding explicit SAS drive temperature format handling
Fixes: #11
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The "block device" line in ceph-volume output shows LVM paths like
ceph-xxx/osd-block-xxx, not physical device names. Changed to parse
the "devices" line which contains the actual physical device path
like /dev/sda.
Also reset current_osd after match to avoid duplicate matches.
Fixes: #17
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The smartctl output has leading whitespace before field names:
"Rotation Rate: 7200 rpm"
Removed the ^ anchor from the regex so it matches lines with
leading whitespace. This fixes HDD detection for drives that
have proper Rotation Rate fields in their SMART data.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added log_info messages to show:
- Count of OSDs found
- Each device-to-OSD mapping as discovered
Also fixed array subscript quoting in CEPH_DEVICE_TO_OSD.
Run with --verbose to see Ceph detection diagnostics.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Improved device type detection:
- Use anchored regex (^Rotation Rate:) to avoid false matches
- Check for actual RPM values (e.g., "7200 rpm") to confirm HDD
- Only match SSD in model name field, not anywhere in output
- Default to HDD when Rotation Rate field is missing
This fixes drives like WDC WD80EFZZ being incorrectly detected
as SSDs when the Rotation Rate field wasn't being matched.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Split SMART data handling into two functions:
- parse_smart_data(): Parses raw smartctl output (no I/O)
- get_drive_smart_info(): Fetches and parses (wrapper)
Changed parallel collection to save raw smartctl output to cache
files, then parse during the display loop. This avoids issues
with function availability in background subshells when running
from process substitution (bash <(curl ...)).
Also fixed:
- Removed orphan code that was outside function scope
- Fixed lsblk caching to use separate calls for SIZE and MOUNTPOINT
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Split lsblk queries into two separate calls:
1. lsblk -dn for disk sizes (whole disk only, simpler parsing)
2. lsblk -rn for mount points (handles partition-to-parent mapping)
This fixes issues where:
- SIZE was empty due to delimiter confusion
- Mount points with spaces caused parsing errors
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removed 'local' keyword from colored_warnings variable assignment
in the main script body. The 'local' keyword can only be used
inside functions in bash.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added path constant for disk-by-path location:
DISK_BY_PATH="/dev/disk/by-path"
Updated build_drive_map() to use the constant instead of
hardcoded path strings.
Note: LOG_DIR not added as the script does not currently use
logging to files. Can be added if logging feature is implemented.
Fixes: #24
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added 'set -o pipefail' to ensure pipe failures are detected.
Not using -e (errexit) as the script is designed for graceful
degradation when optional tools (smartctl, ceph) are missing.
Many commands intentionally redirect stderr to /dev/null.
Not using -u (nounset) as the script uses ${var:-default}
patterns extensively for optional variables.
Fixes: #23
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Improved quoting consistency throughout the script:
- Array subscripts now quoted: DEVICE_TO_BAY["$device"]="$bay"
- Command substitution quoted: all_bays="$(cmd)"
- Function arguments already fixed in earlier commits
Most variable assignments were already properly quoted. The
remaining unquoted uses (like 'for x in $var') are intentional
for word-splitting on whitespace-separated lists.
Fixes: #22
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The magic numbers mentioned have been addressed:
- grep -B 20 for Ceph: Fixed in issue #9 with proper block parsing
in build_ceph_cache() that reads structured output
- awk column 10 for temperature: Fixed in issue #2 with dynamic
last-numeric-field extraction that doesn't rely on column position
- SMART thresholds: Added as named constants in issue #12:
SMART_TEMP_WARN, SMART_TEMP_CRIT, SMART_REALLOCATED_WARN, etc.
Fixes: #21
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Converted all echo -e commands to printf for better portability
across different systems and shells. Printf is POSIX-compliant
and behaves consistently.
Updated functions:
- colorize_health(): Uses printf %b for escape sequences
- colorize_temp(): Uses printf %b for escape sequences
- colorize_header(): Uses printf with newline
- log_error(), log_warn(), log_info(): Uses printf for stderr
Also simplified header output by calling colorize_header directly
since it now handles its own newline.
Fixes: #19
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The build_ceph_cache() function added in commit 9d39332 already
addresses this issue by:
- Querying ceph-volume lvm list once, building CEPH_DEVICE_TO_OSD map
- Querying ceph osd tree once, building CEPH_OSD_STATUS and CEPH_OSD_IN maps
- Eliminating per-device Ceph queries in the main loop
Fixes: #18
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The lspci command is now called only once on first invocation of
get_storage_controllers, with results cached in LSPCI_CACHE.
Subsequent calls from different layout generators (10bay, large1,
micro) reuse the cached output, reducing subprocess overhead.
Also added function documentation.
Fixes: #17
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of calling lsblk twice per device (once for size, once for
mount points), now performs a single lsblk call at start and caches:
- LSBLK_SIZE: Device sizes
- LSBLK_MOUNTS: Mount points (accumulated for partitions)
This reduces the number of subprocess calls significantly,
especially on systems with many drives.
Fixes: #16
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SMART queries are now run in parallel using background jobs:
1. First pass launches background jobs for all devices
2. Each job writes to a temp file in SMART_CACHE_DIR
3. Wait for all jobs to complete
4. Second pass reads cached data for display
This significantly reduces script runtime when multiple drives
are present, as SMART queries can take 1-2 seconds each.
Cache directory is automatically cleaned up after use.
Fixes: #15
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added --verbose flag that enables detailed logging:
- log_error(): Always shown, critical errors
- log_warn(): Shown in verbose mode, potential issues
- log_info(): Shown in verbose mode, informational messages
Now provides helpful feedback for:
- SMART query failures with specific error messages
- Missing drive mappings for the current host
- Empty bays (no device at configured PCI path)
- Ceph command availability and query status
- Drive mapping statistics (mapped vs empty)
Color-coded output when using --color with --verbose.
Fixes: #14
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The --show-pci flag was added in commit 71a4e3b which displays
the PCI path for each drive in the output table.
Fixes: #13
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New warning detection for concerning SMART values:
- Temperature: Warning at 50°C, Critical at 60°C
- Reallocated sectors: Warning at >= 1
- Pending sectors: Warning at >= 1
- UDMA CRC errors: Warning at >= 100
- Power-on hours: Warning at >= 43800 (5 years)
Health indicator now shows ⚠ when SMART passed but has warnings.
Added WARNINGS column to output showing codes like:
TEMP_WARN, TEMP_CRIT, REALLOC:5, PENDING:2, CRC:150, HOURS:50000
Thresholds are configurable via constants at top of script.
Fixes: #12
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When enabled, colors are applied to:
- Headers: Blue/bold for section titles
- Health status: Green for ✓ (passed), Red for ✗ (failed)
- Temperature: Green (<50°C), Yellow (50-59°C), Red (≥60°C)
Added colorize_health, colorize_temp, and colorize_header helper
functions that respect the USE_COLOR flag.
Fixes: #11
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added comprehensive command-line interface with:
- -h, --help: Show usage information
- -v, --version: Show version
- -d, --debug: Enable debug output
- -s, --skip-smart: Skip SMART data collection (faster)
- --no-ceph: Skip Ceph OSD information
- --show-pci: Display PCI paths for debugging
The script now properly respects these flags throughout execution.
Fixes: #10
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace fragile per-device ceph-volume parsing (grep -B 20) with a
single upfront query that builds lookup tables.
New build_ceph_cache function:
- Parses ceph-volume lvm list output using proper block detection
- Extracts OSD IDs by matching "====== osd.X =======" headers
- Maps block devices to their corresponding OSDs
- Queries ceph osd tree once for all status info
- Creates CEPH_DEVICE_TO_OSD, CEPH_OSD_STATUS, CEPH_OSD_IN arrays
This is both more reliable and more efficient.
Fixes: #9
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use lsblk instead of mount command to detect mount points. This
properly detects mounts on partitions (e.g., /dev/sda1) rather
than only whole-device mounts.
- Shows multiple mount points (up to 3) comma-separated
- Correctly identifies BOOT drives with root partition
- Handles NVMe partition naming (nvme0n1p1, etc.)
Fixes: #8
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The device type detection was updated in commit 90055be to:
- Check for NVMe devices by name prefix first
- Handle "Solid State" and "0 rpm" in Rotation Rate field
- Fall back to checking for SSD/Solid State in SMART output
Fixes: #7
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The serial number parsing was updated in commit 90055be to use
'cut -d: -f2 | xargs' which captures the full serial including
spaces, instead of 'awk {print $3}' which only got the first word.
Fixes: #6
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
NVMe drives mapped to m2-1, m2-2 slots now appear in the main drive
table with their bay position, instead of in a separate unmapped
section.
- Extended bay loop to include m2-* slots after numeric bays
- NVMe section now only shows truly unmapped NVMe drives
- Mapped NVMe drives show full SMART data like other drives
Fixes: #5
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use awk BEGIN block for comparing Ceph OSD reweight values instead
of bc. Awk is more universally available and the previous fallback
to "echo 0" could incorrectly evaluate to true.
Fixes: #4
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Script now verifies required (lsblk, lspci, etc.) and optional
(smartctl, ceph, bc, nvme) dependencies at startup.
- Exits with clear error if required dependencies are missing
- Warns about missing optional dependencies with reduced functionality
- Directs users to freshStartScript for easy installation
- Checks for sudo access needed for SMART operations
Fixes: #3
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Temperature parsing now correctly handles:
- SATA Temperature_Celsius attribute (extracts last numeric value)
- Simple "Temperature: XX Celsius" format
- "Current Temperature: XX Celsius" format
- NVMe temperature reporting
Also improved device type detection for NVMe, SSD (including 0 RPM),
and fixed serial number parsing to capture full serial with spaces.
Fixes: #2
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Declare DRIVE_MAP as global at function start and populate directly,
instead of creating a local array and copying to global. Also added
proper variable quoting and function documentation.
Fixes: #1
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added get_storage_controllers() function that detects SAS, SATA, RAID,
and NVMe controllers via lspci. Updated all layout functions (10bay,
large1, micro) to display detected storage controllers with their
PCI address and model info.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reflects actual physical layout: 3 stacks x 5 rows (15 bays)
- Added note that physical bay mapping is TBD
- Shows Stack A, B, C columns
- Keeps 2x M.2 NVMe slots
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PCI path mappings for large1 (12 SATA/SAS + 2 NVMe drives)
- Update large1 layout to show actual drive assignments
- Controllers: LSI SAS2008 (7), AMD SATA (3), ASMedia SATA (2), NVMe (2)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PCI path mappings for storage-01 (4 SATA drives on AMD controller)
- Fix NVMe serial: use smartctl instead of nvme list for accurate serial numbers
- NVMe now shows actual serial number instead of /dev/ng device path
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updated README.md:
- Added feature list with emojis for visual clarity
- Documented all output columns with descriptions
- Added Ceph integration details
- Included troubleshooting for common issues
- Updated example output with current format
- Added status indicators (✅⚠️) for server mapping status
Created CLAUDE.md:
- Documented AI-assisted development process
- Chronicled evolution from basic script to comprehensive tool
- Detailed technical challenges and solutions
- Listed all phases of development
- Provided metrics and future enhancement ideas
- Lessons learned for future AI collaboration
This documents the complete journey from broken PCI paths to a
production-ready storage infrastructure management tool.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed parsing of ceph osd tree output:
- Column 5 is STATUS (up/down) not column 6
- Column 6 is REWEIGHT (1.0 = in, 0 = out)
- Now correctly shows up/in for active OSDs
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
New features:
- Added STATUS column showing Ceph OSD up/down and in/out status
Format: "up/in", "up/out", "down/in", etc.
- Added USAGE column to identify boot drives and mount points
Shows "BOOT" for root filesystem, mount point for others, "-" for OSDs
- Improved table layout with all relevant drive information
Now you can see at a glance:
- Which drives are boot drives
- Which OSDs are up and in the cluster
- Any problematic OSDs that are down or out
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Major enhancements:
- Drive details now sorted by physical bay position (1-10) instead of alphabetically
- Added BAY column to show physical location
- Added CEPH OSD column to show which OSD each drive hosts
- Fixed ASCII art right border alignment (final fix)
- Drives now display in logical order matching physical layout
This makes it much easier to correlate physical drives with their Ceph OSDs
and understand the layout at a glance.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>