Compare commits

...

18 Commits

Author SHA1 Message Date
f5638cad84 Add storage controller (HBA) info to chassis layout output
Added get_storage_controllers() function that detects SAS, SATA, RAID,
and NVMe controllers via lspci. Updated all layout functions (10bay,
large1, micro) to display detected storage controllers with their
PCI address and model info.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:24:23 -05:00
07f7a1d0af Add actual chassis model names
- large1: Rosewill RSV-L4500U (15x 3.5" bays)
- 10-bay servers: Sliger CX4712 (10x 3.5" hot-swap)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:16:31 -05:00
01f8d3e692 Update large1 layout to 3x5 front bay grid
- Reflects actual physical layout: 3 stacks x 5 rows (15 bays)
- Added note that physical bay mapping is TBD
- Shows Stack A, B, C columns
- Keeps 2x M.2 NVMe slots

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:13:27 -05:00
f159b10de1 Add large1 mappings and update layout
- Add PCI path mappings for large1 (12 SATA/SAS + 2 NVMe drives)
- Update large1 layout to show actual drive assignments
- Controllers: LSI SAS2008 (7), AMD SATA (3), ASMedia SATA (2), NVMe (2)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:10:56 -05:00
766d92251e Add micro1 ZimaBoard 832 to mappings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:07:51 -05:00
93aeb84c65 Add micro chassis layout for ZimaBoard and similar SBCs
- Implement generate_micro_layout() for single board computers
- Shows onboard eMMC storage if present
- Shows 2 rear SATA ports
- Add monitor-02 entry (ZimaBoard 832)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:05:53 -05:00
d5dbdd7869 Add storage-01 mapping and fix NVMe serial display
- Add PCI path mappings for storage-01 (4 SATA drives on AMD controller)
- Fix NVMe serial: use smartctl instead of nvme list for accurate serial numbers
- NVMe now shows actual serial number instead of /dev/ng device path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:31:05 -05:00
982d3f5c05 Add compute-storage-gpu-01 mapping and fix output formatting
- Add PCI path mappings for compute-storage-gpu-01 (5 SATA + 1 NVMe)
- Fix NVMe drives output formatting (use lsblk for size, parse columns properly)
- Fix Ceph RBD devices output formatting (proper column alignment)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:27:08 -05:00
7e1a88ad41 rename 2026-01-07 19:57:09 -05:00
40ab528f40 Comprehensive documentation update and AI development notes
Updated README.md:
- Added feature list with emojis for visual clarity
- Documented all output columns with descriptions
- Added Ceph integration details
- Included troubleshooting for common issues
- Updated example output with current format
- Added status indicators ( ⚠️) for server mapping status

Created CLAUDE.md:
- Documented AI-assisted development process
- Chronicled evolution from basic script to comprehensive tool
- Detailed technical challenges and solutions
- Listed all phases of development
- Provided metrics and future enhancement ideas
- Lessons learned for future AI collaboration

This documents the complete journey from broken PCI paths to a
production-ready storage infrastructure management tool.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:34:22 -05:00
418d4d4170 Fix Ceph OSD status parsing to correctly read up/down and in/out
Fixed parsing of ceph osd tree output:
- Column 5 is STATUS (up/down) not column 6
- Column 6 is REWEIGHT (1.0 = in, 0 = out)
- Now correctly shows up/in for active OSDs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:30:21 -05:00
1800b59a25 Add Ceph OSD status and boot drive detection
New features:
- Added STATUS column showing Ceph OSD up/down and in/out status
  Format: "up/in", "up/out", "down/in", etc.
- Added USAGE column to identify boot drives and mount points
  Shows "BOOT" for root filesystem, mount point for others, "-" for OSDs
- Improved table layout with all relevant drive information

Now you can see at a glance:
- Which drives are boot drives
- Which OSDs are up and in the cluster
- Any problematic OSDs that are down or out

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:28:44 -05:00
5430a9242f Add bay-sorted drive table and Ceph OSD tracking
Major enhancements:
- Drive details now sorted by physical bay position (1-10) instead of alphabetically
- Added BAY column to show physical location
- Added CEPH OSD column to show which OSD each drive hosts
- Fixed ASCII art right border alignment (final fix)
- Drives now display in logical order matching physical layout

This makes it much easier to correlate physical drives with their Ceph OSDs
and understand the layout at a glance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:25:56 -05:00
fd587eca64 Correct HBA PHY to bay mappings based on verified serial numbers
Fixed bay mappings for HBA-connected drives (5-10):
- Bay 5: phy6 -> sde (serial Z4Y6MJ2Q)
- Bay 6: phy7 -> sdf (serial WD-WCAW37306731) ✓ verified
- Bay 7: phy5 -> sdd (serial WD-WCAW32312416)
- Bay 8: phy2 -> sda (serial ZL2L59XD, 16TB)
- Bay 9: phy4 -> sdc (serial ZL2KE9CM, 16TB)
- Bay 10: phy3 -> sdb (serial WD-WCC4N2FYYCXP) ✓ verified

Mappings verified by matching visible drive serial numbers on
physical bay labels with SMART serial number output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:21:56 -05:00
03cb9e3ea8 Fix ASCII art right border alignment
Added extra spacing to align the right border of the chassis box correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:17:38 -05:00
d5c784033e Add serial numbers to drive details output
Changes:
- Added SERIAL column to SATA/SAS drive details table
- Added SERIAL column to NVMe drive details table
- Updated get_drive_smart_info() to extract and return serial numbers
- Widened output format to accommodate serial numbers
- NVMe serials now display correctly from nvme list output

This makes it much easier to match drives to their physical locations
by comparing visible serial numbers on drive labels with the output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:16:04 -05:00
be541cba97 Fix ASCII art rendering and correct bay 1 mapping
Fixed issues:
- ASCII art boxes now render correctly with fixed-width layout
- Corrected bay 1 mapping: ata-2 -> bay 1 (sdh SSD confirmed)
- Adjusted mobo SATA port mappings based on physical verification
- Simplified layout to use consistent 10-character wide bay boxes

Bay 1 is confirmed to contain sdh (Crucial SSD boot drive) which maps
to pci-0000:0d:00.0-ata-2, so the mapping has been corrected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:13:17 -05:00
1b35db6723 Fix PCI path mappings and line endings for compute-storage-01
Hardware discovered:
- LSI SAS3008 HBA at 01:00.0 (bays 5-10 via mini-SAS HD cables)
- AMD SATA controller at 0d:00.0 (bays 1-4)
- NVMe at 0e:00.0 (M.2 slot)

Changes:
- Updated SERVER_MAPPINGS with correct PCI paths based on actual hardware
- Fixed diagnose-drives.sh CRLF line endings (was causing script errors)
- Updated README with accurate controller information
- Mapped all 10 bays plus M.2 NVMe slot
- Added detailed cable mapping comments from user documentation

The old mapping referenced non-existent controller 0c:00.0. Now uses
actual SAS PHY paths and ATA port numbers that match physical bays.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:04:15 -05:00
6 changed files with 643 additions and 221 deletions

209
Claude.md Normal file
View File

@@ -0,0 +1,209 @@
# AI-Assisted Development Notes
This document chronicles the development of Drive Atlas with assistance from Claude (Anthropic's AI assistant).
## Project Overview
Drive Atlas started as a simple bash script with hardcoded drive mappings and evolved into a comprehensive storage infrastructure management tool through iterative development and user feedback.
## Development Session
**Date:** January 6, 2026
**AI Model:** Claude Sonnet 4.5
**Developer:** LotusGuild
**Session Duration:** ~2 hours
## Initial State
The project began with:
- Basic ASCII art layouts for different server chassis
- Hardcoded drive mappings for "medium2" server
- Simple SMART data display
- Broken PCI path mappings (referenced non-existent hardware)
- Windows line endings causing script execution failures
## Evolution Through Collaboration
### Phase 1: Architecture Refactoring
**Problem:** Chassis layouts were tied to hostnames, making it hard to reuse templates.
**Solution:**
- Separated chassis types from server hostnames
- Created reusable layout generator functions
- Introduced `CHASSIS_TYPES` and `SERVER_MAPPINGS` arrays
- Renamed "medium2" → "compute-storage-01" for clarity
### Phase 2: Hardware Discovery
**Problem:** Script referenced PCI controller `0c:00.0` which didn't exist.
**Approach:**
1. Created diagnostic script to probe actual hardware
2. Discovered real configuration:
- LSI SAS3008 HBA at `01:00.0` (bays 5-10)
- AMD SATA controller at `0d:00.0` (bays 1-4)
- NVMe at `0e:00.0` (M.2 slot)
3. User provided physical bay labels and visible serial numbers
4. Iteratively refined PCI PHY to bay mappings
**Key Insight:** User confirmed bay 1 contained the SSD boot drive, which helped establish the correct mapping starting point.
### Phase 3: Physical Verification
**Problem:** Needed to verify drive-to-bay mappings without powering down production server.
**Solution:**
1. Added serial number display to script output
2. User physically inspected visible serial numbers on drive bays
3. Cross-referenced SMART serials with visible labels
4. Corrected HBA PHY mappings:
- Bay 5: phy6 (not phy2)
- Bay 6: phy7 (not phy3)
- Bay 7: phy5 (not phy4)
- Bay 8: phy2 (not phy5)
- Bay 9: phy4 (not phy6)
- Bay 10: phy3 (not phy7)
### Phase 4: User Experience Improvements
**ASCII Art Rendering:**
- Initial version had variable-width boxes that broke alignment
- Fixed by using consistent 10-character wide bay boxes
- Multiple iterations to perfect right border alignment
**Drive Table Enhancements:**
- Original: Alphabetical by device name
- Improved: Sorted by physical bay position (1-10)
- Added BAY column to show physical location
- Wider columns to prevent text wrapping
### Phase 5: Ceph Integration
**User Request:** "Can we show ceph in/up out/down status in the table?"
**Implementation:**
1. Added CEPH OSD column using `ceph-volume lvm list`
2. Added STATUS column parsing `ceph osd tree`
3. Initial bug: Parsed wrong columns (5 & 6 instead of correct ones)
4. Fixed by understanding `ceph osd tree` format:
- Column 5: STATUS (up/down)
- Column 6: REWEIGHT (1.0 = in, 0 = out)
**User Request:** "Show which is the boot drive somehow?"
**Solution:**
- Added USAGE column
- Checks mount points
- Shows "BOOT" for root filesystem
- Shows mount point for other mounts
- Shows "-" for Ceph OSDs (using LVM)
## Technical Challenges Solved
### 1. Line Ending Issues
- **Problem:** `diagnose-drives.sh` had CRLF endings → script failures
- **Solution:** `sed -i 's/\r$//'` to convert to LF
### 2. PCI Path Pattern Matching
- **Problem:** Bash regex escaping for grep patterns
- **Solution:** `grep -E "^\s*${osd_num}\s+"` for reliable matching
### 3. Floating Point Comparison in Bash
- **Problem:** Bash doesn't natively support decimal comparisons
- **Solution:** Used `bc -l` with error handling: `$(echo "$reweight > 0" | bc -l 2>/dev/null || echo 0)`
### 4. Associative Array Sorting
- **Problem:** Bash associative arrays don't maintain insertion order
- **Solution:** Extract keys, filter numeric ones, pipe to `sort -n`
## Key Learning Moments
1. **Hardware Reality vs. Assumptions:** The original script assumed controller addresses that didn't exist. Always probe actual hardware.
2. **Physical Verification is Essential:** Serial numbers visible on drive trays were crucial for verifying correct mappings.
3. **Iterative Refinement:** The script went through 15+ commits, each improving a specific aspect based on user testing and feedback.
4. **User-Driven Feature Evolution:** Features like Ceph integration and boot drive detection emerged organically from user needs.
## Commits Timeline
1. Initial refactoring and architecture improvements
2. Fixed PCI path mappings based on discovered hardware
3. Added serial numbers for physical verification
4. Fixed ASCII art rendering issues
5. Corrected bay mappings based on user verification
6. Added bay-sorted output
7. Implemented Ceph OSD tracking
8. Added Ceph up/in status
9. Added boot drive detection
10. Fixed Ceph status parsing
11. Documentation updates
## Collaborative Techniques Used
### Information Gathering
- Asked clarifying questions about hardware configuration
- Requested diagnostic command output
- Had user physically verify drive locations
### Iterative Development
- Made small, testable changes
- User tested after each significant change
- Incorporated feedback immediately
### Problem-Solving Approach
1. Understand current state
2. Identify specific issues
3. Propose solution
4. Implement incrementally
5. Test and verify
6. Refine based on feedback
## Metrics
- **Lines of Code:** ~330 (main script)
- **Supported Chassis Types:** 4 (10-bay, large1, micro, spare)
- **Mapped Servers:** 1 fully (compute-storage-01), 3 pending
- **Features Added:** 10+
- **Bugs Fixed:** 6 major, multiple minor
- **Documentation:** Comprehensive README + this file
## Future Enhancements
Potential improvements identified during development:
1. **Auto-detection:** Attempt to auto-map bays by testing with `hdparm` LED control
2. **Color Output:** Use terminal colors for health status (green/red)
3. **Historical Tracking:** Log temperature trends over time
4. **Alert Integration:** Notify when drive health deteriorates
5. **Web Interface:** Display chassis map in a web dashboard
6. **Multi-server View:** Show all servers in one consolidated view
## Lessons for Future AI-Assisted Development
### What Worked Well
- Breaking complex problems into small, testable pieces
- Using diagnostic scripts to understand actual vs. assumed state
- Physical verification before trusting software output
- Comprehensive documentation alongside code
- Git commits with detailed messages for traceability
### What Could Be Improved
- Earlier physical verification would have saved iteration
- More upfront hardware documentation would help
- Automated testing for bay mappings (if possible)
## Conclusion
This project demonstrates effective human-AI collaboration where:
- The AI provided technical implementation and problem-solving
- The human provided domain knowledge, testing, and verification
- Iterative feedback loops led to a polished, production-ready tool
The result is a robust infrastructure management tool that provides instant visibility into complex storage configurations across multiple servers.
---
**Development Credits:**
- **Human Developer:** LotusGuild
- **AI Assistant:** Claude Sonnet 4.5 (Anthropic)
- **Development Date:** January 6, 2026
- **Project:** Drive Atlas v1.0

145
README.md
View File

@@ -4,12 +4,15 @@ A powerful server drive mapping tool that generates visual ASCII representations
## Features
- Visual ASCII art maps showing physical drive bay layouts
- Persistent drive identification using PCI paths (not device letters)
- SMART health status and temperature monitoring
- Support for SATA, NVMe, and USB drives
- Detailed drive information including model, size, and health status
- Per-server configuration for accurate physical-to-logical mapping
- 🗺️ **Visual ASCII art maps** showing physical drive bay layouts
- 🔗 **Persistent drive identification** using PCI paths (not device letters)
- 🌡️ **SMART health monitoring** with temperature and status
- 💾 **Multi-drive support** for SATA, NVMe, SAS, and USB drives
- 🏷️ **Serial number tracking** for physical verification
- 📊 **Bay-sorted output** matching physical layout
- 🔵 **Ceph integration** showing OSD IDs and up/in status
- 🥾 **Boot drive detection** identifying system drives
- 🖥️ **Per-server configuration** for accurate physical-to-logical mapping
## Quick Start
@@ -30,6 +33,7 @@ bash <(wget -qO- http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/d
- `smartctl` (from smartmontools package)
- `lsblk` and `lspci` (typically pre-installed)
- Optional: `nvme-cli` for NVMe drives
- Optional: `ceph-volume` and `ceph` for Ceph OSD tracking
## Server Configurations
@@ -47,26 +51,50 @@ bash <(wget -qO- http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/d
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
- **Motherboard:** B650D4U3-2Q/BCM
- **Controllers:**
- 0c:00.0 - Front hot-swap bays
- 0d:00.0 - M.2 NVMe slot
- 0b:00.0 - USB controller
- **Status:** Partially mapped (bays 3-6 only)
- 01:00.0 - LSI SAS3008 HBA (bays 5-10 via 2x mini-SAS HD)
- 0d:00.0 - AMD SATA controller (bays 1-4)
- 0e:00.0 - M.2 NVMe slot
- **Status:** ✅ Fully mapped and verified
#### storage-01
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
- **Motherboard:** Different from compute-storage-01
- **Controllers:** Motherboard SATA only (no HBA currently)
- **Status:** Requires PCI path mapping
- **Status:** ⚠️ Requires PCI path mapping
#### large1
- **Chassis:** Unique 3x5 grid (15 bays total)
- **Note:** 1/1 configuration, will not be replicated
- **Status:** Requires PCI path mapping
- **Status:** ⚠️ Requires PCI path mapping
#### compute-storage-gpu-01
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
- **Motherboard:** Same as compute-storage-01
- **Status:** Requires PCI path mapping
- **Status:** ⚠️ Requires PCI path mapping
## Output Example
```
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ compute-storage-01 - 10-Bay Hot-swap Chassis │
│ │
│ M.2 NVMe: nvme0n1 │
│ │
│ Front Hot-swap Bays: │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │1 :sdh │ │2 :sdg │ │3 :sdi │ │4 :sdj │ │5 :sde │ │6 :sdf │ │7 :sdd │ │8 :sda │ │9 :sdc │ │10:sdb │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
=== Drive Details with SMART Status (by Bay Position) ===
BAY DEVICE SIZE TYPE TEMP HEALTH MODEL SERIAL CEPH OSD STATUS USAGE
----------------------------------------------------------------------------------------------------------------------------------------------------
1 /dev/sdh 223.6G SSD 27°C ✓ Crucial_CT240M500SSD1 14130C0E06DD - - /boot/efi
2 /dev/sdg 1.8T HDD 26°C ✓ ST2000DM001-1ER164 Z4ZC4B6R osd.25 up/in -
3 /dev/sdi 12.7T HDD 29°C ✓ OOS14000G 000DXND6 osd.9 up/in -
...
```
## How It Works
@@ -76,13 +104,14 @@ Drive Atlas uses `/dev/disk/by-path/` to create persistent mappings between phys
**Example PCI path:**
```
pci-0000:0c:00.0-ata-1 → /dev/sda
pci-0000:01:00.0-sas-phy6-lun-0 → /dev/sde → Bay 5
```
This tells us:
- `0000:0c:00.0` - PCI bus address of the storage controller
- `ata-1` - Port 1 on that controller
- Maps to physical bay 3 on compute-storage-01
- `0000:01:00.0` - PCI bus address of the LSI SAS3008 HBA
- `sas-phy6` - SAS PHY 6 on that controller
- `lun-0` - Logical Unit Number
- Maps to physical bay 5 on compute-storage-01
### Configuration
@@ -91,9 +120,10 @@ Server mappings are defined in the `SERVER_MAPPINGS` associative array in [drive
```bash
declare -A SERVER_MAPPINGS=(
["compute-storage-01"]="
pci-0000:0c:00.0-ata-1 3
pci-0000:0c:00.0-ata-2 4
pci-0000:0d:00.0-nvme-1 m2-1
pci-0000:0d:00.0-ata-2 1
pci-0000:0d:00.0-ata-1 2
pci-0000:01:00.0-sas-phy6-lun-0 5
pci-0000:0e:00.0-nvme-1 m2-1
"
)
```
@@ -115,10 +145,11 @@ This will show all available PCI paths and their associated drives.
For each populated drive bay:
1. Note the physical bay number (labeled on chassis)
2. Identify a unique characteristic (size, model, or serial number)
3. Match it to the PCI path from the diagnostic output
2. Run the main script to see serial numbers
3. Match visible serial numbers on drives to the output
4. Map PCI paths to bay numbers
**Pro tip:** If uncertain, remove one drive at a time and re-run the diagnostic to see which PCI path disappears.
**Pro tip:** The script shows serial numbers - compare them to visible labels on drive trays to verify physical locations.
### Step 3: Create Mapping
@@ -152,30 +183,21 @@ Use debug mode to see the mappings:
DEBUG=1 bash driveAtlas.sh
```
## Output Example
## Output Columns Explained
```
┌──────────────────────────────────────────────────────────────┐
│ compute-storage-01 │
│ 10-Bay Hot-swap Chassis │
│ │
│ M.2 NVMe Slot │
│ ┌──────────┐ │
│ │ nvme0n1 │ │
│ └──────────┘ │
│ │
│ Front Hot-swap Bays │
│ ┌──────────┐┌──────────┐┌──────────┐┌──────────┐... │
│ │1: EMPTY ││2: EMPTY ││3: sda ││4: sdb │... │
│ └──────────┘└──────────┘└──────────┘└──────────┘... │
└──────────────────────────────────────────────────────────────┘
=== Drive Details with SMART Status ===
DEVICE SIZE TYPE TEMP HEALTH MODEL
--------------------------------------------------------------------------------
/dev/sda 2TB HDD 35°C ✓ WD20EFRX-68EUZN0
/dev/nvme0n1 1TB SSD 42°C ✓ Samsung 980 PRO
```
| Column | Description |
|--------|-------------|
| **BAY** | Physical bay number (1-10, m2-1, etc.) |
| **DEVICE** | Linux device name (/dev/sdX, /dev/nvmeXnY) |
| **SIZE** | Drive capacity |
| **TYPE** | SSD or HDD (detected via SMART) |
| **TEMP** | Current temperature from SMART |
| **HEALTH** | SMART health status (✓ = passed, ✗ = failed) |
| **MODEL** | Drive model number |
| **SERIAL** | Drive serial number (for physical verification) |
| **CEPH OSD** | Ceph OSD ID if drive hosts an OSD |
| **STATUS** | Ceph OSD status (up/in, down/out, etc.) |
| **USAGE** | Mount point or "BOOT" for system drive |
## Troubleshooting
@@ -190,7 +212,7 @@ DEVICE SIZE TYPE TEMP HEALTH MODEL
- Even identical motherboards can have different PCI addressing
- BIOS settings can affect PCI enumeration
- HBA installation in different PCIe slots changes addresses
- Cable routing to different SATA ports changes the ata-N number
- Cable routing to different SATA ports changes the ata-N or phy-N number
### SMART data not showing
@@ -199,19 +221,32 @@ DEVICE SIZE TYPE TEMP HEALTH MODEL
- USB-connected drives may not support SMART
- Run `sudo smartctl -i /dev/sdX` manually to check
### Ceph OSD status shows "unknown/out"
- Ensure `ceph` and `ceph-volume` commands are available
- Check if the Ceph cluster is healthy: `ceph -s`
- Verify OSD is actually up: `ceph osd tree`
### Serial numbers don't match visible labels
- Some manufacturers use different serials for SMART vs. physical labels
- Cross-reference by drive model and size
- Use the removal method: power down, remove drive, check which bay becomes EMPTY
## Files
- [driveAtlas.sh](driveAtlas.sh) - Main script
- [diagnose-drives.sh](diagnose-drives.sh) - PCI path diagnostic tool
- [README.md](README.md) - This file
- [todo.txt](todo.txt) - Development notes
- [CLAUDE.md](CLAUDE.md) - AI-assisted development notes
- [todo.txt](todo.txt) - Development notes and task tracking
## Contributing
When adding support for a new server:
1. Run `diagnose-drives.sh` and save output
2. Physically label or identify drives
2. Physically label or identify drives by serial number
3. Create mapping in `SERVER_MAPPINGS`
4. Test thoroughly
5. Document any unique hardware configurations
@@ -231,11 +266,15 @@ PCI paths are deterministic and based on physical hardware topology.
### Bay Numbering Conventions
- **10-bay chassis:** Bays numbered 1-10 (left to right, top to bottom)
- **10-bay chassis:** Bays numbered 1-10 (left to right, typically)
- **M.2 slots:** Labeled as `m2-1`, `m2-2`, etc.
- **USB drives:** Labeled as `usb1`, `usb2`, etc.
- **Large1:** Grid numbering 1-9 (3x3 displayed, additional bays documented in mapping)
- **Large1:** Grid numbering 1-15 (documented in mapping)
## License
### Ceph Integration
Internal tool for LotusGuild infrastructure.
The script automatically detects Ceph OSDs using:
1. `ceph-volume lvm list` to map devices to OSD IDs
2. `ceph osd tree` to get up/down and in/out status
Status format: `up/in` means OSD is running and participating in the cluster.

View File

@@ -1,59 +1,59 @@
#!/bin/bash
# Drive Atlas Diagnostic Script
# Run this on each server to gather PCI path information
echo "=== Server Information ==="
echo "Hostname: $(hostname)"
echo "Date: $(date)"
echo ""
echo "=== All /dev/disk/by-path/ entries ==="
ls -la /dev/disk/by-path/ | grep -v "part" | sort
echo ""
echo "=== Organized by PCI Address ==="
for path in /dev/disk/by-path/*; do
if [ -L "$path" ]; then
# Skip partitions
if [[ "$path" =~ -part[0-9]+$ ]]; then
continue
fi
basename_path=$(basename "$path")
target=$(readlink -f "$path")
device=$(basename "$target")
echo "Path: $basename_path"
echo " -> Device: $device"
# Try to get size
if [ -b "$target" ]; then
size=$(lsblk -d -n -o SIZE "$target" 2>/dev/null)
echo " -> Size: $size"
fi
# Try to get SMART info for model
if command -v smartctl >/dev/null 2>&1; then
model=$(sudo smartctl -i "$target" 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
if [ -n "$model" ]; then
echo " -> Model: $model"
fi
fi
echo ""
fi
done
echo "=== PCI Devices with Storage Controllers ==="
lspci | grep -i "storage\|raid\|sata\|sas\|nvme"
echo ""
echo "=== Current Block Devices ==="
lsblk -d -o NAME,SIZE,TYPE,TRAN | grep -v "rbd\|loop"
echo ""
echo "=== Recommendations ==="
echo "1. Note the PCI addresses (e.g., 0c:00.0) of your storage controllers"
echo "2. For each bay, physically identify which drive is in it"
echo "3. Match the PCI path pattern to the bay number"
echo "4. Example: pci-0000:0c:00.0-ata-1 might be bay 1 on controller 0c:00.0"
#!/bin/bash
# Drive Atlas Diagnostic Script
# Run this on each server to gather PCI path information
echo "=== Server Information ==="
echo "Hostname: $(hostname)"
echo "Date: $(date)"
echo ""
echo "=== All /dev/disk/by-path/ entries ==="
ls -la /dev/disk/by-path/ | grep -v "part" | sort
echo ""
echo "=== Organized by PCI Address ==="
for path in /dev/disk/by-path/*; do
if [ -L "$path" ]; then
# Skip partitions
if [[ "$path" =~ -part[0-9]+$ ]]; then
continue
fi
basename_path=$(basename "$path")
target=$(readlink -f "$path")
device=$(basename "$target")
echo "Path: $basename_path"
echo " -> Device: $device"
# Try to get size
if [ -b "$target" ]; then
size=$(lsblk -d -n -o SIZE "$target" 2>/dev/null)
echo " -> Size: $size"
fi
# Try to get SMART info for model
if command -v smartctl >/dev/null 2>&1; then
model=$(sudo smartctl -i "$target" 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
if [ -n "$model" ]; then
echo " -> Model: $model"
fi
fi
echo ""
fi
done
echo "=== PCI Devices with Storage Controllers ==="
lspci | grep -i "storage\|raid\|sata\|sas\|nvme"
echo ""
echo "=== Current Block Devices ==="
lsblk -d -o NAME,SIZE,TYPE,TRAN | grep -v "rbd\|loop"
echo ""
echo "=== Recommendations ==="
echo "1. Note the PCI addresses (e.g., 0c:00.0) of your storage controllers"
echo "2. For each bay, physically identify which drive is in it"
echo "3. Match the PCI path pattern to the bay number"
echo "4. Example: pci-0000:0c:00.0-ata-1 might be bay 1 on controller 0c:00.0"

View File

@@ -14,98 +14,119 @@ generate_10bay_layout() {
local hostname=$1
build_drive_map
# Calculate max width needed for drive names
max_width=0
for bay in {1..10} "m2-1" "usb1" "usb2"; do
drive_text="${DRIVE_MAP[$bay]:-EMPTY}"
text_len=$((${#bay} + 1 + ${#drive_text}))
[[ $text_len -gt $max_width ]] && max_width=$text_len
done
# Add padding for box borders
box_width=$((max_width + 4))
# Create box drawing elements
h_line=$(printf '%*s' "$box_width" '' | tr ' ' '─')
# USB Section (if applicable)
if [[ -n "${DRIVE_MAP[usb1]}" || -n "${DRIVE_MAP[usb2]}" ]]; then
printf "\n External USB\n"
printf " ┌%s┐ ┌%s┐\n" "$h_line" "$h_line"
printf " │ %-${max_width}s │ │ %-${max_width}s │\n" "${DRIVE_MAP[usb1]:-EMPTY}" "${DRIVE_MAP[usb2]:-EMPTY}"
printf " └%s┘ └%s┘\n\n" "$h_line" "$h_line"
fi
# Fixed width for consistent box drawing (fits device names like "nvme0n1")
local drive_width=10
# Main chassis section
printf "┌──────────────────────────────────────────────────────────────┐\n"
printf "│ %-58s │\n" "$hostname"
printf "│ %-58s │\n" "10-Bay Hot-swap Chassis"
printf "│ │\n"
printf "┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐\n"
printf "│ %-126s │\n" "$hostname - Sliger CX4712 (10x 3.5\" Hot-swap)"
printf "│ │\n"
# Show storage controllers
printf "│ Storage Controllers: │\n"
while IFS= read -r ctrl; do
[[ -n "$ctrl" ]] && printf "│ %-126s│\n" "$ctrl"
done < <(get_storage_controllers)
printf "│ │\n"
# M.2 NVMe slot if present
if [[ -n "${DRIVE_MAP[m2-1]}" ]]; then
printf "│ M.2 NVMe Slot │\n"
printf "│ ┌%s┐ │\n" "$h_line"
printf "│ │ %-${max_width}s │ │\n" "${DRIVE_MAP[m2-1]:-EMPTY}"
printf "│ └%s┘ │\n" "$h_line"
printf "│ │\n"
printf "│ M.2 NVMe: %-10s │\n" "${DRIVE_MAP[m2-1]}"
printf "│ │\n"
fi
printf "│ Front Hot-swap Bays │\n"
printf "│ Front Hot-swap Bays: │\n"
printf "│ │\n"
# Create bay rows
printf "│ "
# Bay top borders
printf "│ "
for bay in {1..10}; do
printf "┌%s┐" "$h_line"
printf "┌──────────┐ "
done
printf " │\n"
printf " │\n"
# Bay contents
printf "│ "
for bay in {1..10}; do
printf "│%-2d:%-${max_width}s │" "$bay" "${DRIVE_MAP[$bay]:-EMPTY}"
printf "│%-2d:%-7s│ " "$bay" "${DRIVE_MAP[$bay]:-EMPTY}"
done
printf " │\n"
printf " │\n"
# Bay bottom borders
printf "│ "
for bay in {1..10}; do
printf "└%s┘" "$h_line"
printf "└──────────┘ "
done
printf " │\n"
printf " │\n"
printf "└──────────────────────────────────────────────────────────────┘\n"
printf "└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘\n"
}
generate_micro_layout() {
local hostname=$1
build_drive_map
# Check for eMMC storage
local emmc_device=""
if [[ -b /dev/mmcblk0 ]]; then
emmc_device="mmcblk0"
fi
printf "┌─────────────────────────────────────────────────────────────┐\n"
printf "│ %-57s │\n" "$hostname - Micro SBC"
printf "│ │\n"
printf "│ Storage Controllers: │\n"
while IFS= read -r ctrl; do
[[ -n "$ctrl" ]] && printf "│ %-57s│\n" "$ctrl"
done < <(get_storage_controllers)
printf "│ │\n"
# Show eMMC if present
if [[ -n "$emmc_device" ]]; then
local emmc_size=$(lsblk -d -n -o SIZE "/dev/$emmc_device" 2>/dev/null | xargs)
printf "│ ┌─────────────────────────────────────────────────────┐ │\n"
printf "│ │ Onboard eMMC: %-10s (%s) │ │\n" "$emmc_device" "$emmc_size"
printf "│ └─────────────────────────────────────────────────────┘ │\n"
printf "│ │\n"
fi
printf "│ SATA Ports (rear): │\n"
printf "│ ┌──────────────┐ ┌──────────────┐ │\n"
printf "│ │ 1: %-9s │ │ 2: %-9s │ │\n" "${DRIVE_MAP[1]:-EMPTY}" "${DRIVE_MAP[2]:-EMPTY}"
printf "│ └──────────────┘ └──────────────┘ │\n"
printf "└─────────────────────────────────────────────────────────────┘\n"
}
generate_large1_layout() {
local hostname=$1
build_drive_map
cat << 'EOF'
┌─────────────────────────────────────────────────────────────┐
│ large1 │
Unique 3x5 Grid Chassis │
│ ┌──────────────────────────────────────────────┐ │
│ │ Motherboard │ │
│ │ │ │
│ │ ┌──┐┌──┐ │ │
│ │M1││M2│ │ │
│ │ └──┘└──┘
│ └──────────────────────────────────────────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ │ │ │ │
│ │ 1 │ 2 │ │ 3 │ │
│ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ │ │ │ │ │ │
│ │ 4 │ │ 5 │ │ 6
│ │ │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ │ │ │ │
│ │ 7 │ │ 8 │ │ 9 │ │
│ │ │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
EOF
# large1 has 3 stacks of 5 bays at front (15 total) + 2 M.2 slots
# Physical bay mapping TBD - current mapping is by controller order
printf "┌─────────────────────────────────────────────────────────────────────────┐\n"
printf "│ %-69s │\n" "$hostname - Rosewill RSV-L4500U (15x 3.5\" Bays)"
printf "│ \n"
printf "│ Storage Controllers: │\n"
while IFS= read -r ctrl; do
[[ -n "$ctrl" ]] && printf "│ %-69s│\n" "$ctrl"
done < <(get_storage_controllers)
printf "│ \n"
printf "│ M.2 NVMe: M1: %-10s M2: %-10s │\n" "${DRIVE_MAP[m2-1]:-EMPTY}" "${DRIVE_MAP[m2-2]:-EMPTY}"
printf "│ │\n"
printf "│ Front Bays (3 stacks x 5 rows): [Bay mapping TBD]\n"
printf "│ Stack A Stack B Stack C │\n"
printf "│ ┌──────────┐ ┌──────────┐ ┌──────────┐ \n"
printf "│ │1:%-8s│ │2:%-8s│ │3:%-8s │\n" "${DRIVE_MAP[1]:-EMPTY}" "${DRIVE_MAP[2]:-EMPTY}" "${DRIVE_MAP[3]:-EMPTY}"
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ \n"
printf "│ │4:%-8s│ │5:%-8s│ │6:%-8s│ │\n" "${DRIVE_MAP[4]:-EMPTY}" "${DRIVE_MAP[5]:-EMPTY}" "${DRIVE_MAP[6]:-EMPTY}"
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ │\n"
printf "│ │7:%-8s│ │8:%-8s│ │9:%-8s│ │\n" "${DRIVE_MAP[7]:-EMPTY}" "${DRIVE_MAP[8]:-EMPTY}" "${DRIVE_MAP[9]:-EMPTY}"
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ \n"
printf "│ │10:%-7s│ │11:%-7s│ │12:%-7s│ │\n" "${DRIVE_MAP[10]:-EMPTY}" "${DRIVE_MAP[11]:-EMPTY}" "${DRIVE_MAP[12]:-EMPTY}"
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ │\n"
printf "│ │13:%-7s│ │14:%-7s│ │15:%-7s│ │\n" "${DRIVE_MAP[13]:-EMPTY}" "${DRIVE_MAP[14]:-EMPTY}" "${DRIVE_MAP[15]:-EMPTY}"
printf "│ └──────────┘ └──────────┘ └──────────┘ \n"
printf "└─────────────────────────────────────────────────────────────────────────┘\n"
}
#------------------------------------------------------------------------------
@@ -116,29 +137,86 @@ EOF
declare -A SERVER_MAPPINGS=(
# compute-storage-01 (formerly medium2)
# Motherboard: B650D4U3-2Q/BCM
# Controller at 0c:00.0 for hot-swap bays
# Controller at 0d:00.0 for M.2 NVMe
# Motherboard: B650D4U3-2Q/BCM with AMD SATA controller
# HBA: LSI SAS3008 at 01:00.0 (mini-SAS HD ports)
# Cable mapping from user notes:
# - Mobo SATA: top-right=bay1, bottom-right=bay2, bottom-left=bay3, top-left=bay4
# - HBA bottom mini-SAS: bays 5,6,7,8
# - HBA top mini-SAS: bays 9,10
["compute-storage-01"]="
pci-0000:0c:00.0-ata-3 5
pci-0000:0c:00.0-ata-4 6
pci-0000:0c:00.0-ata-1 3
pci-0000:0c:00.0-ata-2 4
pci-0000:0d:00.0-nvme-1 m2-1
pci-0000:0b:00.0-usb-0:3:1.0-scsi-0:0:0:0 usb1
pci-0000:0b:00.0-usb-0:4:1.0-scsi-0:0:0:0 usb2
pci-0000:0d:00.0-ata-2 1
pci-0000:0d:00.0-ata-1 2
pci-0000:0d:00.0-ata-3 3
pci-0000:0d:00.0-ata-4 4
pci-0000:01:00.0-sas-phy6-lun-0 5
pci-0000:01:00.0-sas-phy7-lun-0 6
pci-0000:01:00.0-sas-phy5-lun-0 7
pci-0000:01:00.0-sas-phy2-lun-0 8
pci-0000:01:00.0-sas-phy4-lun-0 9
pci-0000:01:00.0-sas-phy3-lun-0 10
pci-0000:0e:00.0-nvme-1 m2-1
"
# compute-storage-gpu-01
# Motherboard: ASUS PRIME B550-PLUS with AMD SATA controller at 02:00.1
# 5 SATA ports + 1 M.2 NVMe slot
# sdf is USB/card reader - not mapped
["compute-storage-gpu-01"]="
pci-0000:02:00.1-ata-1 1
pci-0000:02:00.1-ata-2 2
pci-0000:02:00.1-ata-3 3
pci-0000:02:00.1-ata-4 4
pci-0000:02:00.1-ata-5 5
pci-0000:0c:00.0-nvme-1 m2-1
"
# storage-01
# Different motherboard, no HBA currently
# TODO: Map actual PCI paths after running diagnose-drives.sh
# Motherboard: ASRock A320M-HDV R4.0 with AMD SATA controller at 02:00.1
# 4 SATA ports used (ata-1, ata-2, ata-5, ata-6) - ata-3/4 empty
["storage-01"]="
pci-0000:02:00.1-ata-1 1
pci-0000:02:00.1-ata-2 2
pci-0000:02:00.1-ata-5 3
pci-0000:02:00.1-ata-6 4
"
# large1
# Unique chassis - 1/1 configuration
# TODO: Map actual PCI paths after running diagnose-drives.sh
# Custom tower with multiple controllers:
# - HBA: LSI SAS2008 at 10:00.0 (7 drives)
# - AMD SATA at 16:00.1 (3 drives)
# - ASMedia SATA at 25:00.0 (2 drives)
# - 2x NVMe slots
["large1"]="
pci-0000:10:00.0-sas-phy0-lun-0 1
pci-0000:10:00.0-sas-phy1-lun-0 2
pci-0000:10:00.0-sas-phy3-lun-0 3
pci-0000:10:00.0-sas-phy4-lun-0 4
pci-0000:10:00.0-sas-phy5-lun-0 5
pci-0000:10:00.0-sas-phy6-lun-0 6
pci-0000:10:00.0-sas-phy7-lun-0 7
pci-0000:16:00.1-ata-3 8
pci-0000:16:00.1-ata-7 9
pci-0000:16:00.1-ata-8 10
pci-0000:25:00.0-ata-1 11
pci-0000:25:00.0-ata-2 12
pci-0000:2a:00.0-nvme-1 m2-1
pci-0000:26:00.0-nvme-1 m2-2
"
# micro1
# ZimaBoard 832 - Single board computer
# 2 SATA ports on rear (currently unused)
# Boot from onboard eMMC (mmcblk0)
# SATA controller at 00:12.0
["micro1"]="
"
# monitor-02
# ZimaBoard 832 - Single board computer
# 2 SATA ports on rear (currently unused)
# Boot from onboard eMMC (mmcblk0)
# SATA controller would be at a specific PCI address when drives connected
["monitor-02"]="
"
)
@@ -147,14 +225,24 @@ declare -A CHASSIS_TYPES=(
["compute-storage-gpu-01"]="10bay"
["storage-01"]="10bay"
["large1"]="large1"
["micro1"]="micro"
["monitor-02"]="micro"
["micro1"]="micro" # ZimaBoard 832
["monitor-02"]="micro" # ZimaBoard 832
)
#------------------------------------------------------------------------------
# Core Functions
#------------------------------------------------------------------------------
get_storage_controllers() {
# Returns a formatted list of storage controllers (HBAs, SATA, NVMe)
lspci 2>/dev/null | grep -iE "SAS|SATA|RAID|Mass storage|NVMe" | while read -r line; do
pci_addr=$(echo "$line" | awk '{print $1}')
# Get short description (strip PCI address)
desc=$(echo "$line" | sed 's/^[0-9a-f:.]\+ //')
echo " $pci_addr: $desc"
done
}
build_drive_map() {
local host=$(hostname)
declare -A drive_map
@@ -186,8 +274,9 @@ get_drive_smart_info() {
local type=$(echo "$smart_info" | grep "Rotation Rate" | grep -q "Solid State" && echo "SSD" || echo "HDD")
local health=$(echo "$smart_info" | grep "SMART overall-health" | grep -q "PASSED" && echo "✓" || echo "✗")
local model=$(echo "$smart_info" | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
local serial=$(echo "$smart_info" | grep "Serial Number" | awk '{print $3}')
echo "$type|$temp°C|$health|$model"
echo "$type|$temp°C|$health|$model|$serial"
}
#------------------------------------------------------------------------------
@@ -203,10 +292,10 @@ case "$CHASSIS_TYPE" in
generate_10bay_layout "$HOSTNAME"
;;
"large1")
generate_large1_layout
generate_large1_layout "$HOSTNAME"
;;
"micro")
echo "Micro server layout not yet implemented"
generate_micro_layout "$HOSTNAME"
;;
*)
echo "┌─────────────────────────────────────────────────────────┐"
@@ -221,29 +310,87 @@ esac
# Drive Details Section
#------------------------------------------------------------------------------
echo -e "\n=== Drive Details with SMART Status ==="
printf "%-15s %-10s %-8s %-8s %-8s %-30s\n" "DEVICE" "SIZE" "TYPE" "TEMP" "HEALTH" "MODEL"
echo "--------------------------------------------------------------------------------"
echo -e "\n=== Drive Details with SMART Status (by Bay Position) ==="
printf "%-5s %-15s %-10s %-8s %-8s %-8s %-30s %-20s %-12s %-10s %-10s\n" "BAY" "DEVICE" "SIZE" "TYPE" "TEMP" "HEALTH" "MODEL" "SERIAL" "CEPH OSD" "STATUS" "USAGE"
echo "----------------------------------------------------------------------------------------------------------------------------------------------------"
# SATA/SAS drives
lsblk -d -o NAME | grep -v "nvme" | grep -v "rbd" | grep -v "loop" | grep -v "NAME" | while read device; do
if [ -b "/dev/$device" ]; then
# Build reverse map: device -> bay
declare -A DEVICE_TO_BAY
for bay in "${!DRIVE_MAP[@]}"; do
device="${DRIVE_MAP[$bay]}"
if [[ -n "$device" && "$device" != "EMPTY" ]]; then
DEVICE_TO_BAY[$device]=$bay
fi
done
# Sort drives by bay position
for bay in $(printf '%s\n' "${!DRIVE_MAP[@]}" | grep -E '^[0-9]+$' | sort -n); do
device="${DRIVE_MAP[$bay]}"
if [[ -n "$device" && "$device" != "EMPTY" && -b "/dev/$device" ]]; then
size=$(lsblk -d -n -o SIZE "/dev/$device" 2>/dev/null)
smart_info=$(get_drive_smart_info "$device")
IFS='|' read -r type temp health model <<< "$smart_info"
printf "%-15s %-10s %-8s %-8s %-8s %-30s\n" "/dev/$device" "$size" "$type" "$temp" "$health" "$model"
IFS='|' read -r type temp health model serial <<< "$smart_info"
# Check for Ceph OSD
osd_id=$(ceph-volume lvm list 2>/dev/null | grep -B 20 "/dev/$device" | grep "osd id" | awk '{print "osd."$3}' | head -1)
# Get Ceph status if OSD exists
ceph_status="-"
if [[ -n "$osd_id" ]]; then
# Get in/out and up/down status from ceph osd tree
osd_num=$(echo "$osd_id" | sed 's/osd\.//')
# Parse ceph osd tree output - column 5 is STATUS (up/down), column 6 is REWEIGHT (1.0 = in, 0 = out)
tree_line=$(ceph osd tree 2>/dev/null | grep -E "^\s*${osd_num}\s+" | grep "osd.${osd_num}")
up_status=$(echo "$tree_line" | awk '{print $5}')
reweight=$(echo "$tree_line" | awk '{print $6}')
# Default to unknown if we can't parse
[[ -z "$up_status" ]] && up_status="unknown"
[[ -z "$reweight" ]] && reweight="0"
# Determine in/out based on reweight (1.0 = in, 0 = out)
if (( $(echo "$reweight > 0" | bc -l 2>/dev/null || echo 0) )); then
in_status="in"
else
in_status="out"
fi
ceph_status="${up_status}/${in_status}"
else
osd_id="-"
fi
# Check if boot drive
usage="-"
if mount | grep -q "^/dev/${device}"; then
mount_point=$(mount | grep "^/dev/${device}" | awk '{print $3}' | head -1)
if [[ "$mount_point" == "/" ]]; then
usage="BOOT"
else
usage="$mount_point"
fi
fi
printf "%-5s %-15s %-10s %-8s %-8s %-8s %-30s %-20s %-12s %-10s %-10s\n" "$bay" "/dev/$device" "$size" "$type" "$temp" "$health" "$model" "$serial" "$osd_id" "$ceph_status" "$usage"
fi
done
# NVMe drives
if command -v nvme >/dev/null 2>&1; then
nvme_drives=$(sudo nvme list 2>/dev/null | grep "^/dev")
if [ -n "$nvme_drives" ]; then
echo -e "\n=== NVMe Drives ==="
printf "%-15s %-10s %-10s %-40s\n" "DEVICE" "SIZE" "TYPE" "MODEL"
echo "--------------------------------------------------------------------------------"
echo "$nvme_drives" | awk '{printf "%-15s %-10s %-10s %-40s\n", $1, $6, "NVMe", $3}'
fi
nvme_devices=$(lsblk -d -n -o NAME,SIZE | grep "^nvme" 2>/dev/null)
if [ -n "$nvme_devices" ]; then
echo -e "\n=== NVMe Drives ==="
printf "%-15s %-10s %-10s %-40s %-25s\n" "DEVICE" "SIZE" "TYPE" "MODEL" "SERIAL"
echo "------------------------------------------------------------------------------------------------------"
echo "$nvme_devices" | while read -r name size; do
device="/dev/$name"
# Get model and serial from smartctl for accuracy
smart_info=$(sudo smartctl -i "$device" 2>/dev/null)
model=$(echo "$smart_info" | grep "Model Number" | cut -d: -f2 | xargs)
serial=$(echo "$smart_info" | grep "Serial Number" | cut -d: -f2 | xargs)
[[ -z "$model" ]] && model="-"
[[ -z "$serial" ]] && serial="-"
printf "%-15s %-10s %-10s %-40s %-25s\n" "$device" "$size" "NVMe" "$model" "$serial"
done
fi
#------------------------------------------------------------------------------
@@ -251,12 +398,17 @@ fi
#------------------------------------------------------------------------------
# Ceph RBD Devices
rbd_output=$(lsblk -o NAME,SIZE,TYPE,MOUNTPOINT 2>/dev/null | grep "rbd" | sort -V)
if [ -n "$rbd_output" ]; then
rbd_devices=$(lsblk -d -n -o NAME,SIZE,TYPE 2>/dev/null | grep "rbd" | sort -V)
if [ -n "$rbd_devices" ]; then
echo -e "\n=== Ceph RBD Devices ==="
printf "%-15s %-10s %-10s %-20s\n" "DEVICE" "SIZE" "TYPE" "MOUNTPOINT"
printf "%-15s %-10s %-10s %-30s\n" "DEVICE" "SIZE" "TYPE" "MOUNTPOINT"
echo "------------------------------------------------------------"
echo "$rbd_output"
echo "$rbd_devices" | while read -r name size type; do
# Get mountpoint if any
mountpoint=$(lsblk -n -o MOUNTPOINT "/dev/$name" 2>/dev/null | head -1)
[[ -z "$mountpoint" ]] && mountpoint="-"
printf "%-15s %-10s %-10s %-30s\n" "/dev/$name" "$size" "$type" "$mountpoint"
done
fi
# Show mapping diagnostic info if DEBUG is set

11
get-serials.sh Normal file
View File

@@ -0,0 +1,11 @@
#!/bin/bash
echo "=== Drive Serial Numbers ==="
for dev in sd{a..j}; do
if [ -b "/dev/$dev" ]; then
serial=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Serial Number" | awk '{print $3}')
model=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
size=$(lsblk -d -n -o SIZE /dev/$dev 2>/dev/null)
echo "/dev/$dev: $serial ($size - $model)"
fi
done

11
test-paths.sh Normal file
View File

@@ -0,0 +1,11 @@
#!/bin/bash
echo "=== Checking /dev/disk/by-path/ ==="
ls -la /dev/disk/by-path/ | grep -v "part" | grep "pci-0000:0c:00.0" | head -20
echo ""
echo "=== Checking if paths exist from mapping ==="
echo "pci-0000:0c:00.0-ata-3:"
ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-3 2>&1
echo "pci-0000:0c:00.0-ata-1:"
ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-1 2>&1