Compare commits
20 Commits
585240b03f
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| f5638cad84 | |||
| 07f7a1d0af | |||
| 01f8d3e692 | |||
| f159b10de1 | |||
| 766d92251e | |||
| 93aeb84c65 | |||
| d5dbdd7869 | |||
| 982d3f5c05 | |||
| 7e1a88ad41 | |||
| 40ab528f40 | |||
| 418d4d4170 | |||
| 1800b59a25 | |||
| 5430a9242f | |||
| fd587eca64 | |||
| 03cb9e3ea8 | |||
| d5c784033e | |||
| be541cba97 | |||
| 1b35db6723 | |||
| 38c3dc910e | |||
| 657b7d9a2d |
4
.gitignore
vendored
4
.gitignore
vendored
@@ -1,3 +1 @@
|
||||
todo.txt
|
||||
medium1_notes.txt
|
||||
medium2_notes.txt
|
||||
todo.txt
|
||||
209
Claude.md
Normal file
209
Claude.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# AI-Assisted Development Notes
|
||||
|
||||
This document chronicles the development of Drive Atlas with assistance from Claude (Anthropic's AI assistant).
|
||||
|
||||
## Project Overview
|
||||
|
||||
Drive Atlas started as a simple bash script with hardcoded drive mappings and evolved into a comprehensive storage infrastructure management tool through iterative development and user feedback.
|
||||
|
||||
## Development Session
|
||||
|
||||
**Date:** January 6, 2026
|
||||
**AI Model:** Claude Sonnet 4.5
|
||||
**Developer:** LotusGuild
|
||||
**Session Duration:** ~2 hours
|
||||
|
||||
## Initial State
|
||||
|
||||
The project began with:
|
||||
- Basic ASCII art layouts for different server chassis
|
||||
- Hardcoded drive mappings for "medium2" server
|
||||
- Simple SMART data display
|
||||
- Broken PCI path mappings (referenced non-existent hardware)
|
||||
- Windows line endings causing script execution failures
|
||||
|
||||
## Evolution Through Collaboration
|
||||
|
||||
### Phase 1: Architecture Refactoring
|
||||
**Problem:** Chassis layouts were tied to hostnames, making it hard to reuse templates.
|
||||
|
||||
**Solution:**
|
||||
- Separated chassis types from server hostnames
|
||||
- Created reusable layout generator functions
|
||||
- Introduced `CHASSIS_TYPES` and `SERVER_MAPPINGS` arrays
|
||||
- Renamed "medium2" → "compute-storage-01" for clarity
|
||||
|
||||
### Phase 2: Hardware Discovery
|
||||
**Problem:** Script referenced PCI controller `0c:00.0` which didn't exist.
|
||||
|
||||
**Approach:**
|
||||
1. Created diagnostic script to probe actual hardware
|
||||
2. Discovered real configuration:
|
||||
- LSI SAS3008 HBA at `01:00.0` (bays 5-10)
|
||||
- AMD SATA controller at `0d:00.0` (bays 1-4)
|
||||
- NVMe at `0e:00.0` (M.2 slot)
|
||||
3. User provided physical bay labels and visible serial numbers
|
||||
4. Iteratively refined PCI PHY to bay mappings
|
||||
|
||||
**Key Insight:** User confirmed bay 1 contained the SSD boot drive, which helped establish the correct mapping starting point.
|
||||
|
||||
### Phase 3: Physical Verification
|
||||
**Problem:** Needed to verify drive-to-bay mappings without powering down production server.
|
||||
|
||||
**Solution:**
|
||||
1. Added serial number display to script output
|
||||
2. User physically inspected visible serial numbers on drive bays
|
||||
3. Cross-referenced SMART serials with visible labels
|
||||
4. Corrected HBA PHY mappings:
|
||||
- Bay 5: phy6 (not phy2)
|
||||
- Bay 6: phy7 (not phy3)
|
||||
- Bay 7: phy5 (not phy4)
|
||||
- Bay 8: phy2 (not phy5)
|
||||
- Bay 9: phy4 (not phy6)
|
||||
- Bay 10: phy3 (not phy7)
|
||||
|
||||
### Phase 4: User Experience Improvements
|
||||
|
||||
**ASCII Art Rendering:**
|
||||
- Initial version had variable-width boxes that broke alignment
|
||||
- Fixed by using consistent 10-character wide bay boxes
|
||||
- Multiple iterations to perfect right border alignment
|
||||
|
||||
**Drive Table Enhancements:**
|
||||
- Original: Alphabetical by device name
|
||||
- Improved: Sorted by physical bay position (1-10)
|
||||
- Added BAY column to show physical location
|
||||
- Wider columns to prevent text wrapping
|
||||
|
||||
### Phase 5: Ceph Integration
|
||||
**User Request:** "Can we show ceph in/up out/down status in the table?"
|
||||
|
||||
**Implementation:**
|
||||
1. Added CEPH OSD column using `ceph-volume lvm list`
|
||||
2. Added STATUS column parsing `ceph osd tree`
|
||||
3. Initial bug: Parsed wrong columns (5 & 6 instead of correct ones)
|
||||
4. Fixed by understanding `ceph osd tree` format:
|
||||
- Column 5: STATUS (up/down)
|
||||
- Column 6: REWEIGHT (1.0 = in, 0 = out)
|
||||
|
||||
**User Request:** "Show which is the boot drive somehow?"
|
||||
|
||||
**Solution:**
|
||||
- Added USAGE column
|
||||
- Checks mount points
|
||||
- Shows "BOOT" for root filesystem
|
||||
- Shows mount point for other mounts
|
||||
- Shows "-" for Ceph OSDs (using LVM)
|
||||
|
||||
## Technical Challenges Solved
|
||||
|
||||
### 1. Line Ending Issues
|
||||
- **Problem:** `diagnose-drives.sh` had CRLF endings → script failures
|
||||
- **Solution:** `sed -i 's/\r$//'` to convert to LF
|
||||
|
||||
### 2. PCI Path Pattern Matching
|
||||
- **Problem:** Bash regex escaping for grep patterns
|
||||
- **Solution:** `grep -E "^\s*${osd_num}\s+"` for reliable matching
|
||||
|
||||
### 3. Floating Point Comparison in Bash
|
||||
- **Problem:** Bash doesn't natively support decimal comparisons
|
||||
- **Solution:** Used `bc -l` with error handling: `$(echo "$reweight > 0" | bc -l 2>/dev/null || echo 0)`
|
||||
|
||||
### 4. Associative Array Sorting
|
||||
- **Problem:** Bash associative arrays don't maintain insertion order
|
||||
- **Solution:** Extract keys, filter numeric ones, pipe to `sort -n`
|
||||
|
||||
## Key Learning Moments
|
||||
|
||||
1. **Hardware Reality vs. Assumptions:** The original script assumed controller addresses that didn't exist. Always probe actual hardware.
|
||||
|
||||
2. **Physical Verification is Essential:** Serial numbers visible on drive trays were crucial for verifying correct mappings.
|
||||
|
||||
3. **Iterative Refinement:** The script went through 15+ commits, each improving a specific aspect based on user testing and feedback.
|
||||
|
||||
4. **User-Driven Feature Evolution:** Features like Ceph integration and boot drive detection emerged organically from user needs.
|
||||
|
||||
## Commits Timeline
|
||||
|
||||
1. Initial refactoring and architecture improvements
|
||||
2. Fixed PCI path mappings based on discovered hardware
|
||||
3. Added serial numbers for physical verification
|
||||
4. Fixed ASCII art rendering issues
|
||||
5. Corrected bay mappings based on user verification
|
||||
6. Added bay-sorted output
|
||||
7. Implemented Ceph OSD tracking
|
||||
8. Added Ceph up/in status
|
||||
9. Added boot drive detection
|
||||
10. Fixed Ceph status parsing
|
||||
11. Documentation updates
|
||||
|
||||
## Collaborative Techniques Used
|
||||
|
||||
### Information Gathering
|
||||
- Asked clarifying questions about hardware configuration
|
||||
- Requested diagnostic command output
|
||||
- Had user physically verify drive locations
|
||||
|
||||
### Iterative Development
|
||||
- Made small, testable changes
|
||||
- User tested after each significant change
|
||||
- Incorporated feedback immediately
|
||||
|
||||
### Problem-Solving Approach
|
||||
1. Understand current state
|
||||
2. Identify specific issues
|
||||
3. Propose solution
|
||||
4. Implement incrementally
|
||||
5. Test and verify
|
||||
6. Refine based on feedback
|
||||
|
||||
## Metrics
|
||||
|
||||
- **Lines of Code:** ~330 (main script)
|
||||
- **Supported Chassis Types:** 4 (10-bay, large1, micro, spare)
|
||||
- **Mapped Servers:** 1 fully (compute-storage-01), 3 pending
|
||||
- **Features Added:** 10+
|
||||
- **Bugs Fixed:** 6 major, multiple minor
|
||||
- **Documentation:** Comprehensive README + this file
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential improvements identified during development:
|
||||
|
||||
1. **Auto-detection:** Attempt to auto-map bays by testing with `hdparm` LED control
|
||||
2. **Color Output:** Use terminal colors for health status (green/red)
|
||||
3. **Historical Tracking:** Log temperature trends over time
|
||||
4. **Alert Integration:** Notify when drive health deteriorates
|
||||
5. **Web Interface:** Display chassis map in a web dashboard
|
||||
6. **Multi-server View:** Show all servers in one consolidated view
|
||||
|
||||
## Lessons for Future AI-Assisted Development
|
||||
|
||||
### What Worked Well
|
||||
- Breaking complex problems into small, testable pieces
|
||||
- Using diagnostic scripts to understand actual vs. assumed state
|
||||
- Physical verification before trusting software output
|
||||
- Comprehensive documentation alongside code
|
||||
- Git commits with detailed messages for traceability
|
||||
|
||||
### What Could Be Improved
|
||||
- Earlier physical verification would have saved iteration
|
||||
- More upfront hardware documentation would help
|
||||
- Automated testing for bay mappings (if possible)
|
||||
|
||||
## Conclusion
|
||||
|
||||
This project demonstrates effective human-AI collaboration where:
|
||||
- The AI provided technical implementation and problem-solving
|
||||
- The human provided domain knowledge, testing, and verification
|
||||
- Iterative feedback loops led to a polished, production-ready tool
|
||||
|
||||
The result is a robust infrastructure management tool that provides instant visibility into complex storage configurations across multiple servers.
|
||||
|
||||
---
|
||||
|
||||
**Development Credits:**
|
||||
- **Human Developer:** LotusGuild
|
||||
- **AI Assistant:** Claude Sonnet 4.5 (Anthropic)
|
||||
- **Development Date:** January 6, 2026
|
||||
- **Project:** Drive Atlas v1.0
|
||||
294
README.md
294
README.md
@@ -1,44 +1,280 @@
|
||||
# Drive Atlas
|
||||
|
||||
A powerful server drive mapping tool that generates visual ASCII representations of server layouts and provides comprehensive drive information based on the server's hostname.
|
||||
A powerful server drive mapping tool that generates visual ASCII representations of server layouts and provides comprehensive drive information. Maps physical drive bays to logical Linux device names using PCI bus paths for reliable, persistent identification.
|
||||
|
||||
## Features
|
||||
|
||||
- 🖼️ Visual ASCII art maps for different server types (large1, medium1, medium2, micro1/2)
|
||||
- 💽 Detailed NVMe drive information
|
||||
- 🔄 SATA drive listing with size and mount points
|
||||
- 🔍 PCI Bus Device Function (BDF) details
|
||||
- 📝 Drive identification by unique device IDs
|
||||
- 🗺️ **Visual ASCII art maps** showing physical drive bay layouts
|
||||
- 🔗 **Persistent drive identification** using PCI paths (not device letters)
|
||||
- 🌡️ **SMART health monitoring** with temperature and status
|
||||
- 💾 **Multi-drive support** for SATA, NVMe, SAS, and USB drives
|
||||
- 🏷️ **Serial number tracking** for physical verification
|
||||
- 📊 **Bay-sorted output** matching physical layout
|
||||
- 🔵 **Ceph integration** showing OSD IDs and up/in status
|
||||
- 🥾 **Boot drive detection** identifying system drives
|
||||
- 🖥️ **Per-server configuration** for accurate physical-to-logical mapping
|
||||
|
||||
## Quick Start
|
||||
|
||||
Execute remotely using either command:
|
||||
|
||||
Execute remotely using curl:
|
||||
```bash
|
||||
bash <(curl -s http://10.10.10.110:3000/JWS/driveAtlas/raw/branch/main/driveAtlas.sh)
|
||||
```
|
||||
- Using `wget`:
|
||||
```bash
|
||||
bash <(wget -qO- http://10.10.10.110:3000/JWS/driveAtlas/raw/branch/main/driveAtlas.sh)
|
||||
bash <(curl -s http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/driveAtlas.sh)
|
||||
```
|
||||
|
||||
Or using wget:
|
||||
```bash
|
||||
bash <(wget -qO- http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/driveAtlas.sh)
|
||||
```
|
||||
|
||||
## Requirements
|
||||
Linux environment with bash
|
||||
sudo privileges for NVMe operations
|
||||
curl or wget for remote execution
|
||||
## Server Types
|
||||
The script supports different server layouts:
|
||||
|
||||
Type Description
|
||||
large1 3x3 grid layout (9 drives)
|
||||
medium1 2x4 + 2 layout (10 drives)
|
||||
medium2 Single row layout (10 drives)
|
||||
micro1/2 Compact 2-drive layout
|
||||
- Linux environment with bash
|
||||
- `sudo` privileges for SMART operations
|
||||
- `smartctl` (from smartmontools package)
|
||||
- `lsblk` and `lspci` (typically pre-installed)
|
||||
- Optional: `nvme-cli` for NVMe drives
|
||||
- Optional: `ceph-volume` and `ceph` for Ceph OSD tracking
|
||||
|
||||
## Server Configurations
|
||||
|
||||
### Chassis Types
|
||||
|
||||
| Chassis Type | Description | Servers Using It |
|
||||
|-------------|-------------|------------------|
|
||||
| **10-Bay Hot-swap** | Sliger CX471225 4U 10x 3.5" NAS (with unused 2x 5.25" bays) | compute-storage-01, compute-storage-gpu-01, storage-01 |
|
||||
| **Large1 Grid** | Unique 3x5 grid layout (1/1 configuration) | large1 |
|
||||
| **Micro** | Compact 2-drive layout | micro1, monitor-02 |
|
||||
|
||||
### Server Details
|
||||
|
||||
#### compute-storage-01 (formerly medium2)
|
||||
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** B650D4U3-2Q/BCM
|
||||
- **Controllers:**
|
||||
- 01:00.0 - LSI SAS3008 HBA (bays 5-10 via 2x mini-SAS HD)
|
||||
- 0d:00.0 - AMD SATA controller (bays 1-4)
|
||||
- 0e:00.0 - M.2 NVMe slot
|
||||
- **Status:** ✅ Fully mapped and verified
|
||||
|
||||
#### storage-01
|
||||
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** Different from compute-storage-01
|
||||
- **Controllers:** Motherboard SATA only (no HBA currently)
|
||||
- **Status:** ⚠️ Requires PCI path mapping
|
||||
|
||||
#### large1
|
||||
- **Chassis:** Unique 3x5 grid (15 bays total)
|
||||
- **Note:** 1/1 configuration, will not be replicated
|
||||
- **Status:** ⚠️ Requires PCI path mapping
|
||||
|
||||
#### compute-storage-gpu-01
|
||||
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** Same as compute-storage-01
|
||||
- **Status:** ⚠️ Requires PCI path mapping
|
||||
|
||||
## Output Example
|
||||
The script provides:
|
||||
|
||||
ASCII visualization of server layout
|
||||
NVMe drive listing
|
||||
SATA drive information
|
||||
PCI BDF details
|
||||
Drive ID mappings
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ compute-storage-01 - 10-Bay Hot-swap Chassis │
|
||||
│ │
|
||||
│ M.2 NVMe: nvme0n1 │
|
||||
│ │
|
||||
│ Front Hot-swap Bays: │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │1 :sdh │ │2 :sdg │ │3 :sdi │ │4 :sdj │ │5 :sde │ │6 :sdf │ │7 :sdd │ │8 :sda │ │9 :sdc │ │10:sdb │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
=== Drive Details with SMART Status (by Bay Position) ===
|
||||
BAY DEVICE SIZE TYPE TEMP HEALTH MODEL SERIAL CEPH OSD STATUS USAGE
|
||||
----------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
1 /dev/sdh 223.6G SSD 27°C ✓ Crucial_CT240M500SSD1 14130C0E06DD - - /boot/efi
|
||||
2 /dev/sdg 1.8T HDD 26°C ✓ ST2000DM001-1ER164 Z4ZC4B6R osd.25 up/in -
|
||||
3 /dev/sdi 12.7T HDD 29°C ✓ OOS14000G 000DXND6 osd.9 up/in -
|
||||
...
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### PCI Path-Based Mapping
|
||||
|
||||
Drive Atlas uses `/dev/disk/by-path/` to create persistent mappings between physical drive bays and Linux device names. This is superior to using device letters (sda, sdb, etc.) which can change between boots.
|
||||
|
||||
**Example PCI path:**
|
||||
```
|
||||
pci-0000:01:00.0-sas-phy6-lun-0 → /dev/sde → Bay 5
|
||||
```
|
||||
|
||||
This tells us:
|
||||
- `0000:01:00.0` - PCI bus address of the LSI SAS3008 HBA
|
||||
- `sas-phy6` - SAS PHY 6 on that controller
|
||||
- `lun-0` - Logical Unit Number
|
||||
- Maps to physical bay 5 on compute-storage-01
|
||||
|
||||
### Configuration
|
||||
|
||||
Server mappings are defined in the `SERVER_MAPPINGS` associative array in [driveAtlas.sh](driveAtlas.sh):
|
||||
|
||||
```bash
|
||||
declare -A SERVER_MAPPINGS=(
|
||||
["compute-storage-01"]="
|
||||
pci-0000:0d:00.0-ata-2 1
|
||||
pci-0000:0d:00.0-ata-1 2
|
||||
pci-0000:01:00.0-sas-phy6-lun-0 5
|
||||
pci-0000:0e:00.0-nvme-1 m2-1
|
||||
"
|
||||
)
|
||||
```
|
||||
|
||||
## Setting Up a New Server
|
||||
|
||||
### Step 1: Run Diagnostic Script
|
||||
|
||||
First, gather PCI path information:
|
||||
|
||||
```bash
|
||||
bash diagnose-drives.sh > server-diagnostic.txt
|
||||
```
|
||||
|
||||
This will show all available PCI paths and their associated drives.
|
||||
|
||||
### Step 2: Physical Bay Identification
|
||||
|
||||
For each populated drive bay:
|
||||
|
||||
1. Note the physical bay number (labeled on chassis)
|
||||
2. Run the main script to see serial numbers
|
||||
3. Match visible serial numbers on drives to the output
|
||||
4. Map PCI paths to bay numbers
|
||||
|
||||
**Pro tip:** The script shows serial numbers - compare them to visible labels on drive trays to verify physical locations.
|
||||
|
||||
### Step 3: Create Mapping
|
||||
|
||||
Add a new entry to `SERVER_MAPPINGS` in [driveAtlas.sh](driveAtlas.sh):
|
||||
|
||||
```bash
|
||||
["your-hostname"]="
|
||||
pci-0000:XX:XX.X-ata-1 1
|
||||
pci-0000:XX:XX.X-ata-2 2
|
||||
# ... etc
|
||||
"
|
||||
```
|
||||
|
||||
Also add the chassis type to `CHASSIS_TYPES`:
|
||||
|
||||
```bash
|
||||
["your-hostname"]="10bay"
|
||||
```
|
||||
|
||||
### Step 4: Test
|
||||
|
||||
Run the main script and verify the layout matches your physical configuration:
|
||||
|
||||
```bash
|
||||
bash driveAtlas.sh
|
||||
```
|
||||
|
||||
Use debug mode to see the mappings:
|
||||
|
||||
```bash
|
||||
DEBUG=1 bash driveAtlas.sh
|
||||
```
|
||||
|
||||
## Output Columns Explained
|
||||
|
||||
| Column | Description |
|
||||
|--------|-------------|
|
||||
| **BAY** | Physical bay number (1-10, m2-1, etc.) |
|
||||
| **DEVICE** | Linux device name (/dev/sdX, /dev/nvmeXnY) |
|
||||
| **SIZE** | Drive capacity |
|
||||
| **TYPE** | SSD or HDD (detected via SMART) |
|
||||
| **TEMP** | Current temperature from SMART |
|
||||
| **HEALTH** | SMART health status (✓ = passed, ✗ = failed) |
|
||||
| **MODEL** | Drive model number |
|
||||
| **SERIAL** | Drive serial number (for physical verification) |
|
||||
| **CEPH OSD** | Ceph OSD ID if drive hosts an OSD |
|
||||
| **STATUS** | Ceph OSD status (up/in, down/out, etc.) |
|
||||
| **USAGE** | Mount point or "BOOT" for system drive |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Drive shows as EMPTY but is physically present
|
||||
|
||||
- Check if the drive is detected: `ls -la /dev/disk/by-path/`
|
||||
- Verify the PCI path in the mapping matches the actual path
|
||||
- Ensure the drive has power and SATA/power connections are secure
|
||||
|
||||
### PCI paths don't match between servers with "identical" hardware
|
||||
|
||||
- Even identical motherboards can have different PCI addressing
|
||||
- BIOS settings can affect PCI enumeration
|
||||
- HBA installation in different PCIe slots changes addresses
|
||||
- Cable routing to different SATA ports changes the ata-N or phy-N number
|
||||
|
||||
### SMART data not showing
|
||||
|
||||
- Ensure `smartmontools` is installed: `sudo apt install smartmontools`
|
||||
- Some drives don't report temperature
|
||||
- USB-connected drives may not support SMART
|
||||
- Run `sudo smartctl -i /dev/sdX` manually to check
|
||||
|
||||
### Ceph OSD status shows "unknown/out"
|
||||
|
||||
- Ensure `ceph` and `ceph-volume` commands are available
|
||||
- Check if the Ceph cluster is healthy: `ceph -s`
|
||||
- Verify OSD is actually up: `ceph osd tree`
|
||||
|
||||
### Serial numbers don't match visible labels
|
||||
|
||||
- Some manufacturers use different serials for SMART vs. physical labels
|
||||
- Cross-reference by drive model and size
|
||||
- Use the removal method: power down, remove drive, check which bay becomes EMPTY
|
||||
|
||||
## Files
|
||||
|
||||
- [driveAtlas.sh](driveAtlas.sh) - Main script
|
||||
- [diagnose-drives.sh](diagnose-drives.sh) - PCI path diagnostic tool
|
||||
- [README.md](README.md) - This file
|
||||
- [CLAUDE.md](CLAUDE.md) - AI-assisted development notes
|
||||
- [todo.txt](todo.txt) - Development notes and task tracking
|
||||
|
||||
## Contributing
|
||||
|
||||
When adding support for a new server:
|
||||
|
||||
1. Run `diagnose-drives.sh` and save output
|
||||
2. Physically label or identify drives by serial number
|
||||
3. Create mapping in `SERVER_MAPPINGS`
|
||||
4. Test thoroughly
|
||||
5. Document any unique hardware configurations
|
||||
6. Update this README
|
||||
|
||||
## Technical Notes
|
||||
|
||||
### Why PCI Paths?
|
||||
|
||||
Linux device names (sda, sdb, etc.) are assigned in discovery order, which can change:
|
||||
- Between kernel versions
|
||||
- After BIOS updates
|
||||
- When drives are added/removed
|
||||
- Due to timing variations at boot
|
||||
|
||||
PCI paths are deterministic and based on physical hardware topology.
|
||||
|
||||
### Bay Numbering Conventions
|
||||
|
||||
- **10-bay chassis:** Bays numbered 1-10 (left to right, typically)
|
||||
- **M.2 slots:** Labeled as `m2-1`, `m2-2`, etc.
|
||||
- **USB drives:** Labeled as `usb1`, `usb2`, etc.
|
||||
- **Large1:** Grid numbering 1-15 (documented in mapping)
|
||||
|
||||
### Ceph Integration
|
||||
|
||||
The script automatically detects Ceph OSDs using:
|
||||
1. `ceph-volume lvm list` to map devices to OSD IDs
|
||||
2. `ceph osd tree` to get up/down and in/out status
|
||||
|
||||
Status format: `up/in` means OSD is running and participating in the cluster.
|
||||
59
diagnose-drives.sh
Normal file
59
diagnose-drives.sh
Normal file
@@ -0,0 +1,59 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Drive Atlas Diagnostic Script
|
||||
# Run this on each server to gather PCI path information
|
||||
|
||||
echo "=== Server Information ==="
|
||||
echo "Hostname: $(hostname)"
|
||||
echo "Date: $(date)"
|
||||
echo ""
|
||||
|
||||
echo "=== All /dev/disk/by-path/ entries ==="
|
||||
ls -la /dev/disk/by-path/ | grep -v "part" | sort
|
||||
echo ""
|
||||
|
||||
echo "=== Organized by PCI Address ==="
|
||||
for path in /dev/disk/by-path/*; do
|
||||
if [ -L "$path" ]; then
|
||||
# Skip partitions
|
||||
if [[ "$path" =~ -part[0-9]+$ ]]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
basename_path=$(basename "$path")
|
||||
target=$(readlink -f "$path")
|
||||
device=$(basename "$target")
|
||||
|
||||
echo "Path: $basename_path"
|
||||
echo " -> Device: $device"
|
||||
|
||||
# Try to get size
|
||||
if [ -b "$target" ]; then
|
||||
size=$(lsblk -d -n -o SIZE "$target" 2>/dev/null)
|
||||
echo " -> Size: $size"
|
||||
fi
|
||||
|
||||
# Try to get SMART info for model
|
||||
if command -v smartctl >/dev/null 2>&1; then
|
||||
model=$(sudo smartctl -i "$target" 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
|
||||
if [ -n "$model" ]; then
|
||||
echo " -> Model: $model"
|
||||
fi
|
||||
fi
|
||||
echo ""
|
||||
fi
|
||||
done
|
||||
|
||||
echo "=== PCI Devices with Storage Controllers ==="
|
||||
lspci | grep -i "storage\|raid\|sata\|sas\|nvme"
|
||||
echo ""
|
||||
|
||||
echo "=== Current Block Devices ==="
|
||||
lsblk -d -o NAME,SIZE,TYPE,TRAN | grep -v "rbd\|loop"
|
||||
echo ""
|
||||
|
||||
echo "=== Recommendations ==="
|
||||
echo "1. Note the PCI addresses (e.g., 0c:00.0) of your storage controllers"
|
||||
echo "2. For each bay, physically identify which drive is in it"
|
||||
echo "3. Match the PCI path pattern to the bay number"
|
||||
echo "4. Example: pci-0000:0c:00.0-ata-1 might be bay 1 on controller 0c:00.0"
|
||||
667
driveAtlas.sh
667
driveAtlas.sh
@@ -1,15 +1,270 @@
|
||||
#!/bin/bash
|
||||
|
||||
get_device_info() {
|
||||
local pci_addr=$1
|
||||
local info=$(lspci -s "$pci_addr")
|
||||
echo "$info"
|
||||
#==============================================================================
|
||||
# Drive Atlas - Server Drive Mapping Tool
|
||||
# Maps physical drive bays to logical device names using PCI paths
|
||||
#==============================================================================
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
# Chassis Type Definitions
|
||||
# These define the physical layout and display formatting for each chassis type
|
||||
#------------------------------------------------------------------------------
|
||||
|
||||
generate_10bay_layout() {
|
||||
local hostname=$1
|
||||
build_drive_map
|
||||
|
||||
# Fixed width for consistent box drawing (fits device names like "nvme0n1")
|
||||
local drive_width=10
|
||||
|
||||
# Main chassis section
|
||||
printf "┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐\n"
|
||||
printf "│ %-126s │\n" "$hostname - Sliger CX4712 (10x 3.5\" Hot-swap)"
|
||||
printf "│ │\n"
|
||||
|
||||
# Show storage controllers
|
||||
printf "│ Storage Controllers: │\n"
|
||||
while IFS= read -r ctrl; do
|
||||
[[ -n "$ctrl" ]] && printf "│ %-126s│\n" "$ctrl"
|
||||
done < <(get_storage_controllers)
|
||||
printf "│ │\n"
|
||||
|
||||
# M.2 NVMe slot if present
|
||||
if [[ -n "${DRIVE_MAP[m2-1]}" ]]; then
|
||||
printf "│ M.2 NVMe: %-10s │\n" "${DRIVE_MAP[m2-1]}"
|
||||
printf "│ │\n"
|
||||
fi
|
||||
|
||||
printf "│ Front Hot-swap Bays: │\n"
|
||||
printf "│ │\n"
|
||||
|
||||
# Bay top borders
|
||||
printf "│ "
|
||||
for bay in {1..10}; do
|
||||
printf "┌──────────┐ "
|
||||
done
|
||||
printf " │\n"
|
||||
|
||||
# Bay contents
|
||||
printf "│ "
|
||||
for bay in {1..10}; do
|
||||
printf "│%-2d:%-7s│ " "$bay" "${DRIVE_MAP[$bay]:-EMPTY}"
|
||||
done
|
||||
printf " │\n"
|
||||
|
||||
# Bay bottom borders
|
||||
printf "│ "
|
||||
for bay in {1..10}; do
|
||||
printf "└──────────┘ "
|
||||
done
|
||||
printf " │\n"
|
||||
|
||||
printf "└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘\n"
|
||||
}
|
||||
|
||||
get_drive_details() {
|
||||
local device=$1
|
||||
local size=$(lsblk -d -o NAME,SIZE | grep "$device" | awk '{print $2}')
|
||||
echo "$size"
|
||||
generate_micro_layout() {
|
||||
local hostname=$1
|
||||
build_drive_map
|
||||
|
||||
# Check for eMMC storage
|
||||
local emmc_device=""
|
||||
if [[ -b /dev/mmcblk0 ]]; then
|
||||
emmc_device="mmcblk0"
|
||||
fi
|
||||
|
||||
printf "┌─────────────────────────────────────────────────────────────┐\n"
|
||||
printf "│ %-57s │\n" "$hostname - Micro SBC"
|
||||
printf "│ │\n"
|
||||
printf "│ Storage Controllers: │\n"
|
||||
while IFS= read -r ctrl; do
|
||||
[[ -n "$ctrl" ]] && printf "│ %-57s│\n" "$ctrl"
|
||||
done < <(get_storage_controllers)
|
||||
printf "│ │\n"
|
||||
|
||||
# Show eMMC if present
|
||||
if [[ -n "$emmc_device" ]]; then
|
||||
local emmc_size=$(lsblk -d -n -o SIZE "/dev/$emmc_device" 2>/dev/null | xargs)
|
||||
printf "│ ┌─────────────────────────────────────────────────────┐ │\n"
|
||||
printf "│ │ Onboard eMMC: %-10s (%s) │ │\n" "$emmc_device" "$emmc_size"
|
||||
printf "│ └─────────────────────────────────────────────────────┘ │\n"
|
||||
printf "│ │\n"
|
||||
fi
|
||||
|
||||
printf "│ SATA Ports (rear): │\n"
|
||||
printf "│ ┌──────────────┐ ┌──────────────┐ │\n"
|
||||
printf "│ │ 1: %-9s │ │ 2: %-9s │ │\n" "${DRIVE_MAP[1]:-EMPTY}" "${DRIVE_MAP[2]:-EMPTY}"
|
||||
printf "│ └──────────────┘ └──────────────┘ │\n"
|
||||
printf "└─────────────────────────────────────────────────────────────┘\n"
|
||||
}
|
||||
|
||||
generate_large1_layout() {
|
||||
local hostname=$1
|
||||
build_drive_map
|
||||
|
||||
# large1 has 3 stacks of 5 bays at front (15 total) + 2 M.2 slots
|
||||
# Physical bay mapping TBD - current mapping is by controller order
|
||||
printf "┌─────────────────────────────────────────────────────────────────────────┐\n"
|
||||
printf "│ %-69s │\n" "$hostname - Rosewill RSV-L4500U (15x 3.5\" Bays)"
|
||||
printf "│ │\n"
|
||||
printf "│ Storage Controllers: │\n"
|
||||
while IFS= read -r ctrl; do
|
||||
[[ -n "$ctrl" ]] && printf "│ %-69s│\n" "$ctrl"
|
||||
done < <(get_storage_controllers)
|
||||
printf "│ │\n"
|
||||
printf "│ M.2 NVMe: M1: %-10s M2: %-10s │\n" "${DRIVE_MAP[m2-1]:-EMPTY}" "${DRIVE_MAP[m2-2]:-EMPTY}"
|
||||
printf "│ │\n"
|
||||
printf "│ Front Bays (3 stacks x 5 rows): [Bay mapping TBD] │\n"
|
||||
printf "│ Stack A Stack B Stack C │\n"
|
||||
printf "│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │\n"
|
||||
printf "│ │1:%-8s│ │2:%-8s│ │3:%-8s│ │\n" "${DRIVE_MAP[1]:-EMPTY}" "${DRIVE_MAP[2]:-EMPTY}" "${DRIVE_MAP[3]:-EMPTY}"
|
||||
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ │\n"
|
||||
printf "│ │4:%-8s│ │5:%-8s│ │6:%-8s│ │\n" "${DRIVE_MAP[4]:-EMPTY}" "${DRIVE_MAP[5]:-EMPTY}" "${DRIVE_MAP[6]:-EMPTY}"
|
||||
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ │\n"
|
||||
printf "│ │7:%-8s│ │8:%-8s│ │9:%-8s│ │\n" "${DRIVE_MAP[7]:-EMPTY}" "${DRIVE_MAP[8]:-EMPTY}" "${DRIVE_MAP[9]:-EMPTY}"
|
||||
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ │\n"
|
||||
printf "│ │10:%-7s│ │11:%-7s│ │12:%-7s│ │\n" "${DRIVE_MAP[10]:-EMPTY}" "${DRIVE_MAP[11]:-EMPTY}" "${DRIVE_MAP[12]:-EMPTY}"
|
||||
printf "│ ├──────────┤ ├──────────┤ ├──────────┤ │\n"
|
||||
printf "│ │13:%-7s│ │14:%-7s│ │15:%-7s│ │\n" "${DRIVE_MAP[13]:-EMPTY}" "${DRIVE_MAP[14]:-EMPTY}" "${DRIVE_MAP[15]:-EMPTY}"
|
||||
printf "│ └──────────┘ └──────────┘ └──────────┘ │\n"
|
||||
printf "└─────────────────────────────────────────────────────────────────────────┘\n"
|
||||
}
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
# Server-Specific Drive Mappings
|
||||
# Maps PCI paths to physical bay numbers for each server
|
||||
# Format: "pci-path bay-number"
|
||||
#------------------------------------------------------------------------------
|
||||
|
||||
declare -A SERVER_MAPPINGS=(
|
||||
# compute-storage-01 (formerly medium2)
|
||||
# Motherboard: B650D4U3-2Q/BCM with AMD SATA controller
|
||||
# HBA: LSI SAS3008 at 01:00.0 (mini-SAS HD ports)
|
||||
# Cable mapping from user notes:
|
||||
# - Mobo SATA: top-right=bay1, bottom-right=bay2, bottom-left=bay3, top-left=bay4
|
||||
# - HBA bottom mini-SAS: bays 5,6,7,8
|
||||
# - HBA top mini-SAS: bays 9,10
|
||||
["compute-storage-01"]="
|
||||
pci-0000:0d:00.0-ata-2 1
|
||||
pci-0000:0d:00.0-ata-1 2
|
||||
pci-0000:0d:00.0-ata-3 3
|
||||
pci-0000:0d:00.0-ata-4 4
|
||||
pci-0000:01:00.0-sas-phy6-lun-0 5
|
||||
pci-0000:01:00.0-sas-phy7-lun-0 6
|
||||
pci-0000:01:00.0-sas-phy5-lun-0 7
|
||||
pci-0000:01:00.0-sas-phy2-lun-0 8
|
||||
pci-0000:01:00.0-sas-phy4-lun-0 9
|
||||
pci-0000:01:00.0-sas-phy3-lun-0 10
|
||||
pci-0000:0e:00.0-nvme-1 m2-1
|
||||
"
|
||||
|
||||
# compute-storage-gpu-01
|
||||
# Motherboard: ASUS PRIME B550-PLUS with AMD SATA controller at 02:00.1
|
||||
# 5 SATA ports + 1 M.2 NVMe slot
|
||||
# sdf is USB/card reader - not mapped
|
||||
["compute-storage-gpu-01"]="
|
||||
pci-0000:02:00.1-ata-1 1
|
||||
pci-0000:02:00.1-ata-2 2
|
||||
pci-0000:02:00.1-ata-3 3
|
||||
pci-0000:02:00.1-ata-4 4
|
||||
pci-0000:02:00.1-ata-5 5
|
||||
pci-0000:0c:00.0-nvme-1 m2-1
|
||||
"
|
||||
|
||||
# storage-01
|
||||
# Motherboard: ASRock A320M-HDV R4.0 with AMD SATA controller at 02:00.1
|
||||
# 4 SATA ports used (ata-1, ata-2, ata-5, ata-6) - ata-3/4 empty
|
||||
["storage-01"]="
|
||||
pci-0000:02:00.1-ata-1 1
|
||||
pci-0000:02:00.1-ata-2 2
|
||||
pci-0000:02:00.1-ata-5 3
|
||||
pci-0000:02:00.1-ata-6 4
|
||||
"
|
||||
|
||||
# large1
|
||||
# Custom tower with multiple controllers:
|
||||
# - HBA: LSI SAS2008 at 10:00.0 (7 drives)
|
||||
# - AMD SATA at 16:00.1 (3 drives)
|
||||
# - ASMedia SATA at 25:00.0 (2 drives)
|
||||
# - 2x NVMe slots
|
||||
["large1"]="
|
||||
pci-0000:10:00.0-sas-phy0-lun-0 1
|
||||
pci-0000:10:00.0-sas-phy1-lun-0 2
|
||||
pci-0000:10:00.0-sas-phy3-lun-0 3
|
||||
pci-0000:10:00.0-sas-phy4-lun-0 4
|
||||
pci-0000:10:00.0-sas-phy5-lun-0 5
|
||||
pci-0000:10:00.0-sas-phy6-lun-0 6
|
||||
pci-0000:10:00.0-sas-phy7-lun-0 7
|
||||
pci-0000:16:00.1-ata-3 8
|
||||
pci-0000:16:00.1-ata-7 9
|
||||
pci-0000:16:00.1-ata-8 10
|
||||
pci-0000:25:00.0-ata-1 11
|
||||
pci-0000:25:00.0-ata-2 12
|
||||
pci-0000:2a:00.0-nvme-1 m2-1
|
||||
pci-0000:26:00.0-nvme-1 m2-2
|
||||
"
|
||||
|
||||
# micro1
|
||||
# ZimaBoard 832 - Single board computer
|
||||
# 2 SATA ports on rear (currently unused)
|
||||
# Boot from onboard eMMC (mmcblk0)
|
||||
# SATA controller at 00:12.0
|
||||
["micro1"]="
|
||||
"
|
||||
|
||||
# monitor-02
|
||||
# ZimaBoard 832 - Single board computer
|
||||
# 2 SATA ports on rear (currently unused)
|
||||
# Boot from onboard eMMC (mmcblk0)
|
||||
# SATA controller would be at a specific PCI address when drives connected
|
||||
["monitor-02"]="
|
||||
"
|
||||
)
|
||||
|
||||
declare -A CHASSIS_TYPES=(
|
||||
["compute-storage-01"]="10bay"
|
||||
["compute-storage-gpu-01"]="10bay"
|
||||
["storage-01"]="10bay"
|
||||
["large1"]="large1"
|
||||
["micro1"]="micro" # ZimaBoard 832
|
||||
["monitor-02"]="micro" # ZimaBoard 832
|
||||
)
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
# Core Functions
|
||||
#------------------------------------------------------------------------------
|
||||
|
||||
get_storage_controllers() {
|
||||
# Returns a formatted list of storage controllers (HBAs, SATA, NVMe)
|
||||
lspci 2>/dev/null | grep -iE "SAS|SATA|RAID|Mass storage|NVMe" | while read -r line; do
|
||||
pci_addr=$(echo "$line" | awk '{print $1}')
|
||||
# Get short description (strip PCI address)
|
||||
desc=$(echo "$line" | sed 's/^[0-9a-f:.]\+ //')
|
||||
echo " $pci_addr: $desc"
|
||||
done
|
||||
}
|
||||
|
||||
build_drive_map() {
|
||||
local host=$(hostname)
|
||||
declare -A drive_map
|
||||
|
||||
local mapping=${SERVER_MAPPINGS[$host]}
|
||||
|
||||
if [[ -n "$mapping" ]]; then
|
||||
while read -r path slot; do
|
||||
[[ -z "$path" || -z "$slot" ]] && continue
|
||||
|
||||
if [[ -L "/dev/disk/by-path/$path" ]]; then
|
||||
local drive=$(readlink -f "/dev/disk/by-path/$path" | sed 's/.*\///')
|
||||
drive_map[$slot]=$drive
|
||||
fi
|
||||
done <<< "$mapping"
|
||||
fi
|
||||
|
||||
# Make drive_map available globally
|
||||
declare -g -A DRIVE_MAP=()
|
||||
for key in "${!drive_map[@]}"; do
|
||||
DRIVE_MAP[$key]=${drive_map[$key]}
|
||||
done
|
||||
}
|
||||
|
||||
get_drive_smart_info() {
|
||||
@@ -18,300 +273,148 @@ get_drive_smart_info() {
|
||||
local temp=$(echo "$smart_info" | grep "Temperature" | awk '{print $10}' | head -1)
|
||||
local type=$(echo "$smart_info" | grep "Rotation Rate" | grep -q "Solid State" && echo "SSD" || echo "HDD")
|
||||
local health=$(echo "$smart_info" | grep "SMART overall-health" | grep -q "PASSED" && echo "✓" || echo "✗")
|
||||
local model=$(echo "$smart_info" | grep "Device Model" | cut -d: -f2 | xargs)
|
||||
|
||||
echo "$type|$temp°C|$health|$model"
|
||||
local model=$(echo "$smart_info" | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
|
||||
local serial=$(echo "$smart_info" | grep "Serial Number" | awk '{print $3}')
|
||||
|
||||
echo "$type|$temp°C|$health|$model|$serial"
|
||||
}
|
||||
|
||||
get_drives_info() {
|
||||
local path="/dev/disk/by-path"
|
||||
for drive in "$path"/*; do
|
||||
if [ -L "$drive" ]; then
|
||||
echo "$(basename "$drive") $(readlink -f "$drive")"
|
||||
fi
|
||||
done
|
||||
}
|
||||
#------------------------------------------------------------------------------
|
||||
# Main Display Logic
|
||||
#------------------------------------------------------------------------------
|
||||
|
||||
declare -A DRIVE_MAPPINGS=(
|
||||
["medium2"]="
|
||||
pci-0000:0c:00.0-ata-3 5
|
||||
pci-0000:0c:00.0-ata-4 6
|
||||
pci-0000:0c:00.0-ata-1 3
|
||||
pci-0000:0c:00.0-ata-2 4
|
||||
pci-0000:0d:00.0-nvme-1 11
|
||||
pci-0000:0b:00.0-usb-0:3:1.0-scsi-0:0:0:0 usb1
|
||||
pci-0000:0b:00.0-usb-0:4:1.0-scsi-0:0:0:0 usb2
|
||||
"
|
||||
)
|
||||
|
||||
build_drive_map() {
|
||||
local host=$(hostname)
|
||||
declare -A drive_map
|
||||
|
||||
echo "DEBUG: Current host: $host"
|
||||
echo "DEBUG: Mapping found: ${DRIVE_MAPPINGS[$host]}"
|
||||
|
||||
local mapping=${DRIVE_MAPPINGS[$host]}
|
||||
|
||||
if [[ -n "$mapping" ]]; then
|
||||
while read -r path slot; do
|
||||
[[ -z "$path" || -z "$slot" ]] && continue
|
||||
|
||||
echo "DEBUG: Checking path: $path for slot: $slot"
|
||||
|
||||
if [[ -L "/dev/disk/by-path/$path" ]]; then
|
||||
local drive=$(readlink -f "/dev/disk/by-path/$path" | sed 's/.*\///')
|
||||
drive_map[$slot]=$drive
|
||||
echo "DEBUG: Mapped slot $slot to drive $drive"
|
||||
fi
|
||||
done <<< "$mapping"
|
||||
fi
|
||||
|
||||
# Make drive_map available globally
|
||||
declare -g -A DRIVE_MAP=()
|
||||
for key in "${!drive_map[@]}"; do
|
||||
DRIVE_MAP[$key]=${drive_map[$key]}
|
||||
echo "DEBUG: Final mapping - slot $key: ${drive_map[$key]}"
|
||||
done
|
||||
}
|
||||
|
||||
# Define the ASCII art maps
|
||||
large1='''
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ large1 │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────┐ │
|
||||
│ │ Motherboard │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──┐┌──┐ │ │
|
||||
│ │ │M1││M2│ │ │
|
||||
│ │ └──┘└──┘ │ │
|
||||
│ └──────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ 1 │ │ 2 │ │ 3 │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ 4 │ │ 5 │ │ 6 │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ 7 │ │ 8 │ │ 9 │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
'''
|
||||
|
||||
compute-storage-gpu-01='''
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ │
|
||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
|
||||
│ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │
|
||||
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
|
||||
│ │
|
||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
|
||||
│ │ 5 │ │ 6 │ │ 7 │ │ 8 │ │
|
||||
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
|
||||
│ │
|
||||
│ │
|
||||
│ │
|
||||
│ ┌─────────┐ │
|
||||
│ compute-storage-gpu-01 │ 9 │ │
|
||||
│ └─────────┘ │
|
||||
│ ┌─────────┐ │
|
||||
│ │ 10 │ │
|
||||
│ └─────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
'''
|
||||
|
||||
generate_medium2_layout() {
|
||||
build_drive_map
|
||||
|
||||
# Calculate max width needed for drive names
|
||||
max_width=0
|
||||
for bay in {1..10} "11" "usb1" "usb2"; do
|
||||
drive_text="${DRIVE_MAP[$bay]:-EMPTY}"
|
||||
text_len=$((${#bay} + 1 + ${#drive_text}))
|
||||
[[ $text_len -gt $max_width ]] && max_width=$text_len
|
||||
done
|
||||
|
||||
# Add padding for box borders
|
||||
box_width=$((max_width + 4))
|
||||
|
||||
# Create box drawing elements
|
||||
h_line=$(printf '%*s' "$box_width" '' | tr ' ' '─')
|
||||
|
||||
# USB Section
|
||||
printf "\n External USB [0b:00.0]\n"
|
||||
printf " ┌%s┐ ┌%s┐\n" "$h_line" "$h_line"
|
||||
printf " │ %-${max_width}s │ │ %-${max_width}s │\n" "${DRIVE_MAP[usb1]:-EMPTY}" "${DRIVE_MAP[usb2]:-EMPTY}"
|
||||
printf " └%s┘ └%s┘\n\n" "$h_line" "$h_line"
|
||||
|
||||
# Main chassis section
|
||||
printf "┌──────────────────────────────────────────────────────────────┐\n"
|
||||
printf "│ B650D4U3-2Q/BCM │\n"
|
||||
printf "│ │\n"
|
||||
printf "│ NVMe [0d:00.0] Bay 11 │\n"
|
||||
printf "│ ┌%s┐ │\n" "$h_line"
|
||||
printf "│ │ %-${max_width}s │ │\n" "${DRIVE_MAP[11]:-EMPTY}"
|
||||
printf "│ └%s┘ │\n" "$h_line"
|
||||
printf "│ │\n"
|
||||
printf "│ Front Hot-swap Bays [0c:00.0] │\n"
|
||||
|
||||
# Create bay rows
|
||||
printf "│ "
|
||||
for bay in {1..10}; do
|
||||
printf "┌%s┐" "$h_line"
|
||||
done
|
||||
printf " │\n│ "
|
||||
|
||||
for bay in {1..10}; do
|
||||
printf "│%-2d:%-${max_width}s │" "$bay" "${DRIVE_MAP[$bay]:-EMPTY}"
|
||||
done
|
||||
printf " │\n│ "
|
||||
|
||||
for bay in {1..10}; do
|
||||
printf "└%s┘" "$h_line"
|
||||
done
|
||||
printf " │\n"
|
||||
|
||||
printf "└──────────────────────────────────────────────────────────────┘\n"
|
||||
}
|
||||
|
||||
microGeneric='''
|
||||
┌─┐ ┌─┐
|
||||
┌└─┘──└─┘┐
|
||||
│ 1 2 │
|
||||
│ │
|
||||
│ │
|
||||
│ │
|
||||
│ │
|
||||
│ │
|
||||
│ │
|
||||
│ │
|
||||
└────────┘
|
||||
'''
|
||||
|
||||
# Get the hostname
|
||||
HOSTNAME=$(hostname)
|
||||
CHASSIS_TYPE=${CHASSIS_TYPES[$HOSTNAME]:-"unknown"}
|
||||
|
||||
# ASCII art based on hostname
|
||||
case "$HOSTNAME" in
|
||||
# Display chassis layout
|
||||
case "$CHASSIS_TYPE" in
|
||||
"10bay")
|
||||
generate_10bay_layout "$HOSTNAME"
|
||||
;;
|
||||
"large1")
|
||||
echo -e "$large1"
|
||||
generate_large1_layout "$HOSTNAME"
|
||||
;;
|
||||
"compute-storage-gpu-01")
|
||||
echo -e "$compute-storage-gpu-01"
|
||||
;;
|
||||
"medium2")
|
||||
generate_medium2_layout
|
||||
;;
|
||||
"micro1" | "monitor-02")
|
||||
echo -e "$microGeneric"
|
||||
"micro")
|
||||
generate_micro_layout "$HOSTNAME"
|
||||
;;
|
||||
*)
|
||||
echo -e "No ASCII map defined for this hostname."
|
||||
echo "┌─────────────────────────────────────────────────────────┐"
|
||||
echo "│ Unknown server: $HOSTNAME"
|
||||
echo "│ No chassis mapping defined yet"
|
||||
echo "│ Run diagnose-drives.sh to gather PCI path information"
|
||||
echo "└─────────────────────────────────────────────────────────┘"
|
||||
;;
|
||||
esac
|
||||
|
||||
map_drives_to_layout() {
|
||||
local server_type=$1
|
||||
case $server_type in
|
||||
"large1")
|
||||
for i in {1..9}; do
|
||||
local drive_info=$(get_drive_info_for_position $i)
|
||||
echo "Position $i: $drive_info"
|
||||
done
|
||||
;;
|
||||
esac
|
||||
}
|
||||
#------------------------------------------------------------------------------
|
||||
# Drive Details Section
|
||||
#------------------------------------------------------------------------------
|
||||
|
||||
DRIVE_PATHS=$(get_drives_info | awk '{print $1, $2}')
|
||||
echo -e "\n=== Drive Details with SMART Status (by Bay Position) ==="
|
||||
printf "%-5s %-15s %-10s %-8s %-8s %-8s %-30s %-20s %-12s %-10s %-10s\n" "BAY" "DEVICE" "SIZE" "TYPE" "TEMP" "HEALTH" "MODEL" "SERIAL" "CEPH OSD" "STATUS" "USAGE"
|
||||
echo "----------------------------------------------------------------------------------------------------------------------------------------------------"
|
||||
|
||||
# Initialize array for "not found" messages
|
||||
not_found=()
|
||||
|
||||
|
||||
echo -e "\n=== Drive Details with SMART Status ===\n"
|
||||
printf "%-15s %-10s %-8s %-8s %-20s %-30s\n" "DEVICE" "SIZE" "TYPE" "TEMP" "HEALTH" "MODEL"
|
||||
echo "--------------------------------------------------------------------------------"
|
||||
|
||||
# For SATA drives
|
||||
lsblk -d -o NAME | grep -v "nvme" | grep -v "rbd" | while read device; do
|
||||
size=$(get_drive_details "$device")
|
||||
smart_info=$(get_drive_smart_info "$device")
|
||||
IFS='|' read -r type temp health model <<< "$smart_info"
|
||||
printf "%-15s %-10s %-8s %-8s %-20s %-30s\n" "/dev/$device" "$size" "$type" "$temp" "$health" "$model"
|
||||
# Build reverse map: device -> bay
|
||||
declare -A DEVICE_TO_BAY
|
||||
for bay in "${!DRIVE_MAP[@]}"; do
|
||||
device="${DRIVE_MAP[$bay]}"
|
||||
if [[ -n "$device" && "$device" != "EMPTY" ]]; then
|
||||
DEVICE_TO_BAY[$device]=$bay
|
||||
fi
|
||||
done
|
||||
|
||||
# For NVMe drives
|
||||
if [ -n "$nvme_drives" ]; then
|
||||
while read -r line; do
|
||||
device=$(echo "$line" | awk '{print $1}' | sed 's/.*\///')
|
||||
size=$(echo "$line" | awk '{print $6}')
|
||||
# Sort drives by bay position
|
||||
for bay in $(printf '%s\n' "${!DRIVE_MAP[@]}" | grep -E '^[0-9]+$' | sort -n); do
|
||||
device="${DRIVE_MAP[$bay]}"
|
||||
if [[ -n "$device" && "$device" != "EMPTY" && -b "/dev/$device" ]]; then
|
||||
size=$(lsblk -d -n -o SIZE "/dev/$device" 2>/dev/null)
|
||||
smart_info=$(get_drive_smart_info "$device")
|
||||
IFS='|' read -r type temp health model <<< "$smart_info"
|
||||
printf "%-15s %-10s %-8s %-8s %-20s %-30s\n" "/dev/$device" "$size" "$type" "$temp" "$health" "$model"
|
||||
done <<< "$nvme_drives"
|
||||
IFS='|' read -r type temp health model serial <<< "$smart_info"
|
||||
|
||||
# Check for Ceph OSD
|
||||
osd_id=$(ceph-volume lvm list 2>/dev/null | grep -B 20 "/dev/$device" | grep "osd id" | awk '{print "osd."$3}' | head -1)
|
||||
|
||||
# Get Ceph status if OSD exists
|
||||
ceph_status="-"
|
||||
if [[ -n "$osd_id" ]]; then
|
||||
# Get in/out and up/down status from ceph osd tree
|
||||
osd_num=$(echo "$osd_id" | sed 's/osd\.//')
|
||||
# Parse ceph osd tree output - column 5 is STATUS (up/down), column 6 is REWEIGHT (1.0 = in, 0 = out)
|
||||
tree_line=$(ceph osd tree 2>/dev/null | grep -E "^\s*${osd_num}\s+" | grep "osd.${osd_num}")
|
||||
up_status=$(echo "$tree_line" | awk '{print $5}')
|
||||
reweight=$(echo "$tree_line" | awk '{print $6}')
|
||||
|
||||
# Default to unknown if we can't parse
|
||||
[[ -z "$up_status" ]] && up_status="unknown"
|
||||
[[ -z "$reweight" ]] && reweight="0"
|
||||
|
||||
# Determine in/out based on reweight (1.0 = in, 0 = out)
|
||||
if (( $(echo "$reweight > 0" | bc -l 2>/dev/null || echo 0) )); then
|
||||
in_status="in"
|
||||
else
|
||||
in_status="out"
|
||||
fi
|
||||
|
||||
ceph_status="${up_status}/${in_status}"
|
||||
else
|
||||
osd_id="-"
|
||||
fi
|
||||
|
||||
# Check if boot drive
|
||||
usage="-"
|
||||
if mount | grep -q "^/dev/${device}"; then
|
||||
mount_point=$(mount | grep "^/dev/${device}" | awk '{print $3}' | head -1)
|
||||
if [[ "$mount_point" == "/" ]]; then
|
||||
usage="BOOT"
|
||||
else
|
||||
usage="$mount_point"
|
||||
fi
|
||||
fi
|
||||
|
||||
printf "%-5s %-15s %-10s %-8s %-8s %-8s %-30s %-20s %-12s %-10s %-10s\n" "$bay" "/dev/$device" "$size" "$type" "$temp" "$health" "$model" "$serial" "$osd_id" "$ceph_status" "$usage"
|
||||
fi
|
||||
done
|
||||
|
||||
# NVMe drives
|
||||
nvme_devices=$(lsblk -d -n -o NAME,SIZE | grep "^nvme" 2>/dev/null)
|
||||
if [ -n "$nvme_devices" ]; then
|
||||
echo -e "\n=== NVMe Drives ==="
|
||||
printf "%-15s %-10s %-10s %-40s %-25s\n" "DEVICE" "SIZE" "TYPE" "MODEL" "SERIAL"
|
||||
echo "------------------------------------------------------------------------------------------------------"
|
||||
echo "$nvme_devices" | while read -r name size; do
|
||||
device="/dev/$name"
|
||||
# Get model and serial from smartctl for accuracy
|
||||
smart_info=$(sudo smartctl -i "$device" 2>/dev/null)
|
||||
model=$(echo "$smart_info" | grep "Model Number" | cut -d: -f2 | xargs)
|
||||
serial=$(echo "$smart_info" | grep "Serial Number" | cut -d: -f2 | xargs)
|
||||
[[ -z "$model" ]] && model="-"
|
||||
[[ -z "$serial" ]] && serial="-"
|
||||
printf "%-15s %-10s %-10s %-40s %-25s\n" "$device" "$size" "NVMe" "$model" "$serial"
|
||||
done
|
||||
fi
|
||||
|
||||
# Show NVMe Drives only if present
|
||||
nvme_drives=$(sudo nvme list | grep "^/dev")
|
||||
if [ -n "$nvme_drives" ]; then
|
||||
echo -e "\n=== NVMe Drives ===\n"
|
||||
printf "%-15s %-10s %-10s %-20s\n" "DEVICE" "SIZE" "TYPE" "MODEL"
|
||||
#------------------------------------------------------------------------------
|
||||
# Optional sections
|
||||
#------------------------------------------------------------------------------
|
||||
|
||||
# Ceph RBD Devices
|
||||
rbd_devices=$(lsblk -d -n -o NAME,SIZE,TYPE 2>/dev/null | grep "rbd" | sort -V)
|
||||
if [ -n "$rbd_devices" ]; then
|
||||
echo -e "\n=== Ceph RBD Devices ==="
|
||||
printf "%-15s %-10s %-10s %-30s\n" "DEVICE" "SIZE" "TYPE" "MOUNTPOINT"
|
||||
echo "------------------------------------------------------------"
|
||||
echo "$nvme_drives" | awk '{printf "%-15s %-10s %-10s %-20s\n", $1, $6, "NVMe", $3}'
|
||||
else
|
||||
not_found+=("NVMe drives")
|
||||
echo "$rbd_devices" | while read -r name size type; do
|
||||
# Get mountpoint if any
|
||||
mountpoint=$(lsblk -n -o MOUNTPOINT "/dev/$name" 2>/dev/null | head -1)
|
||||
[[ -z "$mountpoint" ]] && mountpoint="-"
|
||||
printf "%-15s %-10s %-10s %-30s\n" "/dev/$name" "$size" "$type" "$mountpoint"
|
||||
done
|
||||
fi
|
||||
|
||||
# Show MMC Drives only if present
|
||||
mmc_output=$(lsblk -o NAME,SIZE,TYPE,MOUNTPOINT | grep "mmcblk" | sort)
|
||||
if [ -n "$mmc_output" ]; then
|
||||
echo -e "\n=== MMC Drives ===\n"
|
||||
printf "%-15s %-10s %-10s %-20s\n" "DEVICE" "SIZE" "TYPE" "MOUNTPOINT"
|
||||
echo "------------------------------------------------------------"
|
||||
echo "$mmc_output"
|
||||
# Show mapping diagnostic info if DEBUG is set
|
||||
if [[ -n "$DEBUG" ]]; then
|
||||
echo -e "\n=== DEBUG: Drive Mappings ==="
|
||||
for key in "${!DRIVE_MAP[@]}"; do
|
||||
echo "Bay $key: ${DRIVE_MAP[$key]}"
|
||||
done | sort -n
|
||||
fi
|
||||
|
||||
# Show SATA Drives only if present
|
||||
sata_output=$(lsblk -d -o NAME,SIZE,TYPE,MOUNTPOINT | grep "disk" | grep -v "nvme" | grep -v "rbd" | sort | column -t)s
|
||||
if [ -n "$sata_output" ]; then
|
||||
echo -e "\n=== SATA Drives ===\n"
|
||||
printf "%-15s %-10s %-10s %-20s\n" "DEVICE" "SIZE" "TYPE" "MOUNTPOINT"
|
||||
echo "------------------------------------------------------------"
|
||||
echo "$sata_output"
|
||||
fi
|
||||
|
||||
# Show Ceph RBD Devices only if present
|
||||
rbd_output=$(lsblk -o NAME,SIZE,TYPE,MOUNTPOINT | grep "rbd" | sort -V)
|
||||
if [ -n "$rbd_output" ]; then
|
||||
echo -e "\n=== Ceph RBD Devices ===\n"
|
||||
printf "%-15s %-10s %-10s %-20s\n" "DEVICE" "SIZE" "TYPE" "MOUNTPOINT"
|
||||
echo "------------------------------------------------------------"
|
||||
echo "$rbd_output"
|
||||
else
|
||||
not_found+=("RBD devices")
|
||||
fi
|
||||
|
||||
# Check RAID
|
||||
if ! [ -f /proc/mdstat ] || ! grep -q "active" /proc/mdstat; then
|
||||
not_found+=("Software RAID")
|
||||
fi
|
||||
|
||||
# Check ZFS
|
||||
if ! command -v zpool >/dev/null 2>&1 || [ -z "$(sudo zpool status 2>/dev/null)" ]; then
|
||||
not_found+=("ZFS pools")
|
||||
fi
|
||||
|
||||
# Display consolidated "not found" messages at the end
|
||||
if [ ${#not_found[@]} -gt 0 ]; then
|
||||
echo -e "\n=== Not Found ===\n"
|
||||
printf "%s\n" "${not_found[@]}"
|
||||
fi
|
||||
11
get-serials.sh
Normal file
11
get-serials.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "=== Drive Serial Numbers ==="
|
||||
for dev in sd{a..j}; do
|
||||
if [ -b "/dev/$dev" ]; then
|
||||
serial=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Serial Number" | awk '{print $3}')
|
||||
model=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
|
||||
size=$(lsblk -d -n -o SIZE /dev/$dev 2>/dev/null)
|
||||
echo "/dev/$dev: $serial ($size - $model)"
|
||||
fi
|
||||
done
|
||||
11
test-paths.sh
Normal file
11
test-paths.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "=== Checking /dev/disk/by-path/ ==="
|
||||
ls -la /dev/disk/by-path/ | grep -v "part" | grep "pci-0000:0c:00.0" | head -20
|
||||
echo ""
|
||||
echo "=== Checking if paths exist from mapping ==="
|
||||
echo "pci-0000:0c:00.0-ata-3:"
|
||||
ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-3 2>&1
|
||||
|
||||
echo "pci-0000:0c:00.0-ata-1:"
|
||||
ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-1 2>&1
|
||||
Reference in New Issue
Block a user