Add --diagnose flag, remove obsolete helper scripts, fix docs
- Add --diagnose option that shows all PCI paths, storage controllers, block devices, and validates current mappings. Replaces the separate diagnose-drives.sh script. - Remove diagnose-drives.sh (incorporated into --diagnose). - Remove get-serials.sh (redundant with SMART data in main table). - Remove test-paths.sh (referenced non-existent 0c:00.0 controller). - Remove todo.md (massively outdated). - Fix storage controller text overflowing box borders in large1 and micro layouts by adding truncation (%-69.69s, %-57.57s). - Fix chassis name to CX4712 in README. - Update server mapping statuses from "Requires mapping" to actual partially-mapped states. - Add ⚠ health indicator to README output column docs. - Update Claude.md metrics to match current state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -159,9 +159,9 @@ The project began with:
|
||||
|
||||
## Metrics
|
||||
|
||||
- **Lines of Code:** ~330 (main script)
|
||||
- **Supported Chassis Types:** 4 (10-bay, large1, micro, spare)
|
||||
- **Mapped Servers:** 1 fully (compute-storage-01), 3 pending
|
||||
- **Lines of Code:** ~1178 (main script)
|
||||
- **Supported Chassis Types:** 3 (10bay, large1, micro)
|
||||
- **Mapped Servers:** 1 fully (compute-storage-01), 3 partially (storage-01, large1, compute-storage-gpu-01), 2 stubs (micro1, monitor-02)
|
||||
- **Features Added:** 10+
|
||||
- **Bugs Fixed:** 6 major, multiple minor
|
||||
- **Documentation:** Comprehensive README + this file
|
||||
@@ -206,4 +206,4 @@ The result is a robust infrastructure management tool that provides instant visi
|
||||
- **Human Developer:** LotusGuild
|
||||
- **AI Assistant:** Claude Sonnet 4.5 (Anthropic)
|
||||
- **Development Date:** January 6, 2026
|
||||
- **Project:** Drive Atlas v1.0
|
||||
- **Project:** Drive Atlas v1.1.0
|
||||
|
||||
34
README.md
34
README.md
@@ -41,14 +41,14 @@ bash <(wget -qO- http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/d
|
||||
|
||||
| Chassis Type | Description | Servers Using It |
|
||||
|-------------|-------------|------------------|
|
||||
| **10-Bay Hot-swap** | Sliger CX471225 4U 10x 3.5" NAS (with unused 2x 5.25" bays) | compute-storage-01, compute-storage-gpu-01, storage-01 |
|
||||
| **10-Bay Hot-swap** | Sliger CX4712 4U 10x 3.5" NAS (with unused 2x 5.25" bays) | compute-storage-01, compute-storage-gpu-01, storage-01 |
|
||||
| **Large1 Grid** | Unique 3x5 grid layout (1/1 configuration) | large1 |
|
||||
| **Micro** | Compact 2-drive layout | micro1, monitor-02 |
|
||||
|
||||
### Server Details
|
||||
|
||||
#### compute-storage-01 (formerly medium2)
|
||||
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
|
||||
- **Chassis:** Sliger CX4712 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** B650D4U3-2Q/BCM
|
||||
- **Controllers:**
|
||||
- 01:00.0 - LSI SAS3008 HBA (bays 5-10 via 2x mini-SAS HD)
|
||||
@@ -57,20 +57,20 @@ bash <(wget -qO- http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/d
|
||||
- **Status:** ✅ Fully mapped and verified
|
||||
|
||||
#### storage-01
|
||||
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** Different from compute-storage-01
|
||||
- **Controllers:** Motherboard SATA only (no HBA currently)
|
||||
- **Status:** ⚠️ Requires PCI path mapping
|
||||
- **Chassis:** Sliger CX4712 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** ASRock A320M-HDV R4.0
|
||||
- **Controllers:** AMD SATA (bays 1-4), LSI SAS3416 HBA (bays 5+, U.2 NVMe)
|
||||
- **Status:** ⚠️ Partially mapped (5 of 10 bays)
|
||||
|
||||
#### large1
|
||||
- **Chassis:** Unique 3x5 grid (15 bays total)
|
||||
- **Note:** 1/1 configuration, will not be replicated
|
||||
- **Status:** ⚠️ Requires PCI path mapping
|
||||
- **Status:** ⚠️ Partially mapped (14 bays + 2 M.2)
|
||||
|
||||
#### compute-storage-gpu-01
|
||||
- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** Same as compute-storage-01
|
||||
- **Status:** ⚠️ Requires PCI path mapping
|
||||
- **Chassis:** Sliger CX4712 4U (10-Bay Hot-swap)
|
||||
- **Motherboard:** ASUS PRIME B550-PLUS
|
||||
- **Status:** ⚠️ Partially mapped (5 SATA + 1 M.2)
|
||||
|
||||
## Output Example
|
||||
|
||||
@@ -130,15 +130,15 @@ declare -A SERVER_MAPPINGS=(
|
||||
|
||||
## Setting Up a New Server
|
||||
|
||||
### Step 1: Run Diagnostic Script
|
||||
### Step 1: Run Diagnostic Mode
|
||||
|
||||
First, gather PCI path information:
|
||||
|
||||
```bash
|
||||
bash diagnose-drives.sh > server-diagnostic.txt
|
||||
bash driveAtlas.sh --diagnose
|
||||
```
|
||||
|
||||
This will show all available PCI paths and their associated drives.
|
||||
This will show all available PCI paths, storage controllers, and their associated drives.
|
||||
|
||||
### Step 2: Physical Bay Identification
|
||||
|
||||
@@ -192,7 +192,7 @@ DEBUG=1 bash driveAtlas.sh
|
||||
| **SIZE** | Drive capacity |
|
||||
| **TYPE** | SSD or HDD (detected via SMART) |
|
||||
| **TEMP** | Current temperature from SMART |
|
||||
| **HEALTH** | SMART health status (✓ = passed, ✗ = failed) |
|
||||
| **HEALTH** | SMART health status (✓ = passed, ⚠ = passed with warnings, ✗ = failed) |
|
||||
| **MODEL** | Drive model number |
|
||||
| **SERIAL** | Drive serial number (for physical verification) |
|
||||
| **CEPH OSD** | Ceph OSD ID if drive hosts an OSD |
|
||||
@@ -235,17 +235,15 @@ DEBUG=1 bash driveAtlas.sh
|
||||
|
||||
## Files
|
||||
|
||||
- [driveAtlas.sh](driveAtlas.sh) - Main script
|
||||
- [diagnose-drives.sh](diagnose-drives.sh) - PCI path diagnostic tool
|
||||
- [driveAtlas.sh](driveAtlas.sh) - Main script (includes `--diagnose` mode for PCI path discovery)
|
||||
- [README.md](README.md) - This file
|
||||
- [CLAUDE.md](CLAUDE.md) - AI-assisted development notes
|
||||
- [todo.txt](todo.txt) - Development notes and task tracking
|
||||
|
||||
## Contributing
|
||||
|
||||
When adding support for a new server:
|
||||
|
||||
1. Run `diagnose-drives.sh` and save output
|
||||
1. Run `driveAtlas.sh --diagnose` and save output
|
||||
2. Physically label or identify drives by serial number
|
||||
3. Create mapping in `SERVER_MAPPINGS`
|
||||
4. Test thoroughly
|
||||
|
||||
@@ -1,59 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Drive Atlas Diagnostic Script
|
||||
# Run this on each server to gather PCI path information
|
||||
|
||||
echo "=== Server Information ==="
|
||||
echo "Hostname: $(hostname)"
|
||||
echo "Date: $(date)"
|
||||
echo ""
|
||||
|
||||
echo "=== All /dev/disk/by-path/ entries ==="
|
||||
ls -la /dev/disk/by-path/ | grep -v "part" | sort
|
||||
echo ""
|
||||
|
||||
echo "=== Organized by PCI Address ==="
|
||||
for path in /dev/disk/by-path/*; do
|
||||
if [ -L "$path" ]; then
|
||||
# Skip partitions
|
||||
if [[ "$path" =~ -part[0-9]+$ ]]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
basename_path=$(basename "$path")
|
||||
target=$(readlink -f "$path")
|
||||
device=$(basename "$target")
|
||||
|
||||
echo "Path: $basename_path"
|
||||
echo " -> Device: $device"
|
||||
|
||||
# Try to get size
|
||||
if [ -b "$target" ]; then
|
||||
size=$(lsblk -d -n -o SIZE "$target" 2>/dev/null)
|
||||
echo " -> Size: $size"
|
||||
fi
|
||||
|
||||
# Try to get SMART info for model
|
||||
if command -v smartctl >/dev/null 2>&1; then
|
||||
model=$(sudo smartctl -i "$target" 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
|
||||
if [ -n "$model" ]; then
|
||||
echo " -> Model: $model"
|
||||
fi
|
||||
fi
|
||||
echo ""
|
||||
fi
|
||||
done
|
||||
|
||||
echo "=== PCI Devices with Storage Controllers ==="
|
||||
lspci | grep -i "storage\|raid\|sata\|sas\|nvme"
|
||||
echo ""
|
||||
|
||||
echo "=== Current Block Devices ==="
|
||||
lsblk -d -o NAME,SIZE,TYPE,TRAN | grep -v "rbd\|loop"
|
||||
echo ""
|
||||
|
||||
echo "=== Recommendations ==="
|
||||
echo "1. Note the PCI addresses (e.g., 0c:00.0) of your storage controllers"
|
||||
echo "2. For each bay, physically identify which drive is in it"
|
||||
echo "3. Match the PCI path pattern to the bay number"
|
||||
echo "4. Example: pci-0000:0c:00.0-ata-1 might be bay 1 on controller 0c:00.0"
|
||||
@@ -60,6 +60,7 @@ OPTIONS:
|
||||
--verbose Show detailed error messages and warnings
|
||||
--no-ceph Skip Ceph OSD information
|
||||
--show-pci Show PCI paths in output
|
||||
--diagnose Show all PCI paths and block devices (for mapping new servers)
|
||||
|
||||
EXAMPLES:
|
||||
$(basename "$0") # Normal run with all features
|
||||
@@ -67,6 +68,7 @@ EXAMPLES:
|
||||
$(basename "$0") --color # Run with colored output
|
||||
$(basename "$0") --verbose # Show all errors and warnings
|
||||
$(basename "$0") --debug # Show mapping debug info
|
||||
$(basename "$0") --diagnose # Gather PCI paths for new server setup
|
||||
|
||||
ENVIRONMENT VARIABLES:
|
||||
DEBUG=1 Same as --debug flag
|
||||
@@ -83,6 +85,7 @@ SKIP_CEPH=false
|
||||
SHOW_PCI=false
|
||||
USE_COLOR=false
|
||||
VERBOSE=false
|
||||
RUN_DIAGNOSE=false
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
@@ -118,6 +121,10 @@ while [[ $# -gt 0 ]]; do
|
||||
VERBOSE=true
|
||||
shift
|
||||
;;
|
||||
--diagnose)
|
||||
RUN_DIAGNOSE=true
|
||||
shift
|
||||
;;
|
||||
*)
|
||||
echo "Unknown option: $1" >&2
|
||||
echo "Use --help for usage information." >&2
|
||||
@@ -321,6 +328,68 @@ check_dependencies() {
|
||||
# Run dependency check at script start
|
||||
check_dependencies
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
# run_diagnose
|
||||
#
|
||||
# Displays all PCI disk paths, storage controllers, and block devices.
|
||||
# Used to gather information needed when mapping a new server.
|
||||
#------------------------------------------------------------------------------
|
||||
run_diagnose() {
|
||||
local hostname
|
||||
hostname="$(hostname)"
|
||||
|
||||
echo "=== Server Information ==="
|
||||
echo "Hostname: $hostname"
|
||||
echo "Date: $(date)"
|
||||
echo ""
|
||||
|
||||
echo "=== Storage Controllers ==="
|
||||
lspci 2>/dev/null | grep -iE "SAS|SATA|RAID|Mass storage|NVMe"
|
||||
echo ""
|
||||
|
||||
echo "=== All /dev/disk/by-path/ entries (whole disks only) ==="
|
||||
for path in "${DISK_BY_PATH}"/*; do
|
||||
[[ -L "$path" ]] || continue
|
||||
# Skip partitions
|
||||
[[ "$path" =~ -part[0-9]+$ ]] && continue
|
||||
|
||||
local basename_path target device size serial model
|
||||
basename_path="$(basename "$path")"
|
||||
target="$(readlink -f "$path")"
|
||||
device="$(basename "$target")"
|
||||
size="$(lsblk -d -n -o SIZE "$target" 2>/dev/null | xargs)"
|
||||
|
||||
printf " %-55s -> %-10s %s\n" "$basename_path" "$device" "${size:+($size)}"
|
||||
done
|
||||
echo ""
|
||||
|
||||
echo "=== Block Devices ==="
|
||||
lsblk -d -o NAME,SIZE,TYPE,TRAN 2>/dev/null | grep -v "rbd\|loop"
|
||||
echo ""
|
||||
|
||||
# Check if this server has a mapping
|
||||
local sanitized
|
||||
sanitized="$(echo "$hostname" | tr -cd '[:alnum:]-_.')"
|
||||
if [[ -n "${SERVER_MAPPINGS[$sanitized]:-}" ]]; then
|
||||
echo "=== Current Mapping for $sanitized ==="
|
||||
echo "${SERVER_MAPPINGS[$sanitized]}" | while read -r pci_path bay; do
|
||||
[[ -z "$pci_path" || -z "$bay" ]] && continue
|
||||
if [[ -L "${DISK_BY_PATH}/$pci_path" ]]; then
|
||||
local dev
|
||||
dev="$(readlink -f "${DISK_BY_PATH}/$pci_path" | sed 's/.*\///')"
|
||||
printf " Bay %-5s %-55s -> %s\n" "$bay" "$pci_path" "$dev"
|
||||
else
|
||||
printf " Bay %-5s %-55s -> (not connected)\n" "$bay" "$pci_path"
|
||||
fi
|
||||
done
|
||||
else
|
||||
echo "NOTE: No mapping exists yet for '$sanitized'."
|
||||
echo "Use the PCI paths above to create a SERVER_MAPPINGS entry."
|
||||
fi
|
||||
|
||||
exit 0
|
||||
}
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
# Chassis Layout Generator Functions
|
||||
# These define the physical layout and display formatting for each chassis type
|
||||
@@ -418,7 +487,7 @@ generate_micro_layout() {
|
||||
printf "│ │\n"
|
||||
printf "│ Storage Controllers: │\n"
|
||||
while IFS= read -r ctrl; do
|
||||
[[ -n "$ctrl" ]] && printf "│ %-57s│\n" "$ctrl"
|
||||
[[ -n "$ctrl" ]] && printf "│ %-57.57s│\n" "$ctrl"
|
||||
done < <(get_storage_controllers)
|
||||
printf "│ │\n"
|
||||
|
||||
@@ -460,7 +529,7 @@ generate_large1_layout() {
|
||||
printf "│ │\n"
|
||||
printf "│ Storage Controllers: │\n"
|
||||
while IFS= read -r ctrl; do
|
||||
[[ -n "$ctrl" ]] && printf "│ %-69s│\n" "$ctrl"
|
||||
[[ -n "$ctrl" ]] && printf "│ %-69.69s│\n" "$ctrl"
|
||||
done < <(get_storage_controllers)
|
||||
printf "│ │\n"
|
||||
printf "│ M.2 NVMe: M1: %-10s M2: %-10s │\n" "${DRIVE_MAP[m2-1]:-EMPTY}" "${DRIVE_MAP[m2-2]:-EMPTY}"
|
||||
@@ -648,7 +717,7 @@ build_drive_map() {
|
||||
declare -g -A BAY_TO_PCI_PATH=()
|
||||
|
||||
if [[ -z "$mapping" ]]; then
|
||||
log_warn "No drive mapping found for host '$host'. Run diagnose-drives.sh to create one."
|
||||
log_warn "No drive mapping found for host '$host'. Run with --diagnose to gather PCI path info."
|
||||
return
|
||||
fi
|
||||
|
||||
@@ -919,6 +988,11 @@ get_drive_smart_info() {
|
||||
# Main Display Logic
|
||||
#------------------------------------------------------------------------------
|
||||
|
||||
# Run diagnose mode if requested (exits after printing)
|
||||
if [[ "$RUN_DIAGNOSE" == true ]]; then
|
||||
run_diagnose
|
||||
fi
|
||||
|
||||
HOSTNAME=$(hostname | tr -cd '[:alnum:]-_.')
|
||||
CHASSIS_TYPE=${CHASSIS_TYPES[$HOSTNAME]:-"unknown"}
|
||||
|
||||
@@ -937,7 +1011,7 @@ case "$CHASSIS_TYPE" in
|
||||
echo "┌─────────────────────────────────────────────────────────┐"
|
||||
echo "│ Unknown server: $HOSTNAME"
|
||||
echo "│ No chassis mapping defined yet"
|
||||
echo "│ Run diagnose-drives.sh to gather PCI path information"
|
||||
echo "│ Run with --diagnose to gather PCI path information"
|
||||
echo "└─────────────────────────────────────────────────────────┘"
|
||||
;;
|
||||
esac
|
||||
|
||||
@@ -1,11 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "=== Drive Serial Numbers ==="
|
||||
for dev in sd{a..j}; do
|
||||
if [ -b "/dev/$dev" ]; then
|
||||
serial=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Serial Number" | awk '{print $3}')
|
||||
model=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs)
|
||||
size=$(lsblk -d -n -o SIZE /dev/$dev 2>/dev/null)
|
||||
echo "/dev/$dev: $serial ($size - $model)"
|
||||
fi
|
||||
done
|
||||
@@ -1,11 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "=== Checking /dev/disk/by-path/ ==="
|
||||
ls -la /dev/disk/by-path/ | grep -v "part" | grep "pci-0000:0c:00.0" | head -20
|
||||
echo ""
|
||||
echo "=== Checking if paths exist from mapping ==="
|
||||
echo "pci-0000:0c:00.0-ata-3:"
|
||||
ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-3 2>&1
|
||||
|
||||
echo "pci-0000:0c:00.0-ata-1:"
|
||||
ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-1 2>&1
|
||||
Reference in New Issue
Block a user