diff --git a/Claude.md b/Claude.md index 751bee5..d36dcfe 100644 --- a/Claude.md +++ b/Claude.md @@ -159,9 +159,9 @@ The project began with: ## Metrics -- **Lines of Code:** ~330 (main script) -- **Supported Chassis Types:** 4 (10-bay, large1, micro, spare) -- **Mapped Servers:** 1 fully (compute-storage-01), 3 pending +- **Lines of Code:** ~1178 (main script) +- **Supported Chassis Types:** 3 (10bay, large1, micro) +- **Mapped Servers:** 1 fully (compute-storage-01), 3 partially (storage-01, large1, compute-storage-gpu-01), 2 stubs (micro1, monitor-02) - **Features Added:** 10+ - **Bugs Fixed:** 6 major, multiple minor - **Documentation:** Comprehensive README + this file @@ -206,4 +206,4 @@ The result is a robust infrastructure management tool that provides instant visi - **Human Developer:** LotusGuild - **AI Assistant:** Claude Sonnet 4.5 (Anthropic) - **Development Date:** January 6, 2026 -- **Project:** Drive Atlas v1.0 +- **Project:** Drive Atlas v1.1.0 diff --git a/README.md b/README.md index 504ec0c..f9f9e01 100644 --- a/README.md +++ b/README.md @@ -41,14 +41,14 @@ bash <(wget -qO- http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/d | Chassis Type | Description | Servers Using It | |-------------|-------------|------------------| -| **10-Bay Hot-swap** | Sliger CX471225 4U 10x 3.5" NAS (with unused 2x 5.25" bays) | compute-storage-01, compute-storage-gpu-01, storage-01 | +| **10-Bay Hot-swap** | Sliger CX4712 4U 10x 3.5" NAS (with unused 2x 5.25" bays) | compute-storage-01, compute-storage-gpu-01, storage-01 | | **Large1 Grid** | Unique 3x5 grid layout (1/1 configuration) | large1 | | **Micro** | Compact 2-drive layout | micro1, monitor-02 | ### Server Details #### compute-storage-01 (formerly medium2) -- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap) +- **Chassis:** Sliger CX4712 4U (10-Bay Hot-swap) - **Motherboard:** B650D4U3-2Q/BCM - **Controllers:** - 01:00.0 - LSI SAS3008 HBA (bays 5-10 via 2x mini-SAS HD) @@ -57,20 +57,20 @@ bash <(wget -qO- http://10.10.10.63:3000/LotusGuild/driveAtlas/raw/branch/main/d - **Status:** ✅ Fully mapped and verified #### storage-01 -- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap) -- **Motherboard:** Different from compute-storage-01 -- **Controllers:** Motherboard SATA only (no HBA currently) -- **Status:** ⚠️ Requires PCI path mapping +- **Chassis:** Sliger CX4712 4U (10-Bay Hot-swap) +- **Motherboard:** ASRock A320M-HDV R4.0 +- **Controllers:** AMD SATA (bays 1-4), LSI SAS3416 HBA (bays 5+, U.2 NVMe) +- **Status:** ⚠️ Partially mapped (5 of 10 bays) #### large1 - **Chassis:** Unique 3x5 grid (15 bays total) - **Note:** 1/1 configuration, will not be replicated -- **Status:** ⚠️ Requires PCI path mapping +- **Status:** ⚠️ Partially mapped (14 bays + 2 M.2) #### compute-storage-gpu-01 -- **Chassis:** Sliger CX471225 4U (10-Bay Hot-swap) -- **Motherboard:** Same as compute-storage-01 -- **Status:** ⚠️ Requires PCI path mapping +- **Chassis:** Sliger CX4712 4U (10-Bay Hot-swap) +- **Motherboard:** ASUS PRIME B550-PLUS +- **Status:** ⚠️ Partially mapped (5 SATA + 1 M.2) ## Output Example @@ -130,15 +130,15 @@ declare -A SERVER_MAPPINGS=( ## Setting Up a New Server -### Step 1: Run Diagnostic Script +### Step 1: Run Diagnostic Mode First, gather PCI path information: ```bash -bash diagnose-drives.sh > server-diagnostic.txt +bash driveAtlas.sh --diagnose ``` -This will show all available PCI paths and their associated drives. +This will show all available PCI paths, storage controllers, and their associated drives. ### Step 2: Physical Bay Identification @@ -192,7 +192,7 @@ DEBUG=1 bash driveAtlas.sh | **SIZE** | Drive capacity | | **TYPE** | SSD or HDD (detected via SMART) | | **TEMP** | Current temperature from SMART | -| **HEALTH** | SMART health status (✓ = passed, ✗ = failed) | +| **HEALTH** | SMART health status (✓ = passed, ⚠ = passed with warnings, ✗ = failed) | | **MODEL** | Drive model number | | **SERIAL** | Drive serial number (for physical verification) | | **CEPH OSD** | Ceph OSD ID if drive hosts an OSD | @@ -235,17 +235,15 @@ DEBUG=1 bash driveAtlas.sh ## Files -- [driveAtlas.sh](driveAtlas.sh) - Main script -- [diagnose-drives.sh](diagnose-drives.sh) - PCI path diagnostic tool +- [driveAtlas.sh](driveAtlas.sh) - Main script (includes `--diagnose` mode for PCI path discovery) - [README.md](README.md) - This file - [CLAUDE.md](CLAUDE.md) - AI-assisted development notes -- [todo.txt](todo.txt) - Development notes and task tracking ## Contributing When adding support for a new server: -1. Run `diagnose-drives.sh` and save output +1. Run `driveAtlas.sh --diagnose` and save output 2. Physically label or identify drives by serial number 3. Create mapping in `SERVER_MAPPINGS` 4. Test thoroughly diff --git a/diagnose-drives.sh b/diagnose-drives.sh deleted file mode 100644 index 490bf62..0000000 --- a/diagnose-drives.sh +++ /dev/null @@ -1,59 +0,0 @@ -#!/bin/bash - -# Drive Atlas Diagnostic Script -# Run this on each server to gather PCI path information - -echo "=== Server Information ===" -echo "Hostname: $(hostname)" -echo "Date: $(date)" -echo "" - -echo "=== All /dev/disk/by-path/ entries ===" -ls -la /dev/disk/by-path/ | grep -v "part" | sort -echo "" - -echo "=== Organized by PCI Address ===" -for path in /dev/disk/by-path/*; do - if [ -L "$path" ]; then - # Skip partitions - if [[ "$path" =~ -part[0-9]+$ ]]; then - continue - fi - - basename_path=$(basename "$path") - target=$(readlink -f "$path") - device=$(basename "$target") - - echo "Path: $basename_path" - echo " -> Device: $device" - - # Try to get size - if [ -b "$target" ]; then - size=$(lsblk -d -n -o SIZE "$target" 2>/dev/null) - echo " -> Size: $size" - fi - - # Try to get SMART info for model - if command -v smartctl >/dev/null 2>&1; then - model=$(sudo smartctl -i "$target" 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs) - if [ -n "$model" ]; then - echo " -> Model: $model" - fi - fi - echo "" - fi -done - -echo "=== PCI Devices with Storage Controllers ===" -lspci | grep -i "storage\|raid\|sata\|sas\|nvme" -echo "" - -echo "=== Current Block Devices ===" -lsblk -d -o NAME,SIZE,TYPE,TRAN | grep -v "rbd\|loop" -echo "" - -echo "=== Recommendations ===" -echo "1. Note the PCI addresses (e.g., 0c:00.0) of your storage controllers" -echo "2. For each bay, physically identify which drive is in it" -echo "3. Match the PCI path pattern to the bay number" -echo "4. Example: pci-0000:0c:00.0-ata-1 might be bay 1 on controller 0c:00.0" diff --git a/driveAtlas.sh b/driveAtlas.sh index 8177baa..62d957e 100644 --- a/driveAtlas.sh +++ b/driveAtlas.sh @@ -60,6 +60,7 @@ OPTIONS: --verbose Show detailed error messages and warnings --no-ceph Skip Ceph OSD information --show-pci Show PCI paths in output + --diagnose Show all PCI paths and block devices (for mapping new servers) EXAMPLES: $(basename "$0") # Normal run with all features @@ -67,6 +68,7 @@ EXAMPLES: $(basename "$0") --color # Run with colored output $(basename "$0") --verbose # Show all errors and warnings $(basename "$0") --debug # Show mapping debug info + $(basename "$0") --diagnose # Gather PCI paths for new server setup ENVIRONMENT VARIABLES: DEBUG=1 Same as --debug flag @@ -83,6 +85,7 @@ SKIP_CEPH=false SHOW_PCI=false USE_COLOR=false VERBOSE=false +RUN_DIAGNOSE=false while [[ $# -gt 0 ]]; do case "$1" in @@ -118,6 +121,10 @@ while [[ $# -gt 0 ]]; do VERBOSE=true shift ;; + --diagnose) + RUN_DIAGNOSE=true + shift + ;; *) echo "Unknown option: $1" >&2 echo "Use --help for usage information." >&2 @@ -321,6 +328,68 @@ check_dependencies() { # Run dependency check at script start check_dependencies +#------------------------------------------------------------------------------ +# run_diagnose +# +# Displays all PCI disk paths, storage controllers, and block devices. +# Used to gather information needed when mapping a new server. +#------------------------------------------------------------------------------ +run_diagnose() { + local hostname + hostname="$(hostname)" + + echo "=== Server Information ===" + echo "Hostname: $hostname" + echo "Date: $(date)" + echo "" + + echo "=== Storage Controllers ===" + lspci 2>/dev/null | grep -iE "SAS|SATA|RAID|Mass storage|NVMe" + echo "" + + echo "=== All /dev/disk/by-path/ entries (whole disks only) ===" + for path in "${DISK_BY_PATH}"/*; do + [[ -L "$path" ]] || continue + # Skip partitions + [[ "$path" =~ -part[0-9]+$ ]] && continue + + local basename_path target device size serial model + basename_path="$(basename "$path")" + target="$(readlink -f "$path")" + device="$(basename "$target")" + size="$(lsblk -d -n -o SIZE "$target" 2>/dev/null | xargs)" + + printf " %-55s -> %-10s %s\n" "$basename_path" "$device" "${size:+($size)}" + done + echo "" + + echo "=== Block Devices ===" + lsblk -d -o NAME,SIZE,TYPE,TRAN 2>/dev/null | grep -v "rbd\|loop" + echo "" + + # Check if this server has a mapping + local sanitized + sanitized="$(echo "$hostname" | tr -cd '[:alnum:]-_.')" + if [[ -n "${SERVER_MAPPINGS[$sanitized]:-}" ]]; then + echo "=== Current Mapping for $sanitized ===" + echo "${SERVER_MAPPINGS[$sanitized]}" | while read -r pci_path bay; do + [[ -z "$pci_path" || -z "$bay" ]] && continue + if [[ -L "${DISK_BY_PATH}/$pci_path" ]]; then + local dev + dev="$(readlink -f "${DISK_BY_PATH}/$pci_path" | sed 's/.*\///')" + printf " Bay %-5s %-55s -> %s\n" "$bay" "$pci_path" "$dev" + else + printf " Bay %-5s %-55s -> (not connected)\n" "$bay" "$pci_path" + fi + done + else + echo "NOTE: No mapping exists yet for '$sanitized'." + echo "Use the PCI paths above to create a SERVER_MAPPINGS entry." + fi + + exit 0 +} + #------------------------------------------------------------------------------ # Chassis Layout Generator Functions # These define the physical layout and display formatting for each chassis type @@ -418,7 +487,7 @@ generate_micro_layout() { printf "│ │\n" printf "│ Storage Controllers: │\n" while IFS= read -r ctrl; do - [[ -n "$ctrl" ]] && printf "│ %-57s│\n" "$ctrl" + [[ -n "$ctrl" ]] && printf "│ %-57.57s│\n" "$ctrl" done < <(get_storage_controllers) printf "│ │\n" @@ -460,7 +529,7 @@ generate_large1_layout() { printf "│ │\n" printf "│ Storage Controllers: │\n" while IFS= read -r ctrl; do - [[ -n "$ctrl" ]] && printf "│ %-69s│\n" "$ctrl" + [[ -n "$ctrl" ]] && printf "│ %-69.69s│\n" "$ctrl" done < <(get_storage_controllers) printf "│ │\n" printf "│ M.2 NVMe: M1: %-10s M2: %-10s │\n" "${DRIVE_MAP[m2-1]:-EMPTY}" "${DRIVE_MAP[m2-2]:-EMPTY}" @@ -648,7 +717,7 @@ build_drive_map() { declare -g -A BAY_TO_PCI_PATH=() if [[ -z "$mapping" ]]; then - log_warn "No drive mapping found for host '$host'. Run diagnose-drives.sh to create one." + log_warn "No drive mapping found for host '$host'. Run with --diagnose to gather PCI path info." return fi @@ -919,6 +988,11 @@ get_drive_smart_info() { # Main Display Logic #------------------------------------------------------------------------------ +# Run diagnose mode if requested (exits after printing) +if [[ "$RUN_DIAGNOSE" == true ]]; then + run_diagnose +fi + HOSTNAME=$(hostname | tr -cd '[:alnum:]-_.') CHASSIS_TYPE=${CHASSIS_TYPES[$HOSTNAME]:-"unknown"} @@ -937,7 +1011,7 @@ case "$CHASSIS_TYPE" in echo "┌─────────────────────────────────────────────────────────┐" echo "│ Unknown server: $HOSTNAME" echo "│ No chassis mapping defined yet" - echo "│ Run diagnose-drives.sh to gather PCI path information" + echo "│ Run with --diagnose to gather PCI path information" echo "└─────────────────────────────────────────────────────────┘" ;; esac diff --git a/get-serials.sh b/get-serials.sh deleted file mode 100644 index 760ee2c..0000000 --- a/get-serials.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -echo "=== Drive Serial Numbers ===" -for dev in sd{a..j}; do - if [ -b "/dev/$dev" ]; then - serial=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Serial Number" | awk '{print $3}') - model=$(sudo smartctl -i /dev/$dev 2>/dev/null | grep "Device Model\|Model Number" | cut -d: -f2 | xargs) - size=$(lsblk -d -n -o SIZE /dev/$dev 2>/dev/null) - echo "/dev/$dev: $serial ($size - $model)" - fi -done diff --git a/test-paths.sh b/test-paths.sh deleted file mode 100644 index a318626..0000000 --- a/test-paths.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -echo "=== Checking /dev/disk/by-path/ ===" -ls -la /dev/disk/by-path/ | grep -v "part" | grep "pci-0000:0c:00.0" | head -20 -echo "" -echo "=== Checking if paths exist from mapping ===" -echo "pci-0000:0c:00.0-ata-3:" -ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-3 2>&1 - -echo "pci-0000:0c:00.0-ata-1:" -ls -la /dev/disk/by-path/pci-0000:0c:00.0-ata-1 2>&1