Files
driveAtlas/CLAUDE.md
Jared Vititoe 40ab528f40 Comprehensive documentation update and AI development notes
Updated README.md:
- Added feature list with emojis for visual clarity
- Documented all output columns with descriptions
- Added Ceph integration details
- Included troubleshooting for common issues
- Updated example output with current format
- Added status indicators ( ⚠️) for server mapping status

Created CLAUDE.md:
- Documented AI-assisted development process
- Chronicled evolution from basic script to comprehensive tool
- Detailed technical challenges and solutions
- Listed all phases of development
- Provided metrics and future enhancement ideas
- Lessons learned for future AI collaboration

This documents the complete journey from broken PCI paths to a
production-ready storage infrastructure management tool.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06 16:34:22 -05:00

7.6 KiB

AI-Assisted Development Notes

This document chronicles the development of Drive Atlas with assistance from Claude (Anthropic's AI assistant).

Project Overview

Drive Atlas started as a simple bash script with hardcoded drive mappings and evolved into a comprehensive storage infrastructure management tool through iterative development and user feedback.

Development Session

Date: January 6, 2026 AI Model: Claude Sonnet 4.5 Developer: LotusGuild Session Duration: ~2 hours

Initial State

The project began with:

  • Basic ASCII art layouts for different server chassis
  • Hardcoded drive mappings for "medium2" server
  • Simple SMART data display
  • Broken PCI path mappings (referenced non-existent hardware)
  • Windows line endings causing script execution failures

Evolution Through Collaboration

Phase 1: Architecture Refactoring

Problem: Chassis layouts were tied to hostnames, making it hard to reuse templates.

Solution:

  • Separated chassis types from server hostnames
  • Created reusable layout generator functions
  • Introduced CHASSIS_TYPES and SERVER_MAPPINGS arrays
  • Renamed "medium2" → "compute-storage-01" for clarity

Phase 2: Hardware Discovery

Problem: Script referenced PCI controller 0c:00.0 which didn't exist.

Approach:

  1. Created diagnostic script to probe actual hardware
  2. Discovered real configuration:
    • LSI SAS3008 HBA at 01:00.0 (bays 5-10)
    • AMD SATA controller at 0d:00.0 (bays 1-4)
    • NVMe at 0e:00.0 (M.2 slot)
  3. User provided physical bay labels and visible serial numbers
  4. Iteratively refined PCI PHY to bay mappings

Key Insight: User confirmed bay 1 contained the SSD boot drive, which helped establish the correct mapping starting point.

Phase 3: Physical Verification

Problem: Needed to verify drive-to-bay mappings without powering down production server.

Solution:

  1. Added serial number display to script output
  2. User physically inspected visible serial numbers on drive bays
  3. Cross-referenced SMART serials with visible labels
  4. Corrected HBA PHY mappings:
    • Bay 5: phy6 (not phy2)
    • Bay 6: phy7 (not phy3)
    • Bay 7: phy5 (not phy4)
    • Bay 8: phy2 (not phy5)
    • Bay 9: phy4 (not phy6)
    • Bay 10: phy3 (not phy7)

Phase 4: User Experience Improvements

ASCII Art Rendering:

  • Initial version had variable-width boxes that broke alignment
  • Fixed by using consistent 10-character wide bay boxes
  • Multiple iterations to perfect right border alignment

Drive Table Enhancements:

  • Original: Alphabetical by device name
  • Improved: Sorted by physical bay position (1-10)
  • Added BAY column to show physical location
  • Wider columns to prevent text wrapping

Phase 5: Ceph Integration

User Request: "Can we show ceph in/up out/down status in the table?"

Implementation:

  1. Added CEPH OSD column using ceph-volume lvm list
  2. Added STATUS column parsing ceph osd tree
  3. Initial bug: Parsed wrong columns (5 & 6 instead of correct ones)
  4. Fixed by understanding ceph osd tree format:
    • Column 5: STATUS (up/down)
    • Column 6: REWEIGHT (1.0 = in, 0 = out)

User Request: "Show which is the boot drive somehow?"

Solution:

  • Added USAGE column
  • Checks mount points
  • Shows "BOOT" for root filesystem
  • Shows mount point for other mounts
  • Shows "-" for Ceph OSDs (using LVM)

Technical Challenges Solved

1. Line Ending Issues

  • Problem: diagnose-drives.sh had CRLF endings → script failures
  • Solution: sed -i 's/\r$//' to convert to LF

2. PCI Path Pattern Matching

  • Problem: Bash regex escaping for grep patterns
  • Solution: grep -E "^\s*${osd_num}\s+" for reliable matching

3. Floating Point Comparison in Bash

  • Problem: Bash doesn't natively support decimal comparisons
  • Solution: Used bc -l with error handling: $(echo "$reweight > 0" | bc -l 2>/dev/null || echo 0)

4. Associative Array Sorting

  • Problem: Bash associative arrays don't maintain insertion order
  • Solution: Extract keys, filter numeric ones, pipe to sort -n

Key Learning Moments

  1. Hardware Reality vs. Assumptions: The original script assumed controller addresses that didn't exist. Always probe actual hardware.

  2. Physical Verification is Essential: Serial numbers visible on drive trays were crucial for verifying correct mappings.

  3. Iterative Refinement: The script went through 15+ commits, each improving a specific aspect based on user testing and feedback.

  4. User-Driven Feature Evolution: Features like Ceph integration and boot drive detection emerged organically from user needs.

Commits Timeline

  1. Initial refactoring and architecture improvements
  2. Fixed PCI path mappings based on discovered hardware
  3. Added serial numbers for physical verification
  4. Fixed ASCII art rendering issues
  5. Corrected bay mappings based on user verification
  6. Added bay-sorted output
  7. Implemented Ceph OSD tracking
  8. Added Ceph up/in status
  9. Added boot drive detection
  10. Fixed Ceph status parsing
  11. Documentation updates

Collaborative Techniques Used

Information Gathering

  • Asked clarifying questions about hardware configuration
  • Requested diagnostic command output
  • Had user physically verify drive locations

Iterative Development

  • Made small, testable changes
  • User tested after each significant change
  • Incorporated feedback immediately

Problem-Solving Approach

  1. Understand current state
  2. Identify specific issues
  3. Propose solution
  4. Implement incrementally
  5. Test and verify
  6. Refine based on feedback

Metrics

  • Lines of Code: ~330 (main script)
  • Supported Chassis Types: 4 (10-bay, large1, micro, spare)
  • Mapped Servers: 1 fully (compute-storage-01), 3 pending
  • Features Added: 10+
  • Bugs Fixed: 6 major, multiple minor
  • Documentation: Comprehensive README + this file

Future Enhancements

Potential improvements identified during development:

  1. Auto-detection: Attempt to auto-map bays by testing with hdparm LED control
  2. Color Output: Use terminal colors for health status (green/red)
  3. Historical Tracking: Log temperature trends over time
  4. Alert Integration: Notify when drive health deteriorates
  5. Web Interface: Display chassis map in a web dashboard
  6. Multi-server View: Show all servers in one consolidated view

Lessons for Future AI-Assisted Development

What Worked Well

  • Breaking complex problems into small, testable pieces
  • Using diagnostic scripts to understand actual vs. assumed state
  • Physical verification before trusting software output
  • Comprehensive documentation alongside code
  • Git commits with detailed messages for traceability

What Could Be Improved

  • Earlier physical verification would have saved iteration
  • More upfront hardware documentation would help
  • Automated testing for bay mappings (if possible)

Conclusion

This project demonstrates effective human-AI collaboration where:

  • The AI provided technical implementation and problem-solving
  • The human provided domain knowledge, testing, and verification
  • Iterative feedback loops led to a polished, production-ready tool

The result is a robust infrastructure management tool that provides instant visibility into complex storage configurations across multiple servers.


Development Credits:

  • Human Developer: LotusGuild
  • AI Assistant: Claude Sonnet 4.5 (Anthropic)
  • Development Date: January 6, 2026
  • Project: Drive Atlas v1.0