210 lines
7.6 KiB
Markdown
210 lines
7.6 KiB
Markdown
# AI-Assisted Development Notes
|
|
|
|
This document chronicles the development of Drive Atlas with assistance from Claude (Anthropic's AI assistant).
|
|
|
|
## Project Overview
|
|
|
|
Drive Atlas started as a simple bash script with hardcoded drive mappings and evolved into a comprehensive storage infrastructure management tool through iterative development and user feedback.
|
|
|
|
## Development Session
|
|
|
|
**Date:** January 6, 2026
|
|
**AI Model:** Claude Sonnet 4.5
|
|
**Developer:** LotusGuild
|
|
**Session Duration:** ~2 hours
|
|
|
|
## Initial State
|
|
|
|
The project began with:
|
|
- Basic ASCII art layouts for different server chassis
|
|
- Hardcoded drive mappings for "medium2" server
|
|
- Simple SMART data display
|
|
- Broken PCI path mappings (referenced non-existent hardware)
|
|
- Windows line endings causing script execution failures
|
|
|
|
## Evolution Through Collaboration
|
|
|
|
### Phase 1: Architecture Refactoring
|
|
**Problem:** Chassis layouts were tied to hostnames, making it hard to reuse templates.
|
|
|
|
**Solution:**
|
|
- Separated chassis types from server hostnames
|
|
- Created reusable layout generator functions
|
|
- Introduced `CHASSIS_TYPES` and `SERVER_MAPPINGS` arrays
|
|
- Renamed "medium2" → "compute-storage-01" for clarity
|
|
|
|
### Phase 2: Hardware Discovery
|
|
**Problem:** Script referenced PCI controller `0c:00.0` which didn't exist.
|
|
|
|
**Approach:**
|
|
1. Created diagnostic script to probe actual hardware
|
|
2. Discovered real configuration:
|
|
- LSI SAS3008 HBA at `01:00.0` (bays 5-10)
|
|
- AMD SATA controller at `0d:00.0` (bays 1-4)
|
|
- NVMe at `0e:00.0` (M.2 slot)
|
|
3. User provided physical bay labels and visible serial numbers
|
|
4. Iteratively refined PCI PHY to bay mappings
|
|
|
|
**Key Insight:** User confirmed bay 1 contained the SSD boot drive, which helped establish the correct mapping starting point.
|
|
|
|
### Phase 3: Physical Verification
|
|
**Problem:** Needed to verify drive-to-bay mappings without powering down production server.
|
|
|
|
**Solution:**
|
|
1. Added serial number display to script output
|
|
2. User physically inspected visible serial numbers on drive bays
|
|
3. Cross-referenced SMART serials with visible labels
|
|
4. Corrected HBA PHY mappings:
|
|
- Bay 5: phy6 (not phy2)
|
|
- Bay 6: phy7 (not phy3)
|
|
- Bay 7: phy5 (not phy4)
|
|
- Bay 8: phy2 (not phy5)
|
|
- Bay 9: phy4 (not phy6)
|
|
- Bay 10: phy3 (not phy7)
|
|
|
|
### Phase 4: User Experience Improvements
|
|
|
|
**ASCII Art Rendering:**
|
|
- Initial version had variable-width boxes that broke alignment
|
|
- Fixed by using consistent 10-character wide bay boxes
|
|
- Multiple iterations to perfect right border alignment
|
|
|
|
**Drive Table Enhancements:**
|
|
- Original: Alphabetical by device name
|
|
- Improved: Sorted by physical bay position (1-10)
|
|
- Added BAY column to show physical location
|
|
- Wider columns to prevent text wrapping
|
|
|
|
### Phase 5: Ceph Integration
|
|
**User Request:** "Can we show ceph in/up out/down status in the table?"
|
|
|
|
**Implementation:**
|
|
1. Added CEPH OSD column using `ceph-volume lvm list`
|
|
2. Added STATUS column parsing `ceph osd tree`
|
|
3. Initial bug: Parsed wrong columns (5 & 6 instead of correct ones)
|
|
4. Fixed by understanding `ceph osd tree` format:
|
|
- Column 5: STATUS (up/down)
|
|
- Column 6: REWEIGHT (1.0 = in, 0 = out)
|
|
|
|
**User Request:** "Show which is the boot drive somehow?"
|
|
|
|
**Solution:**
|
|
- Added USAGE column
|
|
- Checks mount points
|
|
- Shows "BOOT" for root filesystem
|
|
- Shows mount point for other mounts
|
|
- Shows "-" for Ceph OSDs (using LVM)
|
|
|
|
## Technical Challenges Solved
|
|
|
|
### 1. Line Ending Issues
|
|
- **Problem:** `diagnose-drives.sh` had CRLF endings → script failures
|
|
- **Solution:** `sed -i 's/\r$//'` to convert to LF
|
|
|
|
### 2. PCI Path Pattern Matching
|
|
- **Problem:** Bash regex escaping for grep patterns
|
|
- **Solution:** `grep -E "^\s*${osd_num}\s+"` for reliable matching
|
|
|
|
### 3. Floating Point Comparison in Bash
|
|
- **Problem:** Bash doesn't natively support decimal comparisons
|
|
- **Solution:** Used `bc -l` with error handling: `$(echo "$reweight > 0" | bc -l 2>/dev/null || echo 0)`
|
|
|
|
### 4. Associative Array Sorting
|
|
- **Problem:** Bash associative arrays don't maintain insertion order
|
|
- **Solution:** Extract keys, filter numeric ones, pipe to `sort -n`
|
|
|
|
## Key Learning Moments
|
|
|
|
1. **Hardware Reality vs. Assumptions:** The original script assumed controller addresses that didn't exist. Always probe actual hardware.
|
|
|
|
2. **Physical Verification is Essential:** Serial numbers visible on drive trays were crucial for verifying correct mappings.
|
|
|
|
3. **Iterative Refinement:** The script went through 15+ commits, each improving a specific aspect based on user testing and feedback.
|
|
|
|
4. **User-Driven Feature Evolution:** Features like Ceph integration and boot drive detection emerged organically from user needs.
|
|
|
|
## Commits Timeline
|
|
|
|
1. Initial refactoring and architecture improvements
|
|
2. Fixed PCI path mappings based on discovered hardware
|
|
3. Added serial numbers for physical verification
|
|
4. Fixed ASCII art rendering issues
|
|
5. Corrected bay mappings based on user verification
|
|
6. Added bay-sorted output
|
|
7. Implemented Ceph OSD tracking
|
|
8. Added Ceph up/in status
|
|
9. Added boot drive detection
|
|
10. Fixed Ceph status parsing
|
|
11. Documentation updates
|
|
|
|
## Collaborative Techniques Used
|
|
|
|
### Information Gathering
|
|
- Asked clarifying questions about hardware configuration
|
|
- Requested diagnostic command output
|
|
- Had user physically verify drive locations
|
|
|
|
### Iterative Development
|
|
- Made small, testable changes
|
|
- User tested after each significant change
|
|
- Incorporated feedback immediately
|
|
|
|
### Problem-Solving Approach
|
|
1. Understand current state
|
|
2. Identify specific issues
|
|
3. Propose solution
|
|
4. Implement incrementally
|
|
5. Test and verify
|
|
6. Refine based on feedback
|
|
|
|
## Metrics
|
|
|
|
- **Lines of Code:** ~330 (main script)
|
|
- **Supported Chassis Types:** 4 (10-bay, large1, micro, spare)
|
|
- **Mapped Servers:** 1 fully (compute-storage-01), 3 pending
|
|
- **Features Added:** 10+
|
|
- **Bugs Fixed:** 6 major, multiple minor
|
|
- **Documentation:** Comprehensive README + this file
|
|
|
|
## Future Enhancements
|
|
|
|
Potential improvements identified during development:
|
|
|
|
1. **Auto-detection:** Attempt to auto-map bays by testing with `hdparm` LED control
|
|
2. **Color Output:** Use terminal colors for health status (green/red)
|
|
3. **Historical Tracking:** Log temperature trends over time
|
|
4. **Alert Integration:** Notify when drive health deteriorates
|
|
5. **Web Interface:** Display chassis map in a web dashboard
|
|
6. **Multi-server View:** Show all servers in one consolidated view
|
|
|
|
## Lessons for Future AI-Assisted Development
|
|
|
|
### What Worked Well
|
|
- Breaking complex problems into small, testable pieces
|
|
- Using diagnostic scripts to understand actual vs. assumed state
|
|
- Physical verification before trusting software output
|
|
- Comprehensive documentation alongside code
|
|
- Git commits with detailed messages for traceability
|
|
|
|
### What Could Be Improved
|
|
- Earlier physical verification would have saved iteration
|
|
- More upfront hardware documentation would help
|
|
- Automated testing for bay mappings (if possible)
|
|
|
|
## Conclusion
|
|
|
|
This project demonstrates effective human-AI collaboration where:
|
|
- The AI provided technical implementation and problem-solving
|
|
- The human provided domain knowledge, testing, and verification
|
|
- Iterative feedback loops led to a polished, production-ready tool
|
|
|
|
The result is a robust infrastructure management tool that provides instant visibility into complex storage configurations across multiple servers.
|
|
|
|
---
|
|
|
|
**Development Credits:**
|
|
- **Human Developer:** LotusGuild
|
|
- **AI Assistant:** Claude Sonnet 4.5 (Anthropic)
|
|
- **Development Date:** January 6, 2026
|
|
- **Project:** Drive Atlas v1.0
|