Created README

This commit is contained in:
2024-12-05 21:26:01 -05:00
parent d1d41106bd
commit 790c40ec0d
2 changed files with 85 additions and 1 deletions

84
README.md Normal file
View File

@ -0,0 +1,84 @@
# System Health Monitoring Daemon
A robust system health monitoring daemon that tracks hardware status and automatically creates tickets for detected issues.
## Features
- Comprehensive system health monitoring:
- Drive health (SMART status and disk usage)
- Memory usage
- CPU utilization
- Network connectivity (Management and Ceph networks)
- Automatic ticket creation for detected issues
- Configurable thresholds and monitoring parameters
- Dry-run mode for testing
- Systemd integration for automated daily checks
## Installation
1. Copy the service and timer files to systemd:
```bash
sudo cp hwmon.service /etc/systemd/system/
sudo cp hwmon.timer /etc/systemd/system/
```
2. Reload systemd daemon:
```bash
sudo systemctl daemon-reload
```
3. Enable and start the timer:
```bash
sudo systemctl enable hwmon.timer
sudo systemctl start hwmon.timer
```
## Manual Execution
1. Run the daemon with dry-run mode to test:
```bash
python3 hwmonDaemon.py --dry-run
```
2. Run the daemon normally:
```bash
python3 hwmonDaemon.py
```
## Configuration
The daemon monitors:
- Disk usage (warns at 80%, critical at 90%)
- Memory usage (warns at 80%)
- CPU usage (warns at 80%)
- Network connectivity to management (10.10.10.1) and Ceph (10.10.90.1) networks
- SMART status of physical drives
## Ticket Creation
The daemon automatically creates tickets with:
- Standardized titles including hostname, hardware type, and scope
- Detailed descriptions of detected issues
- Priority levels based on severity (P2-P4)
- Proper categorization and status tracking
## Dependencies
- Python 3
- Required Python packages:
- psutil
- requests
- smartmontools (for SMART disk monitoring)
## Service Configuration
The daemon runs:
- Daily via systemd timer
- As root user for hardware access
- With automatic restart on failure
## Security Note
Ensure proper network security measures are in place as the service downloads and executes code from a specified URL.