Created README
This commit is contained in:
84
README.md
Normal file
84
README.md
Normal file
@ -0,0 +1,84 @@
|
||||
# System Health Monitoring Daemon
|
||||
|
||||
A robust system health monitoring daemon that tracks hardware status and automatically creates tickets for detected issues.
|
||||
|
||||
## Features
|
||||
|
||||
- Comprehensive system health monitoring:
|
||||
- Drive health (SMART status and disk usage)
|
||||
- Memory usage
|
||||
- CPU utilization
|
||||
- Network connectivity (Management and Ceph networks)
|
||||
- Automatic ticket creation for detected issues
|
||||
- Configurable thresholds and monitoring parameters
|
||||
- Dry-run mode for testing
|
||||
- Systemd integration for automated daily checks
|
||||
|
||||
## Installation
|
||||
|
||||
1. Copy the service and timer files to systemd:
|
||||
```bash
|
||||
sudo cp hwmon.service /etc/systemd/system/
|
||||
sudo cp hwmon.timer /etc/systemd/system/
|
||||
```
|
||||
2. Reload systemd daemon:
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
```
|
||||
3. Enable and start the timer:
|
||||
```bash
|
||||
sudo systemctl enable hwmon.timer
|
||||
sudo systemctl start hwmon.timer
|
||||
```
|
||||
|
||||
|
||||
## Manual Execution
|
||||
|
||||
1. Run the daemon with dry-run mode to test:
|
||||
```bash
|
||||
python3 hwmonDaemon.py --dry-run
|
||||
```
|
||||
2. Run the daemon normally:
|
||||
```bash
|
||||
python3 hwmonDaemon.py
|
||||
```
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
The daemon monitors:
|
||||
|
||||
- Disk usage (warns at 80%, critical at 90%)
|
||||
- Memory usage (warns at 80%)
|
||||
- CPU usage (warns at 80%)
|
||||
- Network connectivity to management (10.10.10.1) and Ceph (10.10.90.1) networks
|
||||
- SMART status of physical drives
|
||||
|
||||
## Ticket Creation
|
||||
|
||||
The daemon automatically creates tickets with:
|
||||
|
||||
- Standardized titles including hostname, hardware type, and scope
|
||||
- Detailed descriptions of detected issues
|
||||
- Priority levels based on severity (P2-P4)
|
||||
- Proper categorization and status tracking
|
||||
|
||||
## Dependencies
|
||||
|
||||
- Python 3
|
||||
- Required Python packages:
|
||||
- psutil
|
||||
- requests
|
||||
- smartmontools (for SMART disk monitoring)
|
||||
|
||||
## Service Configuration
|
||||
|
||||
The daemon runs:
|
||||
|
||||
- Daily via systemd timer
|
||||
- As root user for hardware access
|
||||
- With automatic restart on failure
|
||||
|
||||
## Security Note
|
||||
|
||||
Ensure proper network security measures are in place as the service downloads and executes code from a specified URL.
|
||||
@ -10,4 +10,4 @@ User=root
|
||||
Group=root
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
WantedBy=multi-user.target
|
||||
Reference in New Issue
Block a user