feat(monitoring): add edge-triggered threshold handling with group action orchestration and HA-aware Proxmox shutdowns
This commit is contained in:
@@ -12,12 +12,12 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
|
||||
|
||||
- **🔌 Multi-UPS Support** — Monitor multiple UPS devices from a single daemon
|
||||
- **📡 Dual Protocol Support** — SNMP (v1/v2c/v3) for network UPS + UPSD/NIS for USB-connected UPS via NUT
|
||||
- **🖥️ Proxmox Integration** — Gracefully shut down QEMU VMs and LXC containers before host shutdown (auto-detects CLI tools — no API token needed on Proxmox hosts)
|
||||
- **🖥️ Proxmox Integration** — Gracefully shut down QEMU VMs and LXC containers before host shutdown, with optional HA-aware stop requests for HA-managed guests
|
||||
- **👥 Group Management** — Organize UPS devices into groups with flexible operating modes
|
||||
- **Redundant Mode** — Only trigger actions when ALL UPS devices in a group are critical
|
||||
- **Non-Redundant Mode** — Trigger actions when ANY UPS device is critical
|
||||
- **⚙️ Action System** — Define custom responses with flexible trigger conditions
|
||||
- Battery & runtime threshold triggers
|
||||
- Edge-triggered battery & runtime threshold triggers
|
||||
- Power status change triggers
|
||||
- Webhook notifications (POST/GET)
|
||||
- Custom shell scripts
|
||||
@@ -255,6 +255,7 @@ their own `shutdownDelay`.
|
||||
"triggerMode": "onlyThresholds",
|
||||
"thresholds": { "battery": 30, "runtime": 15 },
|
||||
"proxmoxMode": "auto",
|
||||
"proxmoxHaPolicy": "haStop",
|
||||
"proxmoxExcludeIds": [],
|
||||
"proxmoxForceStop": true
|
||||
},
|
||||
@@ -360,6 +361,10 @@ For USB-connected UPS via [NUT (Network UPS Tools)](https://networkupstools.org/
|
||||
|
||||
Actions define automated responses to UPS conditions. They run **sequentially in array order**, so place Proxmox actions before shutdown actions.
|
||||
|
||||
Threshold-based actions are **edge-triggered**: they fire when the monitored UPS or group **enters** a threshold violation, not on every polling cycle while the threshold remains violated. If the condition clears and later re-enters, the action can fire again.
|
||||
|
||||
Shutdown and Proxmox actions also suppress duplicate runs where possible, so overlapping UPS and group actions do not repeatedly schedule the same host or guest shutdown workflow.
|
||||
|
||||
#### Action Types
|
||||
|
||||
| Type | Description |
|
||||
@@ -382,8 +387,8 @@ Actions define automated responses to UPS conditions. They run **sequentially in
|
||||
| Mode | Description |
|
||||
| ----------------------------- | -------------------------------------------------------- |
|
||||
| `onlyPowerChanges` | Only when power status changes (online ↔ onBattery) |
|
||||
| `onlyThresholds` | Only when battery or runtime thresholds are violated |
|
||||
| `powerChangesAndThresholds` | On power changes OR threshold violations (default) |
|
||||
| `onlyThresholds` | Only when battery or runtime thresholds are newly violated |
|
||||
| `powerChangesAndThresholds` | On power changes OR when thresholds are newly violated (default) |
|
||||
| `anyChange` | On every polling cycle |
|
||||
|
||||
#### Shutdown Action
|
||||
@@ -441,6 +446,8 @@ Actions define automated responses to UPS conditions. They run **sequentially in
|
||||
|
||||
Gracefully shuts down QEMU VMs and LXC containers on a Proxmox node before the host is shut down.
|
||||
|
||||
If you use Proxmox HA, NUPST can optionally request `state=stopped` for HA-managed guests instead of only issuing direct `qm` / `pct` shutdown commands.
|
||||
|
||||
NUPST supports **two operation modes** for Proxmox:
|
||||
|
||||
| Mode | Description | Requirements |
|
||||
@@ -459,6 +466,7 @@ NUPST supports **two operation modes** for Proxmox:
|
||||
"thresholds": { "battery": 30, "runtime": 15 },
|
||||
"triggerMode": "onlyThresholds",
|
||||
"proxmoxMode": "auto",
|
||||
"proxmoxHaPolicy": "haStop",
|
||||
"proxmoxExcludeIds": [100, 101],
|
||||
"proxmoxStopTimeout": 120,
|
||||
"proxmoxForceStop": true
|
||||
@@ -473,6 +481,7 @@ NUPST supports **two operation modes** for Proxmox:
|
||||
"thresholds": { "battery": 30, "runtime": 15 },
|
||||
"triggerMode": "onlyThresholds",
|
||||
"proxmoxMode": "api",
|
||||
"proxmoxHaPolicy": "haStop",
|
||||
"proxmoxHost": "localhost",
|
||||
"proxmoxPort": 8006,
|
||||
"proxmoxTokenId": "root@pam!nupst",
|
||||
@@ -487,6 +496,7 @@ NUPST supports **two operation modes** for Proxmox:
|
||||
| Field | Description | Default |
|
||||
| --------------------- | ----------------------------------------------- | ------------- |
|
||||
| `proxmoxMode` | Operation mode | `auto` |
|
||||
| `proxmoxHaPolicy` | HA handling for HA-managed guests | `none`, `haStop` (`none` default) |
|
||||
| `proxmoxHost` | Proxmox API host (API mode only) | `localhost` |
|
||||
| `proxmoxPort` | Proxmox API port (API mode only) | `8006` |
|
||||
| `proxmoxNode` | Proxmox node name | Auto-detect via hostname |
|
||||
@@ -504,11 +514,20 @@ NUPST supports **two operation modes** for Proxmox:
|
||||
pveum user token add root@pam nupst --privsep=0
|
||||
```
|
||||
|
||||
**HA Policy values:**
|
||||
|
||||
- **`none`** — Treat HA-managed and non-HA guests the same. NUPST sends normal guest shutdown commands.
|
||||
- **`haStop`** — For HA-managed guests, NUPST requests HA resource state `stopped`. Non-HA guests still use normal shutdown commands.
|
||||
|
||||
> ⚠️ **Important:** Place the Proxmox action **before** the shutdown action in the actions array so VMs are stopped before the host shuts down.
|
||||
|
||||
### Group Configuration
|
||||
|
||||
Groups coordinate actions across multiple UPS devices:
|
||||
Groups coordinate actions across multiple UPS devices.
|
||||
|
||||
Group actions are evaluated **after all UPS devices have been refreshed for a polling cycle**.
|
||||
|
||||
There is **no aggregate battery math** across the group. Instead, each group action evaluates each member UPS against that action's own thresholds.
|
||||
|
||||
| Field | Description | Values |
|
||||
| ------------- | ---------------------------------- | -------------------- |
|
||||
@@ -520,8 +539,10 @@ Groups coordinate actions across multiple UPS devices:
|
||||
|
||||
**Group Modes:**
|
||||
|
||||
- **`redundant`** — Actions trigger only when ALL UPS devices in the group are critical. Use for setups with backup power units.
|
||||
- **`nonRedundant`** — Actions trigger when ANY UPS device is critical. Use when all UPS units must be operational.
|
||||
- **`redundant`** — A threshold-based action triggers only when **all** UPS devices in the group are on battery and below that action's thresholds. Use for setups with backup power units.
|
||||
- **`nonRedundant`** — A threshold-based action triggers when **any** UPS device in the group is on battery and below that action's thresholds. Use when all UPS units must be operational.
|
||||
|
||||
For threshold-based **destructive** group actions (`shutdown` and `proxmox`), NUPST suppresses execution while any group member is `unreachable`. This prevents acting on partial data during network failures.
|
||||
|
||||
### HTTP Server Configuration
|
||||
|
||||
@@ -597,6 +618,7 @@ NUPST tracks communication failures per UPS device:
|
||||
- After **3 consecutive failures**, the UPS status transitions to `unreachable`
|
||||
- **Shutdown actions will NOT fire** on `unreachable` — this prevents false shutdowns from network glitches
|
||||
- Webhook and script actions still fire, allowing you to send alerts
|
||||
- Threshold-based destructive **group** actions are also suppressed while any required group member is `unreachable`
|
||||
- When connectivity is restored, NUPST logs a recovery event with downtime duration
|
||||
- The failure counter is capped at 100 to prevent overflow
|
||||
|
||||
@@ -613,7 +635,7 @@ UPS Devices (2):
|
||||
✓ Main Server UPS (online - 100%, 3840min)
|
||||
Host: 192.168.1.100:161 (SNMP)
|
||||
Groups: Data Center
|
||||
Action: proxmox (onlyThresholds: battery<30%, runtime<15min)
|
||||
Action: proxmox (onlyThresholds: battery<30%, runtime<15min, ha=stop)
|
||||
Action: shutdown (onlyThresholds: battery<20%, runtime<10min, delay=10min)
|
||||
|
||||
✓ Local USB UPS (online - 95%, 2400min)
|
||||
@@ -784,6 +806,9 @@ curl -k -H "Authorization: PVEAPIToken=root@pam!nupst=YOUR-SECRET" \
|
||||
|
||||
# Check token permissions
|
||||
pveum user token list root@pam
|
||||
|
||||
# If using proxmoxHaPolicy: haStop
|
||||
ha-manager config
|
||||
```
|
||||
|
||||
### Actions Not Triggering
|
||||
|
||||
Reference in New Issue
Block a user