feat(monitoring): add edge-triggered threshold handling with group action orchestration and HA-aware Proxmox shutdowns

This commit is contained in:
2026-04-16 02:54:16 +00:00
parent bf4d519428
commit a435bd6fed
13 changed files with 1052 additions and 117 deletions
+33 -8
View File
@@ -12,12 +12,12 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
- **🔌 Multi-UPS Support** — Monitor multiple UPS devices from a single daemon
- **📡 Dual Protocol Support** — SNMP (v1/v2c/v3) for network UPS + UPSD/NIS for USB-connected UPS via NUT
- **🖥️ Proxmox Integration** — Gracefully shut down QEMU VMs and LXC containers before host shutdown (auto-detects CLI tools — no API token needed on Proxmox hosts)
- **🖥️ Proxmox Integration** — Gracefully shut down QEMU VMs and LXC containers before host shutdown, with optional HA-aware stop requests for HA-managed guests
- **👥 Group Management** — Organize UPS devices into groups with flexible operating modes
- **Redundant Mode** — Only trigger actions when ALL UPS devices in a group are critical
- **Non-Redundant Mode** — Trigger actions when ANY UPS device is critical
- **⚙️ Action System** — Define custom responses with flexible trigger conditions
- Battery & runtime threshold triggers
- Edge-triggered battery & runtime threshold triggers
- Power status change triggers
- Webhook notifications (POST/GET)
- Custom shell scripts
@@ -255,6 +255,7 @@ their own `shutdownDelay`.
"triggerMode": "onlyThresholds",
"thresholds": { "battery": 30, "runtime": 15 },
"proxmoxMode": "auto",
"proxmoxHaPolicy": "haStop",
"proxmoxExcludeIds": [],
"proxmoxForceStop": true
},
@@ -360,6 +361,10 @@ For USB-connected UPS via [NUT (Network UPS Tools)](https://networkupstools.org/
Actions define automated responses to UPS conditions. They run **sequentially in array order**, so place Proxmox actions before shutdown actions.
Threshold-based actions are **edge-triggered**: they fire when the monitored UPS or group **enters** a threshold violation, not on every polling cycle while the threshold remains violated. If the condition clears and later re-enters, the action can fire again.
Shutdown and Proxmox actions also suppress duplicate runs where possible, so overlapping UPS and group actions do not repeatedly schedule the same host or guest shutdown workflow.
#### Action Types
| Type | Description |
@@ -382,8 +387,8 @@ Actions define automated responses to UPS conditions. They run **sequentially in
| Mode | Description |
| ----------------------------- | -------------------------------------------------------- |
| `onlyPowerChanges` | Only when power status changes (online ↔ onBattery) |
| `onlyThresholds` | Only when battery or runtime thresholds are violated |
| `powerChangesAndThresholds` | On power changes OR threshold violations (default) |
| `onlyThresholds` | Only when battery or runtime thresholds are newly violated |
| `powerChangesAndThresholds` | On power changes OR when thresholds are newly violated (default) |
| `anyChange` | On every polling cycle |
#### Shutdown Action
@@ -441,6 +446,8 @@ Actions define automated responses to UPS conditions. They run **sequentially in
Gracefully shuts down QEMU VMs and LXC containers on a Proxmox node before the host is shut down.
If you use Proxmox HA, NUPST can optionally request `state=stopped` for HA-managed guests instead of only issuing direct `qm` / `pct` shutdown commands.
NUPST supports **two operation modes** for Proxmox:
| Mode | Description | Requirements |
@@ -459,6 +466,7 @@ NUPST supports **two operation modes** for Proxmox:
"thresholds": { "battery": 30, "runtime": 15 },
"triggerMode": "onlyThresholds",
"proxmoxMode": "auto",
"proxmoxHaPolicy": "haStop",
"proxmoxExcludeIds": [100, 101],
"proxmoxStopTimeout": 120,
"proxmoxForceStop": true
@@ -473,6 +481,7 @@ NUPST supports **two operation modes** for Proxmox:
"thresholds": { "battery": 30, "runtime": 15 },
"triggerMode": "onlyThresholds",
"proxmoxMode": "api",
"proxmoxHaPolicy": "haStop",
"proxmoxHost": "localhost",
"proxmoxPort": 8006,
"proxmoxTokenId": "root@pam!nupst",
@@ -487,6 +496,7 @@ NUPST supports **two operation modes** for Proxmox:
| Field | Description | Default |
| --------------------- | ----------------------------------------------- | ------------- |
| `proxmoxMode` | Operation mode | `auto` |
| `proxmoxHaPolicy` | HA handling for HA-managed guests | `none`, `haStop` (`none` default) |
| `proxmoxHost` | Proxmox API host (API mode only) | `localhost` |
| `proxmoxPort` | Proxmox API port (API mode only) | `8006` |
| `proxmoxNode` | Proxmox node name | Auto-detect via hostname |
@@ -504,11 +514,20 @@ NUPST supports **two operation modes** for Proxmox:
pveum user token add root@pam nupst --privsep=0
```
**HA Policy values:**
- **`none`** — Treat HA-managed and non-HA guests the same. NUPST sends normal guest shutdown commands.
- **`haStop`** — For HA-managed guests, NUPST requests HA resource state `stopped`. Non-HA guests still use normal shutdown commands.
> ⚠️ **Important:** Place the Proxmox action **before** the shutdown action in the actions array so VMs are stopped before the host shuts down.
### Group Configuration
Groups coordinate actions across multiple UPS devices:
Groups coordinate actions across multiple UPS devices.
Group actions are evaluated **after all UPS devices have been refreshed for a polling cycle**.
There is **no aggregate battery math** across the group. Instead, each group action evaluates each member UPS against that action's own thresholds.
| Field | Description | Values |
| ------------- | ---------------------------------- | -------------------- |
@@ -520,8 +539,10 @@ Groups coordinate actions across multiple UPS devices:
**Group Modes:**
- **`redundant`** — Actions trigger only when ALL UPS devices in the group are critical. Use for setups with backup power units.
- **`nonRedundant`** — Actions trigger when ANY UPS device is critical. Use when all UPS units must be operational.
- **`redundant`** — A threshold-based action triggers only when **all** UPS devices in the group are on battery and below that action's thresholds. Use for setups with backup power units.
- **`nonRedundant`** — A threshold-based action triggers when **any** UPS device in the group is on battery and below that action's thresholds. Use when all UPS units must be operational.
For threshold-based **destructive** group actions (`shutdown` and `proxmox`), NUPST suppresses execution while any group member is `unreachable`. This prevents acting on partial data during network failures.
### HTTP Server Configuration
@@ -597,6 +618,7 @@ NUPST tracks communication failures per UPS device:
- After **3 consecutive failures**, the UPS status transitions to `unreachable`
- **Shutdown actions will NOT fire** on `unreachable` — this prevents false shutdowns from network glitches
- Webhook and script actions still fire, allowing you to send alerts
- Threshold-based destructive **group** actions are also suppressed while any required group member is `unreachable`
- When connectivity is restored, NUPST logs a recovery event with downtime duration
- The failure counter is capped at 100 to prevent overflow
@@ -613,7 +635,7 @@ UPS Devices (2):
✓ Main Server UPS (online - 100%, 3840min)
Host: 192.168.1.100:161 (SNMP)
Groups: Data Center
Action: proxmox (onlyThresholds: battery<30%, runtime<15min)
Action: proxmox (onlyThresholds: battery<30%, runtime<15min, ha=stop)
Action: shutdown (onlyThresholds: battery<20%, runtime<10min, delay=10min)
✓ Local USB UPS (online - 95%, 2400min)
@@ -784,6 +806,9 @@ curl -k -H "Authorization: PVEAPIToken=root@pam!nupst=YOUR-SECRET" \
# Check token permissions
pveum user token list root@pam
# If using proxmoxHaPolicy: haStop
ha-manager config
```
### Actions Not Triggering