fix(metrics): fix metrics

This commit is contained in:
Juergen Kunz
2025-06-23 13:07:30 +00:00
parent cc9e76fade
commit caa15e539e
3 changed files with 110 additions and 42 deletions

View File

@ -2,6 +2,20 @@
## Byte Tracking and Metrics
### Throughput Drift Issue (Fixed)
**Problem**: Throughput numbers were gradually increasing over time for long-lived connections.
**Root Cause**: The `byRoute()` and `byIP()` methods were dividing cumulative total bytes (since connection start) by the window duration, causing rates to appear higher as connections aged:
- Hour 1: 1GB total / 60s = 17 MB/s ✓
- Hour 2: 2GB total / 60s = 34 MB/s ✗ (appears doubled!)
- Hour 3: 3GB total / 60s = 50 MB/s ✗ (keeps rising!)
**Solution**: Implemented snapshot-based byte tracking that calculates actual bytes transferred within each time window:
- Store periodic snapshots of byte counts with timestamps
- Calculate delta between window start and end snapshots
- Divide delta by window duration for accurate throughput
### What Gets Counted (Network Interface Throughput)
The byte tracking is designed to match network interface throughput (what Unifi/network monitoring tools show):
@ -41,10 +55,13 @@ The byte tracking is designed to match network interface throughput (what Unifi/
The metrics system has three layers:
1. **Connection Records** (`record.bytesReceived/bytesSent`): Track total bytes per connection
2. **ThroughputTracker**: Accumulates bytes between samples for rate calculations (bytes/second)
3. **connectionByteTrackers**: Track bytes per connection with timestamps for per-route/IP metrics
2. **ThroughputTracker**: Accumulates bytes between samples for global rate calculations (resets each second)
3. **connectionByteTrackers**: Track bytes per connection with snapshots for accurate windowed per-route/IP metrics
Total byte counts come from connection records only, preventing double counting.
Key features:
- Global throughput uses sampling with accumulator reset (accurate)
- Per-route/IP throughput uses snapshots to calculate window-specific deltas (accurate)
- All byte counting happens exactly once at the data flow point
### Understanding "High" Byte Counts