fix(metrics): fix metrics

This commit is contained in:
Juergen Kunz
2025-06-23 15:42:04 +00:00
parent 4587940f38
commit 3857d2670f
2 changed files with 49 additions and 14 deletions

View File

@ -142,4 +142,45 @@ Keep-alive connections receive special treatment based on `keepAliveTreatment` s
The system supports both receiving and sending PROXY protocol:
- **Receiving**: Automatically detected from trusted proxy IPs (configured in `proxyIPs`)
- **Sending**: Enabled per-route or globally via `sendProxyProtocol` setting
- Real client IP is preserved and used for all connection tracking and security checks
- Real client IP is preserved and used for all connection tracking and security checks
## Metrics and Throughput Calculation
The metrics system tracks throughput using per-second sampling:
1. **Byte Recording**: Bytes are recorded as data flows through connections
2. **Sampling**: Every second, accumulated bytes are stored as a sample
3. **Rate Calculation**: Throughput is calculated by summing bytes over a time window
4. **Per-Route/IP Tracking**: Separate ThroughputTracker instances for each route and IP
Key implementation details:
- Bytes are recorded in the bidirectional forwarding callbacks
- The instant() method returns throughput over the last 1 second
- The recent() method returns throughput over the last 10 seconds
- Custom windows can be specified for different averaging periods
### Throughput Spikes Issue
There's a fundamental difference between application-layer and network-layer throughput:
**Application Layer (what we measure)**:
- Bytes are recorded when delivered to/from the application
- Large chunks can arrive "instantly" due to kernel/Node.js buffering
- Shows spikes when buffers are flushed (e.g., 20MB in 1 second = 160 Mbit/s)
**Network Layer (what Unifi shows)**:
- Actual packet flow through the network interface
- Limited by physical network speed (e.g., 20 Mbit/s)
- Data transfers over time, not in bursts
The spikes occur because:
1. Data flows over network at 20 Mbit/s (takes 8 seconds for 20MB)
2. Kernel/Node.js buffers this incoming data
3. When buffer is flushed, application receives large chunk at once
4. We record entire chunk in current second, creating artificial spike
**Potential Solutions**:
1. Use longer window for "instant" measurements (e.g., 5 seconds instead of 1)
2. Track socket write backpressure to estimate actual network flow
3. Implement bandwidth estimation based on connection duration
4. Accept that application-layer != network-layer throughput