fix accumulation

This commit is contained in:
Juergen Kunz
2025-06-08 12:25:31 +00:00
parent 82a350bf51
commit dc3eda5e29
3 changed files with 153 additions and 4 deletions

View File

@ -673,4 +673,52 @@ if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && r
- Connection is older than 60 seconds
- Both sockets are still alive (not destroyed)
This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.
This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.
## 🚨 CRITICAL FIX: Cleanup Queue Bug (January 2025)
### Critical Bug Found
The cleanup queue had a severe bug that caused connection accumulation when more than 100 connections needed cleanup:
```typescript
// BUG: This cleared the ENTIRE queue after processing only the first batch!
const toCleanup = Array.from(this.cleanupQueue).slice(0, this.cleanupBatchSize);
this.cleanupQueue.clear(); // ❌ This discarded all connections beyond the first 100!
```
### Fix Implemented
```typescript
// Now only removes the connections being processed
const toCleanup = Array.from(this.cleanupQueue).slice(0, this.cleanupBatchSize);
for (const connectionId of toCleanup) {
this.cleanupQueue.delete(connectionId); // ✅ Only remove what we process
const record = this.connectionRecords.get(connectionId);
if (record) {
this.cleanupConnection(record, record.incomingTerminationReason || 'normal');
}
}
```
### Impact
- **Before**: If 150 connections needed cleanup, only the first 100 would be processed and the remaining 50 would accumulate forever
- **After**: All connections are properly cleaned up in batches
### Additional Improvements
1. **Faster Inactivity Checks**: Reduced from 30s to 10s intervals
- Zombies and stuck connections are detected 3x faster
- Reduces the window for accumulation
2. **Duplicate Prevention**: Added check in queueCleanup to prevent processing already-closed connections
- Prevents unnecessary work
- Ensures connections are only cleaned up once
### Summary of All Fixes
1. **Connection Timeout** (already documented) - Prevents accumulation when backends are unreachable
2. **Zombie Detection** - Cleans up connections with destroyed sockets
3. **Stuck Connection Detection** - Cleans up connections to hanging backends
4. **Cleanup Queue Bug** - Ensures ALL connections get cleaned up, not just the first 100
5. **Faster Detection** - Reduced check interval from 30s to 10s
These fixes combined should prevent connection accumulation in all known scenarios.