fix(readme): update

2025-06-13 17:22:31 +00:00
parent 7e1b7b190c
commit 424407d879
14 changed files with 85 additions and 4804 deletions
--- a/readme.connections.md
+++ b/readme.connections.md
@ -1,724 +0,0 @@
-# Connection Management in SmartProxy
-
-This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations.
-
-## Connection Accumulation Investigation (January 2025)
-
-### Problem Statement
-Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes.
-
-### Historical Context
- **v19.5.12-v19.5.15**: Major connection cleanup improvements
- **v19.5.19+**: PROXY protocol support with WrappedSocket implementation
- **v19.5.20**: Fixed race condition in immediate routing cleanup
-
-### Current Architecture
-
-#### Connection Flow in Proxy Chains
-```
-Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443)
-```
-
-1. **Outer Proxy**:
-   - Accepts client connection
-   - Sends PROXY protocol header to inner proxy
-   - Tracks connection in ConnectionManager
-   - Immediate routing for non-TLS ports
-
-2. **Inner Proxy**:
-   - Parses PROXY protocol to get real client IP
-   - Establishes connection to backend
-   - Tracks its own connections separately
-
-### Potential Causes of Connection Accumulation
-
-#### 1. Race Condition in Immediate Routing
-When a connection is immediately routed (non-TLS ports), there's a timing window:
-```typescript
-// route-connection-handler.ts, line ~231
-this.routeConnection(socket, record, '', undefined);
-// Connection is routed before all setup is complete
-```
-
-**Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly.
-
-#### 2. Outgoing Socket Assignment Timing
-Despite the fix in v19.5.20:
-```typescript
-// Line 1362 in setupDirectConnection
-record.outgoing = targetSocket;
-```
-There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket.
-
-#### 3. Batch Cleanup Delays
-ConnectionManager uses queued cleanup:
- Batch size: 100 connections
- Batch interval: 100ms
- Under rapid connection/disconnection, queue might lag
-
-#### 4. Different Cleanup Paths
-Multiple cleanup triggers exist:
- Socket 'close' event
- Socket 'error' event  
- Inactivity timeout
- Connection timeout
- Manual cleanup
-
-Not all paths may properly handle proxy chain scenarios.
-
-#### 5. Keep-Alive Connection Handling
-Keep-alive connections have special treatment:
- Extended inactivity timeout (6x normal)
- Warning before closure
- May accumulate if backend is unresponsive
-
-### Observed Symptoms
-
-1. **Outer proxy connection count grows over time**
-2. **Inner proxy maintains zero or low connection count**
-3. **Connections show as closed in logs but remain in tracking**
-4. **Memory usage gradually increases**
-
-### Debug Strategies
-
-#### 1. Enhanced Logging
-Add connection state logging at key points:
-```typescript
-// When outgoing socket is created
-logger.log('debug', `Outgoing socket created for ${connectionId}`, {
-  hasOutgoing: !!record.outgoing,
-  outgoingState: record.outgoing?.readyState
-});
-```
-
-#### 2. Connection State Inspection
-Periodically log detailed connection state:
-```typescript
-for (const [id, record] of connectionManager.getConnections()) {
-  console.log({
-    id,
-    age: Date.now() - record.incomingStartTime,
-    incomingDestroyed: record.incoming.destroyed,
-    outgoingDestroyed: record.outgoing?.destroyed,
-    hasCleanupTimer: !!record.cleanupTimer
-  });
-}
-```
-
-#### 3. Cleanup Verification
-Track cleanup completion:
-```typescript
-// In cleanupConnection
-logger.log('debug', `Cleanup completed for ${record.id}`, {
-  recordsRemaining: this.connectionRecords.size
-});
-```
-
-### Recommendations
-
-1. **Immediate Cleanup for Proxy Chains**
-   - Skip batch queue for proxy chain connections
-   - Use synchronous cleanup when PROXY protocol is detected
-
-2. **Socket State Validation**
-   - Check both `destroyed` and `readyState` before cleanup decisions
-   - Handle 'opening' state sockets explicitly
-
-3. **Timeout Adjustments**
-   - Shorter timeouts for proxy chain connections
-   - More aggressive cleanup for connections without data transfer
-
-4. **Connection Limits**
-   - Per-route connection limits
-   - Backpressure when approaching limits
-
-5. **Monitoring**
-   - Export connection metrics
-   - Alert on connection count thresholds
-   - Track connection age distribution
-
-### Test Scenarios to Reproduce
-
-1. **Rapid Connect/Disconnect**
-   ```bash
-   # Create many short-lived connections
-   for i in {1..1000}; do
-     (echo -n | nc localhost 8001) &
-   done
-   ```
-
-2. **Slow Backend**
-   - Configure inner proxy to connect to unresponsive backend
-   - Monitor outer proxy connection count
-
-3. **Mixed Traffic**
-   - Combine TLS and non-TLS connections
-   - Add keep-alive connections
-   - Observe accumulation patterns
-
-### Future Improvements
-
-1. **Connection Pool Isolation**
-   - Separate pools for proxy chain vs direct connections
-   - Different cleanup strategies per pool
-
-2. **Circuit Breaker**
-   - Detect accumulation and trigger aggressive cleanup
-   - Temporary refuse new connections when near limit
-
-3. **Connection State Machine**
-   - Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED
-   - State transition validation
-   - Timeout per state
-
-4. **Metrics Collection**
-   - Connection lifecycle events
-   - Cleanup success/failure rates
-   - Time spent in each state
-
-### Root Cause Identified (January 2025)
-
-**The primary issue is on the inner proxy when backends are unreachable:**
-
-When the backend is unreachable (e.g., non-routable IP like 10.255.255.1):
-1. The outgoing socket gets stuck in "opening" state indefinitely
-2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout
-3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase
-4. Connections accumulate because they never transition to error state
-5. Socket timeout warnings fire but connections are preserved as keep-alive
-
-**Code Issue:**
-```typescript
-// socket-utils.ts line 275
-if (timeout) {
-  socket.setTimeout(timeout);  // This only handles inactivity, not connection!
-}
-```
-
-**Required Fix:**
-
-1. Add `connectionTimeout` to ISmartProxyOptions interface:
-```typescript
-// In interfaces.ts
-connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s)
-```
-
-2. Update `createSocketWithErrorHandler` in socket-utils.ts:
-```typescript
-export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket {
-  const { port, host, onError, onConnect, timeout } = options;
-  
-  const socket = new plugins.net.Socket();
-  let connected = false;
-  let connectionTimeout: NodeJS.Timeout | null = null;
-  
-  socket.on('error', (error) => {
-    if (connectionTimeout) {
-      clearTimeout(connectionTimeout);
-      connectionTimeout = null;
-    }
-    if (onError) onError(error);
-  });
-  
-  socket.on('connect', () => {
-    connected = true;
-    if (connectionTimeout) {
-      clearTimeout(connectionTimeout);
-      connectionTimeout = null;
-    }
-    if (timeout) socket.setTimeout(timeout); // Set inactivity timeout
-    if (onConnect) onConnect();
-  });
-  
-  // Implement connection establishment timeout
-  if (timeout) {
-    connectionTimeout = setTimeout(() => {
-      if (!connected && !socket.destroyed) {
-        const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
-        (error as any).code = 'ETIMEDOUT';
-        socket.destroy();
-        if (onError) onError(error);
-      }
-    }, timeout);
-  }
-  
-  socket.connect(port, host);
-  return socket;
-}
-```
-
-3. Pass connectionTimeout in route-connection-handler.ts:
-```typescript
-const targetSocket = createSocketWithErrorHandler({
-  port: finalTargetPort,
-  host: finalTargetHost,
-  timeout: this.settings.connectionTimeout || 30000, // Connection timeout
-  onError: (error) => { /* existing */ },
-  onConnect: async () => { /* existing */ }
-});
-```
-
-### Investigation Results (January 2025)
-
-Based on extensive testing with debug scripts:
-
-1. **Normal Operation**: In controlled tests, connections are properly cleaned up:
-   - Immediate routing cleanup handler properly destroys outgoing connections
-   - Both outer and inner proxies maintain 0 connections after clients disconnect
-   - Keep-alive connections are tracked and cleaned up correctly
-
-2. **Potential Edge Cases Not Covered by Tests**:
-   - **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1
-   - **WebSocket Connections**: Long-lived upgrade connections might persist
-   - **Partial TLS Handshakes**: Connections that start TLS but don't complete
-   - **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources
-   - **Connection Pool Reuse**: HttpProxy component may maintain its own pools
-
-3. **Timing-Sensitive Scenarios**:
-   - Client disconnects exactly when `record.outgoing` is being assigned
-   - Backend connects but immediately RSTs
-   - Proxy chain where middle proxy restarts
-   - Multiple rapid reconnects with same source IP/port
-
-4. **Configuration-Specific Issues**:
-   - Mixed `sendProxyProtocol` settings in chain
-   - Different `keepAlive` settings between proxies
-   - Mismatched timeout values
-   - Routes with `forwardingEngine: 'nftables'`
-
-### Additional Debug Points
-
-Add these debug logs to identify the specific scenario:
-
-```typescript
-// In route-connection-handler.ts setupDirectConnection
-logger.log('debug', `Setting outgoing socket for ${connectionId}`, {
-  timestamp: Date.now(),
-  hasOutgoing: !!record.outgoing,
-  socketState: targetSocket.readyState
-});
-
-// In connection-manager.ts cleanupConnection
-logger.log('debug', `Cleanup attempt for ${record.id}`, {
-  alreadyClosed: record.connectionClosed,
-  hasIncoming: !!record.incoming,
-  hasOutgoing: !!record.outgoing,
-  incomingDestroyed: record.incoming?.destroyed,
-  outgoingDestroyed: record.outgoing?.destroyed
-});
-```
-
-### Workarounds
-
-Until root cause is identified:
-
-1. **Periodic Force Cleanup**:
-   ```typescript
-   setInterval(() => {
-     const connections = connectionManager.getConnections();
-     for (const [id, record] of connections) {
-       if (record.incoming?.destroyed && !record.connectionClosed) {
-         connectionManager.cleanupConnection(record, 'force_cleanup');
-       }
-     }
-   }, 60000); // Every minute
-   ```
-
-2. **Connection Age Limit**:
-   ```typescript
-   // Add max connection age check
-   const maxAge = 3600000; // 1 hour
-   if (Date.now() - record.incomingStartTime > maxAge) {
-     connectionManager.cleanupConnection(record, 'max_age');
-   }
-   ```
-
-3. **Aggressive Timeout Settings**:
-   ```typescript
-   {
-     socketTimeout: 60000,        // 1 minute
-     inactivityTimeout: 300000,   // 5 minutes
-     connectionCleanupInterval: 30000  // 30 seconds
-   }
-   ```
-
-### Related Files
- `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling
- `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup
- `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities
- `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup
- `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention
- `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states
- `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests
- `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains
-
-### Summary
-
-**Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable.
-
-**Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase.
-
-**Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate.
-
-**Fix Required**:
-1. Add `connectionTimeout` setting to ISmartProxyOptions
-2. Implement proper connection timeout in `createSocketWithErrorHandler`
-3. Pass the timeout value from route-connection-handler
-
-**Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above.
-
-The connection cleanup mechanisms have been significantly improved in v19.5.20:
-1. Race condition fixed by setting `record.outgoing` before connecting
-2. Immediate routing cleanup handler always destroys outgoing connections
-3. Tests confirm no accumulation in standard scenarios with reachable backends
-
-However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
-
-### Outer Proxy Sudden Accumulation After Hours
-
-**User Report**: "The counter goes up suddenly after some hours on the outer proxy"
-
-**Investigation Findings**:
-
-1. **Cleanup Queue Mechanism**:
-   - Connections are cleaned up in batches of 100 via a queue
-   - If the cleanup timer gets stuck or cleared without restart, connections accumulate
-   - The timer is set with `setTimeout` and could be affected by event loop blocking
-
-2. **Potential Causes for Sudden Spikes**:
-   
-   a) **Cleanup Timer Failure**:
-   ```typescript
-   // In ConnectionManager, if this timer gets cleared but not restarted:
-   this.cleanupTimer = this.setTimeout(() => {
-     this.processCleanupQueue();
-   }, 100);
-   ```
-   
-   b) **Memory Pressure**:
-   - After hours of operation, memory fragmentation or pressure could cause delays
-   - Garbage collection pauses might interfere with timer execution
-   
-   c) **Event Listener Accumulation**:
-   - Socket event listeners might accumulate over time
-   - Server 'connection' event handlers are particularly important
-   
-   d) **Keep-Alive Connection Cascades**:
-   - When many keep-alive connections timeout simultaneously
-   - Outer proxy has different timeout than inner proxy
-   - Mass disconnection events can overwhelm cleanup queue
-   
-   e) **HttpProxy Component Issues**:
-   - If using `useHttpProxy`, the HttpProxy bridge might maintain connection pools
-   - These pools might not be properly cleaned after hours
-
-3. **Why "Sudden" After Hours**:
-   - Not a gradual leak but triggered by specific conditions
-   - Likely related to periodic events or thresholds:
-     - Inactivity check runs every 30 seconds
-     - Keep-alive connections have extended timeouts (6x normal)
-     - Parity check has 30-minute timeout for half-closed connections
-   
-4. **Reproduction Scenarios**:
-   - Mass client disconnection/reconnection (network blip)
-   - Keep-alive timeout cascade when inner proxy times out first
-   - Cleanup timer getting stuck during high load
-   - Memory pressure causing event loop delays
-
-### Additional Monitoring Recommendations
-
-1. **Add Cleanup Queue Monitoring**:
-   ```typescript
-   setInterval(() => {
-     const cm = proxy.connectionManager;
-     if (cm.cleanupQueue.size > 100 && !cm.cleanupTimer) {
-       logger.error('Cleanup queue stuck!', {
-         queueSize: cm.cleanupQueue.size,
-         hasTimer: !!cm.cleanupTimer
-       });
-     }
-   }, 60000);
-   ```
-
-2. **Track Timer Health**:
-   - Monitor if cleanup timer is running
-   - Check for event loop blocking
-   - Log when batch processing takes too long
-
-3. **Memory Monitoring**:
-   - Track heap usage over time
-   - Monitor for memory leaks in long-running processes
-   - Force periodic garbage collection if needed
-
-### Immediate Mitigations
-
-1. **Restart Cleanup Timer**:
-   ```typescript
-   // Emergency cleanup timer restart
-   if (!cm.cleanupTimer && cm.cleanupQueue.size > 0) {
-     cm.cleanupTimer = setTimeout(() => {
-       cm.processCleanupQueue();
-     }, 100);
-   }
-   ```
-
-2. **Force Periodic Cleanup**:
-   ```typescript
-   setInterval(() => {
-     const cm = connectionManager;
-     if (cm.getConnectionCount() > threshold) {
-       cm.performOptimizedInactivityCheck();
-       // Force process cleanup queue
-       cm.processCleanupQueue();
-     }
-   }, 300000); // Every 5 minutes
-   ```
-
-3. **Connection Age Limits**:
-   - Set maximum connection lifetime
-   - Force close connections older than threshold
-   - More aggressive cleanup for proxy chains
-
-## ✅ FIXED: Zombie Connection Detection (January 2025)
-
-### Root Cause Identified
-"Zombie connections" occur when sockets are destroyed without triggering their close/error event handlers. This causes connections to remain tracked with both sockets destroyed but `connectionClosed=false`. This is particularly problematic in proxy chains where the inner proxy might close connections in ways that don't trigger proper events on the outer proxy.
-
-### Fix Implemented
-Added zombie detection to the periodic inactivity check in ConnectionManager:
-
-```typescript
-// In performOptimizedInactivityCheck()
-// Check ALL connections for zombie state
-for (const [connectionId, record] of this.connectionRecords) {
-  if (!record.connectionClosed) {
-    const incomingDestroyed = record.incoming?.destroyed || false;
-    const outgoingDestroyed = record.outgoing?.destroyed || false;
-    
-    // Check for zombie connections: both sockets destroyed but not cleaned up
-    if (incomingDestroyed && outgoingDestroyed) {
-      logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
-        connectionId,
-        remoteIP: record.remoteIP,
-        age: plugins.prettyMs(now - record.incomingStartTime),
-        component: 'connection-manager'
-      });
-      
-      // Clean up immediately
-      this.cleanupConnection(record, 'zombie_cleanup');
-      continue;
-    }
-    
-    // Check for half-zombie: one socket destroyed
-    if (incomingDestroyed || outgoingDestroyed) {
-      const age = now - record.incomingStartTime;
-      // Give it 30 seconds grace period for normal cleanup
-      if (age > 30000) {
-        logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
-          connectionId,
-          remoteIP: record.remoteIP,
-          age: plugins.prettyMs(age),
-          incomingDestroyed,
-          outgoingDestroyed,
-          component: 'connection-manager'
-        });
-        
-        // Clean up
-        this.cleanupConnection(record, 'half_zombie_cleanup');
-      }
-    }
-  }
-}
-```
-
-### How It Works
-1. **Full Zombie Detection**: Detects when both incoming and outgoing sockets are destroyed but the connection hasn't been cleaned up
-2. **Half-Zombie Detection**: Detects when only one socket is destroyed, with a 30-second grace period for normal cleanup to occur
-3. **Automatic Cleanup**: Immediately cleans up zombie connections when detected
-4. **Runs Periodically**: Integrated into the existing inactivity check that runs every 30 seconds
-
-### Why This Fixes the Outer Proxy Accumulation
- When inner proxy closes connections abruptly (e.g., due to backend failure), the outer proxy's outgoing socket might be destroyed without firing close/error events
- These become zombie connections that previously accumulated indefinitely
- Now they are detected and cleaned up within 30 seconds
-
-### Test Results
-Debug scripts confirmed:
- Zombie connections can be created when sockets are destroyed directly without events
- The zombie detection successfully identifies and cleans up these connections
- Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
-
-This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
-
-## 🔍 Production Diagnostics (January 2025)
-
-Since the zombie detection fix didn't fully resolve the issue, use the ProductionConnectionMonitor to diagnose the actual problem:
-
-### How to Use the Production Monitor
-
-1. **Add to your proxy startup script**:
-```typescript
-import ProductionConnectionMonitor from './production-connection-monitor.js';
-
-// After proxy.start()
-const monitor = new ProductionConnectionMonitor(proxy);
-monitor.start(5000); // Check every 5 seconds
-
-// Monitor will automatically capture diagnostics when:
-// - Connections exceed threshold (default: 50)
-// - Sudden spike occurs (default: +20 connections)
-```
-
-2. **Diagnostics are saved to**: `.nogit/connection-diagnostics/`
-
-3. **Force capture anytime**: `monitor.forceCaptureNow()`
-
-### What the Monitor Captures
-
-For each connection:
- Socket states (destroyed, readable, writable, readyState)
- Connection flags (closed, keepAlive, TLS status)
- Data transfer statistics
- Time since last activity
- Cleanup queue status
- Event listener counts
- Termination reasons
-
-### Pattern Analysis
-
-The monitor automatically identifies:
- **Zombie connections**: Both sockets destroyed but not cleaned up
- **Half-zombies**: One socket destroyed
- **Stuck connecting**: Outgoing socket stuck in connecting state
- **No outgoing**: Missing outgoing socket
- **Keep-alive stuck**: Keep-alive connections with no recent activity
- **Old connections**: Connections older than 1 hour
- **No data transfer**: Connections with no bytes transferred
- **Listener leaks**: Excessive event listeners
-
-### Common Accumulation Patterns
-
-1. **Connecting State Stuck**
-   - Outgoing socket shows `connecting: true` indefinitely
-   - Usually means connection timeout not working
-   - Check if backend is reachable
-
-2. **Missing Outgoing Socket**
-   - Connection has no outgoing socket but isn't closed
-   - May indicate immediate routing issues
-   - Check error logs during connection setup
-
-3. **Event Listener Accumulation**
-   - High listener counts (>20) on sockets
-   - Indicates cleanup not removing all listeners
-   - Can cause memory leaks
-
-4. **Keep-Alive Zombies**
-   - Keep-alive connections not timing out
-   - Check keepAlive timeout settings
-   - May need more aggressive cleanup
-
-### Next Steps
-
-1. **Run the monitor in production** during accumulation
-2. **Share the diagnostic files** from `.nogit/connection-diagnostics/`
-3. **Look for patterns** in the captured snapshots
-4. **Check specific connection IDs** that accumulate
-
-The diagnostic files will show exactly what state connections are in when accumulation occurs, allowing targeted fixes for the specific issue.
-
-## ✅ FIXED: Stuck Connection Detection (January 2025) 
-
-### Additional Root Cause Found
-Connections to hanging backends (that accept but never respond) were not being cleaned up because:
- Both sockets remain alive (not destroyed)
- Keep-alive prevents normal timeout
- No data is sent back to the client despite receiving data
- These don't qualify as "zombies" since sockets aren't destroyed
-
-### Fix Implemented
-Added stuck connection detection to the periodic inactivity check:
-
-```typescript
-// Check for stuck connections: no data sent back to client
-if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
-  const age = now - record.incomingStartTime;
-  // If connection is older than 60 seconds and no data sent back, likely stuck
-  if (age > 60000) {
-    logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
-      connectionId,
-      remoteIP: record.remoteIP,
-      age: plugins.prettyMs(age),
-      bytesReceived: record.bytesReceived,
-      targetHost: record.targetHost,
-      targetPort: record.targetPort,
-      component: 'connection-manager'
-    });
-    
-    // Clean up
-    this.cleanupConnection(record, 'stuck_no_response');
-  }
-}
-```
-
-### What This Fixes
- Connections to backends that accept but never respond
- Proxy chains where inner proxy connects to unresponsive services
- Scenarios where keep-alive prevents normal timeout mechanisms
- Connections that receive client data but never send anything back
-
-### Detection Criteria
- Connection has received bytes from client (`bytesReceived > 0`)
- No bytes sent back to client (`bytesSent === 0`)
- Connection is older than 60 seconds
- Both sockets are still alive (not destroyed)
-
-This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.
-
-## 🚨 CRITICAL FIX: Cleanup Queue Bug (January 2025)
-
-### Critical Bug Found
-The cleanup queue had a severe bug that caused connection accumulation when more than 100 connections needed cleanup:
-
-```typescript
-// BUG: This cleared the ENTIRE queue after processing only the first batch!
-const toCleanup = Array.from(this.cleanupQueue).slice(0, this.cleanupBatchSize);
-this.cleanupQueue.clear(); // ❌ This discarded all connections beyond the first 100!
-```
-
-### Fix Implemented
-```typescript
-// Now only removes the connections being processed
-const toCleanup = Array.from(this.cleanupQueue).slice(0, this.cleanupBatchSize);
-for (const connectionId of toCleanup) {
-  this.cleanupQueue.delete(connectionId); // ✅ Only remove what we process
-  const record = this.connectionRecords.get(connectionId);
-  if (record) {
-    this.cleanupConnection(record, record.incomingTerminationReason || 'normal');
-  }
-}
-```
-
-### Impact
- **Before**: If 150 connections needed cleanup, only the first 100 would be processed and the remaining 50 would accumulate forever
- **After**: All connections are properly cleaned up in batches
-
-### Additional Improvements
-
-1. **Faster Inactivity Checks**: Reduced from 30s to 10s intervals
-   - Zombies and stuck connections are detected 3x faster
-   - Reduces the window for accumulation
-
-2. **Duplicate Prevention**: Added check in queueCleanup to prevent processing already-closed connections
-   - Prevents unnecessary work
-   - Ensures connections are only cleaned up once
-
-### Summary of All Fixes
-
-1. **Connection Timeout** (already documented) - Prevents accumulation when backends are unreachable
-2. **Zombie Detection** - Cleans up connections with destroyed sockets
-3. **Stuck Connection Detection** - Cleans up connections to hanging backends
-4. **Cleanup Queue Bug** - Ensures ALL connections get cleaned up, not just the first 100
-5. **Faster Detection** - Reduced check interval from 30s to 10s
-
-These fixes combined should prevent connection accumulation in all known scenarios.