# Connection Management in SmartProxy This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations. ## Connection Accumulation Investigation (January 2025) ### Problem Statement Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes. ### Historical Context - **v19.5.12-v19.5.15**: Major connection cleanup improvements - **v19.5.19+**: PROXY protocol support with WrappedSocket implementation - **v19.5.20**: Fixed race condition in immediate routing cleanup ### Current Architecture #### Connection Flow in Proxy Chains ``` Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443) ``` 1. **Outer Proxy**: - Accepts client connection - Sends PROXY protocol header to inner proxy - Tracks connection in ConnectionManager - Immediate routing for non-TLS ports 2. **Inner Proxy**: - Parses PROXY protocol to get real client IP - Establishes connection to backend - Tracks its own connections separately ### Potential Causes of Connection Accumulation #### 1. Race Condition in Immediate Routing When a connection is immediately routed (non-TLS ports), there's a timing window: ```typescript // route-connection-handler.ts, line ~231 this.routeConnection(socket, record, '', undefined); // Connection is routed before all setup is complete ``` **Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly. #### 2. Outgoing Socket Assignment Timing Despite the fix in v19.5.20: ```typescript // Line 1362 in setupDirectConnection record.outgoing = targetSocket; ``` There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket. #### 3. Batch Cleanup Delays ConnectionManager uses queued cleanup: - Batch size: 100 connections - Batch interval: 100ms - Under rapid connection/disconnection, queue might lag #### 4. Different Cleanup Paths Multiple cleanup triggers exist: - Socket 'close' event - Socket 'error' event - Inactivity timeout - Connection timeout - Manual cleanup Not all paths may properly handle proxy chain scenarios. #### 5. Keep-Alive Connection Handling Keep-alive connections have special treatment: - Extended inactivity timeout (6x normal) - Warning before closure - May accumulate if backend is unresponsive ### Observed Symptoms 1. **Outer proxy connection count grows over time** 2. **Inner proxy maintains zero or low connection count** 3. **Connections show as closed in logs but remain in tracking** 4. **Memory usage gradually increases** ### Debug Strategies #### 1. Enhanced Logging Add connection state logging at key points: ```typescript // When outgoing socket is created logger.log('debug', `Outgoing socket created for ${connectionId}`, { hasOutgoing: !!record.outgoing, outgoingState: record.outgoing?.readyState }); ``` #### 2. Connection State Inspection Periodically log detailed connection state: ```typescript for (const [id, record] of connectionManager.getConnections()) { console.log({ id, age: Date.now() - record.incomingStartTime, incomingDestroyed: record.incoming.destroyed, outgoingDestroyed: record.outgoing?.destroyed, hasCleanupTimer: !!record.cleanupTimer }); } ``` #### 3. Cleanup Verification Track cleanup completion: ```typescript // In cleanupConnection logger.log('debug', `Cleanup completed for ${record.id}`, { recordsRemaining: this.connectionRecords.size }); ``` ### Recommendations 1. **Immediate Cleanup for Proxy Chains** - Skip batch queue for proxy chain connections - Use synchronous cleanup when PROXY protocol is detected 2. **Socket State Validation** - Check both `destroyed` and `readyState` before cleanup decisions - Handle 'opening' state sockets explicitly 3. **Timeout Adjustments** - Shorter timeouts for proxy chain connections - More aggressive cleanup for connections without data transfer 4. **Connection Limits** - Per-route connection limits - Backpressure when approaching limits 5. **Monitoring** - Export connection metrics - Alert on connection count thresholds - Track connection age distribution ### Test Scenarios to Reproduce 1. **Rapid Connect/Disconnect** ```bash # Create many short-lived connections for i in {1..1000}; do (echo -n | nc localhost 8001) & done ``` 2. **Slow Backend** - Configure inner proxy to connect to unresponsive backend - Monitor outer proxy connection count 3. **Mixed Traffic** - Combine TLS and non-TLS connections - Add keep-alive connections - Observe accumulation patterns ### Future Improvements 1. **Connection Pool Isolation** - Separate pools for proxy chain vs direct connections - Different cleanup strategies per pool 2. **Circuit Breaker** - Detect accumulation and trigger aggressive cleanup - Temporary refuse new connections when near limit 3. **Connection State Machine** - Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED - State transition validation - Timeout per state 4. **Metrics Collection** - Connection lifecycle events - Cleanup success/failure rates - Time spent in each state ### Root Cause Identified (January 2025) **The primary issue is on the inner proxy when backends are unreachable:** When the backend is unreachable (e.g., non-routable IP like 10.255.255.1): 1. The outgoing socket gets stuck in "opening" state indefinitely 2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout 3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase 4. Connections accumulate because they never transition to error state 5. Socket timeout warnings fire but connections are preserved as keep-alive **Code Issue:** ```typescript // socket-utils.ts line 275 if (timeout) { socket.setTimeout(timeout); // This only handles inactivity, not connection! } ``` **Required Fix:** 1. Add `connectionTimeout` to ISmartProxyOptions interface: ```typescript // In interfaces.ts connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s) ``` 2. Update `createSocketWithErrorHandler` in socket-utils.ts: ```typescript export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket { const { port, host, onError, onConnect, timeout } = options; const socket = new plugins.net.Socket(); let connected = false; let connectionTimeout: NodeJS.Timeout | null = null; socket.on('error', (error) => { if (connectionTimeout) { clearTimeout(connectionTimeout); connectionTimeout = null; } if (onError) onError(error); }); socket.on('connect', () => { connected = true; if (connectionTimeout) { clearTimeout(connectionTimeout); connectionTimeout = null; } if (timeout) socket.setTimeout(timeout); // Set inactivity timeout if (onConnect) onConnect(); }); // Implement connection establishment timeout if (timeout) { connectionTimeout = setTimeout(() => { if (!connected && !socket.destroyed) { const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`); (error as any).code = 'ETIMEDOUT'; socket.destroy(); if (onError) onError(error); } }, timeout); } socket.connect(port, host); return socket; } ``` 3. Pass connectionTimeout in route-connection-handler.ts: ```typescript const targetSocket = createSocketWithErrorHandler({ port: finalTargetPort, host: finalTargetHost, timeout: this.settings.connectionTimeout || 30000, // Connection timeout onError: (error) => { /* existing */ }, onConnect: async () => { /* existing */ } }); ``` ### Investigation Results (January 2025) Based on extensive testing with debug scripts: 1. **Normal Operation**: In controlled tests, connections are properly cleaned up: - Immediate routing cleanup handler properly destroys outgoing connections - Both outer and inner proxies maintain 0 connections after clients disconnect - Keep-alive connections are tracked and cleaned up correctly 2. **Potential Edge Cases Not Covered by Tests**: - **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1 - **WebSocket Connections**: Long-lived upgrade connections might persist - **Partial TLS Handshakes**: Connections that start TLS but don't complete - **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources - **Connection Pool Reuse**: HttpProxy component may maintain its own pools 3. **Timing-Sensitive Scenarios**: - Client disconnects exactly when `record.outgoing` is being assigned - Backend connects but immediately RSTs - Proxy chain where middle proxy restarts - Multiple rapid reconnects with same source IP/port 4. **Configuration-Specific Issues**: - Mixed `sendProxyProtocol` settings in chain - Different `keepAlive` settings between proxies - Mismatched timeout values - Routes with `forwardingEngine: 'nftables'` ### Additional Debug Points Add these debug logs to identify the specific scenario: ```typescript // In route-connection-handler.ts setupDirectConnection logger.log('debug', `Setting outgoing socket for ${connectionId}`, { timestamp: Date.now(), hasOutgoing: !!record.outgoing, socketState: targetSocket.readyState }); // In connection-manager.ts cleanupConnection logger.log('debug', `Cleanup attempt for ${record.id}`, { alreadyClosed: record.connectionClosed, hasIncoming: !!record.incoming, hasOutgoing: !!record.outgoing, incomingDestroyed: record.incoming?.destroyed, outgoingDestroyed: record.outgoing?.destroyed }); ``` ### Workarounds Until root cause is identified: 1. **Periodic Force Cleanup**: ```typescript setInterval(() => { const connections = connectionManager.getConnections(); for (const [id, record] of connections) { if (record.incoming?.destroyed && !record.connectionClosed) { connectionManager.cleanupConnection(record, 'force_cleanup'); } } }, 60000); // Every minute ``` 2. **Connection Age Limit**: ```typescript // Add max connection age check const maxAge = 3600000; // 1 hour if (Date.now() - record.incomingStartTime > maxAge) { connectionManager.cleanupConnection(record, 'max_age'); } ``` 3. **Aggressive Timeout Settings**: ```typescript { socketTimeout: 60000, // 1 minute inactivityTimeout: 300000, // 5 minutes connectionCleanupInterval: 30000 // 30 seconds } ``` ### Related Files - `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling - `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup - `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities - `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup - `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention - `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states - `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests - `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains ### Summary **Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable. **Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase. **Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate. **Fix Required**: 1. Add `connectionTimeout` setting to ISmartProxyOptions 2. Implement proper connection timeout in `createSocketWithErrorHandler` 3. Pass the timeout value from route-connection-handler **Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above. The connection cleanup mechanisms have been significantly improved in v19.5.20: 1. Race condition fixed by setting `record.outgoing` before connecting 2. Immediate routing cleanup handler always destroys outgoing connections 3. Tests confirm no accumulation in standard scenarios with reachable backends However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.