diff --git a/readme.connections.md b/readme.connections.md new file mode 100644 index 0000000..5673989 --- /dev/null +++ b/readme.connections.md @@ -0,0 +1,375 @@ +# Connection Management in SmartProxy + +This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations. + +## Connection Accumulation Investigation (January 2025) + +### Problem Statement +Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes. + +### Historical Context +- **v19.5.12-v19.5.15**: Major connection cleanup improvements +- **v19.5.19+**: PROXY protocol support with WrappedSocket implementation +- **v19.5.20**: Fixed race condition in immediate routing cleanup + +### Current Architecture + +#### Connection Flow in Proxy Chains +``` +Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443) +``` + +1. **Outer Proxy**: + - Accepts client connection + - Sends PROXY protocol header to inner proxy + - Tracks connection in ConnectionManager + - Immediate routing for non-TLS ports + +2. **Inner Proxy**: + - Parses PROXY protocol to get real client IP + - Establishes connection to backend + - Tracks its own connections separately + +### Potential Causes of Connection Accumulation + +#### 1. Race Condition in Immediate Routing +When a connection is immediately routed (non-TLS ports), there's a timing window: +```typescript +// route-connection-handler.ts, line ~231 +this.routeConnection(socket, record, '', undefined); +// Connection is routed before all setup is complete +``` + +**Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly. + +#### 2. Outgoing Socket Assignment Timing +Despite the fix in v19.5.20: +```typescript +// Line 1362 in setupDirectConnection +record.outgoing = targetSocket; +``` +There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket. + +#### 3. Batch Cleanup Delays +ConnectionManager uses queued cleanup: +- Batch size: 100 connections +- Batch interval: 100ms +- Under rapid connection/disconnection, queue might lag + +#### 4. Different Cleanup Paths +Multiple cleanup triggers exist: +- Socket 'close' event +- Socket 'error' event +- Inactivity timeout +- Connection timeout +- Manual cleanup + +Not all paths may properly handle proxy chain scenarios. + +#### 5. Keep-Alive Connection Handling +Keep-alive connections have special treatment: +- Extended inactivity timeout (6x normal) +- Warning before closure +- May accumulate if backend is unresponsive + +### Observed Symptoms + +1. **Outer proxy connection count grows over time** +2. **Inner proxy maintains zero or low connection count** +3. **Connections show as closed in logs but remain in tracking** +4. **Memory usage gradually increases** + +### Debug Strategies + +#### 1. Enhanced Logging +Add connection state logging at key points: +```typescript +// When outgoing socket is created +logger.log('debug', `Outgoing socket created for ${connectionId}`, { + hasOutgoing: !!record.outgoing, + outgoingState: record.outgoing?.readyState +}); +``` + +#### 2. Connection State Inspection +Periodically log detailed connection state: +```typescript +for (const [id, record] of connectionManager.getConnections()) { + console.log({ + id, + age: Date.now() - record.incomingStartTime, + incomingDestroyed: record.incoming.destroyed, + outgoingDestroyed: record.outgoing?.destroyed, + hasCleanupTimer: !!record.cleanupTimer + }); +} +``` + +#### 3. Cleanup Verification +Track cleanup completion: +```typescript +// In cleanupConnection +logger.log('debug', `Cleanup completed for ${record.id}`, { + recordsRemaining: this.connectionRecords.size +}); +``` + +### Recommendations + +1. **Immediate Cleanup for Proxy Chains** + - Skip batch queue for proxy chain connections + - Use synchronous cleanup when PROXY protocol is detected + +2. **Socket State Validation** + - Check both `destroyed` and `readyState` before cleanup decisions + - Handle 'opening' state sockets explicitly + +3. **Timeout Adjustments** + - Shorter timeouts for proxy chain connections + - More aggressive cleanup for connections without data transfer + +4. **Connection Limits** + - Per-route connection limits + - Backpressure when approaching limits + +5. **Monitoring** + - Export connection metrics + - Alert on connection count thresholds + - Track connection age distribution + +### Test Scenarios to Reproduce + +1. **Rapid Connect/Disconnect** + ```bash + # Create many short-lived connections + for i in {1..1000}; do + (echo -n | nc localhost 8001) & + done + ``` + +2. **Slow Backend** + - Configure inner proxy to connect to unresponsive backend + - Monitor outer proxy connection count + +3. **Mixed Traffic** + - Combine TLS and non-TLS connections + - Add keep-alive connections + - Observe accumulation patterns + +### Future Improvements + +1. **Connection Pool Isolation** + - Separate pools for proxy chain vs direct connections + - Different cleanup strategies per pool + +2. **Circuit Breaker** + - Detect accumulation and trigger aggressive cleanup + - Temporary refuse new connections when near limit + +3. **Connection State Machine** + - Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED + - State transition validation + - Timeout per state + +4. **Metrics Collection** + - Connection lifecycle events + - Cleanup success/failure rates + - Time spent in each state + +### Root Cause Identified (January 2025) + +**The primary issue is on the inner proxy when backends are unreachable:** + +When the backend is unreachable (e.g., non-routable IP like 10.255.255.1): +1. The outgoing socket gets stuck in "opening" state indefinitely +2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout +3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase +4. Connections accumulate because they never transition to error state +5. Socket timeout warnings fire but connections are preserved as keep-alive + +**Code Issue:** +```typescript +// socket-utils.ts line 275 +if (timeout) { + socket.setTimeout(timeout); // This only handles inactivity, not connection! +} +``` + +**Required Fix:** + +1. Add `connectionTimeout` to ISmartProxyOptions interface: +```typescript +// In interfaces.ts +connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s) +``` + +2. Update `createSocketWithErrorHandler` in socket-utils.ts: +```typescript +export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket { + const { port, host, onError, onConnect, timeout } = options; + + const socket = new plugins.net.Socket(); + let connected = false; + let connectionTimeout: NodeJS.Timeout | null = null; + + socket.on('error', (error) => { + if (connectionTimeout) { + clearTimeout(connectionTimeout); + connectionTimeout = null; + } + if (onError) onError(error); + }); + + socket.on('connect', () => { + connected = true; + if (connectionTimeout) { + clearTimeout(connectionTimeout); + connectionTimeout = null; + } + if (timeout) socket.setTimeout(timeout); // Set inactivity timeout + if (onConnect) onConnect(); + }); + + // Implement connection establishment timeout + if (timeout) { + connectionTimeout = setTimeout(() => { + if (!connected && !socket.destroyed) { + const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`); + (error as any).code = 'ETIMEDOUT'; + socket.destroy(); + if (onError) onError(error); + } + }, timeout); + } + + socket.connect(port, host); + return socket; +} +``` + +3. Pass connectionTimeout in route-connection-handler.ts: +```typescript +const targetSocket = createSocketWithErrorHandler({ + port: finalTargetPort, + host: finalTargetHost, + timeout: this.settings.connectionTimeout || 30000, // Connection timeout + onError: (error) => { /* existing */ }, + onConnect: async () => { /* existing */ } +}); +``` + +### Investigation Results (January 2025) + +Based on extensive testing with debug scripts: + +1. **Normal Operation**: In controlled tests, connections are properly cleaned up: + - Immediate routing cleanup handler properly destroys outgoing connections + - Both outer and inner proxies maintain 0 connections after clients disconnect + - Keep-alive connections are tracked and cleaned up correctly + +2. **Potential Edge Cases Not Covered by Tests**: + - **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1 + - **WebSocket Connections**: Long-lived upgrade connections might persist + - **Partial TLS Handshakes**: Connections that start TLS but don't complete + - **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources + - **Connection Pool Reuse**: HttpProxy component may maintain its own pools + +3. **Timing-Sensitive Scenarios**: + - Client disconnects exactly when `record.outgoing` is being assigned + - Backend connects but immediately RSTs + - Proxy chain where middle proxy restarts + - Multiple rapid reconnects with same source IP/port + +4. **Configuration-Specific Issues**: + - Mixed `sendProxyProtocol` settings in chain + - Different `keepAlive` settings between proxies + - Mismatched timeout values + - Routes with `forwardingEngine: 'nftables'` + +### Additional Debug Points + +Add these debug logs to identify the specific scenario: + +```typescript +// In route-connection-handler.ts setupDirectConnection +logger.log('debug', `Setting outgoing socket for ${connectionId}`, { + timestamp: Date.now(), + hasOutgoing: !!record.outgoing, + socketState: targetSocket.readyState +}); + +// In connection-manager.ts cleanupConnection +logger.log('debug', `Cleanup attempt for ${record.id}`, { + alreadyClosed: record.connectionClosed, + hasIncoming: !!record.incoming, + hasOutgoing: !!record.outgoing, + incomingDestroyed: record.incoming?.destroyed, + outgoingDestroyed: record.outgoing?.destroyed +}); +``` + +### Workarounds + +Until root cause is identified: + +1. **Periodic Force Cleanup**: + ```typescript + setInterval(() => { + const connections = connectionManager.getConnections(); + for (const [id, record] of connections) { + if (record.incoming?.destroyed && !record.connectionClosed) { + connectionManager.cleanupConnection(record, 'force_cleanup'); + } + } + }, 60000); // Every minute + ``` + +2. **Connection Age Limit**: + ```typescript + // Add max connection age check + const maxAge = 3600000; // 1 hour + if (Date.now() - record.incomingStartTime > maxAge) { + connectionManager.cleanupConnection(record, 'max_age'); + } + ``` + +3. **Aggressive Timeout Settings**: + ```typescript + { + socketTimeout: 60000, // 1 minute + inactivityTimeout: 300000, // 5 minutes + connectionCleanupInterval: 30000 // 30 seconds + } + ``` + +### Related Files +- `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling +- `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup +- `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities +- `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup +- `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention +- `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states +- `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests +- `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains + +### Summary + +**Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable. + +**Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase. + +**Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate. + +**Fix Required**: +1. Add `connectionTimeout` setting to ISmartProxyOptions +2. Implement proper connection timeout in `createSocketWithErrorHandler` +3. Pass the timeout value from route-connection-handler + +**Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above. + +The connection cleanup mechanisms have been significantly improved in v19.5.20: +1. Race condition fixed by setting `record.outgoing` before connecting +2. Immediate routing cleanup handler always destroys outgoing connections +3. Tests confirm no accumulation in standard scenarios with reachable backends + +However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect. \ No newline at end of file diff --git a/test/test.proxy-chain-cleanup.node.ts b/test/test.proxy-chain-cleanup.node.ts new file mode 100644 index 0000000..9fcce63 --- /dev/null +++ b/test/test.proxy-chain-cleanup.node.ts @@ -0,0 +1,182 @@ +import { expect, tap } from '@git.zone/tstest/tapbundle'; +import * as plugins from '../ts/plugins.js'; +import { SmartProxy } from '../ts/index.js'; + +let outerProxy: SmartProxy; +let innerProxy: SmartProxy; + +tap.test('setup two smartproxies in a chain configuration', async () => { + // Setup inner proxy (backend proxy) + innerProxy = new SmartProxy({ + routes: [ + { + match: { + ports: 8002 + }, + action: { + type: 'forward', + target: { + host: 'httpbin.org', + port: 443 + } + } + } + ], + defaults: { + target: { + host: 'httpbin.org', + port: 443 + } + }, + acceptProxyProtocol: true, + sendProxyProtocol: false, + enableDetailedLogging: true, + connectionCleanupInterval: 5000, // More frequent cleanup for testing + inactivityTimeout: 10000 // Shorter timeout for testing + }); + await innerProxy.start(); + + // Setup outer proxy (frontend proxy) + outerProxy = new SmartProxy({ + routes: [ + { + match: { + ports: 8001 + }, + action: { + type: 'forward', + target: { + host: 'localhost', + port: 8002 + }, + sendProxyProtocol: true + } + } + ], + defaults: { + target: { + host: 'localhost', + port: 8002 + } + }, + sendProxyProtocol: true, + enableDetailedLogging: true, + connectionCleanupInterval: 5000, // More frequent cleanup for testing + inactivityTimeout: 10000 // Shorter timeout for testing + }); + await outerProxy.start(); +}); + +tap.test('should properly cleanup connections in proxy chain', async (tools) => { + const testDuration = 30000; // 30 seconds + const connectionInterval = 500; // Create new connection every 500ms + const connectionDuration = 2000; // Each connection lasts 2 seconds + + let connectionsCreated = 0; + let connectionsCompleted = 0; + + // Function to create a test connection + const createTestConnection = async () => { + connectionsCreated++; + const connectionId = connectionsCreated; + + try { + const socket = plugins.net.connect({ + port: 8001, + host: 'localhost' + }); + + await new Promise((resolve, reject) => { + socket.on('connect', () => { + console.log(`Connection ${connectionId} established`); + + // Send TLS Client Hello for httpbin.org + const clientHello = Buffer.from([ + 0x16, 0x03, 0x01, 0x00, 0xc8, // TLS handshake header + 0x01, 0x00, 0x00, 0xc4, // Client Hello + 0x03, 0x03, // TLS 1.2 + ...Array(32).fill(0), // Random bytes + 0x00, // Session ID length + 0x00, 0x02, 0x13, 0x01, // Cipher suites + 0x01, 0x00, // Compression methods + 0x00, 0x97, // Extensions length + 0x00, 0x00, 0x00, 0x0f, 0x00, 0x0d, // SNI extension + 0x00, 0x00, 0x0a, 0x68, 0x74, 0x74, 0x70, 0x62, 0x69, 0x6e, 0x2e, 0x6f, 0x72, 0x67 // "httpbin.org" + ]); + + socket.write(clientHello); + + // Keep connection alive for specified duration + setTimeout(() => { + socket.destroy(); + connectionsCompleted++; + console.log(`Connection ${connectionId} closed (completed: ${connectionsCompleted}/${connectionsCreated})`); + resolve(); + }, connectionDuration); + }); + + socket.on('error', (err) => { + console.log(`Connection ${connectionId} error: ${err.message}`); + connectionsCompleted++; + reject(err); + }); + }); + } catch (err) { + console.log(`Failed to create connection ${connectionId}: ${err.message}`); + connectionsCompleted++; + } + }; + + // Start creating connections + const startTime = Date.now(); + const connectionTimer = setInterval(() => { + if (Date.now() - startTime < testDuration) { + createTestConnection().catch(() => {}); + } else { + clearInterval(connectionTimer); + } + }, connectionInterval); + + // Monitor connection counts + const monitorInterval = setInterval(() => { + const outerConnections = (outerProxy as any).connectionManager.getConnectionCount(); + const innerConnections = (innerProxy as any).connectionManager.getConnectionCount(); + + console.log(`Active connections - Outer: ${outerConnections}, Inner: ${innerConnections}, Created: ${connectionsCreated}, Completed: ${connectionsCompleted}`); + }, 2000); + + // Wait for test duration + cleanup time + await tools.delayFor(testDuration + 10000); + + clearInterval(connectionTimer); + clearInterval(monitorInterval); + + // Wait for all connections to complete + while (connectionsCompleted < connectionsCreated) { + await tools.delayFor(100); + } + + // Give some time for cleanup + await tools.delayFor(5000); + + // Check final connection counts + const finalOuterConnections = (outerProxy as any).connectionManager.getConnectionCount(); + const finalInnerConnections = (innerProxy as any).connectionManager.getConnectionCount(); + + console.log(`\nFinal connection counts:`); + console.log(`Outer proxy: ${finalOuterConnections}`); + console.log(`Inner proxy: ${finalInnerConnections}`); + console.log(`Total created: ${connectionsCreated}`); + console.log(`Total completed: ${connectionsCompleted}`); + + // Both proxies should have cleaned up all connections + expect(finalOuterConnections).toEqual(0); + expect(finalInnerConnections).toEqual(0); +}); + +tap.test('cleanup proxies', async () => { + await outerProxy.stop(); + await innerProxy.stop(); +}); + +export default tap.start(); \ No newline at end of file diff --git a/ts/core/utils/socket-utils.ts b/ts/core/utils/socket-utils.ts index 86cae3c..fc5f58a 100644 --- a/ts/core/utils/socket-utils.ts +++ b/ts/core/utils/socket-utils.ts @@ -258,22 +258,61 @@ export function createSocketWithErrorHandler(options: SafeSocketOptions): plugin // Create socket with immediate error handler attachment const socket = new plugins.net.Socket(); + // Track if connected + let connected = false; + let connectionTimeout: NodeJS.Timeout | null = null; + // Attach error handler BEFORE connecting to catch immediate errors socket.on('error', (error) => { console.error(`Socket connection error to ${host}:${port}: ${error.message}`); + // Clear the connection timeout if it exists + if (connectionTimeout) { + clearTimeout(connectionTimeout); + connectionTimeout = null; + } if (onError) { onError(error); } }); - // Attach connect handler if provided - if (onConnect) { - socket.on('connect', onConnect); - } + // Attach connect handler + const handleConnect = () => { + connected = true; + // Clear the connection timeout + if (connectionTimeout) { + clearTimeout(connectionTimeout); + connectionTimeout = null; + } + // Set inactivity timeout if provided (after connection is established) + if (timeout) { + socket.setTimeout(timeout); + } + if (onConnect) { + onConnect(); + } + }; - // Set timeout if provided + socket.on('connect', handleConnect); + + // Implement connection establishment timeout if (timeout) { - socket.setTimeout(timeout); + connectionTimeout = setTimeout(() => { + if (!connected && !socket.destroyed) { + // Connection timed out - destroy the socket + const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`); + (error as any).code = 'ETIMEDOUT'; + + console.error(`Socket connection timeout to ${host}:${port} after ${timeout}ms`); + + // Destroy the socket + socket.destroy(); + + // Call error handler + if (onError) { + onError(error); + } + } + }, timeout); } // Now attempt to connect - any immediate errors will be caught diff --git a/ts/proxies/smart-proxy/models/interfaces.ts b/ts/proxies/smart-proxy/models/interfaces.ts index 4196154..c395d2d 100644 --- a/ts/proxies/smart-proxy/models/interfaces.ts +++ b/ts/proxies/smart-proxy/models/interfaces.ts @@ -69,6 +69,7 @@ export interface ISmartProxyOptions { maxVersion?: string; // Timeout settings + connectionTimeout?: number; // Timeout for establishing connection to backend (ms), default: 30000 (30s) initialDataTimeout?: number; // Timeout for initial data/SNI (ms), default: 60000 (60s) socketTimeout?: number; // Socket inactivity timeout (ms), default: 3600000 (1h) inactivityCheckInterval?: number; // How often to check for inactive connections (ms), default: 60000 (60s) diff --git a/ts/proxies/smart-proxy/route-connection-handler.ts b/ts/proxies/smart-proxy/route-connection-handler.ts index 736528b..bd1b67d 100644 --- a/ts/proxies/smart-proxy/route-connection-handler.ts +++ b/ts/proxies/smart-proxy/route-connection-handler.ts @@ -1125,6 +1125,7 @@ export class RouteConnectionHandler { const targetSocket = createSocketWithErrorHandler({ port: finalTargetPort, host: finalTargetHost, + timeout: this.settings.connectionTimeout || 30000, // Connection timeout (default: 30s) onError: (error) => { // Connection failed - clean up everything immediately // Check if connection record is still valid (client might have disconnected)