smartproxy/readme.connections.md

12 KiB

Connection Management in SmartProxy

This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations.

Connection Accumulation Investigation (January 2025)

Problem Statement

Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes.

Historical Context

  • v19.5.12-v19.5.15: Major connection cleanup improvements
  • v19.5.19+: PROXY protocol support with WrappedSocket implementation
  • v19.5.20: Fixed race condition in immediate routing cleanup

Current Architecture

Connection Flow in Proxy Chains

Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443)
  1. Outer Proxy:

    • Accepts client connection
    • Sends PROXY protocol header to inner proxy
    • Tracks connection in ConnectionManager
    • Immediate routing for non-TLS ports
  2. Inner Proxy:

    • Parses PROXY protocol to get real client IP
    • Establishes connection to backend
    • Tracks its own connections separately

Potential Causes of Connection Accumulation

1. Race Condition in Immediate Routing

When a connection is immediately routed (non-TLS ports), there's a timing window:

// route-connection-handler.ts, line ~231
this.routeConnection(socket, record, '', undefined);
// Connection is routed before all setup is complete

Issue: If client disconnects during backend connection setup, cleanup may not trigger properly.

2. Outgoing Socket Assignment Timing

Despite the fix in v19.5.20:

// Line 1362 in setupDirectConnection
record.outgoing = targetSocket;

There's still a window between socket creation and the connect event where cleanup might miss the outgoing socket.

3. Batch Cleanup Delays

ConnectionManager uses queued cleanup:

  • Batch size: 100 connections
  • Batch interval: 100ms
  • Under rapid connection/disconnection, queue might lag

4. Different Cleanup Paths

Multiple cleanup triggers exist:

  • Socket 'close' event
  • Socket 'error' event
  • Inactivity timeout
  • Connection timeout
  • Manual cleanup

Not all paths may properly handle proxy chain scenarios.

5. Keep-Alive Connection Handling

Keep-alive connections have special treatment:

  • Extended inactivity timeout (6x normal)
  • Warning before closure
  • May accumulate if backend is unresponsive

Observed Symptoms

  1. Outer proxy connection count grows over time
  2. Inner proxy maintains zero or low connection count
  3. Connections show as closed in logs but remain in tracking
  4. Memory usage gradually increases

Debug Strategies

1. Enhanced Logging

Add connection state logging at key points:

// When outgoing socket is created
logger.log('debug', `Outgoing socket created for ${connectionId}`, {
  hasOutgoing: !!record.outgoing,
  outgoingState: record.outgoing?.readyState
});

2. Connection State Inspection

Periodically log detailed connection state:

for (const [id, record] of connectionManager.getConnections()) {
  console.log({
    id,
    age: Date.now() - record.incomingStartTime,
    incomingDestroyed: record.incoming.destroyed,
    outgoingDestroyed: record.outgoing?.destroyed,
    hasCleanupTimer: !!record.cleanupTimer
  });
}

3. Cleanup Verification

Track cleanup completion:

// In cleanupConnection
logger.log('debug', `Cleanup completed for ${record.id}`, {
  recordsRemaining: this.connectionRecords.size
});

Recommendations

  1. Immediate Cleanup for Proxy Chains

    • Skip batch queue for proxy chain connections
    • Use synchronous cleanup when PROXY protocol is detected
  2. Socket State Validation

    • Check both destroyed and readyState before cleanup decisions
    • Handle 'opening' state sockets explicitly
  3. Timeout Adjustments

    • Shorter timeouts for proxy chain connections
    • More aggressive cleanup for connections without data transfer
  4. Connection Limits

    • Per-route connection limits
    • Backpressure when approaching limits
  5. Monitoring

    • Export connection metrics
    • Alert on connection count thresholds
    • Track connection age distribution

Test Scenarios to Reproduce

  1. Rapid Connect/Disconnect

    # Create many short-lived connections
    for i in {1..1000}; do
      (echo -n | nc localhost 8001) &
    done
    
  2. Slow Backend

    • Configure inner proxy to connect to unresponsive backend
    • Monitor outer proxy connection count
  3. Mixed Traffic

    • Combine TLS and non-TLS connections
    • Add keep-alive connections
    • Observe accumulation patterns

Future Improvements

  1. Connection Pool Isolation

    • Separate pools for proxy chain vs direct connections
    • Different cleanup strategies per pool
  2. Circuit Breaker

    • Detect accumulation and trigger aggressive cleanup
    • Temporary refuse new connections when near limit
  3. Connection State Machine

    • Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED
    • State transition validation
    • Timeout per state
  4. Metrics Collection

    • Connection lifecycle events
    • Cleanup success/failure rates
    • Time spent in each state

Root Cause Identified (January 2025)

The primary issue is on the inner proxy when backends are unreachable:

When the backend is unreachable (e.g., non-routable IP like 10.255.255.1):

  1. The outgoing socket gets stuck in "opening" state indefinitely
  2. The createSocketWithErrorHandler in socket-utils.ts doesn't implement connection timeout
  3. socket.setTimeout() only handles inactivity AFTER connection, not during connect phase
  4. Connections accumulate because they never transition to error state
  5. Socket timeout warnings fire but connections are preserved as keep-alive

Code Issue:

// socket-utils.ts line 275
if (timeout) {
  socket.setTimeout(timeout);  // This only handles inactivity, not connection!
}

Required Fix:

  1. Add connectionTimeout to ISmartProxyOptions interface:
// In interfaces.ts
connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s)
  1. Update createSocketWithErrorHandler in socket-utils.ts:
export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket {
  const { port, host, onError, onConnect, timeout } = options;
  
  const socket = new plugins.net.Socket();
  let connected = false;
  let connectionTimeout: NodeJS.Timeout | null = null;
  
  socket.on('error', (error) => {
    if (connectionTimeout) {
      clearTimeout(connectionTimeout);
      connectionTimeout = null;
    }
    if (onError) onError(error);
  });
  
  socket.on('connect', () => {
    connected = true;
    if (connectionTimeout) {
      clearTimeout(connectionTimeout);
      connectionTimeout = null;
    }
    if (timeout) socket.setTimeout(timeout); // Set inactivity timeout
    if (onConnect) onConnect();
  });
  
  // Implement connection establishment timeout
  if (timeout) {
    connectionTimeout = setTimeout(() => {
      if (!connected && !socket.destroyed) {
        const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
        (error as any).code = 'ETIMEDOUT';
        socket.destroy();
        if (onError) onError(error);
      }
    }, timeout);
  }
  
  socket.connect(port, host);
  return socket;
}
  1. Pass connectionTimeout in route-connection-handler.ts:
const targetSocket = createSocketWithErrorHandler({
  port: finalTargetPort,
  host: finalTargetHost,
  timeout: this.settings.connectionTimeout || 30000, // Connection timeout
  onError: (error) => { /* existing */ },
  onConnect: async () => { /* existing */ }
});

Investigation Results (January 2025)

Based on extensive testing with debug scripts:

  1. Normal Operation: In controlled tests, connections are properly cleaned up:

    • Immediate routing cleanup handler properly destroys outgoing connections
    • Both outer and inner proxies maintain 0 connections after clients disconnect
    • Keep-alive connections are tracked and cleaned up correctly
  2. Potential Edge Cases Not Covered by Tests:

    • HTTP/2 Connections: May have different lifecycle than HTTP/1.1
    • WebSocket Connections: Long-lived upgrade connections might persist
    • Partial TLS Handshakes: Connections that start TLS but don't complete
    • PROXY Protocol Parse Failures: Malformed headers from untrusted sources
    • Connection Pool Reuse: HttpProxy component may maintain its own pools
  3. Timing-Sensitive Scenarios:

    • Client disconnects exactly when record.outgoing is being assigned
    • Backend connects but immediately RSTs
    • Proxy chain where middle proxy restarts
    • Multiple rapid reconnects with same source IP/port
  4. Configuration-Specific Issues:

    • Mixed sendProxyProtocol settings in chain
    • Different keepAlive settings between proxies
    • Mismatched timeout values
    • Routes with forwardingEngine: 'nftables'

Additional Debug Points

Add these debug logs to identify the specific scenario:

// In route-connection-handler.ts setupDirectConnection
logger.log('debug', `Setting outgoing socket for ${connectionId}`, {
  timestamp: Date.now(),
  hasOutgoing: !!record.outgoing,
  socketState: targetSocket.readyState
});

// In connection-manager.ts cleanupConnection
logger.log('debug', `Cleanup attempt for ${record.id}`, {
  alreadyClosed: record.connectionClosed,
  hasIncoming: !!record.incoming,
  hasOutgoing: !!record.outgoing,
  incomingDestroyed: record.incoming?.destroyed,
  outgoingDestroyed: record.outgoing?.destroyed
});

Workarounds

Until root cause is identified:

  1. Periodic Force Cleanup:

    setInterval(() => {
      const connections = connectionManager.getConnections();
      for (const [id, record] of connections) {
        if (record.incoming?.destroyed && !record.connectionClosed) {
          connectionManager.cleanupConnection(record, 'force_cleanup');
        }
      }
    }, 60000); // Every minute
    
  2. Connection Age Limit:

    // Add max connection age check
    const maxAge = 3600000; // 1 hour
    if (Date.now() - record.incomingStartTime > maxAge) {
      connectionManager.cleanupConnection(record, 'max_age');
    }
    
  3. Aggressive Timeout Settings:

    {
      socketTimeout: 60000,        // 1 minute
      inactivityTimeout: 300000,   // 5 minutes
      connectionCleanupInterval: 30000  // 30 seconds
    }
    
  • /ts/proxies/smart-proxy/route-connection-handler.ts - Main connection handling
  • /ts/proxies/smart-proxy/connection-manager.ts - Connection tracking and cleanup
  • /ts/core/utils/socket-utils.ts - Socket cleanup utilities
  • /test/test.proxy-chain-cleanup.node.ts - Test for connection cleanup
  • /test/test.proxy-chaining-accumulation.node.ts - Test for accumulation prevention
  • /.nogit/debug/connection-accumulation-debug.ts - Debug script for connection states
  • /.nogit/debug/connection-accumulation-keepalive.ts - Keep-alive specific tests
  • /.nogit/debug/connection-accumulation-http.ts - HTTP traffic through proxy chains

Summary

Issue Identified: Connection accumulation occurs on the inner proxy (not outer) when backends are unreachable.

Root Cause: The createSocketWithErrorHandler function in socket-utils.ts doesn't implement connection establishment timeout. It only sets socket.setTimeout() which handles inactivity AFTER connection is established, not during the connect phase.

Impact: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate.

Fix Required:

  1. Add connectionTimeout setting to ISmartProxyOptions
  2. Implement proper connection timeout in createSocketWithErrorHandler
  3. Pass the timeout value from route-connection-handler

Workaround Until Fixed: Configure shorter socket timeouts and use the periodic force cleanup suggested above.

The connection cleanup mechanisms have been significantly improved in v19.5.20:

  1. Race condition fixed by setting record.outgoing before connecting
  2. Immediate routing cleanup handler always destroys outgoing connections
  3. Tests confirm no accumulation in standard scenarios with reachable backends

However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.