smartproxy/readme.plan.md
Philipp Kunz ad80798210 Enhance socket cleanup and management for improved connection handling
- Refactor cleanupSocket function to support options for immediate destruction, allowing drain, and grace periods.
- Introduce createIndependentSocketHandlers for better management of half-open connections between client and server sockets.
- Update various handlers (HTTP, HTTPS passthrough, HTTPS terminate) to utilize new cleanup and socket management functions.
- Implement custom timeout handling in socket setup to prevent immediate closure during keep-alive connections.
- Add tests for long-lived connections and half-open connection scenarios to ensure stability and reliability.
- Adjust connection manager to handle socket cleanup based on activity status, improving resource management.
2025-06-01 12:27:15 +00:00

11 KiB

SmartProxy Socket Cleanup Fix Plan

Problem Summary

The current socket cleanup implementation is too aggressive and closes long-lived connections prematurely. This affects:

  • WebSocket connections in HTTPS passthrough
  • Long-lived HTTP connections (SSE, streaming)
  • Database connections
  • Any connection that should remain open for hours

Root Causes

1. Bilateral Socket Cleanup

When one socket closes, both sockets are immediately destroyed:

// In createSocketCleanupHandler
cleanupSocket(clientSocket, 'client');
cleanupSocket(serverSocket, 'server');  // Both destroyed together!

2. Aggressive Timeout Handling

Timeout events immediately trigger connection cleanup:

socket.on('timeout', () => {
  handleClose(`${prefix}_timeout`);  // Destroys both sockets!
});

3. Parity Check Forces Closure

If one socket closes but the other remains open for >2 minutes, connection is forcefully terminated:

if (record.outgoingClosedTime && 
    !record.incoming.destroyed && 
    now - record.outgoingClosedTime > 120000) {
  this.cleanupConnection(record, 'parity_check');
}

4. No Half-Open Connection Support

The proxy doesn't support TCP half-open connections where one side closes while the other continues sending.

Fix Implementation Plan

Phase 1: Fix Socket Cleanup (Prevent Premature Closure)

1.1 Modify cleanupSocket() to support graceful shutdown

export interface CleanupOptions {
  immediate?: boolean;      // Force immediate destruction
  allowDrain?: boolean;     // Allow write buffer to drain
  gracePeriod?: number;     // Ms to wait before force close
}

export function cleanupSocket(
  socket: Socket | TLSSocket | null, 
  socketName?: string,
  options: CleanupOptions = {}
): Promise<void> {
  if (!socket || socket.destroyed) return Promise.resolve();
  
  return new Promise<void>((resolve) => {
    const cleanup = () => {
      socket.removeAllListeners();
      if (!socket.destroyed) {
        socket.destroy();
      }
      resolve();
    };
    
    if (options.immediate) {
      cleanup();
    } else if (options.allowDrain && socket.writable) {
      // Allow pending writes to complete
      socket.end(() => cleanup());
      
      // Force cleanup after grace period
      if (options.gracePeriod) {
        setTimeout(cleanup, options.gracePeriod);
      }
    } else {
      cleanup();
    }
  });
}

1.2 Implement Independent Socket Tracking

export function createIndependentSocketHandlers(
  clientSocket: Socket,
  serverSocket: Socket,
  onBothClosed: (reason: string) => void
): { cleanupClient: () => void, cleanupServer: () => void } {
  let clientClosed = false;
  let serverClosed = false;
  let clientReason = '';
  let serverReason = '';
  
  const checkBothClosed = () => {
    if (clientClosed && serverClosed) {
      onBothClosed(`client: ${clientReason}, server: ${serverReason}`);
    }
  };
  
  const cleanupClient = async (reason: string) => {
    if (clientClosed) return;
    clientClosed = true;
    clientReason = reason;
    
    // Allow server to continue if still active
    if (!serverClosed && serverSocket.writable) {
      // Half-close: stop reading from client, let server finish
      clientSocket.pause();
      clientSocket.unpipe(serverSocket);
      await cleanupSocket(clientSocket, 'client', { allowDrain: true });
    } else {
      await cleanupSocket(clientSocket, 'client');
    }
    
    checkBothClosed();
  };
  
  const cleanupServer = async (reason: string) => {
    if (serverClosed) return;
    serverClosed = true;
    serverReason = reason;
    
    // Allow client to continue if still active
    if (!clientClosed && clientSocket.writable) {
      // Half-close: stop reading from server, let client finish
      serverSocket.pause();
      serverSocket.unpipe(clientSocket);
      await cleanupSocket(serverSocket, 'server', { allowDrain: true });
    } else {
      await cleanupSocket(serverSocket, 'server');
    }
    
    checkBothClosed();
  };
  
  return { cleanupClient, cleanupServer };
}

Phase 2: Fix Timeout Handling

2.1 Separate timeout handling from connection closure

export function setupSocketHandlers(
  socket: Socket | TLSSocket,
  handleClose: (reason: string) => void,
  handleTimeout?: (socket: Socket) => void,  // New optional handler
  errorPrefix?: string
): void {
  socket.on('error', (error) => {
    const prefix = errorPrefix || 'Socket';
    handleClose(`${prefix}_error: ${error.message}`);
  });
  
  socket.on('close', () => {
    const prefix = errorPrefix || 'socket';
    handleClose(`${prefix}_closed`);
  });
  
  socket.on('timeout', () => {
    if (handleTimeout) {
      handleTimeout(socket);  // Custom timeout handling
    } else {
      // Default: just log, don't close
      console.warn(`Socket timeout: ${errorPrefix || 'socket'}`);
    }
  });
}

2.2 Update HTTPS passthrough handler

// In https-passthrough-handler.ts
const { cleanupClient, cleanupServer } = createIndependentSocketHandlers(
  clientSocket,
  serverSocket,
  (reason) => {
    this.emit(ForwardingHandlerEvents.DISCONNECTED, {
      remoteAddress,
      bytesSent,
      bytesReceived,
      reason
    });
  }
);

// Setup handlers with custom timeout handling
setupSocketHandlers(clientSocket, cleanupClient, (socket) => {
  // Just reset timeout, don't close
  socket.setTimeout(timeout);
}, 'client');

setupSocketHandlers(serverSocket, cleanupServer, (socket) => {
  // Just reset timeout, don't close
  socket.setTimeout(timeout);
}, 'server');

Phase 3: Fix Connection Manager

3.1 Remove aggressive parity check

// Remove or significantly increase the parity check timeout
// From 2 minutes to 30 minutes for long-lived connections
if (record.outgoingClosedTime &&
    !record.incoming.destroyed &&
    !record.connectionClosed &&
    now - record.outgoingClosedTime > 1800000) {  // 30 minutes
  // Only close if no data activity
  if (now - record.lastActivity > 600000) {  // 10 minutes of inactivity
    this.cleanupConnection(record, 'parity_check');
  }
}

3.2 Update cleanupConnection to check socket states

public cleanupConnection(record: IConnectionRecord, reason: string = 'normal'): void {
  if (!record.connectionClosed) {
    record.connectionClosed = true;
    
    // Only cleanup sockets that are actually closed or inactive
    if (record.incoming && (!record.incoming.writable || record.incoming.destroyed)) {
      cleanupSocket(record.incoming, `${record.id}-incoming`, { immediate: true });
    }
    
    if (record.outgoing && (!record.outgoing.writable || record.outgoing.destroyed)) {
      cleanupSocket(record.outgoing, `${record.id}-outgoing`, { immediate: true });
    }
    
    // If either socket is still active, don't remove the record yet
    if ((record.incoming && record.incoming.writable) || 
        (record.outgoing && record.outgoing.writable)) {
      record.connectionClosed = false;  // Reset flag
      return;  // Don't finish cleanup
    }
    
    // Continue with full cleanup...
  }
}

Phase 4: Testing and Validation

4.1 Test Cases to Implement

  1. WebSocket connection should stay open for >1 hour
  2. HTTP streaming response should continue after request closes
  3. Half-open connections should work correctly
  4. Verify no socket leaks with long-running connections
  5. Test graceful shutdown with pending data

4.2 Socket Leak Prevention

  • Ensure all event listeners are tracked and removed
  • Use WeakMap for socket metadata to prevent memory leaks
  • Implement connection count monitoring
  • Add periodic health checks for orphaned sockets

Implementation Order

  1. Day 1: Implement graceful cleanupSocket() and independent socket handlers
  2. Day 2: Update all handlers to use new cleanup mechanism
  3. Day 3: Fix timeout handling to not close connections
  4. Day 4: Update connection manager parity check and cleanup logic
  5. Day 5: Comprehensive testing and leak detection

Configuration Changes

Add new options to SmartProxyOptions:

interface ISmartProxyOptions {
  // Existing options...
  
  // New options for long-lived connections
  socketCleanupGracePeriod?: number;        // Default: 5000ms
  allowHalfOpenConnections?: boolean;       // Default: true
  parityCheckTimeout?: number;              // Default: 1800000ms (30 min)
  timeoutBehavior?: 'close' | 'reset' | 'ignore';  // Default: 'reset'
}

Success Metrics

  1. WebSocket connections remain stable for 24+ hours
  2. No premature connection closures reported
  3. Memory usage remains stable (no socket leaks)
  4. Half-open connections work correctly
  5. Graceful shutdown completes within grace period

Implementation Status: COMPLETED

Implemented Changes

  1. Modified cleanupSocket() in socket-utils.ts

    • Added CleanupOptions interface with immediate, allowDrain, and gracePeriod options
    • Implemented graceful shutdown support with write buffer draining
  2. Created createIndependentSocketHandlers() in socket-utils.ts

    • Tracks socket states independently
    • Supports half-open connections where one side can close while the other remains open
    • Only triggers full cleanup when both sockets are closed
  3. Updated setupSocketHandlers() in socket-utils.ts

    • Added optional handleTimeout parameter to customize timeout behavior
    • Prevents automatic connection closure on timeout events
  4. Updated HTTPS Passthrough Handler

    • Now uses createIndependentSocketHandlers for half-open support
    • Custom timeout handling that resets timer instead of closing connection
    • Manual data forwarding with backpressure handling
  5. Updated Connection Manager

    • Extended parity check from 2 minutes to 30 minutes
    • Added activity check before closing (10 minutes of inactivity required)
    • Modified cleanup to check socket states before destroying
  6. Updated Basic Forwarding in Route Connection Handler

    • Replaced simple pipe() with independent socket handlers
    • Added manual data forwarding with backpressure support
    • Removed bilateral close handlers to prevent premature cleanup

Test Results

All tests passing:

  • Long-lived connection test: Connection stayed open for 61+ seconds with periodic keep-alive
  • Half-open connection test: One side closed while the other continued to send data
  • No socket leaks or premature closures

Notes

  • The fix maintains backward compatibility
  • No configuration changes required for existing deployments
  • Long-lived connections now work correctly in both HTTPS passthrough and basic forwarding modes