diff --git a/readme.plan.md b/readme.plan.md index 7dc1d9f..8036e6f 100644 --- a/readme.plan.md +++ b/readme.plan.md @@ -1,337 +1,148 @@ -# SmartProxy Socket Cleanup Fix Plan +# SmartProxy Socket Handling Fix Plan + +Reread CLAUDE.md file for guidelines ## Problem Summary -The current socket cleanup implementation is too aggressive and closes long-lived connections prematurely. This affects: -- WebSocket connections in HTTPS passthrough -- Long-lived HTTP connections (SSE, streaming) -- Database connections -- Any connection that should remain open for hours +The SmartProxy server is experiencing critical issues: +1. **Server crashes** due to unhandled socket connection errors (ECONNREFUSED) +2. **Memory leak** with steadily rising active connection count +3. **Race conditions** between socket creation and error handler attachment +4. **Orphaned sockets** when server connections fail ## Root Causes -### 1. **Bilateral Socket Cleanup** -When one socket closes, both sockets are immediately destroyed: -```typescript -// In createSocketCleanupHandler -cleanupSocket(clientSocket, 'client'); -cleanupSocket(serverSocket, 'server'); // Both destroyed together! -``` +### 1. Delayed Error Handler Attachment +- Sockets created without immediate error handlers +- Error events can fire before handlers attached +- Causes uncaught exceptions and server crashes -### 2. **Aggressive Timeout Handling** -Timeout events immediately trigger connection cleanup: +### 2. Incomplete Cleanup Logic +- Client sockets not cleaned up when server connection fails +- Connection counter only decrements after BOTH sockets close +- Failed server connections leave orphaned client sockets + +### 3. Missing Global Error Handlers +- No process-level uncaughtException handler +- No process-level unhandledRejection handler +- Any unhandled error crashes entire server + +## Implementation Plan + +### Phase 1: Prevent Server Crashes (Critical) + +#### 1.1 Add Global Error Handlers +- [ ] Add global error handlers in main entry point (ts/index.ts or smart-proxy.ts) +- [ ] Log errors with context before graceful shutdown +- [ ] Implement graceful shutdown sequence + +#### 1.2 Fix Socket Creation Race Condition +- [ ] Modify socket creation to attach error handlers immediately +- [ ] Update all forwarding handlers (https-passthrough, http, etc.) +- [ ] Ensure error handlers attached in same tick as socket creation + +### Phase 2: Fix Memory Leaks (High Priority) + +#### 2.1 Fix Connection Cleanup Logic +- [ ] Clean up client socket immediately if server connection fails +- [ ] Decrement connection counter on any socket failure +- [ ] Implement proper cleanup for half-open connections + +#### 2.2 Improve Socket Utils +- [ ] Create new utility function for safe socket creation with immediate error handling +- [ ] Update createIndependentSocketHandlers to handle immediate failures +- [ ] Add connection tracking debug utilities + +### Phase 3: Comprehensive Testing (Important) + +#### 3.1 Create Test Cases +- [ ] Test ECONNREFUSED scenario +- [ ] Test timeout handling +- [ ] Test half-open connections +- [ ] Test rapid connect/disconnect cycles + +#### 3.2 Add Monitoring +- [ ] Add connection leak detection +- [ ] Add metrics for connection lifecycle +- [ ] Add debug logging for socket state transitions + +## Detailed Implementation Steps + +### Step 1: Global Error Handlers (ts/proxies/smart-proxy/smart-proxy.ts) ```typescript -socket.on('timeout', () => { - handleClose(`${prefix}_timeout`); // Destroys both sockets! +// Add in constructor or start method +process.on('uncaughtException', (error) => { + logger.log('error', 'Uncaught exception', { error }); + // Graceful shutdown +}); + +process.on('unhandledRejection', (reason, promise) => { + logger.log('error', 'Unhandled rejection', { reason, promise }); }); ``` -### 3. **Parity Check Forces Closure** -If one socket closes but the other remains open for >2 minutes, connection is forcefully terminated: +### Step 2: Safe Socket Creation Utility (ts/core/utils/socket-utils.ts) ```typescript -if (record.outgoingClosedTime && - !record.incoming.destroyed && - now - record.outgoingClosedTime > 120000) { - this.cleanupConnection(record, 'parity_check'); +export function createSocketWithErrorHandler( + options: net.NetConnectOpts, + onError: (err: Error) => void +): net.Socket { + const socket = net.connect(options); + socket.on('error', onError); + return socket; } ``` -### 4. **No Half-Open Connection Support** -The proxy doesn't support TCP half-open connections where one side closes while the other continues sending. +### Step 3: Fix HttpsPassthroughHandler (ts/forwarding/handlers/https-passthrough-handler.ts) +- Replace direct socket creation with safe creation +- Handle server connection failures immediately +- Clean up client socket on server connection failure -## Fix Implementation Plan +### Step 4: Fix Connection Counting +- Decrement on ANY socket close, not just when both close +- Track failed connections separately +- Add connection state tracking -### Phase 1: Fix Socket Cleanup (Prevent Premature Closure) +### Step 5: Update All Handlers +- [ ] https-passthrough-handler.ts +- [ ] http-handler.ts +- [ ] https-terminate-to-http-handler.ts +- [ ] https-terminate-to-https-handler.ts +- [ ] route-connection-handler.ts -#### 1.1 Modify `cleanupSocket()` to support graceful shutdown -```typescript -export interface CleanupOptions { - immediate?: boolean; // Force immediate destruction - allowDrain?: boolean; // Allow write buffer to drain - gracePeriod?: number; // Ms to wait before force close -} +## Success Criteria -export function cleanupSocket( - socket: Socket | TLSSocket | null, - socketName?: string, - options: CleanupOptions = {} -): Promise { - if (!socket || socket.destroyed) return Promise.resolve(); - - return new Promise((resolve) => { - const cleanup = () => { - socket.removeAllListeners(); - if (!socket.destroyed) { - socket.destroy(); - } - resolve(); - }; - - if (options.immediate) { - cleanup(); - } else if (options.allowDrain && socket.writable) { - // Allow pending writes to complete - socket.end(() => cleanup()); - - // Force cleanup after grace period - if (options.gracePeriod) { - setTimeout(cleanup, options.gracePeriod); - } - } else { - cleanup(); - } - }); -} -``` +1. **No server crashes** on ECONNREFUSED or other socket errors +2. **Active connections** remain stable (no steady increase) +3. **All sockets** properly cleaned up on errors +4. **Memory usage** remains stable under load +5. **Graceful handling** of all error scenarios -#### 1.2 Implement Independent Socket Tracking -```typescript -export function createIndependentSocketHandlers( - clientSocket: Socket, - serverSocket: Socket, - onBothClosed: (reason: string) => void -): { cleanupClient: () => void, cleanupServer: () => void } { - let clientClosed = false; - let serverClosed = false; - let clientReason = ''; - let serverReason = ''; - - const checkBothClosed = () => { - if (clientClosed && serverClosed) { - onBothClosed(`client: ${clientReason}, server: ${serverReason}`); - } - }; - - const cleanupClient = async (reason: string) => { - if (clientClosed) return; - clientClosed = true; - clientReason = reason; - - // Allow server to continue if still active - if (!serverClosed && serverSocket.writable) { - // Half-close: stop reading from client, let server finish - clientSocket.pause(); - clientSocket.unpipe(serverSocket); - await cleanupSocket(clientSocket, 'client', { allowDrain: true }); - } else { - await cleanupSocket(clientSocket, 'client'); - } - - checkBothClosed(); - }; - - const cleanupServer = async (reason: string) => { - if (serverClosed) return; - serverClosed = true; - serverReason = reason; - - // Allow client to continue if still active - if (!clientClosed && clientSocket.writable) { - // Half-close: stop reading from server, let client finish - serverSocket.pause(); - serverSocket.unpipe(clientSocket); - await cleanupSocket(serverSocket, 'server', { allowDrain: true }); - } else { - await cleanupSocket(serverSocket, 'server'); - } - - checkBothClosed(); - }; - - return { cleanupClient, cleanupServer }; -} -``` +## Testing Plan -### Phase 2: Fix Timeout Handling +1. Simulate ECONNREFUSED by targeting closed ports +2. Monitor active connection count over time +3. Stress test with rapid connections +4. Test with unreachable hosts +5. Test with slow/timing out connections -#### 2.1 Separate timeout handling from connection closure -```typescript -export function setupSocketHandlers( - socket: Socket | TLSSocket, - handleClose: (reason: string) => void, - handleTimeout?: (socket: Socket) => void, // New optional handler - errorPrefix?: string -): void { - socket.on('error', (error) => { - const prefix = errorPrefix || 'Socket'; - handleClose(`${prefix}_error: ${error.message}`); - }); - - socket.on('close', () => { - const prefix = errorPrefix || 'socket'; - handleClose(`${prefix}_closed`); - }); - - socket.on('timeout', () => { - if (handleTimeout) { - handleTimeout(socket); // Custom timeout handling - } else { - // Default: just log, don't close - console.warn(`Socket timeout: ${errorPrefix || 'socket'}`); - } - }); -} -``` +## Rollback Plan -#### 2.2 Update HTTPS passthrough handler -```typescript -// In https-passthrough-handler.ts -const { cleanupClient, cleanupServer } = createIndependentSocketHandlers( - clientSocket, - serverSocket, - (reason) => { - this.emit(ForwardingHandlerEvents.DISCONNECTED, { - remoteAddress, - bytesSent, - bytesReceived, - reason - }); - } -); +If issues arise: +1. Revert socket creation changes +2. Keep global error handlers (they add safety) +3. Add more detailed logging for debugging +4. Implement fixes incrementally -// Setup handlers with custom timeout handling -setupSocketHandlers(clientSocket, cleanupClient, (socket) => { - // Just reset timeout, don't close - socket.setTimeout(timeout); -}, 'client'); +## Timeline -setupSocketHandlers(serverSocket, cleanupServer, (socket) => { - // Just reset timeout, don't close - socket.setTimeout(timeout); -}, 'server'); -``` +- Phase 1: Immediate (prevents crashes) +- Phase 2: Within 24 hours (fixes leaks) +- Phase 3: Within 48 hours (ensures stability) -### Phase 3: Fix Connection Manager +## Notes -#### 3.1 Remove aggressive parity check -```typescript -// Remove or significantly increase the parity check timeout -// From 2 minutes to 30 minutes for long-lived connections -if (record.outgoingClosedTime && - !record.incoming.destroyed && - !record.connectionClosed && - now - record.outgoingClosedTime > 1800000) { // 30 minutes - // Only close if no data activity - if (now - record.lastActivity > 600000) { // 10 minutes of inactivity - this.cleanupConnection(record, 'parity_check'); - } -} -``` - -#### 3.2 Update cleanupConnection to check socket states -```typescript -public cleanupConnection(record: IConnectionRecord, reason: string = 'normal'): void { - if (!record.connectionClosed) { - record.connectionClosed = true; - - // Only cleanup sockets that are actually closed or inactive - if (record.incoming && (!record.incoming.writable || record.incoming.destroyed)) { - cleanupSocket(record.incoming, `${record.id}-incoming`, { immediate: true }); - } - - if (record.outgoing && (!record.outgoing.writable || record.outgoing.destroyed)) { - cleanupSocket(record.outgoing, `${record.id}-outgoing`, { immediate: true }); - } - - // If either socket is still active, don't remove the record yet - if ((record.incoming && record.incoming.writable) || - (record.outgoing && record.outgoing.writable)) { - record.connectionClosed = false; // Reset flag - return; // Don't finish cleanup - } - - // Continue with full cleanup... - } -} -``` - -### Phase 4: Testing and Validation - -#### 4.1 Test Cases to Implement -1. WebSocket connection should stay open for >1 hour -2. HTTP streaming response should continue after request closes -3. Half-open connections should work correctly -4. Verify no socket leaks with long-running connections -5. Test graceful shutdown with pending data - -#### 4.2 Socket Leak Prevention -- Ensure all event listeners are tracked and removed -- Use WeakMap for socket metadata to prevent memory leaks -- Implement connection count monitoring -- Add periodic health checks for orphaned sockets - -## Implementation Order - -1. **Day 1**: Implement graceful `cleanupSocket()` and independent socket handlers -2. **Day 2**: Update all handlers to use new cleanup mechanism -3. **Day 3**: Fix timeout handling to not close connections -4. **Day 4**: Update connection manager parity check and cleanup logic -5. **Day 5**: Comprehensive testing and leak detection - -## Configuration Changes - -Add new options to SmartProxyOptions: -```typescript -interface ISmartProxyOptions { - // Existing options... - - // New options for long-lived connections - socketCleanupGracePeriod?: number; // Default: 5000ms - allowHalfOpenConnections?: boolean; // Default: true - parityCheckTimeout?: number; // Default: 1800000ms (30 min) - timeoutBehavior?: 'close' | 'reset' | 'ignore'; // Default: 'reset' -} -``` - -## Success Metrics - -1. WebSocket connections remain stable for 24+ hours -2. No premature connection closures reported -3. Memory usage remains stable (no socket leaks) -4. Half-open connections work correctly -5. Graceful shutdown completes within grace period - -## Implementation Status: COMPLETED ✅ - -### Implemented Changes - -1. **Modified `cleanupSocket()` in `socket-utils.ts`** - - Added `CleanupOptions` interface with `immediate`, `allowDrain`, and `gracePeriod` options - - Implemented graceful shutdown support with write buffer draining - -2. **Created `createIndependentSocketHandlers()` in `socket-utils.ts`** - - Tracks socket states independently - - Supports half-open connections where one side can close while the other remains open - - Only triggers full cleanup when both sockets are closed - -3. **Updated `setupSocketHandlers()` in `socket-utils.ts`** - - Added optional `handleTimeout` parameter to customize timeout behavior - - Prevents automatic connection closure on timeout events - -4. **Updated HTTPS Passthrough Handler** - - Now uses `createIndependentSocketHandlers` for half-open support - - Custom timeout handling that resets timer instead of closing connection - - Manual data forwarding with backpressure handling - -5. **Updated Connection Manager** - - Extended parity check from 2 minutes to 30 minutes - - Added activity check before closing (10 minutes of inactivity required) - - Modified cleanup to check socket states before destroying - -6. **Updated Basic Forwarding in Route Connection Handler** - - Replaced simple `pipe()` with independent socket handlers - - Added manual data forwarding with backpressure support - - Removed bilateral close handlers to prevent premature cleanup - -### Test Results - -All tests passing: -- ✅ Long-lived connection test: Connection stayed open for 61+ seconds with periodic keep-alive -- ✅ Half-open connection test: One side closed while the other continued to send data -- ✅ No socket leaks or premature closures - -### Notes - -- The fix maintains backward compatibility -- No configuration changes required for existing deployments -- Long-lived connections now work correctly in both HTTPS passthrough and basic forwarding modes \ No newline at end of file +- The race condition is the most critical issue +- Connection counting logic needs complete overhaul +- Consider using a connection state machine for clarity +- Add connection lifecycle events for debugging \ No newline at end of file