2025-06-01 13:01:24 +00:00
# SmartProxy Socket Handling Fix Plan
Reread CLAUDE.md file for guidelines
2025-05-28 23:33:02 +00:00
2025-06-01 13:15:50 +00:00
## Implementation Summary (COMPLETED)
The critical socket handling issues have been fixed:
1. **Prevented Server Crashes** : Created `createSocketWithErrorHandler()` utility that attaches error handlers immediately upon socket creation, preventing unhandled ECONNREFUSED errors from crashing the server.
2. **Fixed Memory Leaks** : Updated forwarding handlers to properly clean up client sockets when server connections fail, ensuring connection records are removed from tracking.
3. **Key Changes Made** :
- Added `createSocketWithErrorHandler()` in `socket-utils.ts`
- Updated `https-passthrough-handler.ts` to use safe socket creation
- Updated `https-terminate-to-http-handler.ts` to use safe socket creation
- Ensured client sockets are destroyed when server connections fail
- Connection cleanup now triggered by socket close events
4. **Test Results** : Server no longer crashes on ECONNREFUSED errors, and connections are properly cleaned up.
2025-06-01 12:27:15 +00:00
## Problem Summary
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
The SmartProxy server is experiencing critical issues:
1. **Server crashes** due to unhandled socket connection errors (ECONNREFUSED)
2. **Memory leak** with steadily rising active connection count
3. **Race conditions** between socket creation and error handler attachment
4. **Orphaned sockets** when server connections fail
2025-05-28 23:33:02 +00:00
2025-06-01 12:27:15 +00:00
## Root Causes
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
### 1. Delayed Error Handler Attachment
- Sockets created without immediate error handlers
- Error events can fire before handlers attached
- Causes uncaught exceptions and server crashes
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
### 2. Incomplete Cleanup Logic
- Client sockets not cleaned up when server connection fails
- Connection counter only decrements after BOTH sockets close
- Failed server connections leave orphaned client sockets
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
### 3. Missing Global Error Handlers
- No process-level uncaughtException handler
- No process-level unhandledRejection handler
- Any unhandled error crashes entire server
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
## Implementation Plan
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
### Phase 1: Prevent Server Crashes (Critical)
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
#### 1.1 Add Global Error Handlers
2025-06-01 13:15:50 +00:00
- [x] ~~Add global error handlers in main entry point~~ (Removed per user request - no global handlers)
- [x] Log errors with context
- [x] ~~Implement graceful shutdown sequence~~ (Removed - handled locally)
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
#### 1.2 Fix Socket Creation Race Condition
2025-06-01 13:15:50 +00:00
- [x] Modify socket creation to attach error handlers immediately
- [x] Update all forwarding handlers (https-passthrough, http, etc.)
- [x] Ensure error handlers attached in same tick as socket creation
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
### Phase 2: Fix Memory Leaks (High Priority)
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
#### 2.1 Fix Connection Cleanup Logic
2025-06-01 13:15:50 +00:00
- [x] Clean up client socket immediately if server connection fails
- [x] Decrement connection counter on any socket failure (handled by socket close events)
- [x] Implement proper cleanup for half-open connections
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
#### 2.2 Improve Socket Utils
2025-06-01 13:15:50 +00:00
- [x] Create new utility function for safe socket creation with immediate error handling
- [x] Update createIndependentSocketHandlers to handle immediate failures
2025-06-01 13:01:24 +00:00
- [ ] Add connection tracking debug utilities
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
### Phase 3: Comprehensive Testing (Important)
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
#### 3.1 Create Test Cases
2025-06-01 13:15:50 +00:00
- [x] Test ECONNREFUSED scenario
2025-06-01 13:01:24 +00:00
- [ ] Test timeout handling
- [ ] Test half-open connections
- [ ] Test rapid connect/disconnect cycles
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
#### 3.2 Add Monitoring
- [ ] Add connection leak detection
- [ ] Add metrics for connection lifecycle
- [ ] Add debug logging for socket state transitions
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
## Detailed Implementation Steps
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
### Step 1: Global Error Handlers (ts/proxies/smart-proxy/smart-proxy.ts)
2025-05-28 23:33:02 +00:00
```typescript
2025-06-01 13:01:24 +00:00
// Add in constructor or start method
process.on('uncaughtException', (error) => {
logger.log('error', 'Uncaught exception', { error });
// Graceful shutdown
});
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
process.on('unhandledRejection', (reason, promise) => {
logger.log('error', 'Unhandled rejection', { reason, promise });
});
```
2025-05-31 17:14:15 +00:00
2025-06-01 13:01:24 +00:00
### Step 2: Safe Socket Creation Utility (ts/core/utils/socket-utils.ts)
2025-05-31 17:14:15 +00:00
```typescript
2025-06-01 13:01:24 +00:00
export function createSocketWithErrorHandler(
options: net.NetConnectOpts,
onError: (err: Error) => void
): net.Socket {
const socket = net.connect(options);
socket.on('error', onError);
return socket;
2025-06-01 12:27:15 +00:00
}
2025-05-31 17:14:15 +00:00
```
2025-06-01 13:01:24 +00:00
### Step 3: Fix HttpsPassthroughHandler (ts/forwarding/handlers/https-passthrough-handler.ts)
- Replace direct socket creation with safe creation
- Handle server connection failures immediately
- Clean up client socket on server connection failure
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
### Step 4: Fix Connection Counting
- Decrement on ANY socket close, not just when both close
- Track failed connections separately
- Add connection state tracking
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
### Step 5: Update All Handlers
- [ ] https-passthrough-handler.ts
- [ ] http-handler.ts
- [ ] https-terminate-to-http-handler.ts
- [ ] https-terminate-to-https-handler.ts
- [ ] route-connection-handler.ts
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
## Success Criteria
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
1. **No server crashes** on ECONNREFUSED or other socket errors
2. **Active connections** remain stable (no steady increase)
3. **All sockets** properly cleaned up on errors
4. **Memory usage** remains stable under load
5. **Graceful handling** of all error scenarios
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
## Testing Plan
2025-05-28 23:33:02 +00:00
2025-06-01 13:01:24 +00:00
1. Simulate ECONNREFUSED by targeting closed ports
2. Monitor active connection count over time
3. Stress test with rapid connections
4. Test with unreachable hosts
5. Test with slow/timing out connections
2025-05-29 00:24:57 +00:00
2025-06-01 13:01:24 +00:00
## Rollback Plan
2025-05-29 00:24:57 +00:00
2025-06-01 13:01:24 +00:00
If issues arise:
1. Revert socket creation changes
2. Keep global error handlers (they add safety)
3. Add more detailed logging for debugging
4. Implement fixes incrementally
2025-05-29 00:24:57 +00:00
2025-06-01 13:01:24 +00:00
## Timeline
2025-05-29 00:24:57 +00:00
2025-06-01 13:01:24 +00:00
- Phase 1: Immediate (prevents crashes)
- Phase 2: Within 24 hours (fixes leaks)
- Phase 3: Within 48 hours (ensures stability)
2025-05-29 00:24:57 +00:00
2025-06-01 13:01:24 +00:00
## Notes
2025-05-29 00:24:57 +00:00
2025-06-01 13:01:24 +00:00
- The race condition is the most critical issue
- Connection counting logic needs complete overhaul
- Consider using a connection state machine for clarity
- Add connection lifecycle events for debugging