2025-06-01 13:01:24 +00:00
|
|
|
# SmartProxy Socket Handling Fix Plan
|
|
|
|
|
|
|
|
Reread CLAUDE.md file for guidelines
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 12:27:15 +00:00
|
|
|
## Problem Summary
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
The SmartProxy server is experiencing critical issues:
|
|
|
|
1. **Server crashes** due to unhandled socket connection errors (ECONNREFUSED)
|
|
|
|
2. **Memory leak** with steadily rising active connection count
|
|
|
|
3. **Race conditions** between socket creation and error handler attachment
|
|
|
|
4. **Orphaned sockets** when server connections fail
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 12:27:15 +00:00
|
|
|
## Root Causes
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### 1. Delayed Error Handler Attachment
|
|
|
|
- Sockets created without immediate error handlers
|
|
|
|
- Error events can fire before handlers attached
|
|
|
|
- Causes uncaught exceptions and server crashes
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### 2. Incomplete Cleanup Logic
|
|
|
|
- Client sockets not cleaned up when server connection fails
|
|
|
|
- Connection counter only decrements after BOTH sockets close
|
|
|
|
- Failed server connections leave orphaned client sockets
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### 3. Missing Global Error Handlers
|
|
|
|
- No process-level uncaughtException handler
|
|
|
|
- No process-level unhandledRejection handler
|
|
|
|
- Any unhandled error crashes entire server
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
## Implementation Plan
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Phase 1: Prevent Server Crashes (Critical)
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
#### 1.1 Add Global Error Handlers
|
|
|
|
- [ ] Add global error handlers in main entry point (ts/index.ts or smart-proxy.ts)
|
|
|
|
- [ ] Log errors with context before graceful shutdown
|
|
|
|
- [ ] Implement graceful shutdown sequence
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
#### 1.2 Fix Socket Creation Race Condition
|
|
|
|
- [ ] Modify socket creation to attach error handlers immediately
|
|
|
|
- [ ] Update all forwarding handlers (https-passthrough, http, etc.)
|
|
|
|
- [ ] Ensure error handlers attached in same tick as socket creation
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Phase 2: Fix Memory Leaks (High Priority)
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
#### 2.1 Fix Connection Cleanup Logic
|
|
|
|
- [ ] Clean up client socket immediately if server connection fails
|
|
|
|
- [ ] Decrement connection counter on any socket failure
|
|
|
|
- [ ] Implement proper cleanup for half-open connections
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
#### 2.2 Improve Socket Utils
|
|
|
|
- [ ] Create new utility function for safe socket creation with immediate error handling
|
|
|
|
- [ ] Update createIndependentSocketHandlers to handle immediate failures
|
|
|
|
- [ ] Add connection tracking debug utilities
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Phase 3: Comprehensive Testing (Important)
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
#### 3.1 Create Test Cases
|
|
|
|
- [ ] Test ECONNREFUSED scenario
|
|
|
|
- [ ] Test timeout handling
|
|
|
|
- [ ] Test half-open connections
|
|
|
|
- [ ] Test rapid connect/disconnect cycles
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
#### 3.2 Add Monitoring
|
|
|
|
- [ ] Add connection leak detection
|
|
|
|
- [ ] Add metrics for connection lifecycle
|
|
|
|
- [ ] Add debug logging for socket state transitions
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
## Detailed Implementation Steps
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Step 1: Global Error Handlers (ts/proxies/smart-proxy/smart-proxy.ts)
|
2025-05-28 23:33:02 +00:00
|
|
|
```typescript
|
2025-06-01 13:01:24 +00:00
|
|
|
// Add in constructor or start method
|
|
|
|
process.on('uncaughtException', (error) => {
|
|
|
|
logger.log('error', 'Uncaught exception', { error });
|
|
|
|
// Graceful shutdown
|
|
|
|
});
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
process.on('unhandledRejection', (reason, promise) => {
|
|
|
|
logger.log('error', 'Unhandled rejection', { reason, promise });
|
|
|
|
});
|
|
|
|
```
|
2025-05-31 17:14:15 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Step 2: Safe Socket Creation Utility (ts/core/utils/socket-utils.ts)
|
2025-05-31 17:14:15 +00:00
|
|
|
```typescript
|
2025-06-01 13:01:24 +00:00
|
|
|
export function createSocketWithErrorHandler(
|
|
|
|
options: net.NetConnectOpts,
|
|
|
|
onError: (err: Error) => void
|
|
|
|
): net.Socket {
|
|
|
|
const socket = net.connect(options);
|
|
|
|
socket.on('error', onError);
|
|
|
|
return socket;
|
2025-06-01 12:27:15 +00:00
|
|
|
}
|
2025-05-31 17:14:15 +00:00
|
|
|
```
|
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Step 3: Fix HttpsPassthroughHandler (ts/forwarding/handlers/https-passthrough-handler.ts)
|
|
|
|
- Replace direct socket creation with safe creation
|
|
|
|
- Handle server connection failures immediately
|
|
|
|
- Clean up client socket on server connection failure
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Step 4: Fix Connection Counting
|
|
|
|
- Decrement on ANY socket close, not just when both close
|
|
|
|
- Track failed connections separately
|
|
|
|
- Add connection state tracking
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
### Step 5: Update All Handlers
|
|
|
|
- [ ] https-passthrough-handler.ts
|
|
|
|
- [ ] http-handler.ts
|
|
|
|
- [ ] https-terminate-to-http-handler.ts
|
|
|
|
- [ ] https-terminate-to-https-handler.ts
|
|
|
|
- [ ] route-connection-handler.ts
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
## Success Criteria
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
1. **No server crashes** on ECONNREFUSED or other socket errors
|
|
|
|
2. **Active connections** remain stable (no steady increase)
|
|
|
|
3. **All sockets** properly cleaned up on errors
|
|
|
|
4. **Memory usage** remains stable under load
|
|
|
|
5. **Graceful handling** of all error scenarios
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
## Testing Plan
|
2025-05-28 23:33:02 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
1. Simulate ECONNREFUSED by targeting closed ports
|
|
|
|
2. Monitor active connection count over time
|
|
|
|
3. Stress test with rapid connections
|
|
|
|
4. Test with unreachable hosts
|
|
|
|
5. Test with slow/timing out connections
|
2025-05-29 00:24:57 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
## Rollback Plan
|
2025-05-29 00:24:57 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
If issues arise:
|
|
|
|
1. Revert socket creation changes
|
|
|
|
2. Keep global error handlers (they add safety)
|
|
|
|
3. Add more detailed logging for debugging
|
|
|
|
4. Implement fixes incrementally
|
2025-05-29 00:24:57 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
## Timeline
|
2025-05-29 00:24:57 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
- Phase 1: Immediate (prevents crashes)
|
|
|
|
- Phase 2: Within 24 hours (fixes leaks)
|
|
|
|
- Phase 3: Within 48 hours (ensures stability)
|
2025-05-29 00:24:57 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
## Notes
|
2025-05-29 00:24:57 +00:00
|
|
|
|
2025-06-01 13:01:24 +00:00
|
|
|
- The race condition is the most critical issue
|
|
|
|
- Connection counting logic needs complete overhaul
|
|
|
|
- Consider using a connection state machine for clarity
|
|
|
|
- Add connection lifecycle events for debugging
|