Improve error handling and logging for outgoing connections in RouteConnectionHandler
This commit is contained in:
165
readme.plan.md
165
readme.plan.md
@ -1,165 +0,0 @@
|
||||
# SmartProxy Socket Handling Fix Plan
|
||||
|
||||
Reread CLAUDE.md file for guidelines
|
||||
|
||||
## Implementation Summary (COMPLETED)
|
||||
|
||||
The critical socket handling issues have been fixed:
|
||||
|
||||
1. **Prevented Server Crashes**: Created `createSocketWithErrorHandler()` utility that attaches error handlers immediately upon socket creation, preventing unhandled ECONNREFUSED errors from crashing the server.
|
||||
|
||||
2. **Fixed Memory Leaks**: Updated forwarding handlers to properly clean up client sockets when server connections fail, ensuring connection records are removed from tracking.
|
||||
|
||||
3. **Key Changes Made**:
|
||||
- Added `createSocketWithErrorHandler()` in `socket-utils.ts`
|
||||
- Updated `https-passthrough-handler.ts` to use safe socket creation
|
||||
- Updated `https-terminate-to-http-handler.ts` to use safe socket creation
|
||||
- Ensured client sockets are destroyed when server connections fail
|
||||
- Connection cleanup now triggered by socket close events
|
||||
|
||||
4. **Test Results**: Server no longer crashes on ECONNREFUSED errors, and connections are properly cleaned up.
|
||||
|
||||
## Problem Summary
|
||||
|
||||
The SmartProxy server is experiencing critical issues:
|
||||
1. **Server crashes** due to unhandled socket connection errors (ECONNREFUSED)
|
||||
2. **Memory leak** with steadily rising active connection count
|
||||
3. **Race conditions** between socket creation and error handler attachment
|
||||
4. **Orphaned sockets** when server connections fail
|
||||
|
||||
## Root Causes
|
||||
|
||||
### 1. Delayed Error Handler Attachment
|
||||
- Sockets created without immediate error handlers
|
||||
- Error events can fire before handlers attached
|
||||
- Causes uncaught exceptions and server crashes
|
||||
|
||||
### 2. Incomplete Cleanup Logic
|
||||
- Client sockets not cleaned up when server connection fails
|
||||
- Connection counter only decrements after BOTH sockets close
|
||||
- Failed server connections leave orphaned client sockets
|
||||
|
||||
### 3. Missing Global Error Handlers
|
||||
- No process-level uncaughtException handler
|
||||
- No process-level unhandledRejection handler
|
||||
- Any unhandled error crashes entire server
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Prevent Server Crashes (Critical)
|
||||
|
||||
#### 1.1 Add Global Error Handlers
|
||||
- [x] ~~Add global error handlers in main entry point~~ (Removed per user request - no global handlers)
|
||||
- [x] Log errors with context
|
||||
- [x] ~~Implement graceful shutdown sequence~~ (Removed - handled locally)
|
||||
|
||||
#### 1.2 Fix Socket Creation Race Condition
|
||||
- [x] Modify socket creation to attach error handlers immediately
|
||||
- [x] Update all forwarding handlers (https-passthrough, http, etc.)
|
||||
- [x] Ensure error handlers attached in same tick as socket creation
|
||||
|
||||
### Phase 2: Fix Memory Leaks (High Priority)
|
||||
|
||||
#### 2.1 Fix Connection Cleanup Logic
|
||||
- [x] Clean up client socket immediately if server connection fails
|
||||
- [x] Decrement connection counter on any socket failure (handled by socket close events)
|
||||
- [x] Implement proper cleanup for half-open connections
|
||||
|
||||
#### 2.2 Improve Socket Utils
|
||||
- [x] Create new utility function for safe socket creation with immediate error handling
|
||||
- [x] Update createIndependentSocketHandlers to handle immediate failures
|
||||
- [ ] Add connection tracking debug utilities
|
||||
|
||||
### Phase 3: Comprehensive Testing (Important)
|
||||
|
||||
#### 3.1 Create Test Cases
|
||||
- [x] Test ECONNREFUSED scenario
|
||||
- [ ] Test timeout handling
|
||||
- [ ] Test half-open connections
|
||||
- [ ] Test rapid connect/disconnect cycles
|
||||
|
||||
#### 3.2 Add Monitoring
|
||||
- [ ] Add connection leak detection
|
||||
- [ ] Add metrics for connection lifecycle
|
||||
- [ ] Add debug logging for socket state transitions
|
||||
|
||||
## Detailed Implementation Steps
|
||||
|
||||
### Step 1: Global Error Handlers (ts/proxies/smart-proxy/smart-proxy.ts)
|
||||
```typescript
|
||||
// Add in constructor or start method
|
||||
process.on('uncaughtException', (error) => {
|
||||
logger.log('error', 'Uncaught exception', { error });
|
||||
// Graceful shutdown
|
||||
});
|
||||
|
||||
process.on('unhandledRejection', (reason, promise) => {
|
||||
logger.log('error', 'Unhandled rejection', { reason, promise });
|
||||
});
|
||||
```
|
||||
|
||||
### Step 2: Safe Socket Creation Utility (ts/core/utils/socket-utils.ts)
|
||||
```typescript
|
||||
export function createSocketWithErrorHandler(
|
||||
options: net.NetConnectOpts,
|
||||
onError: (err: Error) => void
|
||||
): net.Socket {
|
||||
const socket = net.connect(options);
|
||||
socket.on('error', onError);
|
||||
return socket;
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: Fix HttpsPassthroughHandler (ts/forwarding/handlers/https-passthrough-handler.ts)
|
||||
- Replace direct socket creation with safe creation
|
||||
- Handle server connection failures immediately
|
||||
- Clean up client socket on server connection failure
|
||||
|
||||
### Step 4: Fix Connection Counting
|
||||
- Decrement on ANY socket close, not just when both close
|
||||
- Track failed connections separately
|
||||
- Add connection state tracking
|
||||
|
||||
### Step 5: Update All Handlers
|
||||
- [ ] https-passthrough-handler.ts
|
||||
- [ ] http-handler.ts
|
||||
- [ ] https-terminate-to-http-handler.ts
|
||||
- [ ] https-terminate-to-https-handler.ts
|
||||
- [ ] route-connection-handler.ts
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **No server crashes** on ECONNREFUSED or other socket errors
|
||||
2. **Active connections** remain stable (no steady increase)
|
||||
3. **All sockets** properly cleaned up on errors
|
||||
4. **Memory usage** remains stable under load
|
||||
5. **Graceful handling** of all error scenarios
|
||||
|
||||
## Testing Plan
|
||||
|
||||
1. Simulate ECONNREFUSED by targeting closed ports
|
||||
2. Monitor active connection count over time
|
||||
3. Stress test with rapid connections
|
||||
4. Test with unreachable hosts
|
||||
5. Test with slow/timing out connections
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If issues arise:
|
||||
1. Revert socket creation changes
|
||||
2. Keep global error handlers (they add safety)
|
||||
3. Add more detailed logging for debugging
|
||||
4. Implement fixes incrementally
|
||||
|
||||
## Timeline
|
||||
|
||||
- Phase 1: Immediate (prevents crashes)
|
||||
- Phase 2: Within 24 hours (fixes leaks)
|
||||
- Phase 3: Within 48 hours (ensures stability)
|
||||
|
||||
## Notes
|
||||
|
||||
- The race condition is the most critical issue
|
||||
- Connection counting logic needs complete overhaul
|
||||
- Consider using a connection state machine for clarity
|
||||
- Add connection lifecycle events for debugging
|
Reference in New Issue
Block a user