Improve error handling and logging for outgoing connections in RouteConnectionHandler
This commit is contained in:
parent
705a59413d
commit
2d240671ab
165
readme.plan.md
165
readme.plan.md
@ -1,165 +0,0 @@
|
||||
# SmartProxy Socket Handling Fix Plan
|
||||
|
||||
Reread CLAUDE.md file for guidelines
|
||||
|
||||
## Implementation Summary (COMPLETED)
|
||||
|
||||
The critical socket handling issues have been fixed:
|
||||
|
||||
1. **Prevented Server Crashes**: Created `createSocketWithErrorHandler()` utility that attaches error handlers immediately upon socket creation, preventing unhandled ECONNREFUSED errors from crashing the server.
|
||||
|
||||
2. **Fixed Memory Leaks**: Updated forwarding handlers to properly clean up client sockets when server connections fail, ensuring connection records are removed from tracking.
|
||||
|
||||
3. **Key Changes Made**:
|
||||
- Added `createSocketWithErrorHandler()` in `socket-utils.ts`
|
||||
- Updated `https-passthrough-handler.ts` to use safe socket creation
|
||||
- Updated `https-terminate-to-http-handler.ts` to use safe socket creation
|
||||
- Ensured client sockets are destroyed when server connections fail
|
||||
- Connection cleanup now triggered by socket close events
|
||||
|
||||
4. **Test Results**: Server no longer crashes on ECONNREFUSED errors, and connections are properly cleaned up.
|
||||
|
||||
## Problem Summary
|
||||
|
||||
The SmartProxy server is experiencing critical issues:
|
||||
1. **Server crashes** due to unhandled socket connection errors (ECONNREFUSED)
|
||||
2. **Memory leak** with steadily rising active connection count
|
||||
3. **Race conditions** between socket creation and error handler attachment
|
||||
4. **Orphaned sockets** when server connections fail
|
||||
|
||||
## Root Causes
|
||||
|
||||
### 1. Delayed Error Handler Attachment
|
||||
- Sockets created without immediate error handlers
|
||||
- Error events can fire before handlers attached
|
||||
- Causes uncaught exceptions and server crashes
|
||||
|
||||
### 2. Incomplete Cleanup Logic
|
||||
- Client sockets not cleaned up when server connection fails
|
||||
- Connection counter only decrements after BOTH sockets close
|
||||
- Failed server connections leave orphaned client sockets
|
||||
|
||||
### 3. Missing Global Error Handlers
|
||||
- No process-level uncaughtException handler
|
||||
- No process-level unhandledRejection handler
|
||||
- Any unhandled error crashes entire server
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Prevent Server Crashes (Critical)
|
||||
|
||||
#### 1.1 Add Global Error Handlers
|
||||
- [x] ~~Add global error handlers in main entry point~~ (Removed per user request - no global handlers)
|
||||
- [x] Log errors with context
|
||||
- [x] ~~Implement graceful shutdown sequence~~ (Removed - handled locally)
|
||||
|
||||
#### 1.2 Fix Socket Creation Race Condition
|
||||
- [x] Modify socket creation to attach error handlers immediately
|
||||
- [x] Update all forwarding handlers (https-passthrough, http, etc.)
|
||||
- [x] Ensure error handlers attached in same tick as socket creation
|
||||
|
||||
### Phase 2: Fix Memory Leaks (High Priority)
|
||||
|
||||
#### 2.1 Fix Connection Cleanup Logic
|
||||
- [x] Clean up client socket immediately if server connection fails
|
||||
- [x] Decrement connection counter on any socket failure (handled by socket close events)
|
||||
- [x] Implement proper cleanup for half-open connections
|
||||
|
||||
#### 2.2 Improve Socket Utils
|
||||
- [x] Create new utility function for safe socket creation with immediate error handling
|
||||
- [x] Update createIndependentSocketHandlers to handle immediate failures
|
||||
- [ ] Add connection tracking debug utilities
|
||||
|
||||
### Phase 3: Comprehensive Testing (Important)
|
||||
|
||||
#### 3.1 Create Test Cases
|
||||
- [x] Test ECONNREFUSED scenario
|
||||
- [ ] Test timeout handling
|
||||
- [ ] Test half-open connections
|
||||
- [ ] Test rapid connect/disconnect cycles
|
||||
|
||||
#### 3.2 Add Monitoring
|
||||
- [ ] Add connection leak detection
|
||||
- [ ] Add metrics for connection lifecycle
|
||||
- [ ] Add debug logging for socket state transitions
|
||||
|
||||
## Detailed Implementation Steps
|
||||
|
||||
### Step 1: Global Error Handlers (ts/proxies/smart-proxy/smart-proxy.ts)
|
||||
```typescript
|
||||
// Add in constructor or start method
|
||||
process.on('uncaughtException', (error) => {
|
||||
logger.log('error', 'Uncaught exception', { error });
|
||||
// Graceful shutdown
|
||||
});
|
||||
|
||||
process.on('unhandledRejection', (reason, promise) => {
|
||||
logger.log('error', 'Unhandled rejection', { reason, promise });
|
||||
});
|
||||
```
|
||||
|
||||
### Step 2: Safe Socket Creation Utility (ts/core/utils/socket-utils.ts)
|
||||
```typescript
|
||||
export function createSocketWithErrorHandler(
|
||||
options: net.NetConnectOpts,
|
||||
onError: (err: Error) => void
|
||||
): net.Socket {
|
||||
const socket = net.connect(options);
|
||||
socket.on('error', onError);
|
||||
return socket;
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: Fix HttpsPassthroughHandler (ts/forwarding/handlers/https-passthrough-handler.ts)
|
||||
- Replace direct socket creation with safe creation
|
||||
- Handle server connection failures immediately
|
||||
- Clean up client socket on server connection failure
|
||||
|
||||
### Step 4: Fix Connection Counting
|
||||
- Decrement on ANY socket close, not just when both close
|
||||
- Track failed connections separately
|
||||
- Add connection state tracking
|
||||
|
||||
### Step 5: Update All Handlers
|
||||
- [ ] https-passthrough-handler.ts
|
||||
- [ ] http-handler.ts
|
||||
- [ ] https-terminate-to-http-handler.ts
|
||||
- [ ] https-terminate-to-https-handler.ts
|
||||
- [ ] route-connection-handler.ts
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **No server crashes** on ECONNREFUSED or other socket errors
|
||||
2. **Active connections** remain stable (no steady increase)
|
||||
3. **All sockets** properly cleaned up on errors
|
||||
4. **Memory usage** remains stable under load
|
||||
5. **Graceful handling** of all error scenarios
|
||||
|
||||
## Testing Plan
|
||||
|
||||
1. Simulate ECONNREFUSED by targeting closed ports
|
||||
2. Monitor active connection count over time
|
||||
3. Stress test with rapid connections
|
||||
4. Test with unreachable hosts
|
||||
5. Test with slow/timing out connections
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If issues arise:
|
||||
1. Revert socket creation changes
|
||||
2. Keep global error handlers (they add safety)
|
||||
3. Add more detailed logging for debugging
|
||||
4. Implement fixes incrementally
|
||||
|
||||
## Timeline
|
||||
|
||||
- Phase 1: Immediate (prevents crashes)
|
||||
- Phase 2: Within 24 hours (fixes leaks)
|
||||
- Phase 3: Within 48 hours (ensures stability)
|
||||
|
||||
## Notes
|
||||
|
||||
- The race condition is the most critical issue
|
||||
- Connection counting logic needs complete overhaul
|
||||
- Consider using a connection state machine for clarity
|
||||
- Add connection lifecycle events for debugging
|
@ -1074,19 +1074,13 @@ export class RouteConnectionHandler {
|
||||
}
|
||||
|
||||
// Create the target socket with immediate error handling
|
||||
let targetSocket: plugins.net.Socket;
|
||||
let connectionEstablished = false;
|
||||
|
||||
// Flag to track if initial connection failed
|
||||
let connectionFailed = false;
|
||||
|
||||
targetSocket = createSocketWithErrorHandler({
|
||||
const targetSocket = createSocketWithErrorHandler({
|
||||
port: finalTargetPort,
|
||||
host: finalTargetHost,
|
||||
onError: (error) => {
|
||||
// Mark connection as failed
|
||||
connectionFailed = true;
|
||||
|
||||
// Connection failed - clean up immediately
|
||||
// Connection failed - clean up everything immediately
|
||||
logger.log('error',
|
||||
`Connection setup error for ${connectionId} to ${finalTargetHost}:${finalTargetPort}: ${error.message} (${(error as any).code})`,
|
||||
{
|
||||
@ -1099,6 +1093,20 @@ export class RouteConnectionHandler {
|
||||
}
|
||||
);
|
||||
|
||||
// Log specific error types for easier debugging
|
||||
if ((error as any).code === 'ECONNREFUSED') {
|
||||
logger.log('error',
|
||||
`Connection ${connectionId}: Target ${finalTargetHost}:${finalTargetPort} refused connection. Check if the target service is running and listening on that port.`,
|
||||
{
|
||||
connectionId,
|
||||
targetHost: finalTargetHost,
|
||||
targetPort: finalTargetPort,
|
||||
recommendation: 'Check if the target service is running and listening on that port.',
|
||||
component: 'route-handler'
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
// Resume the incoming socket to prevent it from hanging
|
||||
socket.resume();
|
||||
|
||||
@ -1107,120 +1115,12 @@ export class RouteConnectionHandler {
|
||||
socket.destroy();
|
||||
}
|
||||
|
||||
// Clean up the connection record
|
||||
this.connectionManager.initiateCleanupOnce(record, `connection_failed_${(error as any).code || 'unknown'}`);
|
||||
}
|
||||
});
|
||||
// Clean up the connection record - this is critical!
|
||||
this.connectionManager.cleanupConnection(record, `connection_failed_${(error as any).code || 'unknown'}`);
|
||||
},
|
||||
onConnect: () => {
|
||||
connectionEstablished = true;
|
||||
|
||||
// Only proceed with setup if connection didn't fail immediately
|
||||
if (!connectionFailed) {
|
||||
record.outgoing = targetSocket;
|
||||
record.outgoingStartTime = Date.now();
|
||||
|
||||
// Apply socket optimizations
|
||||
targetSocket.setNoDelay(this.settings.noDelay);
|
||||
|
||||
// Apply keep-alive settings if enabled
|
||||
if (this.settings.keepAlive) {
|
||||
targetSocket.setKeepAlive(true, this.settings.keepAliveInitialDelay);
|
||||
|
||||
// Apply enhanced TCP keep-alive options if enabled
|
||||
if (this.settings.enableKeepAliveProbes) {
|
||||
try {
|
||||
if ('setKeepAliveProbes' in targetSocket) {
|
||||
(targetSocket as any).setKeepAliveProbes(10);
|
||||
}
|
||||
if ('setKeepAliveInterval' in targetSocket) {
|
||||
(targetSocket as any).setKeepAliveInterval(1000);
|
||||
}
|
||||
} catch (err) {
|
||||
// Ignore errors - these are optional enhancements
|
||||
if (this.settings.enableDetailedLogging) {
|
||||
logger.log('warn', `Enhanced TCP keep-alive not supported for outgoing socket on connection ${connectionId}: ${err}`, {
|
||||
connectionId,
|
||||
error: err,
|
||||
component: 'route-handler'
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Setup improved error handling for outgoing connection
|
||||
this.setupOutgoingErrorHandler(connectionId, targetSocket, record, socket, finalTargetHost, finalTargetPort);
|
||||
|
||||
// Note: Close handlers are managed by independent socket handlers above
|
||||
// We don't register handleClose here to avoid bilateral cleanup
|
||||
|
||||
// Setup error handlers for incoming socket
|
||||
socket.on('error', this.connectionManager.handleError('incoming', record));
|
||||
|
||||
// Handle timeouts with keep-alive awareness
|
||||
socket.on('timeout', () => {
|
||||
// For keep-alive connections, just log a warning instead of closing
|
||||
if (record.hasKeepAlive) {
|
||||
logger.log('warn', `Timeout event on incoming keep-alive connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}. Connection preserved.`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
status: 'Connection preserved',
|
||||
component: 'route-handler'
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// For non-keep-alive connections, proceed with normal cleanup
|
||||
logger.log('warn', `Timeout on incoming side for connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
component: 'route-handler'
|
||||
});
|
||||
if (record.incomingTerminationReason === null) {
|
||||
record.incomingTerminationReason = 'timeout';
|
||||
this.connectionManager.incrementTerminationStat('incoming', 'timeout');
|
||||
}
|
||||
this.connectionManager.initiateCleanupOnce(record, 'timeout_incoming');
|
||||
});
|
||||
|
||||
targetSocket.on('timeout', () => {
|
||||
// For keep-alive connections, just log a warning instead of closing
|
||||
if (record.hasKeepAlive) {
|
||||
logger.log('warn', `Timeout event on outgoing keep-alive connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}. Connection preserved.`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
status: 'Connection preserved',
|
||||
component: 'route-handler'
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// For non-keep-alive connections, proceed with normal cleanup
|
||||
logger.log('warn', `Timeout on outgoing side for connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
component: 'route-handler'
|
||||
});
|
||||
if (record.outgoingTerminationReason === null) {
|
||||
record.outgoingTerminationReason = 'timeout';
|
||||
this.connectionManager.incrementTerminationStat('outgoing', 'timeout');
|
||||
}
|
||||
this.connectionManager.initiateCleanupOnce(record, 'timeout_outgoing');
|
||||
});
|
||||
|
||||
// Apply socket timeouts
|
||||
this.timeoutManager.applySocketTimeouts(record);
|
||||
|
||||
// Track outgoing data for bytes counting
|
||||
targetSocket.on('data', (chunk: Buffer) => {
|
||||
record.bytesSent += chunk.length;
|
||||
this.timeoutManager.updateActivity(record);
|
||||
});
|
||||
|
||||
// Wait for the outgoing connection to be ready before setting up piping
|
||||
targetSocket.once('connect', () => {
|
||||
if (this.settings.enableDetailedLogging) {
|
||||
logger.log('info', `Connection ${connectionId} established to target ${finalTargetHost}:${finalTargetPort}`, {
|
||||
connectionId,
|
||||
@ -1230,7 +1130,7 @@ export class RouteConnectionHandler {
|
||||
});
|
||||
}
|
||||
|
||||
// Clear the initial connection error handler
|
||||
// Clear any error listeners added by createSocketWithErrorHandler
|
||||
targetSocket.removeAllListeners('error');
|
||||
|
||||
// Add the normal error handler for established connections
|
||||
@ -1384,7 +1284,107 @@ export class RouteConnectionHandler {
|
||||
if (record.isTLS) {
|
||||
record.tlsHandshakeComplete = true;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// Only set up basic properties - everything else happens in onConnect
|
||||
record.outgoing = targetSocket;
|
||||
record.outgoingStartTime = Date.now();
|
||||
|
||||
// Apply socket optimizations
|
||||
targetSocket.setNoDelay(this.settings.noDelay);
|
||||
|
||||
// Apply keep-alive settings if enabled
|
||||
if (this.settings.keepAlive) {
|
||||
targetSocket.setKeepAlive(true, this.settings.keepAliveInitialDelay);
|
||||
|
||||
// Apply enhanced TCP keep-alive options if enabled
|
||||
if (this.settings.enableKeepAliveProbes) {
|
||||
try {
|
||||
if ('setKeepAliveProbes' in targetSocket) {
|
||||
(targetSocket as any).setKeepAliveProbes(10);
|
||||
}
|
||||
if ('setKeepAliveInterval' in targetSocket) {
|
||||
(targetSocket as any).setKeepAliveInterval(1000);
|
||||
}
|
||||
} catch (err) {
|
||||
// Ignore errors - these are optional enhancements
|
||||
if (this.settings.enableDetailedLogging) {
|
||||
logger.log('warn', `Enhanced TCP keep-alive not supported for outgoing socket on connection ${connectionId}: ${err}`, {
|
||||
connectionId,
|
||||
error: err,
|
||||
component: 'route-handler'
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Setup error handlers for incoming socket
|
||||
socket.on('error', this.connectionManager.handleError('incoming', record));
|
||||
|
||||
// Handle timeouts with keep-alive awareness
|
||||
socket.on('timeout', () => {
|
||||
// For keep-alive connections, just log a warning instead of closing
|
||||
if (record.hasKeepAlive) {
|
||||
logger.log('warn', `Timeout event on incoming keep-alive connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}. Connection preserved.`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
status: 'Connection preserved',
|
||||
component: 'route-handler'
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// For non-keep-alive connections, proceed with normal cleanup
|
||||
logger.log('warn', `Timeout on incoming side for connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
component: 'route-handler'
|
||||
});
|
||||
if (record.incomingTerminationReason === null) {
|
||||
record.incomingTerminationReason = 'timeout';
|
||||
this.connectionManager.incrementTerminationStat('incoming', 'timeout');
|
||||
}
|
||||
this.connectionManager.initiateCleanupOnce(record, 'timeout_incoming');
|
||||
});
|
||||
|
||||
targetSocket.on('timeout', () => {
|
||||
// For keep-alive connections, just log a warning instead of closing
|
||||
if (record.hasKeepAlive) {
|
||||
logger.log('warn', `Timeout event on outgoing keep-alive connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}. Connection preserved.`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
status: 'Connection preserved',
|
||||
component: 'route-handler'
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// For non-keep-alive connections, proceed with normal cleanup
|
||||
logger.log('warn', `Timeout on outgoing side for connection ${connectionId} from ${record.remoteIP} after ${plugins.prettyMs(this.settings.socketTimeout || 3600000)}`, {
|
||||
connectionId,
|
||||
remoteIP: record.remoteIP,
|
||||
timeout: plugins.prettyMs(this.settings.socketTimeout || 3600000),
|
||||
component: 'route-handler'
|
||||
});
|
||||
if (record.outgoingTerminationReason === null) {
|
||||
record.outgoingTerminationReason = 'timeout';
|
||||
this.connectionManager.incrementTerminationStat('outgoing', 'timeout');
|
||||
}
|
||||
this.connectionManager.initiateCleanupOnce(record, 'timeout_outgoing');
|
||||
});
|
||||
|
||||
// Apply socket timeouts
|
||||
this.timeoutManager.applySocketTimeouts(record);
|
||||
|
||||
// Track outgoing data for bytes counting (moved from the duplicate connect handler)
|
||||
targetSocket.on('data', (chunk: Buffer) => {
|
||||
record.bytesSent += chunk.length;
|
||||
this.timeoutManager.updateActivity(record);
|
||||
});
|
||||
} // End of if (!connectionFailed)
|
||||
}
|
||||
}
|
Loading…
x
Reference in New Issue
Block a user