smartproxy/readme.plan.md
Philipp Kunz ad80798210 Enhance socket cleanup and management for improved connection handling
- Refactor cleanupSocket function to support options for immediate destruction, allowing drain, and grace periods.
- Introduce createIndependentSocketHandlers for better management of half-open connections between client and server sockets.
- Update various handlers (HTTP, HTTPS passthrough, HTTPS terminate) to utilize new cleanup and socket management functions.
- Implement custom timeout handling in socket setup to prevent immediate closure during keep-alive connections.
- Add tests for long-lived connections and half-open connection scenarios to ensure stability and reliability.
- Adjust connection manager to handle socket cleanup based on activity status, improving resource management.
2025-06-01 12:27:15 +00:00

337 lines
11 KiB
Markdown

# SmartProxy Socket Cleanup Fix Plan
## Problem Summary
The current socket cleanup implementation is too aggressive and closes long-lived connections prematurely. This affects:
- WebSocket connections in HTTPS passthrough
- Long-lived HTTP connections (SSE, streaming)
- Database connections
- Any connection that should remain open for hours
## Root Causes
### 1. **Bilateral Socket Cleanup**
When one socket closes, both sockets are immediately destroyed:
```typescript
// In createSocketCleanupHandler
cleanupSocket(clientSocket, 'client');
cleanupSocket(serverSocket, 'server'); // Both destroyed together!
```
### 2. **Aggressive Timeout Handling**
Timeout events immediately trigger connection cleanup:
```typescript
socket.on('timeout', () => {
handleClose(`${prefix}_timeout`); // Destroys both sockets!
});
```
### 3. **Parity Check Forces Closure**
If one socket closes but the other remains open for >2 minutes, connection is forcefully terminated:
```typescript
if (record.outgoingClosedTime &&
!record.incoming.destroyed &&
now - record.outgoingClosedTime > 120000) {
this.cleanupConnection(record, 'parity_check');
}
```
### 4. **No Half-Open Connection Support**
The proxy doesn't support TCP half-open connections where one side closes while the other continues sending.
## Fix Implementation Plan
### Phase 1: Fix Socket Cleanup (Prevent Premature Closure)
#### 1.1 Modify `cleanupSocket()` to support graceful shutdown
```typescript
export interface CleanupOptions {
immediate?: boolean; // Force immediate destruction
allowDrain?: boolean; // Allow write buffer to drain
gracePeriod?: number; // Ms to wait before force close
}
export function cleanupSocket(
socket: Socket | TLSSocket | null,
socketName?: string,
options: CleanupOptions = {}
): Promise<void> {
if (!socket || socket.destroyed) return Promise.resolve();
return new Promise<void>((resolve) => {
const cleanup = () => {
socket.removeAllListeners();
if (!socket.destroyed) {
socket.destroy();
}
resolve();
};
if (options.immediate) {
cleanup();
} else if (options.allowDrain && socket.writable) {
// Allow pending writes to complete
socket.end(() => cleanup());
// Force cleanup after grace period
if (options.gracePeriod) {
setTimeout(cleanup, options.gracePeriod);
}
} else {
cleanup();
}
});
}
```
#### 1.2 Implement Independent Socket Tracking
```typescript
export function createIndependentSocketHandlers(
clientSocket: Socket,
serverSocket: Socket,
onBothClosed: (reason: string) => void
): { cleanupClient: () => void, cleanupServer: () => void } {
let clientClosed = false;
let serverClosed = false;
let clientReason = '';
let serverReason = '';
const checkBothClosed = () => {
if (clientClosed && serverClosed) {
onBothClosed(`client: ${clientReason}, server: ${serverReason}`);
}
};
const cleanupClient = async (reason: string) => {
if (clientClosed) return;
clientClosed = true;
clientReason = reason;
// Allow server to continue if still active
if (!serverClosed && serverSocket.writable) {
// Half-close: stop reading from client, let server finish
clientSocket.pause();
clientSocket.unpipe(serverSocket);
await cleanupSocket(clientSocket, 'client', { allowDrain: true });
} else {
await cleanupSocket(clientSocket, 'client');
}
checkBothClosed();
};
const cleanupServer = async (reason: string) => {
if (serverClosed) return;
serverClosed = true;
serverReason = reason;
// Allow client to continue if still active
if (!clientClosed && clientSocket.writable) {
// Half-close: stop reading from server, let client finish
serverSocket.pause();
serverSocket.unpipe(clientSocket);
await cleanupSocket(serverSocket, 'server', { allowDrain: true });
} else {
await cleanupSocket(serverSocket, 'server');
}
checkBothClosed();
};
return { cleanupClient, cleanupServer };
}
```
### Phase 2: Fix Timeout Handling
#### 2.1 Separate timeout handling from connection closure
```typescript
export function setupSocketHandlers(
socket: Socket | TLSSocket,
handleClose: (reason: string) => void,
handleTimeout?: (socket: Socket) => void, // New optional handler
errorPrefix?: string
): void {
socket.on('error', (error) => {
const prefix = errorPrefix || 'Socket';
handleClose(`${prefix}_error: ${error.message}`);
});
socket.on('close', () => {
const prefix = errorPrefix || 'socket';
handleClose(`${prefix}_closed`);
});
socket.on('timeout', () => {
if (handleTimeout) {
handleTimeout(socket); // Custom timeout handling
} else {
// Default: just log, don't close
console.warn(`Socket timeout: ${errorPrefix || 'socket'}`);
}
});
}
```
#### 2.2 Update HTTPS passthrough handler
```typescript
// In https-passthrough-handler.ts
const { cleanupClient, cleanupServer } = createIndependentSocketHandlers(
clientSocket,
serverSocket,
(reason) => {
this.emit(ForwardingHandlerEvents.DISCONNECTED, {
remoteAddress,
bytesSent,
bytesReceived,
reason
});
}
);
// Setup handlers with custom timeout handling
setupSocketHandlers(clientSocket, cleanupClient, (socket) => {
// Just reset timeout, don't close
socket.setTimeout(timeout);
}, 'client');
setupSocketHandlers(serverSocket, cleanupServer, (socket) => {
// Just reset timeout, don't close
socket.setTimeout(timeout);
}, 'server');
```
### Phase 3: Fix Connection Manager
#### 3.1 Remove aggressive parity check
```typescript
// Remove or significantly increase the parity check timeout
// From 2 minutes to 30 minutes for long-lived connections
if (record.outgoingClosedTime &&
!record.incoming.destroyed &&
!record.connectionClosed &&
now - record.outgoingClosedTime > 1800000) { // 30 minutes
// Only close if no data activity
if (now - record.lastActivity > 600000) { // 10 minutes of inactivity
this.cleanupConnection(record, 'parity_check');
}
}
```
#### 3.2 Update cleanupConnection to check socket states
```typescript
public cleanupConnection(record: IConnectionRecord, reason: string = 'normal'): void {
if (!record.connectionClosed) {
record.connectionClosed = true;
// Only cleanup sockets that are actually closed or inactive
if (record.incoming && (!record.incoming.writable || record.incoming.destroyed)) {
cleanupSocket(record.incoming, `${record.id}-incoming`, { immediate: true });
}
if (record.outgoing && (!record.outgoing.writable || record.outgoing.destroyed)) {
cleanupSocket(record.outgoing, `${record.id}-outgoing`, { immediate: true });
}
// If either socket is still active, don't remove the record yet
if ((record.incoming && record.incoming.writable) ||
(record.outgoing && record.outgoing.writable)) {
record.connectionClosed = false; // Reset flag
return; // Don't finish cleanup
}
// Continue with full cleanup...
}
}
```
### Phase 4: Testing and Validation
#### 4.1 Test Cases to Implement
1. WebSocket connection should stay open for >1 hour
2. HTTP streaming response should continue after request closes
3. Half-open connections should work correctly
4. Verify no socket leaks with long-running connections
5. Test graceful shutdown with pending data
#### 4.2 Socket Leak Prevention
- Ensure all event listeners are tracked and removed
- Use WeakMap for socket metadata to prevent memory leaks
- Implement connection count monitoring
- Add periodic health checks for orphaned sockets
## Implementation Order
1. **Day 1**: Implement graceful `cleanupSocket()` and independent socket handlers
2. **Day 2**: Update all handlers to use new cleanup mechanism
3. **Day 3**: Fix timeout handling to not close connections
4. **Day 4**: Update connection manager parity check and cleanup logic
5. **Day 5**: Comprehensive testing and leak detection
## Configuration Changes
Add new options to SmartProxyOptions:
```typescript
interface ISmartProxyOptions {
// Existing options...
// New options for long-lived connections
socketCleanupGracePeriod?: number; // Default: 5000ms
allowHalfOpenConnections?: boolean; // Default: true
parityCheckTimeout?: number; // Default: 1800000ms (30 min)
timeoutBehavior?: 'close' | 'reset' | 'ignore'; // Default: 'reset'
}
```
## Success Metrics
1. WebSocket connections remain stable for 24+ hours
2. No premature connection closures reported
3. Memory usage remains stable (no socket leaks)
4. Half-open connections work correctly
5. Graceful shutdown completes within grace period
## Implementation Status: COMPLETED ✅
### Implemented Changes
1. **Modified `cleanupSocket()` in `socket-utils.ts`**
- Added `CleanupOptions` interface with `immediate`, `allowDrain`, and `gracePeriod` options
- Implemented graceful shutdown support with write buffer draining
2. **Created `createIndependentSocketHandlers()` in `socket-utils.ts`**
- Tracks socket states independently
- Supports half-open connections where one side can close while the other remains open
- Only triggers full cleanup when both sockets are closed
3. **Updated `setupSocketHandlers()` in `socket-utils.ts`**
- Added optional `handleTimeout` parameter to customize timeout behavior
- Prevents automatic connection closure on timeout events
4. **Updated HTTPS Passthrough Handler**
- Now uses `createIndependentSocketHandlers` for half-open support
- Custom timeout handling that resets timer instead of closing connection
- Manual data forwarding with backpressure handling
5. **Updated Connection Manager**
- Extended parity check from 2 minutes to 30 minutes
- Added activity check before closing (10 minutes of inactivity required)
- Modified cleanup to check socket states before destroying
6. **Updated Basic Forwarding in Route Connection Handler**
- Replaced simple `pipe()` with independent socket handlers
- Added manual data forwarding with backpressure support
- Removed bilateral close handlers to prevent premature cleanup
### Test Results
All tests passing:
- ✅ Long-lived connection test: Connection stayed open for 61+ seconds with periodic keep-alive
- ✅ Half-open connection test: One side closed while the other continued to send data
- ✅ No socket leaks or premature closures
### Notes
- The fix maintains backward compatibility
- No configuration changes required for existing deployments
- Long-lived connections now work correctly in both HTTPS passthrough and basic forwarding modes