Implement connection timeout handling and improve connection cleanup in SmartProxy

This commit is contained in:
Juergen Kunz 2025-06-06 23:34:50 +00:00
parent 311691c2cc
commit 9aebcd488d
5 changed files with 604 additions and 6 deletions

375
readme.connections.md Normal file
View File

@ -0,0 +1,375 @@
# Connection Management in SmartProxy
This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations.
## Connection Accumulation Investigation (January 2025)
### Problem Statement
Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes.
### Historical Context
- **v19.5.12-v19.5.15**: Major connection cleanup improvements
- **v19.5.19+**: PROXY protocol support with WrappedSocket implementation
- **v19.5.20**: Fixed race condition in immediate routing cleanup
### Current Architecture
#### Connection Flow in Proxy Chains
```
Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443)
```
1. **Outer Proxy**:
- Accepts client connection
- Sends PROXY protocol header to inner proxy
- Tracks connection in ConnectionManager
- Immediate routing for non-TLS ports
2. **Inner Proxy**:
- Parses PROXY protocol to get real client IP
- Establishes connection to backend
- Tracks its own connections separately
### Potential Causes of Connection Accumulation
#### 1. Race Condition in Immediate Routing
When a connection is immediately routed (non-TLS ports), there's a timing window:
```typescript
// route-connection-handler.ts, line ~231
this.routeConnection(socket, record, '', undefined);
// Connection is routed before all setup is complete
```
**Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly.
#### 2. Outgoing Socket Assignment Timing
Despite the fix in v19.5.20:
```typescript
// Line 1362 in setupDirectConnection
record.outgoing = targetSocket;
```
There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket.
#### 3. Batch Cleanup Delays
ConnectionManager uses queued cleanup:
- Batch size: 100 connections
- Batch interval: 100ms
- Under rapid connection/disconnection, queue might lag
#### 4. Different Cleanup Paths
Multiple cleanup triggers exist:
- Socket 'close' event
- Socket 'error' event
- Inactivity timeout
- Connection timeout
- Manual cleanup
Not all paths may properly handle proxy chain scenarios.
#### 5. Keep-Alive Connection Handling
Keep-alive connections have special treatment:
- Extended inactivity timeout (6x normal)
- Warning before closure
- May accumulate if backend is unresponsive
### Observed Symptoms
1. **Outer proxy connection count grows over time**
2. **Inner proxy maintains zero or low connection count**
3. **Connections show as closed in logs but remain in tracking**
4. **Memory usage gradually increases**
### Debug Strategies
#### 1. Enhanced Logging
Add connection state logging at key points:
```typescript
// When outgoing socket is created
logger.log('debug', `Outgoing socket created for ${connectionId}`, {
hasOutgoing: !!record.outgoing,
outgoingState: record.outgoing?.readyState
});
```
#### 2. Connection State Inspection
Periodically log detailed connection state:
```typescript
for (const [id, record] of connectionManager.getConnections()) {
console.log({
id,
age: Date.now() - record.incomingStartTime,
incomingDestroyed: record.incoming.destroyed,
outgoingDestroyed: record.outgoing?.destroyed,
hasCleanupTimer: !!record.cleanupTimer
});
}
```
#### 3. Cleanup Verification
Track cleanup completion:
```typescript
// In cleanupConnection
logger.log('debug', `Cleanup completed for ${record.id}`, {
recordsRemaining: this.connectionRecords.size
});
```
### Recommendations
1. **Immediate Cleanup for Proxy Chains**
- Skip batch queue for proxy chain connections
- Use synchronous cleanup when PROXY protocol is detected
2. **Socket State Validation**
- Check both `destroyed` and `readyState` before cleanup decisions
- Handle 'opening' state sockets explicitly
3. **Timeout Adjustments**
- Shorter timeouts for proxy chain connections
- More aggressive cleanup for connections without data transfer
4. **Connection Limits**
- Per-route connection limits
- Backpressure when approaching limits
5. **Monitoring**
- Export connection metrics
- Alert on connection count thresholds
- Track connection age distribution
### Test Scenarios to Reproduce
1. **Rapid Connect/Disconnect**
```bash
# Create many short-lived connections
for i in {1..1000}; do
(echo -n | nc localhost 8001) &
done
```
2. **Slow Backend**
- Configure inner proxy to connect to unresponsive backend
- Monitor outer proxy connection count
3. **Mixed Traffic**
- Combine TLS and non-TLS connections
- Add keep-alive connections
- Observe accumulation patterns
### Future Improvements
1. **Connection Pool Isolation**
- Separate pools for proxy chain vs direct connections
- Different cleanup strategies per pool
2. **Circuit Breaker**
- Detect accumulation and trigger aggressive cleanup
- Temporary refuse new connections when near limit
3. **Connection State Machine**
- Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED
- State transition validation
- Timeout per state
4. **Metrics Collection**
- Connection lifecycle events
- Cleanup success/failure rates
- Time spent in each state
### Root Cause Identified (January 2025)
**The primary issue is on the inner proxy when backends are unreachable:**
When the backend is unreachable (e.g., non-routable IP like 10.255.255.1):
1. The outgoing socket gets stuck in "opening" state indefinitely
2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout
3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase
4. Connections accumulate because they never transition to error state
5. Socket timeout warnings fire but connections are preserved as keep-alive
**Code Issue:**
```typescript
// socket-utils.ts line 275
if (timeout) {
socket.setTimeout(timeout); // This only handles inactivity, not connection!
}
```
**Required Fix:**
1. Add `connectionTimeout` to ISmartProxyOptions interface:
```typescript
// In interfaces.ts
connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s)
```
2. Update `createSocketWithErrorHandler` in socket-utils.ts:
```typescript
export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket {
const { port, host, onError, onConnect, timeout } = options;
const socket = new plugins.net.Socket();
let connected = false;
let connectionTimeout: NodeJS.Timeout | null = null;
socket.on('error', (error) => {
if (connectionTimeout) {
clearTimeout(connectionTimeout);
connectionTimeout = null;
}
if (onError) onError(error);
});
socket.on('connect', () => {
connected = true;
if (connectionTimeout) {
clearTimeout(connectionTimeout);
connectionTimeout = null;
}
if (timeout) socket.setTimeout(timeout); // Set inactivity timeout
if (onConnect) onConnect();
});
// Implement connection establishment timeout
if (timeout) {
connectionTimeout = setTimeout(() => {
if (!connected && !socket.destroyed) {
const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
(error as any).code = 'ETIMEDOUT';
socket.destroy();
if (onError) onError(error);
}
}, timeout);
}
socket.connect(port, host);
return socket;
}
```
3. Pass connectionTimeout in route-connection-handler.ts:
```typescript
const targetSocket = createSocketWithErrorHandler({
port: finalTargetPort,
host: finalTargetHost,
timeout: this.settings.connectionTimeout || 30000, // Connection timeout
onError: (error) => { /* existing */ },
onConnect: async () => { /* existing */ }
});
```
### Investigation Results (January 2025)
Based on extensive testing with debug scripts:
1. **Normal Operation**: In controlled tests, connections are properly cleaned up:
- Immediate routing cleanup handler properly destroys outgoing connections
- Both outer and inner proxies maintain 0 connections after clients disconnect
- Keep-alive connections are tracked and cleaned up correctly
2. **Potential Edge Cases Not Covered by Tests**:
- **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1
- **WebSocket Connections**: Long-lived upgrade connections might persist
- **Partial TLS Handshakes**: Connections that start TLS but don't complete
- **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources
- **Connection Pool Reuse**: HttpProxy component may maintain its own pools
3. **Timing-Sensitive Scenarios**:
- Client disconnects exactly when `record.outgoing` is being assigned
- Backend connects but immediately RSTs
- Proxy chain where middle proxy restarts
- Multiple rapid reconnects with same source IP/port
4. **Configuration-Specific Issues**:
- Mixed `sendProxyProtocol` settings in chain
- Different `keepAlive` settings between proxies
- Mismatched timeout values
- Routes with `forwardingEngine: 'nftables'`
### Additional Debug Points
Add these debug logs to identify the specific scenario:
```typescript
// In route-connection-handler.ts setupDirectConnection
logger.log('debug', `Setting outgoing socket for ${connectionId}`, {
timestamp: Date.now(),
hasOutgoing: !!record.outgoing,
socketState: targetSocket.readyState
});
// In connection-manager.ts cleanupConnection
logger.log('debug', `Cleanup attempt for ${record.id}`, {
alreadyClosed: record.connectionClosed,
hasIncoming: !!record.incoming,
hasOutgoing: !!record.outgoing,
incomingDestroyed: record.incoming?.destroyed,
outgoingDestroyed: record.outgoing?.destroyed
});
```
### Workarounds
Until root cause is identified:
1. **Periodic Force Cleanup**:
```typescript
setInterval(() => {
const connections = connectionManager.getConnections();
for (const [id, record] of connections) {
if (record.incoming?.destroyed && !record.connectionClosed) {
connectionManager.cleanupConnection(record, 'force_cleanup');
}
}
}, 60000); // Every minute
```
2. **Connection Age Limit**:
```typescript
// Add max connection age check
const maxAge = 3600000; // 1 hour
if (Date.now() - record.incomingStartTime > maxAge) {
connectionManager.cleanupConnection(record, 'max_age');
}
```
3. **Aggressive Timeout Settings**:
```typescript
{
socketTimeout: 60000, // 1 minute
inactivityTimeout: 300000, // 5 minutes
connectionCleanupInterval: 30000 // 30 seconds
}
```
### Related Files
- `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling
- `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup
- `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities
- `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup
- `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention
- `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states
- `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests
- `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains
### Summary
**Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable.
**Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase.
**Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate.
**Fix Required**:
1. Add `connectionTimeout` setting to ISmartProxyOptions
2. Implement proper connection timeout in `createSocketWithErrorHandler`
3. Pass the timeout value from route-connection-handler
**Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above.
The connection cleanup mechanisms have been significantly improved in v19.5.20:
1. Race condition fixed by setting `record.outgoing` before connecting
2. Immediate routing cleanup handler always destroys outgoing connections
3. Tests confirm no accumulation in standard scenarios with reachable backends
However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.

View File

@ -0,0 +1,182 @@
import { expect, tap } from '@git.zone/tstest/tapbundle';
import * as plugins from '../ts/plugins.js';
import { SmartProxy } from '../ts/index.js';
let outerProxy: SmartProxy;
let innerProxy: SmartProxy;
tap.test('setup two smartproxies in a chain configuration', async () => {
// Setup inner proxy (backend proxy)
innerProxy = new SmartProxy({
routes: [
{
match: {
ports: 8002
},
action: {
type: 'forward',
target: {
host: 'httpbin.org',
port: 443
}
}
}
],
defaults: {
target: {
host: 'httpbin.org',
port: 443
}
},
acceptProxyProtocol: true,
sendProxyProtocol: false,
enableDetailedLogging: true,
connectionCleanupInterval: 5000, // More frequent cleanup for testing
inactivityTimeout: 10000 // Shorter timeout for testing
});
await innerProxy.start();
// Setup outer proxy (frontend proxy)
outerProxy = new SmartProxy({
routes: [
{
match: {
ports: 8001
},
action: {
type: 'forward',
target: {
host: 'localhost',
port: 8002
},
sendProxyProtocol: true
}
}
],
defaults: {
target: {
host: 'localhost',
port: 8002
}
},
sendProxyProtocol: true,
enableDetailedLogging: true,
connectionCleanupInterval: 5000, // More frequent cleanup for testing
inactivityTimeout: 10000 // Shorter timeout for testing
});
await outerProxy.start();
});
tap.test('should properly cleanup connections in proxy chain', async (tools) => {
const testDuration = 30000; // 30 seconds
const connectionInterval = 500; // Create new connection every 500ms
const connectionDuration = 2000; // Each connection lasts 2 seconds
let connectionsCreated = 0;
let connectionsCompleted = 0;
// Function to create a test connection
const createTestConnection = async () => {
connectionsCreated++;
const connectionId = connectionsCreated;
try {
const socket = plugins.net.connect({
port: 8001,
host: 'localhost'
});
await new Promise<void>((resolve, reject) => {
socket.on('connect', () => {
console.log(`Connection ${connectionId} established`);
// Send TLS Client Hello for httpbin.org
const clientHello = Buffer.from([
0x16, 0x03, 0x01, 0x00, 0xc8, // TLS handshake header
0x01, 0x00, 0x00, 0xc4, // Client Hello
0x03, 0x03, // TLS 1.2
...Array(32).fill(0), // Random bytes
0x00, // Session ID length
0x00, 0x02, 0x13, 0x01, // Cipher suites
0x01, 0x00, // Compression methods
0x00, 0x97, // Extensions length
0x00, 0x00, 0x00, 0x0f, 0x00, 0x0d, // SNI extension
0x00, 0x00, 0x0a, 0x68, 0x74, 0x74, 0x70, 0x62, 0x69, 0x6e, 0x2e, 0x6f, 0x72, 0x67 // "httpbin.org"
]);
socket.write(clientHello);
// Keep connection alive for specified duration
setTimeout(() => {
socket.destroy();
connectionsCompleted++;
console.log(`Connection ${connectionId} closed (completed: ${connectionsCompleted}/${connectionsCreated})`);
resolve();
}, connectionDuration);
});
socket.on('error', (err) => {
console.log(`Connection ${connectionId} error: ${err.message}`);
connectionsCompleted++;
reject(err);
});
});
} catch (err) {
console.log(`Failed to create connection ${connectionId}: ${err.message}`);
connectionsCompleted++;
}
};
// Start creating connections
const startTime = Date.now();
const connectionTimer = setInterval(() => {
if (Date.now() - startTime < testDuration) {
createTestConnection().catch(() => {});
} else {
clearInterval(connectionTimer);
}
}, connectionInterval);
// Monitor connection counts
const monitorInterval = setInterval(() => {
const outerConnections = (outerProxy as any).connectionManager.getConnectionCount();
const innerConnections = (innerProxy as any).connectionManager.getConnectionCount();
console.log(`Active connections - Outer: ${outerConnections}, Inner: ${innerConnections}, Created: ${connectionsCreated}, Completed: ${connectionsCompleted}`);
}, 2000);
// Wait for test duration + cleanup time
await tools.delayFor(testDuration + 10000);
clearInterval(connectionTimer);
clearInterval(monitorInterval);
// Wait for all connections to complete
while (connectionsCompleted < connectionsCreated) {
await tools.delayFor(100);
}
// Give some time for cleanup
await tools.delayFor(5000);
// Check final connection counts
const finalOuterConnections = (outerProxy as any).connectionManager.getConnectionCount();
const finalInnerConnections = (innerProxy as any).connectionManager.getConnectionCount();
console.log(`\nFinal connection counts:`);
console.log(`Outer proxy: ${finalOuterConnections}`);
console.log(`Inner proxy: ${finalInnerConnections}`);
console.log(`Total created: ${connectionsCreated}`);
console.log(`Total completed: ${connectionsCompleted}`);
// Both proxies should have cleaned up all connections
expect(finalOuterConnections).toEqual(0);
expect(finalInnerConnections).toEqual(0);
});
tap.test('cleanup proxies', async () => {
await outerProxy.stop();
await innerProxy.stop();
});
export default tap.start();

View File

@ -258,22 +258,61 @@ export function createSocketWithErrorHandler(options: SafeSocketOptions): plugin
// Create socket with immediate error handler attachment // Create socket with immediate error handler attachment
const socket = new plugins.net.Socket(); const socket = new plugins.net.Socket();
// Track if connected
let connected = false;
let connectionTimeout: NodeJS.Timeout | null = null;
// Attach error handler BEFORE connecting to catch immediate errors // Attach error handler BEFORE connecting to catch immediate errors
socket.on('error', (error) => { socket.on('error', (error) => {
console.error(`Socket connection error to ${host}:${port}: ${error.message}`); console.error(`Socket connection error to ${host}:${port}: ${error.message}`);
// Clear the connection timeout if it exists
if (connectionTimeout) {
clearTimeout(connectionTimeout);
connectionTimeout = null;
}
if (onError) { if (onError) {
onError(error); onError(error);
} }
}); });
// Attach connect handler if provided // Attach connect handler
if (onConnect) { const handleConnect = () => {
socket.on('connect', onConnect); connected = true;
} // Clear the connection timeout
if (connectionTimeout) {
clearTimeout(connectionTimeout);
connectionTimeout = null;
}
// Set inactivity timeout if provided (after connection is established)
if (timeout) {
socket.setTimeout(timeout);
}
if (onConnect) {
onConnect();
}
};
// Set timeout if provided socket.on('connect', handleConnect);
// Implement connection establishment timeout
if (timeout) { if (timeout) {
socket.setTimeout(timeout); connectionTimeout = setTimeout(() => {
if (!connected && !socket.destroyed) {
// Connection timed out - destroy the socket
const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
(error as any).code = 'ETIMEDOUT';
console.error(`Socket connection timeout to ${host}:${port} after ${timeout}ms`);
// Destroy the socket
socket.destroy();
// Call error handler
if (onError) {
onError(error);
}
}
}, timeout);
} }
// Now attempt to connect - any immediate errors will be caught // Now attempt to connect - any immediate errors will be caught

View File

@ -69,6 +69,7 @@ export interface ISmartProxyOptions {
maxVersion?: string; maxVersion?: string;
// Timeout settings // Timeout settings
connectionTimeout?: number; // Timeout for establishing connection to backend (ms), default: 30000 (30s)
initialDataTimeout?: number; // Timeout for initial data/SNI (ms), default: 60000 (60s) initialDataTimeout?: number; // Timeout for initial data/SNI (ms), default: 60000 (60s)
socketTimeout?: number; // Socket inactivity timeout (ms), default: 3600000 (1h) socketTimeout?: number; // Socket inactivity timeout (ms), default: 3600000 (1h)
inactivityCheckInterval?: number; // How often to check for inactive connections (ms), default: 60000 (60s) inactivityCheckInterval?: number; // How often to check for inactive connections (ms), default: 60000 (60s)

View File

@ -1125,6 +1125,7 @@ export class RouteConnectionHandler {
const targetSocket = createSocketWithErrorHandler({ const targetSocket = createSocketWithErrorHandler({
port: finalTargetPort, port: finalTargetPort,
host: finalTargetHost, host: finalTargetHost,
timeout: this.settings.connectionTimeout || 30000, // Connection timeout (default: 30s)
onError: (error) => { onError: (error) => {
// Connection failed - clean up everything immediately // Connection failed - clean up everything immediately
// Check if connection record is still valid (client might have disconnected) // Check if connection record is still valid (client might have disconnected)