375 lines
12 KiB
Markdown
375 lines
12 KiB
Markdown
|
# Connection Management in SmartProxy
|
||
|
|
||
|
This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations.
|
||
|
|
||
|
## Connection Accumulation Investigation (January 2025)
|
||
|
|
||
|
### Problem Statement
|
||
|
Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes.
|
||
|
|
||
|
### Historical Context
|
||
|
- **v19.5.12-v19.5.15**: Major connection cleanup improvements
|
||
|
- **v19.5.19+**: PROXY protocol support with WrappedSocket implementation
|
||
|
- **v19.5.20**: Fixed race condition in immediate routing cleanup
|
||
|
|
||
|
### Current Architecture
|
||
|
|
||
|
#### Connection Flow in Proxy Chains
|
||
|
```
|
||
|
Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443)
|
||
|
```
|
||
|
|
||
|
1. **Outer Proxy**:
|
||
|
- Accepts client connection
|
||
|
- Sends PROXY protocol header to inner proxy
|
||
|
- Tracks connection in ConnectionManager
|
||
|
- Immediate routing for non-TLS ports
|
||
|
|
||
|
2. **Inner Proxy**:
|
||
|
- Parses PROXY protocol to get real client IP
|
||
|
- Establishes connection to backend
|
||
|
- Tracks its own connections separately
|
||
|
|
||
|
### Potential Causes of Connection Accumulation
|
||
|
|
||
|
#### 1. Race Condition in Immediate Routing
|
||
|
When a connection is immediately routed (non-TLS ports), there's a timing window:
|
||
|
```typescript
|
||
|
// route-connection-handler.ts, line ~231
|
||
|
this.routeConnection(socket, record, '', undefined);
|
||
|
// Connection is routed before all setup is complete
|
||
|
```
|
||
|
|
||
|
**Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly.
|
||
|
|
||
|
#### 2. Outgoing Socket Assignment Timing
|
||
|
Despite the fix in v19.5.20:
|
||
|
```typescript
|
||
|
// Line 1362 in setupDirectConnection
|
||
|
record.outgoing = targetSocket;
|
||
|
```
|
||
|
There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket.
|
||
|
|
||
|
#### 3. Batch Cleanup Delays
|
||
|
ConnectionManager uses queued cleanup:
|
||
|
- Batch size: 100 connections
|
||
|
- Batch interval: 100ms
|
||
|
- Under rapid connection/disconnection, queue might lag
|
||
|
|
||
|
#### 4. Different Cleanup Paths
|
||
|
Multiple cleanup triggers exist:
|
||
|
- Socket 'close' event
|
||
|
- Socket 'error' event
|
||
|
- Inactivity timeout
|
||
|
- Connection timeout
|
||
|
- Manual cleanup
|
||
|
|
||
|
Not all paths may properly handle proxy chain scenarios.
|
||
|
|
||
|
#### 5. Keep-Alive Connection Handling
|
||
|
Keep-alive connections have special treatment:
|
||
|
- Extended inactivity timeout (6x normal)
|
||
|
- Warning before closure
|
||
|
- May accumulate if backend is unresponsive
|
||
|
|
||
|
### Observed Symptoms
|
||
|
|
||
|
1. **Outer proxy connection count grows over time**
|
||
|
2. **Inner proxy maintains zero or low connection count**
|
||
|
3. **Connections show as closed in logs but remain in tracking**
|
||
|
4. **Memory usage gradually increases**
|
||
|
|
||
|
### Debug Strategies
|
||
|
|
||
|
#### 1. Enhanced Logging
|
||
|
Add connection state logging at key points:
|
||
|
```typescript
|
||
|
// When outgoing socket is created
|
||
|
logger.log('debug', `Outgoing socket created for ${connectionId}`, {
|
||
|
hasOutgoing: !!record.outgoing,
|
||
|
outgoingState: record.outgoing?.readyState
|
||
|
});
|
||
|
```
|
||
|
|
||
|
#### 2. Connection State Inspection
|
||
|
Periodically log detailed connection state:
|
||
|
```typescript
|
||
|
for (const [id, record] of connectionManager.getConnections()) {
|
||
|
console.log({
|
||
|
id,
|
||
|
age: Date.now() - record.incomingStartTime,
|
||
|
incomingDestroyed: record.incoming.destroyed,
|
||
|
outgoingDestroyed: record.outgoing?.destroyed,
|
||
|
hasCleanupTimer: !!record.cleanupTimer
|
||
|
});
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### 3. Cleanup Verification
|
||
|
Track cleanup completion:
|
||
|
```typescript
|
||
|
// In cleanupConnection
|
||
|
logger.log('debug', `Cleanup completed for ${record.id}`, {
|
||
|
recordsRemaining: this.connectionRecords.size
|
||
|
});
|
||
|
```
|
||
|
|
||
|
### Recommendations
|
||
|
|
||
|
1. **Immediate Cleanup for Proxy Chains**
|
||
|
- Skip batch queue for proxy chain connections
|
||
|
- Use synchronous cleanup when PROXY protocol is detected
|
||
|
|
||
|
2. **Socket State Validation**
|
||
|
- Check both `destroyed` and `readyState` before cleanup decisions
|
||
|
- Handle 'opening' state sockets explicitly
|
||
|
|
||
|
3. **Timeout Adjustments**
|
||
|
- Shorter timeouts for proxy chain connections
|
||
|
- More aggressive cleanup for connections without data transfer
|
||
|
|
||
|
4. **Connection Limits**
|
||
|
- Per-route connection limits
|
||
|
- Backpressure when approaching limits
|
||
|
|
||
|
5. **Monitoring**
|
||
|
- Export connection metrics
|
||
|
- Alert on connection count thresholds
|
||
|
- Track connection age distribution
|
||
|
|
||
|
### Test Scenarios to Reproduce
|
||
|
|
||
|
1. **Rapid Connect/Disconnect**
|
||
|
```bash
|
||
|
# Create many short-lived connections
|
||
|
for i in {1..1000}; do
|
||
|
(echo -n | nc localhost 8001) &
|
||
|
done
|
||
|
```
|
||
|
|
||
|
2. **Slow Backend**
|
||
|
- Configure inner proxy to connect to unresponsive backend
|
||
|
- Monitor outer proxy connection count
|
||
|
|
||
|
3. **Mixed Traffic**
|
||
|
- Combine TLS and non-TLS connections
|
||
|
- Add keep-alive connections
|
||
|
- Observe accumulation patterns
|
||
|
|
||
|
### Future Improvements
|
||
|
|
||
|
1. **Connection Pool Isolation**
|
||
|
- Separate pools for proxy chain vs direct connections
|
||
|
- Different cleanup strategies per pool
|
||
|
|
||
|
2. **Circuit Breaker**
|
||
|
- Detect accumulation and trigger aggressive cleanup
|
||
|
- Temporary refuse new connections when near limit
|
||
|
|
||
|
3. **Connection State Machine**
|
||
|
- Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED
|
||
|
- State transition validation
|
||
|
- Timeout per state
|
||
|
|
||
|
4. **Metrics Collection**
|
||
|
- Connection lifecycle events
|
||
|
- Cleanup success/failure rates
|
||
|
- Time spent in each state
|
||
|
|
||
|
### Root Cause Identified (January 2025)
|
||
|
|
||
|
**The primary issue is on the inner proxy when backends are unreachable:**
|
||
|
|
||
|
When the backend is unreachable (e.g., non-routable IP like 10.255.255.1):
|
||
|
1. The outgoing socket gets stuck in "opening" state indefinitely
|
||
|
2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout
|
||
|
3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase
|
||
|
4. Connections accumulate because they never transition to error state
|
||
|
5. Socket timeout warnings fire but connections are preserved as keep-alive
|
||
|
|
||
|
**Code Issue:**
|
||
|
```typescript
|
||
|
// socket-utils.ts line 275
|
||
|
if (timeout) {
|
||
|
socket.setTimeout(timeout); // This only handles inactivity, not connection!
|
||
|
}
|
||
|
```
|
||
|
|
||
|
**Required Fix:**
|
||
|
|
||
|
1. Add `connectionTimeout` to ISmartProxyOptions interface:
|
||
|
```typescript
|
||
|
// In interfaces.ts
|
||
|
connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s)
|
||
|
```
|
||
|
|
||
|
2. Update `createSocketWithErrorHandler` in socket-utils.ts:
|
||
|
```typescript
|
||
|
export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket {
|
||
|
const { port, host, onError, onConnect, timeout } = options;
|
||
|
|
||
|
const socket = new plugins.net.Socket();
|
||
|
let connected = false;
|
||
|
let connectionTimeout: NodeJS.Timeout | null = null;
|
||
|
|
||
|
socket.on('error', (error) => {
|
||
|
if (connectionTimeout) {
|
||
|
clearTimeout(connectionTimeout);
|
||
|
connectionTimeout = null;
|
||
|
}
|
||
|
if (onError) onError(error);
|
||
|
});
|
||
|
|
||
|
socket.on('connect', () => {
|
||
|
connected = true;
|
||
|
if (connectionTimeout) {
|
||
|
clearTimeout(connectionTimeout);
|
||
|
connectionTimeout = null;
|
||
|
}
|
||
|
if (timeout) socket.setTimeout(timeout); // Set inactivity timeout
|
||
|
if (onConnect) onConnect();
|
||
|
});
|
||
|
|
||
|
// Implement connection establishment timeout
|
||
|
if (timeout) {
|
||
|
connectionTimeout = setTimeout(() => {
|
||
|
if (!connected && !socket.destroyed) {
|
||
|
const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
|
||
|
(error as any).code = 'ETIMEDOUT';
|
||
|
socket.destroy();
|
||
|
if (onError) onError(error);
|
||
|
}
|
||
|
}, timeout);
|
||
|
}
|
||
|
|
||
|
socket.connect(port, host);
|
||
|
return socket;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
3. Pass connectionTimeout in route-connection-handler.ts:
|
||
|
```typescript
|
||
|
const targetSocket = createSocketWithErrorHandler({
|
||
|
port: finalTargetPort,
|
||
|
host: finalTargetHost,
|
||
|
timeout: this.settings.connectionTimeout || 30000, // Connection timeout
|
||
|
onError: (error) => { /* existing */ },
|
||
|
onConnect: async () => { /* existing */ }
|
||
|
});
|
||
|
```
|
||
|
|
||
|
### Investigation Results (January 2025)
|
||
|
|
||
|
Based on extensive testing with debug scripts:
|
||
|
|
||
|
1. **Normal Operation**: In controlled tests, connections are properly cleaned up:
|
||
|
- Immediate routing cleanup handler properly destroys outgoing connections
|
||
|
- Both outer and inner proxies maintain 0 connections after clients disconnect
|
||
|
- Keep-alive connections are tracked and cleaned up correctly
|
||
|
|
||
|
2. **Potential Edge Cases Not Covered by Tests**:
|
||
|
- **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1
|
||
|
- **WebSocket Connections**: Long-lived upgrade connections might persist
|
||
|
- **Partial TLS Handshakes**: Connections that start TLS but don't complete
|
||
|
- **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources
|
||
|
- **Connection Pool Reuse**: HttpProxy component may maintain its own pools
|
||
|
|
||
|
3. **Timing-Sensitive Scenarios**:
|
||
|
- Client disconnects exactly when `record.outgoing` is being assigned
|
||
|
- Backend connects but immediately RSTs
|
||
|
- Proxy chain where middle proxy restarts
|
||
|
- Multiple rapid reconnects with same source IP/port
|
||
|
|
||
|
4. **Configuration-Specific Issues**:
|
||
|
- Mixed `sendProxyProtocol` settings in chain
|
||
|
- Different `keepAlive` settings between proxies
|
||
|
- Mismatched timeout values
|
||
|
- Routes with `forwardingEngine: 'nftables'`
|
||
|
|
||
|
### Additional Debug Points
|
||
|
|
||
|
Add these debug logs to identify the specific scenario:
|
||
|
|
||
|
```typescript
|
||
|
// In route-connection-handler.ts setupDirectConnection
|
||
|
logger.log('debug', `Setting outgoing socket for ${connectionId}`, {
|
||
|
timestamp: Date.now(),
|
||
|
hasOutgoing: !!record.outgoing,
|
||
|
socketState: targetSocket.readyState
|
||
|
});
|
||
|
|
||
|
// In connection-manager.ts cleanupConnection
|
||
|
logger.log('debug', `Cleanup attempt for ${record.id}`, {
|
||
|
alreadyClosed: record.connectionClosed,
|
||
|
hasIncoming: !!record.incoming,
|
||
|
hasOutgoing: !!record.outgoing,
|
||
|
incomingDestroyed: record.incoming?.destroyed,
|
||
|
outgoingDestroyed: record.outgoing?.destroyed
|
||
|
});
|
||
|
```
|
||
|
|
||
|
### Workarounds
|
||
|
|
||
|
Until root cause is identified:
|
||
|
|
||
|
1. **Periodic Force Cleanup**:
|
||
|
```typescript
|
||
|
setInterval(() => {
|
||
|
const connections = connectionManager.getConnections();
|
||
|
for (const [id, record] of connections) {
|
||
|
if (record.incoming?.destroyed && !record.connectionClosed) {
|
||
|
connectionManager.cleanupConnection(record, 'force_cleanup');
|
||
|
}
|
||
|
}
|
||
|
}, 60000); // Every minute
|
||
|
```
|
||
|
|
||
|
2. **Connection Age Limit**:
|
||
|
```typescript
|
||
|
// Add max connection age check
|
||
|
const maxAge = 3600000; // 1 hour
|
||
|
if (Date.now() - record.incomingStartTime > maxAge) {
|
||
|
connectionManager.cleanupConnection(record, 'max_age');
|
||
|
}
|
||
|
```
|
||
|
|
||
|
3. **Aggressive Timeout Settings**:
|
||
|
```typescript
|
||
|
{
|
||
|
socketTimeout: 60000, // 1 minute
|
||
|
inactivityTimeout: 300000, // 5 minutes
|
||
|
connectionCleanupInterval: 30000 // 30 seconds
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Related Files
|
||
|
- `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling
|
||
|
- `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup
|
||
|
- `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities
|
||
|
- `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup
|
||
|
- `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention
|
||
|
- `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states
|
||
|
- `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests
|
||
|
- `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains
|
||
|
|
||
|
### Summary
|
||
|
|
||
|
**Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable.
|
||
|
|
||
|
**Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase.
|
||
|
|
||
|
**Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate.
|
||
|
|
||
|
**Fix Required**:
|
||
|
1. Add `connectionTimeout` setting to ISmartProxyOptions
|
||
|
2. Implement proper connection timeout in `createSocketWithErrorHandler`
|
||
|
3. Pass the timeout value from route-connection-handler
|
||
|
|
||
|
**Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above.
|
||
|
|
||
|
The connection cleanup mechanisms have been significantly improved in v19.5.20:
|
||
|
1. Race condition fixed by setting `record.outgoing` before connecting
|
||
|
2. Immediate routing cleanup handler always destroys outgoing connections
|
||
|
3. Tests confirm no accumulation in standard scenarios with reachable backends
|
||
|
|
||
|
However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
|