smartproxy/readme.connections.md

375 lines
12 KiB
Markdown

# Connection Management in SmartProxy
This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations.
## Connection Accumulation Investigation (January 2025)
### Problem Statement
Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes.
### Historical Context
- **v19.5.12-v19.5.15**: Major connection cleanup improvements
- **v19.5.19+**: PROXY protocol support with WrappedSocket implementation
- **v19.5.20**: Fixed race condition in immediate routing cleanup
### Current Architecture
#### Connection Flow in Proxy Chains
```
Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443)
```
1. **Outer Proxy**:
- Accepts client connection
- Sends PROXY protocol header to inner proxy
- Tracks connection in ConnectionManager
- Immediate routing for non-TLS ports
2. **Inner Proxy**:
- Parses PROXY protocol to get real client IP
- Establishes connection to backend
- Tracks its own connections separately
### Potential Causes of Connection Accumulation
#### 1. Race Condition in Immediate Routing
When a connection is immediately routed (non-TLS ports), there's a timing window:
```typescript
// route-connection-handler.ts, line ~231
this.routeConnection(socket, record, '', undefined);
// Connection is routed before all setup is complete
```
**Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly.
#### 2. Outgoing Socket Assignment Timing
Despite the fix in v19.5.20:
```typescript
// Line 1362 in setupDirectConnection
record.outgoing = targetSocket;
```
There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket.
#### 3. Batch Cleanup Delays
ConnectionManager uses queued cleanup:
- Batch size: 100 connections
- Batch interval: 100ms
- Under rapid connection/disconnection, queue might lag
#### 4. Different Cleanup Paths
Multiple cleanup triggers exist:
- Socket 'close' event
- Socket 'error' event
- Inactivity timeout
- Connection timeout
- Manual cleanup
Not all paths may properly handle proxy chain scenarios.
#### 5. Keep-Alive Connection Handling
Keep-alive connections have special treatment:
- Extended inactivity timeout (6x normal)
- Warning before closure
- May accumulate if backend is unresponsive
### Observed Symptoms
1. **Outer proxy connection count grows over time**
2. **Inner proxy maintains zero or low connection count**
3. **Connections show as closed in logs but remain in tracking**
4. **Memory usage gradually increases**
### Debug Strategies
#### 1. Enhanced Logging
Add connection state logging at key points:
```typescript
// When outgoing socket is created
logger.log('debug', `Outgoing socket created for ${connectionId}`, {
hasOutgoing: !!record.outgoing,
outgoingState: record.outgoing?.readyState
});
```
#### 2. Connection State Inspection
Periodically log detailed connection state:
```typescript
for (const [id, record] of connectionManager.getConnections()) {
console.log({
id,
age: Date.now() - record.incomingStartTime,
incomingDestroyed: record.incoming.destroyed,
outgoingDestroyed: record.outgoing?.destroyed,
hasCleanupTimer: !!record.cleanupTimer
});
}
```
#### 3. Cleanup Verification
Track cleanup completion:
```typescript
// In cleanupConnection
logger.log('debug', `Cleanup completed for ${record.id}`, {
recordsRemaining: this.connectionRecords.size
});
```
### Recommendations
1. **Immediate Cleanup for Proxy Chains**
- Skip batch queue for proxy chain connections
- Use synchronous cleanup when PROXY protocol is detected
2. **Socket State Validation**
- Check both `destroyed` and `readyState` before cleanup decisions
- Handle 'opening' state sockets explicitly
3. **Timeout Adjustments**
- Shorter timeouts for proxy chain connections
- More aggressive cleanup for connections without data transfer
4. **Connection Limits**
- Per-route connection limits
- Backpressure when approaching limits
5. **Monitoring**
- Export connection metrics
- Alert on connection count thresholds
- Track connection age distribution
### Test Scenarios to Reproduce
1. **Rapid Connect/Disconnect**
```bash
# Create many short-lived connections
for i in {1..1000}; do
(echo -n | nc localhost 8001) &
done
```
2. **Slow Backend**
- Configure inner proxy to connect to unresponsive backend
- Monitor outer proxy connection count
3. **Mixed Traffic**
- Combine TLS and non-TLS connections
- Add keep-alive connections
- Observe accumulation patterns
### Future Improvements
1. **Connection Pool Isolation**
- Separate pools for proxy chain vs direct connections
- Different cleanup strategies per pool
2. **Circuit Breaker**
- Detect accumulation and trigger aggressive cleanup
- Temporary refuse new connections when near limit
3. **Connection State Machine**
- Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED
- State transition validation
- Timeout per state
4. **Metrics Collection**
- Connection lifecycle events
- Cleanup success/failure rates
- Time spent in each state
### Root Cause Identified (January 2025)
**The primary issue is on the inner proxy when backends are unreachable:**
When the backend is unreachable (e.g., non-routable IP like 10.255.255.1):
1. The outgoing socket gets stuck in "opening" state indefinitely
2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout
3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase
4. Connections accumulate because they never transition to error state
5. Socket timeout warnings fire but connections are preserved as keep-alive
**Code Issue:**
```typescript
// socket-utils.ts line 275
if (timeout) {
socket.setTimeout(timeout); // This only handles inactivity, not connection!
}
```
**Required Fix:**
1. Add `connectionTimeout` to ISmartProxyOptions interface:
```typescript
// In interfaces.ts
connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s)
```
2. Update `createSocketWithErrorHandler` in socket-utils.ts:
```typescript
export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket {
const { port, host, onError, onConnect, timeout } = options;
const socket = new plugins.net.Socket();
let connected = false;
let connectionTimeout: NodeJS.Timeout | null = null;
socket.on('error', (error) => {
if (connectionTimeout) {
clearTimeout(connectionTimeout);
connectionTimeout = null;
}
if (onError) onError(error);
});
socket.on('connect', () => {
connected = true;
if (connectionTimeout) {
clearTimeout(connectionTimeout);
connectionTimeout = null;
}
if (timeout) socket.setTimeout(timeout); // Set inactivity timeout
if (onConnect) onConnect();
});
// Implement connection establishment timeout
if (timeout) {
connectionTimeout = setTimeout(() => {
if (!connected && !socket.destroyed) {
const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
(error as any).code = 'ETIMEDOUT';
socket.destroy();
if (onError) onError(error);
}
}, timeout);
}
socket.connect(port, host);
return socket;
}
```
3. Pass connectionTimeout in route-connection-handler.ts:
```typescript
const targetSocket = createSocketWithErrorHandler({
port: finalTargetPort,
host: finalTargetHost,
timeout: this.settings.connectionTimeout || 30000, // Connection timeout
onError: (error) => { /* existing */ },
onConnect: async () => { /* existing */ }
});
```
### Investigation Results (January 2025)
Based on extensive testing with debug scripts:
1. **Normal Operation**: In controlled tests, connections are properly cleaned up:
- Immediate routing cleanup handler properly destroys outgoing connections
- Both outer and inner proxies maintain 0 connections after clients disconnect
- Keep-alive connections are tracked and cleaned up correctly
2. **Potential Edge Cases Not Covered by Tests**:
- **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1
- **WebSocket Connections**: Long-lived upgrade connections might persist
- **Partial TLS Handshakes**: Connections that start TLS but don't complete
- **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources
- **Connection Pool Reuse**: HttpProxy component may maintain its own pools
3. **Timing-Sensitive Scenarios**:
- Client disconnects exactly when `record.outgoing` is being assigned
- Backend connects but immediately RSTs
- Proxy chain where middle proxy restarts
- Multiple rapid reconnects with same source IP/port
4. **Configuration-Specific Issues**:
- Mixed `sendProxyProtocol` settings in chain
- Different `keepAlive` settings between proxies
- Mismatched timeout values
- Routes with `forwardingEngine: 'nftables'`
### Additional Debug Points
Add these debug logs to identify the specific scenario:
```typescript
// In route-connection-handler.ts setupDirectConnection
logger.log('debug', `Setting outgoing socket for ${connectionId}`, {
timestamp: Date.now(),
hasOutgoing: !!record.outgoing,
socketState: targetSocket.readyState
});
// In connection-manager.ts cleanupConnection
logger.log('debug', `Cleanup attempt for ${record.id}`, {
alreadyClosed: record.connectionClosed,
hasIncoming: !!record.incoming,
hasOutgoing: !!record.outgoing,
incomingDestroyed: record.incoming?.destroyed,
outgoingDestroyed: record.outgoing?.destroyed
});
```
### Workarounds
Until root cause is identified:
1. **Periodic Force Cleanup**:
```typescript
setInterval(() => {
const connections = connectionManager.getConnections();
for (const [id, record] of connections) {
if (record.incoming?.destroyed && !record.connectionClosed) {
connectionManager.cleanupConnection(record, 'force_cleanup');
}
}
}, 60000); // Every minute
```
2. **Connection Age Limit**:
```typescript
// Add max connection age check
const maxAge = 3600000; // 1 hour
if (Date.now() - record.incomingStartTime > maxAge) {
connectionManager.cleanupConnection(record, 'max_age');
}
```
3. **Aggressive Timeout Settings**:
```typescript
{
socketTimeout: 60000, // 1 minute
inactivityTimeout: 300000, // 5 minutes
connectionCleanupInterval: 30000 // 30 seconds
}
```
### Related Files
- `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling
- `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup
- `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities
- `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup
- `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention
- `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states
- `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests
- `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains
### Summary
**Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable.
**Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase.
**Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate.
**Fix Required**:
1. Add `connectionTimeout` setting to ISmartProxyOptions
2. Implement proper connection timeout in `createSocketWithErrorHandler`
3. Pass the timeout value from route-connection-handler
**Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above.
The connection cleanup mechanisms have been significantly improved in v19.5.20:
1. Race condition fixed by setting `record.outgoing` before connecting
2. Immediate routing cleanup handler always destroys outgoing connections
3. Tests confirm no accumulation in standard scenarios with reachable backends
However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.