Compare commits
6 Commits
Author | SHA1 | Date | |
---|---|---|---|
82a350bf51 | |||
890e907664 | |||
19590ef107 | |||
47735adbf2 | |||
9094b76b1b | |||
9aebcd488d |
@ -1,6 +1,6 @@
|
|||||||
{
|
{
|
||||||
"name": "@push.rocks/smartproxy",
|
"name": "@push.rocks/smartproxy",
|
||||||
"version": "19.5.22",
|
"version": "19.5.25",
|
||||||
"private": false,
|
"private": false,
|
||||||
"description": "A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.",
|
"description": "A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.",
|
||||||
"main": "dist_ts/index.js",
|
"main": "dist_ts/index.js",
|
||||||
|
676
readme.connections.md
Normal file
676
readme.connections.md
Normal file
@ -0,0 +1,676 @@
|
|||||||
|
# Connection Management in SmartProxy
|
||||||
|
|
||||||
|
This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations.
|
||||||
|
|
||||||
|
## Connection Accumulation Investigation (January 2025)
|
||||||
|
|
||||||
|
### Problem Statement
|
||||||
|
Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes.
|
||||||
|
|
||||||
|
### Historical Context
|
||||||
|
- **v19.5.12-v19.5.15**: Major connection cleanup improvements
|
||||||
|
- **v19.5.19+**: PROXY protocol support with WrappedSocket implementation
|
||||||
|
- **v19.5.20**: Fixed race condition in immediate routing cleanup
|
||||||
|
|
||||||
|
### Current Architecture
|
||||||
|
|
||||||
|
#### Connection Flow in Proxy Chains
|
||||||
|
```
|
||||||
|
Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443)
|
||||||
|
```
|
||||||
|
|
||||||
|
1. **Outer Proxy**:
|
||||||
|
- Accepts client connection
|
||||||
|
- Sends PROXY protocol header to inner proxy
|
||||||
|
- Tracks connection in ConnectionManager
|
||||||
|
- Immediate routing for non-TLS ports
|
||||||
|
|
||||||
|
2. **Inner Proxy**:
|
||||||
|
- Parses PROXY protocol to get real client IP
|
||||||
|
- Establishes connection to backend
|
||||||
|
- Tracks its own connections separately
|
||||||
|
|
||||||
|
### Potential Causes of Connection Accumulation
|
||||||
|
|
||||||
|
#### 1. Race Condition in Immediate Routing
|
||||||
|
When a connection is immediately routed (non-TLS ports), there's a timing window:
|
||||||
|
```typescript
|
||||||
|
// route-connection-handler.ts, line ~231
|
||||||
|
this.routeConnection(socket, record, '', undefined);
|
||||||
|
// Connection is routed before all setup is complete
|
||||||
|
```
|
||||||
|
|
||||||
|
**Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly.
|
||||||
|
|
||||||
|
#### 2. Outgoing Socket Assignment Timing
|
||||||
|
Despite the fix in v19.5.20:
|
||||||
|
```typescript
|
||||||
|
// Line 1362 in setupDirectConnection
|
||||||
|
record.outgoing = targetSocket;
|
||||||
|
```
|
||||||
|
There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket.
|
||||||
|
|
||||||
|
#### 3. Batch Cleanup Delays
|
||||||
|
ConnectionManager uses queued cleanup:
|
||||||
|
- Batch size: 100 connections
|
||||||
|
- Batch interval: 100ms
|
||||||
|
- Under rapid connection/disconnection, queue might lag
|
||||||
|
|
||||||
|
#### 4. Different Cleanup Paths
|
||||||
|
Multiple cleanup triggers exist:
|
||||||
|
- Socket 'close' event
|
||||||
|
- Socket 'error' event
|
||||||
|
- Inactivity timeout
|
||||||
|
- Connection timeout
|
||||||
|
- Manual cleanup
|
||||||
|
|
||||||
|
Not all paths may properly handle proxy chain scenarios.
|
||||||
|
|
||||||
|
#### 5. Keep-Alive Connection Handling
|
||||||
|
Keep-alive connections have special treatment:
|
||||||
|
- Extended inactivity timeout (6x normal)
|
||||||
|
- Warning before closure
|
||||||
|
- May accumulate if backend is unresponsive
|
||||||
|
|
||||||
|
### Observed Symptoms
|
||||||
|
|
||||||
|
1. **Outer proxy connection count grows over time**
|
||||||
|
2. **Inner proxy maintains zero or low connection count**
|
||||||
|
3. **Connections show as closed in logs but remain in tracking**
|
||||||
|
4. **Memory usage gradually increases**
|
||||||
|
|
||||||
|
### Debug Strategies
|
||||||
|
|
||||||
|
#### 1. Enhanced Logging
|
||||||
|
Add connection state logging at key points:
|
||||||
|
```typescript
|
||||||
|
// When outgoing socket is created
|
||||||
|
logger.log('debug', `Outgoing socket created for ${connectionId}`, {
|
||||||
|
hasOutgoing: !!record.outgoing,
|
||||||
|
outgoingState: record.outgoing?.readyState
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. Connection State Inspection
|
||||||
|
Periodically log detailed connection state:
|
||||||
|
```typescript
|
||||||
|
for (const [id, record] of connectionManager.getConnections()) {
|
||||||
|
console.log({
|
||||||
|
id,
|
||||||
|
age: Date.now() - record.incomingStartTime,
|
||||||
|
incomingDestroyed: record.incoming.destroyed,
|
||||||
|
outgoingDestroyed: record.outgoing?.destroyed,
|
||||||
|
hasCleanupTimer: !!record.cleanupTimer
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. Cleanup Verification
|
||||||
|
Track cleanup completion:
|
||||||
|
```typescript
|
||||||
|
// In cleanupConnection
|
||||||
|
logger.log('debug', `Cleanup completed for ${record.id}`, {
|
||||||
|
recordsRemaining: this.connectionRecords.size
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### Recommendations
|
||||||
|
|
||||||
|
1. **Immediate Cleanup for Proxy Chains**
|
||||||
|
- Skip batch queue for proxy chain connections
|
||||||
|
- Use synchronous cleanup when PROXY protocol is detected
|
||||||
|
|
||||||
|
2. **Socket State Validation**
|
||||||
|
- Check both `destroyed` and `readyState` before cleanup decisions
|
||||||
|
- Handle 'opening' state sockets explicitly
|
||||||
|
|
||||||
|
3. **Timeout Adjustments**
|
||||||
|
- Shorter timeouts for proxy chain connections
|
||||||
|
- More aggressive cleanup for connections without data transfer
|
||||||
|
|
||||||
|
4. **Connection Limits**
|
||||||
|
- Per-route connection limits
|
||||||
|
- Backpressure when approaching limits
|
||||||
|
|
||||||
|
5. **Monitoring**
|
||||||
|
- Export connection metrics
|
||||||
|
- Alert on connection count thresholds
|
||||||
|
- Track connection age distribution
|
||||||
|
|
||||||
|
### Test Scenarios to Reproduce
|
||||||
|
|
||||||
|
1. **Rapid Connect/Disconnect**
|
||||||
|
```bash
|
||||||
|
# Create many short-lived connections
|
||||||
|
for i in {1..1000}; do
|
||||||
|
(echo -n | nc localhost 8001) &
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Slow Backend**
|
||||||
|
- Configure inner proxy to connect to unresponsive backend
|
||||||
|
- Monitor outer proxy connection count
|
||||||
|
|
||||||
|
3. **Mixed Traffic**
|
||||||
|
- Combine TLS and non-TLS connections
|
||||||
|
- Add keep-alive connections
|
||||||
|
- Observe accumulation patterns
|
||||||
|
|
||||||
|
### Future Improvements
|
||||||
|
|
||||||
|
1. **Connection Pool Isolation**
|
||||||
|
- Separate pools for proxy chain vs direct connections
|
||||||
|
- Different cleanup strategies per pool
|
||||||
|
|
||||||
|
2. **Circuit Breaker**
|
||||||
|
- Detect accumulation and trigger aggressive cleanup
|
||||||
|
- Temporary refuse new connections when near limit
|
||||||
|
|
||||||
|
3. **Connection State Machine**
|
||||||
|
- Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED
|
||||||
|
- State transition validation
|
||||||
|
- Timeout per state
|
||||||
|
|
||||||
|
4. **Metrics Collection**
|
||||||
|
- Connection lifecycle events
|
||||||
|
- Cleanup success/failure rates
|
||||||
|
- Time spent in each state
|
||||||
|
|
||||||
|
### Root Cause Identified (January 2025)
|
||||||
|
|
||||||
|
**The primary issue is on the inner proxy when backends are unreachable:**
|
||||||
|
|
||||||
|
When the backend is unreachable (e.g., non-routable IP like 10.255.255.1):
|
||||||
|
1. The outgoing socket gets stuck in "opening" state indefinitely
|
||||||
|
2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout
|
||||||
|
3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase
|
||||||
|
4. Connections accumulate because they never transition to error state
|
||||||
|
5. Socket timeout warnings fire but connections are preserved as keep-alive
|
||||||
|
|
||||||
|
**Code Issue:**
|
||||||
|
```typescript
|
||||||
|
// socket-utils.ts line 275
|
||||||
|
if (timeout) {
|
||||||
|
socket.setTimeout(timeout); // This only handles inactivity, not connection!
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Required Fix:**
|
||||||
|
|
||||||
|
1. Add `connectionTimeout` to ISmartProxyOptions interface:
|
||||||
|
```typescript
|
||||||
|
// In interfaces.ts
|
||||||
|
connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Update `createSocketWithErrorHandler` in socket-utils.ts:
|
||||||
|
```typescript
|
||||||
|
export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket {
|
||||||
|
const { port, host, onError, onConnect, timeout } = options;
|
||||||
|
|
||||||
|
const socket = new plugins.net.Socket();
|
||||||
|
let connected = false;
|
||||||
|
let connectionTimeout: NodeJS.Timeout | null = null;
|
||||||
|
|
||||||
|
socket.on('error', (error) => {
|
||||||
|
if (connectionTimeout) {
|
||||||
|
clearTimeout(connectionTimeout);
|
||||||
|
connectionTimeout = null;
|
||||||
|
}
|
||||||
|
if (onError) onError(error);
|
||||||
|
});
|
||||||
|
|
||||||
|
socket.on('connect', () => {
|
||||||
|
connected = true;
|
||||||
|
if (connectionTimeout) {
|
||||||
|
clearTimeout(connectionTimeout);
|
||||||
|
connectionTimeout = null;
|
||||||
|
}
|
||||||
|
if (timeout) socket.setTimeout(timeout); // Set inactivity timeout
|
||||||
|
if (onConnect) onConnect();
|
||||||
|
});
|
||||||
|
|
||||||
|
// Implement connection establishment timeout
|
||||||
|
if (timeout) {
|
||||||
|
connectionTimeout = setTimeout(() => {
|
||||||
|
if (!connected && !socket.destroyed) {
|
||||||
|
const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
|
||||||
|
(error as any).code = 'ETIMEDOUT';
|
||||||
|
socket.destroy();
|
||||||
|
if (onError) onError(error);
|
||||||
|
}
|
||||||
|
}, timeout);
|
||||||
|
}
|
||||||
|
|
||||||
|
socket.connect(port, host);
|
||||||
|
return socket;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Pass connectionTimeout in route-connection-handler.ts:
|
||||||
|
```typescript
|
||||||
|
const targetSocket = createSocketWithErrorHandler({
|
||||||
|
port: finalTargetPort,
|
||||||
|
host: finalTargetHost,
|
||||||
|
timeout: this.settings.connectionTimeout || 30000, // Connection timeout
|
||||||
|
onError: (error) => { /* existing */ },
|
||||||
|
onConnect: async () => { /* existing */ }
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### Investigation Results (January 2025)
|
||||||
|
|
||||||
|
Based on extensive testing with debug scripts:
|
||||||
|
|
||||||
|
1. **Normal Operation**: In controlled tests, connections are properly cleaned up:
|
||||||
|
- Immediate routing cleanup handler properly destroys outgoing connections
|
||||||
|
- Both outer and inner proxies maintain 0 connections after clients disconnect
|
||||||
|
- Keep-alive connections are tracked and cleaned up correctly
|
||||||
|
|
||||||
|
2. **Potential Edge Cases Not Covered by Tests**:
|
||||||
|
- **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1
|
||||||
|
- **WebSocket Connections**: Long-lived upgrade connections might persist
|
||||||
|
- **Partial TLS Handshakes**: Connections that start TLS but don't complete
|
||||||
|
- **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources
|
||||||
|
- **Connection Pool Reuse**: HttpProxy component may maintain its own pools
|
||||||
|
|
||||||
|
3. **Timing-Sensitive Scenarios**:
|
||||||
|
- Client disconnects exactly when `record.outgoing` is being assigned
|
||||||
|
- Backend connects but immediately RSTs
|
||||||
|
- Proxy chain where middle proxy restarts
|
||||||
|
- Multiple rapid reconnects with same source IP/port
|
||||||
|
|
||||||
|
4. **Configuration-Specific Issues**:
|
||||||
|
- Mixed `sendProxyProtocol` settings in chain
|
||||||
|
- Different `keepAlive` settings between proxies
|
||||||
|
- Mismatched timeout values
|
||||||
|
- Routes with `forwardingEngine: 'nftables'`
|
||||||
|
|
||||||
|
### Additional Debug Points
|
||||||
|
|
||||||
|
Add these debug logs to identify the specific scenario:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In route-connection-handler.ts setupDirectConnection
|
||||||
|
logger.log('debug', `Setting outgoing socket for ${connectionId}`, {
|
||||||
|
timestamp: Date.now(),
|
||||||
|
hasOutgoing: !!record.outgoing,
|
||||||
|
socketState: targetSocket.readyState
|
||||||
|
});
|
||||||
|
|
||||||
|
// In connection-manager.ts cleanupConnection
|
||||||
|
logger.log('debug', `Cleanup attempt for ${record.id}`, {
|
||||||
|
alreadyClosed: record.connectionClosed,
|
||||||
|
hasIncoming: !!record.incoming,
|
||||||
|
hasOutgoing: !!record.outgoing,
|
||||||
|
incomingDestroyed: record.incoming?.destroyed,
|
||||||
|
outgoingDestroyed: record.outgoing?.destroyed
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### Workarounds
|
||||||
|
|
||||||
|
Until root cause is identified:
|
||||||
|
|
||||||
|
1. **Periodic Force Cleanup**:
|
||||||
|
```typescript
|
||||||
|
setInterval(() => {
|
||||||
|
const connections = connectionManager.getConnections();
|
||||||
|
for (const [id, record] of connections) {
|
||||||
|
if (record.incoming?.destroyed && !record.connectionClosed) {
|
||||||
|
connectionManager.cleanupConnection(record, 'force_cleanup');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}, 60000); // Every minute
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Connection Age Limit**:
|
||||||
|
```typescript
|
||||||
|
// Add max connection age check
|
||||||
|
const maxAge = 3600000; // 1 hour
|
||||||
|
if (Date.now() - record.incomingStartTime > maxAge) {
|
||||||
|
connectionManager.cleanupConnection(record, 'max_age');
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Aggressive Timeout Settings**:
|
||||||
|
```typescript
|
||||||
|
{
|
||||||
|
socketTimeout: 60000, // 1 minute
|
||||||
|
inactivityTimeout: 300000, // 5 minutes
|
||||||
|
connectionCleanupInterval: 30000 // 30 seconds
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Related Files
|
||||||
|
- `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling
|
||||||
|
- `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup
|
||||||
|
- `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities
|
||||||
|
- `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup
|
||||||
|
- `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention
|
||||||
|
- `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states
|
||||||
|
- `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests
|
||||||
|
- `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains
|
||||||
|
|
||||||
|
### Summary
|
||||||
|
|
||||||
|
**Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable.
|
||||||
|
|
||||||
|
**Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase.
|
||||||
|
|
||||||
|
**Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate.
|
||||||
|
|
||||||
|
**Fix Required**:
|
||||||
|
1. Add `connectionTimeout` setting to ISmartProxyOptions
|
||||||
|
2. Implement proper connection timeout in `createSocketWithErrorHandler`
|
||||||
|
3. Pass the timeout value from route-connection-handler
|
||||||
|
|
||||||
|
**Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above.
|
||||||
|
|
||||||
|
The connection cleanup mechanisms have been significantly improved in v19.5.20:
|
||||||
|
1. Race condition fixed by setting `record.outgoing` before connecting
|
||||||
|
2. Immediate routing cleanup handler always destroys outgoing connections
|
||||||
|
3. Tests confirm no accumulation in standard scenarios with reachable backends
|
||||||
|
|
||||||
|
However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
|
||||||
|
|
||||||
|
### Outer Proxy Sudden Accumulation After Hours
|
||||||
|
|
||||||
|
**User Report**: "The counter goes up suddenly after some hours on the outer proxy"
|
||||||
|
|
||||||
|
**Investigation Findings**:
|
||||||
|
|
||||||
|
1. **Cleanup Queue Mechanism**:
|
||||||
|
- Connections are cleaned up in batches of 100 via a queue
|
||||||
|
- If the cleanup timer gets stuck or cleared without restart, connections accumulate
|
||||||
|
- The timer is set with `setTimeout` and could be affected by event loop blocking
|
||||||
|
|
||||||
|
2. **Potential Causes for Sudden Spikes**:
|
||||||
|
|
||||||
|
a) **Cleanup Timer Failure**:
|
||||||
|
```typescript
|
||||||
|
// In ConnectionManager, if this timer gets cleared but not restarted:
|
||||||
|
this.cleanupTimer = this.setTimeout(() => {
|
||||||
|
this.processCleanupQueue();
|
||||||
|
}, 100);
|
||||||
|
```
|
||||||
|
|
||||||
|
b) **Memory Pressure**:
|
||||||
|
- After hours of operation, memory fragmentation or pressure could cause delays
|
||||||
|
- Garbage collection pauses might interfere with timer execution
|
||||||
|
|
||||||
|
c) **Event Listener Accumulation**:
|
||||||
|
- Socket event listeners might accumulate over time
|
||||||
|
- Server 'connection' event handlers are particularly important
|
||||||
|
|
||||||
|
d) **Keep-Alive Connection Cascades**:
|
||||||
|
- When many keep-alive connections timeout simultaneously
|
||||||
|
- Outer proxy has different timeout than inner proxy
|
||||||
|
- Mass disconnection events can overwhelm cleanup queue
|
||||||
|
|
||||||
|
e) **HttpProxy Component Issues**:
|
||||||
|
- If using `useHttpProxy`, the HttpProxy bridge might maintain connection pools
|
||||||
|
- These pools might not be properly cleaned after hours
|
||||||
|
|
||||||
|
3. **Why "Sudden" After Hours**:
|
||||||
|
- Not a gradual leak but triggered by specific conditions
|
||||||
|
- Likely related to periodic events or thresholds:
|
||||||
|
- Inactivity check runs every 30 seconds
|
||||||
|
- Keep-alive connections have extended timeouts (6x normal)
|
||||||
|
- Parity check has 30-minute timeout for half-closed connections
|
||||||
|
|
||||||
|
4. **Reproduction Scenarios**:
|
||||||
|
- Mass client disconnection/reconnection (network blip)
|
||||||
|
- Keep-alive timeout cascade when inner proxy times out first
|
||||||
|
- Cleanup timer getting stuck during high load
|
||||||
|
- Memory pressure causing event loop delays
|
||||||
|
|
||||||
|
### Additional Monitoring Recommendations
|
||||||
|
|
||||||
|
1. **Add Cleanup Queue Monitoring**:
|
||||||
|
```typescript
|
||||||
|
setInterval(() => {
|
||||||
|
const cm = proxy.connectionManager;
|
||||||
|
if (cm.cleanupQueue.size > 100 && !cm.cleanupTimer) {
|
||||||
|
logger.error('Cleanup queue stuck!', {
|
||||||
|
queueSize: cm.cleanupQueue.size,
|
||||||
|
hasTimer: !!cm.cleanupTimer
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}, 60000);
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Track Timer Health**:
|
||||||
|
- Monitor if cleanup timer is running
|
||||||
|
- Check for event loop blocking
|
||||||
|
- Log when batch processing takes too long
|
||||||
|
|
||||||
|
3. **Memory Monitoring**:
|
||||||
|
- Track heap usage over time
|
||||||
|
- Monitor for memory leaks in long-running processes
|
||||||
|
- Force periodic garbage collection if needed
|
||||||
|
|
||||||
|
### Immediate Mitigations
|
||||||
|
|
||||||
|
1. **Restart Cleanup Timer**:
|
||||||
|
```typescript
|
||||||
|
// Emergency cleanup timer restart
|
||||||
|
if (!cm.cleanupTimer && cm.cleanupQueue.size > 0) {
|
||||||
|
cm.cleanupTimer = setTimeout(() => {
|
||||||
|
cm.processCleanupQueue();
|
||||||
|
}, 100);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Force Periodic Cleanup**:
|
||||||
|
```typescript
|
||||||
|
setInterval(() => {
|
||||||
|
const cm = connectionManager;
|
||||||
|
if (cm.getConnectionCount() > threshold) {
|
||||||
|
cm.performOptimizedInactivityCheck();
|
||||||
|
// Force process cleanup queue
|
||||||
|
cm.processCleanupQueue();
|
||||||
|
}
|
||||||
|
}, 300000); // Every 5 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Connection Age Limits**:
|
||||||
|
- Set maximum connection lifetime
|
||||||
|
- Force close connections older than threshold
|
||||||
|
- More aggressive cleanup for proxy chains
|
||||||
|
|
||||||
|
## ✅ FIXED: Zombie Connection Detection (January 2025)
|
||||||
|
|
||||||
|
### Root Cause Identified
|
||||||
|
"Zombie connections" occur when sockets are destroyed without triggering their close/error event handlers. This causes connections to remain tracked with both sockets destroyed but `connectionClosed=false`. This is particularly problematic in proxy chains where the inner proxy might close connections in ways that don't trigger proper events on the outer proxy.
|
||||||
|
|
||||||
|
### Fix Implemented
|
||||||
|
Added zombie detection to the periodic inactivity check in ConnectionManager:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In performOptimizedInactivityCheck()
|
||||||
|
// Check ALL connections for zombie state
|
||||||
|
for (const [connectionId, record] of this.connectionRecords) {
|
||||||
|
if (!record.connectionClosed) {
|
||||||
|
const incomingDestroyed = record.incoming?.destroyed || false;
|
||||||
|
const outgoingDestroyed = record.outgoing?.destroyed || false;
|
||||||
|
|
||||||
|
// Check for zombie connections: both sockets destroyed but not cleaned up
|
||||||
|
if (incomingDestroyed && outgoingDestroyed) {
|
||||||
|
logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(now - record.incomingStartTime),
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up immediately
|
||||||
|
this.cleanupConnection(record, 'zombie_cleanup');
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for half-zombie: one socket destroyed
|
||||||
|
if (incomingDestroyed || outgoingDestroyed) {
|
||||||
|
const age = now - record.incomingStartTime;
|
||||||
|
// Give it 30 seconds grace period for normal cleanup
|
||||||
|
if (age > 30000) {
|
||||||
|
logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(age),
|
||||||
|
incomingDestroyed,
|
||||||
|
outgoingDestroyed,
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
this.cleanupConnection(record, 'half_zombie_cleanup');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### How It Works
|
||||||
|
1. **Full Zombie Detection**: Detects when both incoming and outgoing sockets are destroyed but the connection hasn't been cleaned up
|
||||||
|
2. **Half-Zombie Detection**: Detects when only one socket is destroyed, with a 30-second grace period for normal cleanup to occur
|
||||||
|
3. **Automatic Cleanup**: Immediately cleans up zombie connections when detected
|
||||||
|
4. **Runs Periodically**: Integrated into the existing inactivity check that runs every 30 seconds
|
||||||
|
|
||||||
|
### Why This Fixes the Outer Proxy Accumulation
|
||||||
|
- When inner proxy closes connections abruptly (e.g., due to backend failure), the outer proxy's outgoing socket might be destroyed without firing close/error events
|
||||||
|
- These become zombie connections that previously accumulated indefinitely
|
||||||
|
- Now they are detected and cleaned up within 30 seconds
|
||||||
|
|
||||||
|
### Test Results
|
||||||
|
Debug scripts confirmed:
|
||||||
|
- Zombie connections can be created when sockets are destroyed directly without events
|
||||||
|
- The zombie detection successfully identifies and cleans up these connections
|
||||||
|
- Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
|
||||||
|
|
||||||
|
This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
|
||||||
|
|
||||||
|
## 🔍 Production Diagnostics (January 2025)
|
||||||
|
|
||||||
|
Since the zombie detection fix didn't fully resolve the issue, use the ProductionConnectionMonitor to diagnose the actual problem:
|
||||||
|
|
||||||
|
### How to Use the Production Monitor
|
||||||
|
|
||||||
|
1. **Add to your proxy startup script**:
|
||||||
|
```typescript
|
||||||
|
import ProductionConnectionMonitor from './production-connection-monitor.js';
|
||||||
|
|
||||||
|
// After proxy.start()
|
||||||
|
const monitor = new ProductionConnectionMonitor(proxy);
|
||||||
|
monitor.start(5000); // Check every 5 seconds
|
||||||
|
|
||||||
|
// Monitor will automatically capture diagnostics when:
|
||||||
|
// - Connections exceed threshold (default: 50)
|
||||||
|
// - Sudden spike occurs (default: +20 connections)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Diagnostics are saved to**: `.nogit/connection-diagnostics/`
|
||||||
|
|
||||||
|
3. **Force capture anytime**: `monitor.forceCaptureNow()`
|
||||||
|
|
||||||
|
### What the Monitor Captures
|
||||||
|
|
||||||
|
For each connection:
|
||||||
|
- Socket states (destroyed, readable, writable, readyState)
|
||||||
|
- Connection flags (closed, keepAlive, TLS status)
|
||||||
|
- Data transfer statistics
|
||||||
|
- Time since last activity
|
||||||
|
- Cleanup queue status
|
||||||
|
- Event listener counts
|
||||||
|
- Termination reasons
|
||||||
|
|
||||||
|
### Pattern Analysis
|
||||||
|
|
||||||
|
The monitor automatically identifies:
|
||||||
|
- **Zombie connections**: Both sockets destroyed but not cleaned up
|
||||||
|
- **Half-zombies**: One socket destroyed
|
||||||
|
- **Stuck connecting**: Outgoing socket stuck in connecting state
|
||||||
|
- **No outgoing**: Missing outgoing socket
|
||||||
|
- **Keep-alive stuck**: Keep-alive connections with no recent activity
|
||||||
|
- **Old connections**: Connections older than 1 hour
|
||||||
|
- **No data transfer**: Connections with no bytes transferred
|
||||||
|
- **Listener leaks**: Excessive event listeners
|
||||||
|
|
||||||
|
### Common Accumulation Patterns
|
||||||
|
|
||||||
|
1. **Connecting State Stuck**
|
||||||
|
- Outgoing socket shows `connecting: true` indefinitely
|
||||||
|
- Usually means connection timeout not working
|
||||||
|
- Check if backend is reachable
|
||||||
|
|
||||||
|
2. **Missing Outgoing Socket**
|
||||||
|
- Connection has no outgoing socket but isn't closed
|
||||||
|
- May indicate immediate routing issues
|
||||||
|
- Check error logs during connection setup
|
||||||
|
|
||||||
|
3. **Event Listener Accumulation**
|
||||||
|
- High listener counts (>20) on sockets
|
||||||
|
- Indicates cleanup not removing all listeners
|
||||||
|
- Can cause memory leaks
|
||||||
|
|
||||||
|
4. **Keep-Alive Zombies**
|
||||||
|
- Keep-alive connections not timing out
|
||||||
|
- Check keepAlive timeout settings
|
||||||
|
- May need more aggressive cleanup
|
||||||
|
|
||||||
|
### Next Steps
|
||||||
|
|
||||||
|
1. **Run the monitor in production** during accumulation
|
||||||
|
2. **Share the diagnostic files** from `.nogit/connection-diagnostics/`
|
||||||
|
3. **Look for patterns** in the captured snapshots
|
||||||
|
4. **Check specific connection IDs** that accumulate
|
||||||
|
|
||||||
|
The diagnostic files will show exactly what state connections are in when accumulation occurs, allowing targeted fixes for the specific issue.
|
||||||
|
|
||||||
|
## ✅ FIXED: Stuck Connection Detection (January 2025)
|
||||||
|
|
||||||
|
### Additional Root Cause Found
|
||||||
|
Connections to hanging backends (that accept but never respond) were not being cleaned up because:
|
||||||
|
- Both sockets remain alive (not destroyed)
|
||||||
|
- Keep-alive prevents normal timeout
|
||||||
|
- No data is sent back to the client despite receiving data
|
||||||
|
- These don't qualify as "zombies" since sockets aren't destroyed
|
||||||
|
|
||||||
|
### Fix Implemented
|
||||||
|
Added stuck connection detection to the periodic inactivity check:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Check for stuck connections: no data sent back to client
|
||||||
|
if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
|
||||||
|
const age = now - record.incomingStartTime;
|
||||||
|
// If connection is older than 60 seconds and no data sent back, likely stuck
|
||||||
|
if (age > 60000) {
|
||||||
|
logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(age),
|
||||||
|
bytesReceived: record.bytesReceived,
|
||||||
|
targetHost: record.targetHost,
|
||||||
|
targetPort: record.targetPort,
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
this.cleanupConnection(record, 'stuck_no_response');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### What This Fixes
|
||||||
|
- Connections to backends that accept but never respond
|
||||||
|
- Proxy chains where inner proxy connects to unresponsive services
|
||||||
|
- Scenarios where keep-alive prevents normal timeout mechanisms
|
||||||
|
- Connections that receive client data but never send anything back
|
||||||
|
|
||||||
|
### Detection Criteria
|
||||||
|
- Connection has received bytes from client (`bytesReceived > 0`)
|
||||||
|
- No bytes sent back to client (`bytesSent === 0`)
|
||||||
|
- Connection is older than 60 seconds
|
||||||
|
- Both sockets are still alive (not destroyed)
|
||||||
|
|
||||||
|
This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.
|
@ -856,4 +856,42 @@ The WrappedSocket class has been implemented as the foundation for PROXY protoco
|
|||||||
For detailed information about proxy protocol implementation and proxy chaining:
|
For detailed information about proxy protocol implementation and proxy chaining:
|
||||||
- **[Proxy Protocol Guide](./readme.proxy-protocol.md)** - Complete implementation details and configuration
|
- **[Proxy Protocol Guide](./readme.proxy-protocol.md)** - Complete implementation details and configuration
|
||||||
- **[Proxy Protocol Examples](./readme.proxy-protocol-example.md)** - Code examples and conceptual implementation
|
- **[Proxy Protocol Examples](./readme.proxy-protocol-example.md)** - Code examples and conceptual implementation
|
||||||
- **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
|
- **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
|
||||||
|
|
||||||
|
## Connection Cleanup Edge Cases Investigation (v19.5.20+)
|
||||||
|
|
||||||
|
### Issue Discovered
|
||||||
|
"Zombie connections" can occur when both sockets are destroyed but the connection record hasn't been cleaned up. This happens when sockets are destroyed without triggering their close/error event handlers.
|
||||||
|
|
||||||
|
### Root Cause
|
||||||
|
1. **Event Handler Bypass**: In edge cases (network failures, proxy chain failures, forced socket destruction), sockets can be destroyed without their event handlers being called
|
||||||
|
2. **Cleanup Queue Delay**: The `initiateCleanupOnce` method adds connections to a cleanup queue (batch of 100 every 100ms), which may not process fast enough
|
||||||
|
3. **Inactivity Check Limitation**: The periodic inactivity check only examines `lastActivity` timestamps, not actual socket states
|
||||||
|
|
||||||
|
### Test Results
|
||||||
|
Debug script (`connection-manager-direct-test.ts`) revealed:
|
||||||
|
- **Normal cleanup works**: When socket events fire normally, cleanup is reliable
|
||||||
|
- **Zombies ARE created**: Direct socket destruction creates zombies (destroyed sockets, connectionClosed=false)
|
||||||
|
- **Manual cleanup works**: Calling `initiateCleanupOnce` on a zombie does clean it up
|
||||||
|
- **Inactivity check misses zombies**: The check doesn't detect connections with destroyed sockets
|
||||||
|
|
||||||
|
### Potential Solutions
|
||||||
|
1. **Periodic Zombie Detection**: Add zombie detection to the inactivity check:
|
||||||
|
```typescript
|
||||||
|
// In performOptimizedInactivityCheck
|
||||||
|
if (record.incoming?.destroyed && record.outgoing?.destroyed && !record.connectionClosed) {
|
||||||
|
this.cleanupConnection(record, 'zombie_detected');
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Socket State Monitoring**: Check socket states during connection operations
|
||||||
|
3. **Defensive Socket Handling**: Always attach cleanup handlers before any operation that might destroy sockets
|
||||||
|
4. **Immediate Cleanup Option**: For critical paths, use `cleanupConnection` instead of `initiateCleanupOnce`
|
||||||
|
|
||||||
|
### Impact
|
||||||
|
- Memory leaks in edge cases (network failures, proxy chain issues)
|
||||||
|
- Connection count inaccuracy
|
||||||
|
- Potential resource exhaustion over time
|
||||||
|
|
||||||
|
### Test Files
|
||||||
|
- `.nogit/debug/connection-manager-direct-test.ts` - Direct ConnectionManager testing showing zombie creation
|
202
readme.monitoring.md
Normal file
202
readme.monitoring.md
Normal file
@ -0,0 +1,202 @@
|
|||||||
|
# Production Connection Monitoring
|
||||||
|
|
||||||
|
This document explains how to use the ProductionConnectionMonitor to diagnose connection accumulation issues in real-time.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import ProductionConnectionMonitor from './.nogit/debug/production-connection-monitor.js';
|
||||||
|
|
||||||
|
// After starting your proxy
|
||||||
|
const monitor = new ProductionConnectionMonitor(proxy);
|
||||||
|
monitor.start(5000); // Check every 5 seconds
|
||||||
|
|
||||||
|
// The monitor will automatically capture diagnostics when:
|
||||||
|
// - Connections exceed 50 (default threshold)
|
||||||
|
// - Sudden spike of 20+ connections occurs
|
||||||
|
// - You manually call monitor.forceCaptureNow()
|
||||||
|
```
|
||||||
|
|
||||||
|
## What Gets Captured
|
||||||
|
|
||||||
|
When accumulation is detected, the monitor saves a JSON file with:
|
||||||
|
|
||||||
|
### Connection Details
|
||||||
|
- Socket states (destroyed, readable, writable, readyState)
|
||||||
|
- Connection age and activity timestamps
|
||||||
|
- Data transfer statistics (bytes sent/received)
|
||||||
|
- Target host and port information
|
||||||
|
- Keep-alive status
|
||||||
|
- Event listener counts
|
||||||
|
|
||||||
|
### System State
|
||||||
|
- Memory usage
|
||||||
|
- Event loop lag
|
||||||
|
- Connection count trends
|
||||||
|
- Termination statistics
|
||||||
|
|
||||||
|
## Reading Diagnostic Files
|
||||||
|
|
||||||
|
Files are saved to `.nogit/connection-diagnostics/` with names like:
|
||||||
|
```
|
||||||
|
accumulation_2025-06-07T20-20-43-733Z_force_capture.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Fields to Check
|
||||||
|
|
||||||
|
1. **Socket States**
|
||||||
|
```json
|
||||||
|
"incomingState": {
|
||||||
|
"destroyed": false,
|
||||||
|
"readable": true,
|
||||||
|
"writable": true,
|
||||||
|
"readyState": "open"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- Both destroyed = zombie connection
|
||||||
|
- One destroyed = half-zombie
|
||||||
|
- Both alive but old = potential stuck connection
|
||||||
|
|
||||||
|
2. **Data Transfer**
|
||||||
|
```json
|
||||||
|
"bytesReceived": 36,
|
||||||
|
"bytesSent": 0,
|
||||||
|
"timeSinceLastActivity": 60000
|
||||||
|
```
|
||||||
|
- No bytes sent back = stuck connection
|
||||||
|
- High bytes but old = slow backend
|
||||||
|
- No activity = idle connection
|
||||||
|
|
||||||
|
3. **Connection Flags**
|
||||||
|
```json
|
||||||
|
"hasReceivedInitialData": false,
|
||||||
|
"hasKeepAlive": true,
|
||||||
|
"connectionClosed": false
|
||||||
|
```
|
||||||
|
- hasReceivedInitialData=false on non-TLS = immediate routing
|
||||||
|
- hasKeepAlive=true = extended timeout applies
|
||||||
|
- connectionClosed=false = still tracked
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
### 1. Hanging Backend Pattern
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"bytesReceived": 36,
|
||||||
|
"bytesSent": 0,
|
||||||
|
"age": 120000,
|
||||||
|
"targetHost": "backend.example.com",
|
||||||
|
"incomingState": { "destroyed": false },
|
||||||
|
"outgoingState": { "destroyed": false }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
**Fix**: The stuck connection detection (60s timeout) should clean these up.
|
||||||
|
|
||||||
|
### 2. Zombie Connection Pattern
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"incomingState": { "destroyed": true },
|
||||||
|
"outgoingState": { "destroyed": true },
|
||||||
|
"connectionClosed": false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
**Fix**: The zombie detection should clean these up within 30s.
|
||||||
|
|
||||||
|
### 3. Event Listener Leak Pattern
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"incomingListeners": {
|
||||||
|
"data": 15,
|
||||||
|
"error": 20,
|
||||||
|
"close": 18
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
**Issue**: Event listeners accumulating, potential memory leak.
|
||||||
|
|
||||||
|
### 4. No Outgoing Socket Pattern
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"outgoingState": { "exists": false },
|
||||||
|
"connectionClosed": false,
|
||||||
|
"age": 5000
|
||||||
|
}
|
||||||
|
```
|
||||||
|
**Issue**: Connection setup failed but cleanup didn't trigger.
|
||||||
|
|
||||||
|
## Forcing Diagnostic Capture
|
||||||
|
|
||||||
|
To capture current state immediately:
|
||||||
|
```typescript
|
||||||
|
monitor.forceCaptureNow();
|
||||||
|
```
|
||||||
|
|
||||||
|
This is useful when you notice accumulation starting.
|
||||||
|
|
||||||
|
## Automated Analysis
|
||||||
|
|
||||||
|
The monitor automatically analyzes patterns and logs:
|
||||||
|
- Zombie/half-zombie counts
|
||||||
|
- Stuck connection counts
|
||||||
|
- Old connection counts
|
||||||
|
- Memory usage
|
||||||
|
- Recommendations
|
||||||
|
|
||||||
|
## Integration Example
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In your proxy startup script
|
||||||
|
import { SmartProxy } from '@push.rocks/smartproxy';
|
||||||
|
import ProductionConnectionMonitor from './production-connection-monitor.js';
|
||||||
|
|
||||||
|
async function startProxyWithMonitoring() {
|
||||||
|
const proxy = new SmartProxy({
|
||||||
|
// your config
|
||||||
|
});
|
||||||
|
|
||||||
|
await proxy.start();
|
||||||
|
|
||||||
|
// Start monitoring
|
||||||
|
const monitor = new ProductionConnectionMonitor(proxy);
|
||||||
|
monitor.start(5000);
|
||||||
|
|
||||||
|
// Optional: Capture on specific events
|
||||||
|
process.on('SIGUSR1', () => {
|
||||||
|
console.log('Manual diagnostic capture triggered');
|
||||||
|
monitor.forceCaptureNow();
|
||||||
|
});
|
||||||
|
|
||||||
|
// Graceful shutdown
|
||||||
|
process.on('SIGTERM', async () => {
|
||||||
|
monitor.stop();
|
||||||
|
await proxy.stop();
|
||||||
|
process.exit(0);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Monitor Not Detecting Accumulation
|
||||||
|
- Check threshold settings (default: 50 connections)
|
||||||
|
- Reduce check interval for faster detection
|
||||||
|
- Use forceCaptureNow() to capture current state
|
||||||
|
|
||||||
|
### Too Many False Positives
|
||||||
|
- Increase accumulation threshold
|
||||||
|
- Increase spike threshold
|
||||||
|
- Adjust check interval
|
||||||
|
|
||||||
|
### Missing Diagnostic Data
|
||||||
|
- Ensure output directory exists and is writable
|
||||||
|
- Check disk space
|
||||||
|
- Verify process has write permissions
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. Deploy the monitor to production
|
||||||
|
2. Wait for accumulation to occur
|
||||||
|
3. Share diagnostic files for analysis
|
||||||
|
4. Apply targeted fixes based on patterns found
|
||||||
|
|
||||||
|
The diagnostic data will reveal the exact state of connections when accumulation occurs, enabling precise fixes for your specific scenario.
|
182
test/test.proxy-chain-cleanup.node.ts
Normal file
182
test/test.proxy-chain-cleanup.node.ts
Normal file
@ -0,0 +1,182 @@
|
|||||||
|
import { expect, tap } from '@git.zone/tstest/tapbundle';
|
||||||
|
import * as plugins from '../ts/plugins.js';
|
||||||
|
import { SmartProxy } from '../ts/index.js';
|
||||||
|
|
||||||
|
let outerProxy: SmartProxy;
|
||||||
|
let innerProxy: SmartProxy;
|
||||||
|
|
||||||
|
tap.test('setup two smartproxies in a chain configuration', async () => {
|
||||||
|
// Setup inner proxy (backend proxy)
|
||||||
|
innerProxy = new SmartProxy({
|
||||||
|
routes: [
|
||||||
|
{
|
||||||
|
match: {
|
||||||
|
ports: 8002
|
||||||
|
},
|
||||||
|
action: {
|
||||||
|
type: 'forward',
|
||||||
|
target: {
|
||||||
|
host: 'httpbin.org',
|
||||||
|
port: 443
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
defaults: {
|
||||||
|
target: {
|
||||||
|
host: 'httpbin.org',
|
||||||
|
port: 443
|
||||||
|
}
|
||||||
|
},
|
||||||
|
acceptProxyProtocol: true,
|
||||||
|
sendProxyProtocol: false,
|
||||||
|
enableDetailedLogging: true,
|
||||||
|
connectionCleanupInterval: 5000, // More frequent cleanup for testing
|
||||||
|
inactivityTimeout: 10000 // Shorter timeout for testing
|
||||||
|
});
|
||||||
|
await innerProxy.start();
|
||||||
|
|
||||||
|
// Setup outer proxy (frontend proxy)
|
||||||
|
outerProxy = new SmartProxy({
|
||||||
|
routes: [
|
||||||
|
{
|
||||||
|
match: {
|
||||||
|
ports: 8001
|
||||||
|
},
|
||||||
|
action: {
|
||||||
|
type: 'forward',
|
||||||
|
target: {
|
||||||
|
host: 'localhost',
|
||||||
|
port: 8002
|
||||||
|
},
|
||||||
|
sendProxyProtocol: true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
defaults: {
|
||||||
|
target: {
|
||||||
|
host: 'localhost',
|
||||||
|
port: 8002
|
||||||
|
}
|
||||||
|
},
|
||||||
|
sendProxyProtocol: true,
|
||||||
|
enableDetailedLogging: true,
|
||||||
|
connectionCleanupInterval: 5000, // More frequent cleanup for testing
|
||||||
|
inactivityTimeout: 10000 // Shorter timeout for testing
|
||||||
|
});
|
||||||
|
await outerProxy.start();
|
||||||
|
});
|
||||||
|
|
||||||
|
tap.test('should properly cleanup connections in proxy chain', async (tools) => {
|
||||||
|
const testDuration = 30000; // 30 seconds
|
||||||
|
const connectionInterval = 500; // Create new connection every 500ms
|
||||||
|
const connectionDuration = 2000; // Each connection lasts 2 seconds
|
||||||
|
|
||||||
|
let connectionsCreated = 0;
|
||||||
|
let connectionsCompleted = 0;
|
||||||
|
|
||||||
|
// Function to create a test connection
|
||||||
|
const createTestConnection = async () => {
|
||||||
|
connectionsCreated++;
|
||||||
|
const connectionId = connectionsCreated;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const socket = plugins.net.connect({
|
||||||
|
port: 8001,
|
||||||
|
host: 'localhost'
|
||||||
|
});
|
||||||
|
|
||||||
|
await new Promise<void>((resolve, reject) => {
|
||||||
|
socket.on('connect', () => {
|
||||||
|
console.log(`Connection ${connectionId} established`);
|
||||||
|
|
||||||
|
// Send TLS Client Hello for httpbin.org
|
||||||
|
const clientHello = Buffer.from([
|
||||||
|
0x16, 0x03, 0x01, 0x00, 0xc8, // TLS handshake header
|
||||||
|
0x01, 0x00, 0x00, 0xc4, // Client Hello
|
||||||
|
0x03, 0x03, // TLS 1.2
|
||||||
|
...Array(32).fill(0), // Random bytes
|
||||||
|
0x00, // Session ID length
|
||||||
|
0x00, 0x02, 0x13, 0x01, // Cipher suites
|
||||||
|
0x01, 0x00, // Compression methods
|
||||||
|
0x00, 0x97, // Extensions length
|
||||||
|
0x00, 0x00, 0x00, 0x0f, 0x00, 0x0d, // SNI extension
|
||||||
|
0x00, 0x00, 0x0a, 0x68, 0x74, 0x74, 0x70, 0x62, 0x69, 0x6e, 0x2e, 0x6f, 0x72, 0x67 // "httpbin.org"
|
||||||
|
]);
|
||||||
|
|
||||||
|
socket.write(clientHello);
|
||||||
|
|
||||||
|
// Keep connection alive for specified duration
|
||||||
|
setTimeout(() => {
|
||||||
|
socket.destroy();
|
||||||
|
connectionsCompleted++;
|
||||||
|
console.log(`Connection ${connectionId} closed (completed: ${connectionsCompleted}/${connectionsCreated})`);
|
||||||
|
resolve();
|
||||||
|
}, connectionDuration);
|
||||||
|
});
|
||||||
|
|
||||||
|
socket.on('error', (err) => {
|
||||||
|
console.log(`Connection ${connectionId} error: ${err.message}`);
|
||||||
|
connectionsCompleted++;
|
||||||
|
reject(err);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
} catch (err) {
|
||||||
|
console.log(`Failed to create connection ${connectionId}: ${err.message}`);
|
||||||
|
connectionsCompleted++;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// Start creating connections
|
||||||
|
const startTime = Date.now();
|
||||||
|
const connectionTimer = setInterval(() => {
|
||||||
|
if (Date.now() - startTime < testDuration) {
|
||||||
|
createTestConnection().catch(() => {});
|
||||||
|
} else {
|
||||||
|
clearInterval(connectionTimer);
|
||||||
|
}
|
||||||
|
}, connectionInterval);
|
||||||
|
|
||||||
|
// Monitor connection counts
|
||||||
|
const monitorInterval = setInterval(() => {
|
||||||
|
const outerConnections = (outerProxy as any).connectionManager.getConnectionCount();
|
||||||
|
const innerConnections = (innerProxy as any).connectionManager.getConnectionCount();
|
||||||
|
|
||||||
|
console.log(`Active connections - Outer: ${outerConnections}, Inner: ${innerConnections}, Created: ${connectionsCreated}, Completed: ${connectionsCompleted}`);
|
||||||
|
}, 2000);
|
||||||
|
|
||||||
|
// Wait for test duration + cleanup time
|
||||||
|
await tools.delayFor(testDuration + 10000);
|
||||||
|
|
||||||
|
clearInterval(connectionTimer);
|
||||||
|
clearInterval(monitorInterval);
|
||||||
|
|
||||||
|
// Wait for all connections to complete
|
||||||
|
while (connectionsCompleted < connectionsCreated) {
|
||||||
|
await tools.delayFor(100);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Give some time for cleanup
|
||||||
|
await tools.delayFor(5000);
|
||||||
|
|
||||||
|
// Check final connection counts
|
||||||
|
const finalOuterConnections = (outerProxy as any).connectionManager.getConnectionCount();
|
||||||
|
const finalInnerConnections = (innerProxy as any).connectionManager.getConnectionCount();
|
||||||
|
|
||||||
|
console.log(`\nFinal connection counts:`);
|
||||||
|
console.log(`Outer proxy: ${finalOuterConnections}`);
|
||||||
|
console.log(`Inner proxy: ${finalInnerConnections}`);
|
||||||
|
console.log(`Total created: ${connectionsCreated}`);
|
||||||
|
console.log(`Total completed: ${connectionsCompleted}`);
|
||||||
|
|
||||||
|
// Both proxies should have cleaned up all connections
|
||||||
|
expect(finalOuterConnections).toEqual(0);
|
||||||
|
expect(finalInnerConnections).toEqual(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
tap.test('cleanup proxies', async () => {
|
||||||
|
await outerProxy.stop();
|
||||||
|
await innerProxy.stop();
|
||||||
|
});
|
||||||
|
|
||||||
|
export default tap.start();
|
144
test/test.stuck-connection-cleanup.node.ts
Normal file
144
test/test.stuck-connection-cleanup.node.ts
Normal file
@ -0,0 +1,144 @@
|
|||||||
|
import { expect, tap } from '@git.zone/tstest/tapbundle';
|
||||||
|
import * as net from 'net';
|
||||||
|
import { SmartProxy } from '../ts/index.js';
|
||||||
|
import * as plugins from '../ts/plugins.js';
|
||||||
|
|
||||||
|
tap.test('stuck connection cleanup - verify connections to hanging backends are cleaned up', async (tools) => {
|
||||||
|
console.log('\n=== Stuck Connection Cleanup Test ===');
|
||||||
|
console.log('Purpose: Verify that connections to backends that accept but never respond are cleaned up');
|
||||||
|
|
||||||
|
// Create a hanging backend that accepts connections but never responds
|
||||||
|
let backendConnections = 0;
|
||||||
|
const hangingBackend = net.createServer((socket) => {
|
||||||
|
backendConnections++;
|
||||||
|
console.log(`Hanging backend: Connection ${backendConnections} received`);
|
||||||
|
// Accept the connection but never send any data back
|
||||||
|
// This simulates a hung backend service
|
||||||
|
});
|
||||||
|
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
hangingBackend.listen(9997, () => {
|
||||||
|
console.log('✓ Hanging backend started on port 9997');
|
||||||
|
resolve();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Create proxy that forwards to hanging backend
|
||||||
|
const proxy = new SmartProxy({
|
||||||
|
routes: [{
|
||||||
|
name: 'to-hanging-backend',
|
||||||
|
match: { ports: 8589 },
|
||||||
|
action: {
|
||||||
|
type: 'forward',
|
||||||
|
target: { host: 'localhost', port: 9997 }
|
||||||
|
}
|
||||||
|
}],
|
||||||
|
keepAlive: true,
|
||||||
|
enableDetailedLogging: false,
|
||||||
|
inactivityTimeout: 5000, // 5 second inactivity check interval for faster testing
|
||||||
|
});
|
||||||
|
|
||||||
|
await proxy.start();
|
||||||
|
console.log('✓ Proxy started on port 8589');
|
||||||
|
|
||||||
|
// Create connections that will get stuck
|
||||||
|
console.log('\n--- Creating connections to hanging backend ---');
|
||||||
|
const clients: net.Socket[] = [];
|
||||||
|
|
||||||
|
for (let i = 0; i < 5; i++) {
|
||||||
|
const client = net.connect(8589, 'localhost');
|
||||||
|
clients.push(client);
|
||||||
|
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
client.on('connect', () => {
|
||||||
|
console.log(`Client ${i} connected`);
|
||||||
|
// Send data that will never get a response
|
||||||
|
client.write(`GET / HTTP/1.1\r\nHost: localhost\r\n\r\n`);
|
||||||
|
resolve();
|
||||||
|
});
|
||||||
|
|
||||||
|
client.on('error', (err) => {
|
||||||
|
console.log(`Client ${i} error: ${err.message}`);
|
||||||
|
resolve();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait a moment for connections to establish
|
||||||
|
await plugins.smartdelay.delayFor(1000);
|
||||||
|
|
||||||
|
// Check initial connection count
|
||||||
|
const initialCount = (proxy as any).connectionManager.getConnectionCount();
|
||||||
|
console.log(`\nInitial connection count: ${initialCount}`);
|
||||||
|
expect(initialCount).toEqual(5);
|
||||||
|
|
||||||
|
// Get connection details
|
||||||
|
const connections = (proxy as any).connectionManager.getConnections();
|
||||||
|
let stuckCount = 0;
|
||||||
|
|
||||||
|
for (const [id, record] of connections) {
|
||||||
|
if (record.bytesReceived > 0 && record.bytesSent === 0) {
|
||||||
|
stuckCount++;
|
||||||
|
console.log(`Stuck connection ${id}: received=${record.bytesReceived}, sent=${record.bytesSent}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`Stuck connections found: ${stuckCount}`);
|
||||||
|
expect(stuckCount).toEqual(5);
|
||||||
|
|
||||||
|
// Wait for inactivity check to run (it checks every 30s by default, but we set it to 5s)
|
||||||
|
console.log('\n--- Waiting for stuck connection detection (65 seconds) ---');
|
||||||
|
console.log('Note: Stuck connections are cleaned up after 60 seconds with no response');
|
||||||
|
|
||||||
|
// Speed up time by manually triggering inactivity check after simulating time passage
|
||||||
|
// First, age the connections by updating their timestamps
|
||||||
|
const now = Date.now();
|
||||||
|
for (const [id, record] of connections) {
|
||||||
|
// Simulate that these connections are 61 seconds old
|
||||||
|
record.incomingStartTime = now - 61000;
|
||||||
|
record.lastActivity = now - 61000;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Manually trigger inactivity check
|
||||||
|
console.log('Manually triggering inactivity check...');
|
||||||
|
(proxy as any).connectionManager.performOptimizedInactivityCheck();
|
||||||
|
|
||||||
|
// Wait for cleanup to complete
|
||||||
|
await plugins.smartdelay.delayFor(1000);
|
||||||
|
|
||||||
|
// Check connection count after cleanup
|
||||||
|
const afterCleanupCount = (proxy as any).connectionManager.getConnectionCount();
|
||||||
|
console.log(`\nConnection count after cleanup: ${afterCleanupCount}`);
|
||||||
|
|
||||||
|
// Verify termination stats
|
||||||
|
const stats = (proxy as any).connectionManager.getTerminationStats();
|
||||||
|
console.log('\nTermination stats:', stats);
|
||||||
|
|
||||||
|
// All connections should be cleaned up as "stuck_no_response"
|
||||||
|
expect(afterCleanupCount).toEqual(0);
|
||||||
|
|
||||||
|
// The termination reason might be under incoming or general stats
|
||||||
|
const stuckCleanups = (stats.incoming.stuck_no_response || 0) +
|
||||||
|
(stats.outgoing?.stuck_no_response || 0);
|
||||||
|
console.log(`Stuck cleanups detected: ${stuckCleanups}`);
|
||||||
|
expect(stuckCleanups).toBeGreaterThan(0);
|
||||||
|
|
||||||
|
// Verify clients were disconnected
|
||||||
|
let closedClients = 0;
|
||||||
|
for (const client of clients) {
|
||||||
|
if (client.destroyed) {
|
||||||
|
closedClients++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
console.log(`Closed clients: ${closedClients}/5`);
|
||||||
|
expect(closedClients).toEqual(5);
|
||||||
|
|
||||||
|
// Cleanup
|
||||||
|
console.log('\n--- Cleanup ---');
|
||||||
|
await proxy.stop();
|
||||||
|
hangingBackend.close();
|
||||||
|
|
||||||
|
console.log('✓ Test complete: Stuck connections are properly detected and cleaned up');
|
||||||
|
});
|
||||||
|
|
||||||
|
tap.start();
|
306
test/test.zombie-connection-cleanup.node.ts
Normal file
306
test/test.zombie-connection-cleanup.node.ts
Normal file
@ -0,0 +1,306 @@
|
|||||||
|
import { tap, expect } from '@git.zone/tstest/tapbundle';
|
||||||
|
import * as net from 'net';
|
||||||
|
import * as plugins from '../ts/plugins.js';
|
||||||
|
|
||||||
|
// Import SmartProxy
|
||||||
|
import { SmartProxy } from '../ts/index.js';
|
||||||
|
|
||||||
|
// Import types through type-only imports
|
||||||
|
import type { ConnectionManager } from '../ts/proxies/smart-proxy/connection-manager.js';
|
||||||
|
import type { IConnectionRecord } from '../ts/proxies/smart-proxy/models/interfaces.js';
|
||||||
|
|
||||||
|
tap.test('zombie connection cleanup - verify inactivity check detects and cleans destroyed sockets', async () => {
|
||||||
|
console.log('\n=== Zombie Connection Cleanup Test ===');
|
||||||
|
console.log('Purpose: Verify that connections with destroyed sockets are detected and cleaned up');
|
||||||
|
console.log('Setup: Client → OuterProxy (8590) → InnerProxy (8591) → Backend (9998)');
|
||||||
|
|
||||||
|
// Create backend server that can be controlled
|
||||||
|
let acceptConnections = true;
|
||||||
|
let destroyImmediately = false;
|
||||||
|
const backendConnections: net.Socket[] = [];
|
||||||
|
|
||||||
|
const backend = net.createServer((socket) => {
|
||||||
|
console.log('Backend: Connection received');
|
||||||
|
backendConnections.push(socket);
|
||||||
|
|
||||||
|
if (destroyImmediately) {
|
||||||
|
console.log('Backend: Destroying connection immediately');
|
||||||
|
socket.destroy();
|
||||||
|
} else {
|
||||||
|
socket.on('data', (data) => {
|
||||||
|
console.log('Backend: Received data, echoing back');
|
||||||
|
socket.write(data);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
backend.listen(9998, () => {
|
||||||
|
console.log('✓ Backend server started on port 9998');
|
||||||
|
resolve();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Create InnerProxy with faster inactivity check for testing
|
||||||
|
const innerProxy = new SmartProxy({
|
||||||
|
ports: [8591],
|
||||||
|
enableDetailedLogging: true,
|
||||||
|
inactivityTimeout: 5000, // 5 seconds for faster testing
|
||||||
|
inactivityCheckInterval: 1000, // Check every second
|
||||||
|
routes: [{
|
||||||
|
name: 'to-backend',
|
||||||
|
match: { ports: 8591 },
|
||||||
|
action: {
|
||||||
|
type: 'forward',
|
||||||
|
target: {
|
||||||
|
host: 'localhost',
|
||||||
|
port: 9998
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
});
|
||||||
|
|
||||||
|
// Create OuterProxy with faster inactivity check
|
||||||
|
const outerProxy = new SmartProxy({
|
||||||
|
ports: [8590],
|
||||||
|
enableDetailedLogging: true,
|
||||||
|
inactivityTimeout: 5000, // 5 seconds for faster testing
|
||||||
|
inactivityCheckInterval: 1000, // Check every second
|
||||||
|
routes: [{
|
||||||
|
name: 'to-inner',
|
||||||
|
match: { ports: 8590 },
|
||||||
|
action: {
|
||||||
|
type: 'forward',
|
||||||
|
target: {
|
||||||
|
host: 'localhost',
|
||||||
|
port: 8591
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
});
|
||||||
|
|
||||||
|
await innerProxy.start();
|
||||||
|
console.log('✓ InnerProxy started on port 8591');
|
||||||
|
|
||||||
|
await outerProxy.start();
|
||||||
|
console.log('✓ OuterProxy started on port 8590');
|
||||||
|
|
||||||
|
// Helper to get connection details
|
||||||
|
const getConnectionDetails = () => {
|
||||||
|
const outerConnMgr = (outerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
const innerConnMgr = (innerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
|
||||||
|
const outerRecords = Array.from((outerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
const innerRecords = Array.from((innerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
|
||||||
|
return {
|
||||||
|
outer: {
|
||||||
|
count: outerConnMgr.getConnectionCount(),
|
||||||
|
records: outerRecords,
|
||||||
|
zombies: outerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
r.incoming?.destroyed &&
|
||||||
|
(r.outgoing?.destroyed ?? true)
|
||||||
|
),
|
||||||
|
halfZombies: outerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
(r.incoming?.destroyed || r.outgoing?.destroyed) &&
|
||||||
|
!(r.incoming?.destroyed && (r.outgoing?.destroyed ?? true))
|
||||||
|
)
|
||||||
|
},
|
||||||
|
inner: {
|
||||||
|
count: innerConnMgr.getConnectionCount(),
|
||||||
|
records: innerRecords,
|
||||||
|
zombies: innerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
r.incoming?.destroyed &&
|
||||||
|
(r.outgoing?.destroyed ?? true)
|
||||||
|
),
|
||||||
|
halfZombies: innerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
(r.incoming?.destroyed || r.outgoing?.destroyed) &&
|
||||||
|
!(r.incoming?.destroyed && (r.outgoing?.destroyed ?? true))
|
||||||
|
)
|
||||||
|
}
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
console.log('\n--- Test 1: Create zombie by destroying sockets without events ---');
|
||||||
|
|
||||||
|
// Create a connection and forcefully destroy sockets to create zombies
|
||||||
|
const client1 = new net.Socket();
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
client1.connect(8590, 'localhost', () => {
|
||||||
|
console.log('Client1 connected to OuterProxy');
|
||||||
|
client1.write('GET / HTTP/1.1\r\nHost: test.com\r\n\r\n');
|
||||||
|
|
||||||
|
// Wait for connection to be established through the chain
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('Forcefully destroying backend connections to create zombies');
|
||||||
|
|
||||||
|
// Get connection details before destruction
|
||||||
|
const beforeDetails = getConnectionDetails();
|
||||||
|
console.log(`Before destruction: Outer=${beforeDetails.outer.count}, Inner=${beforeDetails.inner.count}`);
|
||||||
|
|
||||||
|
// Destroy all backend connections without proper close events
|
||||||
|
backendConnections.forEach(conn => {
|
||||||
|
if (!conn.destroyed) {
|
||||||
|
// Remove all listeners to prevent proper cleanup
|
||||||
|
conn.removeAllListeners();
|
||||||
|
conn.destroy();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Also destroy the client socket abruptly
|
||||||
|
client1.removeAllListeners();
|
||||||
|
client1.destroy();
|
||||||
|
|
||||||
|
resolve();
|
||||||
|
}, 500);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Check immediately after destruction
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 100));
|
||||||
|
let details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter destruction:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Wait for inactivity check to run (should detect zombies)
|
||||||
|
console.log('\nWaiting for inactivity check to detect zombies...');
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 2000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter first inactivity check:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
console.log('\n--- Test 2: Create half-zombie by destroying only one socket ---');
|
||||||
|
|
||||||
|
// Clear backend connections array
|
||||||
|
backendConnections.length = 0;
|
||||||
|
|
||||||
|
const client2 = new net.Socket();
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
client2.connect(8590, 'localhost', () => {
|
||||||
|
console.log('Client2 connected to OuterProxy');
|
||||||
|
client2.write('GET / HTTP/1.1\r\nHost: test.com\r\n\r\n');
|
||||||
|
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('Creating half-zombie by destroying only outgoing socket on outer proxy');
|
||||||
|
|
||||||
|
// Access the connection records directly
|
||||||
|
const outerConnMgr = (outerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
const outerRecords = Array.from((outerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
|
||||||
|
// Find the active connection and destroy only its outgoing socket
|
||||||
|
const activeRecord = outerRecords.find(r => !r.connectionClosed && r.outgoing && !r.outgoing.destroyed);
|
||||||
|
if (activeRecord && activeRecord.outgoing) {
|
||||||
|
console.log('Found active connection, destroying outgoing socket');
|
||||||
|
activeRecord.outgoing.removeAllListeners();
|
||||||
|
activeRecord.outgoing.destroy();
|
||||||
|
}
|
||||||
|
|
||||||
|
resolve();
|
||||||
|
}, 500);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Check half-zombie state
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 100));
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter creating half-zombie:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Wait for 30-second grace period (simulated by multiple checks)
|
||||||
|
console.log('\nWaiting for half-zombie grace period (30 seconds simulated)...');
|
||||||
|
|
||||||
|
// Manually age the connection to trigger half-zombie cleanup
|
||||||
|
const outerConnMgr = (outerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
const records = Array.from((outerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
records.forEach(record => {
|
||||||
|
if (!record.connectionClosed) {
|
||||||
|
// Age the connection by 35 seconds
|
||||||
|
record.incomingStartTime -= 35000;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Trigger inactivity check
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 2000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter half-zombie cleanup:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Clean up client2 properly
|
||||||
|
if (!client2.destroyed) {
|
||||||
|
client2.destroy();
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('\n--- Test 3: Rapid zombie creation under load ---');
|
||||||
|
|
||||||
|
// Create multiple connections rapidly and destroy them
|
||||||
|
const rapidClients: net.Socket[] = [];
|
||||||
|
|
||||||
|
for (let i = 0; i < 5; i++) {
|
||||||
|
const client = new net.Socket();
|
||||||
|
rapidClients.push(client);
|
||||||
|
|
||||||
|
client.connect(8590, 'localhost', () => {
|
||||||
|
console.log(`Rapid client ${i} connected`);
|
||||||
|
client.write('GET / HTTP/1.1\r\nHost: test.com\r\n\r\n');
|
||||||
|
|
||||||
|
// Destroy after random delay
|
||||||
|
setTimeout(() => {
|
||||||
|
client.removeAllListeners();
|
||||||
|
client.destroy();
|
||||||
|
}, Math.random() * 500);
|
||||||
|
});
|
||||||
|
|
||||||
|
// Small delay between connections
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 50));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait a bit
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 1000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter rapid connections:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Wait for cleanup
|
||||||
|
console.log('\nWaiting for final cleanup...');
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 3000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nFinal state:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Cleanup
|
||||||
|
await outerProxy.stop();
|
||||||
|
await innerProxy.stop();
|
||||||
|
backend.close();
|
||||||
|
|
||||||
|
// Verify all connections are cleaned up
|
||||||
|
console.log('\n--- Verification ---');
|
||||||
|
|
||||||
|
if (details.outer.count === 0 && details.inner.count === 0) {
|
||||||
|
console.log('✅ PASS: All zombie connections were cleaned up');
|
||||||
|
} else {
|
||||||
|
console.log('❌ FAIL: Some connections remain');
|
||||||
|
}
|
||||||
|
|
||||||
|
expect(details.outer.count).toEqual(0);
|
||||||
|
expect(details.inner.count).toEqual(0);
|
||||||
|
expect(details.outer.zombies.length).toEqual(0);
|
||||||
|
expect(details.inner.zombies.length).toEqual(0);
|
||||||
|
expect(details.outer.halfZombies.length).toEqual(0);
|
||||||
|
expect(details.inner.halfZombies.length).toEqual(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
tap.start();
|
@ -258,22 +258,61 @@ export function createSocketWithErrorHandler(options: SafeSocketOptions): plugin
|
|||||||
// Create socket with immediate error handler attachment
|
// Create socket with immediate error handler attachment
|
||||||
const socket = new plugins.net.Socket();
|
const socket = new plugins.net.Socket();
|
||||||
|
|
||||||
|
// Track if connected
|
||||||
|
let connected = false;
|
||||||
|
let connectionTimeout: NodeJS.Timeout | null = null;
|
||||||
|
|
||||||
// Attach error handler BEFORE connecting to catch immediate errors
|
// Attach error handler BEFORE connecting to catch immediate errors
|
||||||
socket.on('error', (error) => {
|
socket.on('error', (error) => {
|
||||||
console.error(`Socket connection error to ${host}:${port}: ${error.message}`);
|
console.error(`Socket connection error to ${host}:${port}: ${error.message}`);
|
||||||
|
// Clear the connection timeout if it exists
|
||||||
|
if (connectionTimeout) {
|
||||||
|
clearTimeout(connectionTimeout);
|
||||||
|
connectionTimeout = null;
|
||||||
|
}
|
||||||
if (onError) {
|
if (onError) {
|
||||||
onError(error);
|
onError(error);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
// Attach connect handler if provided
|
// Attach connect handler
|
||||||
if (onConnect) {
|
const handleConnect = () => {
|
||||||
socket.on('connect', onConnect);
|
connected = true;
|
||||||
}
|
// Clear the connection timeout
|
||||||
|
if (connectionTimeout) {
|
||||||
|
clearTimeout(connectionTimeout);
|
||||||
|
connectionTimeout = null;
|
||||||
|
}
|
||||||
|
// Set inactivity timeout if provided (after connection is established)
|
||||||
|
if (timeout) {
|
||||||
|
socket.setTimeout(timeout);
|
||||||
|
}
|
||||||
|
if (onConnect) {
|
||||||
|
onConnect();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
// Set timeout if provided
|
socket.on('connect', handleConnect);
|
||||||
|
|
||||||
|
// Implement connection establishment timeout
|
||||||
if (timeout) {
|
if (timeout) {
|
||||||
socket.setTimeout(timeout);
|
connectionTimeout = setTimeout(() => {
|
||||||
|
if (!connected && !socket.destroyed) {
|
||||||
|
// Connection timed out - destroy the socket
|
||||||
|
const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
|
||||||
|
(error as any).code = 'ETIMEDOUT';
|
||||||
|
|
||||||
|
console.error(`Socket connection timeout to ${host}:${port} after ${timeout}ms`);
|
||||||
|
|
||||||
|
// Destroy the socket
|
||||||
|
socket.destroy();
|
||||||
|
|
||||||
|
// Call error handler
|
||||||
|
if (onError) {
|
||||||
|
onError(error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}, timeout);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Now attempt to connect - any immediate errors will be caught
|
// Now attempt to connect - any immediate errors will be caught
|
||||||
|
@ -456,6 +456,74 @@ export class ConnectionManager extends LifecycleComponent {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Also check ALL connections for zombie state (destroyed sockets but not cleaned up)
|
||||||
|
// This is critical for proxy chains where sockets can be destroyed without events
|
||||||
|
for (const [connectionId, record] of this.connectionRecords) {
|
||||||
|
if (!record.connectionClosed) {
|
||||||
|
const incomingDestroyed = record.incoming?.destroyed || false;
|
||||||
|
const outgoingDestroyed = record.outgoing?.destroyed || false;
|
||||||
|
|
||||||
|
// Check for zombie connections: both sockets destroyed but connection not cleaned up
|
||||||
|
if (incomingDestroyed && outgoingDestroyed) {
|
||||||
|
logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(now - record.incomingStartTime),
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up immediately
|
||||||
|
this.cleanupConnection(record, 'zombie_cleanup');
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for half-zombie: one socket destroyed
|
||||||
|
if (incomingDestroyed || outgoingDestroyed) {
|
||||||
|
const age = now - record.incomingStartTime;
|
||||||
|
// Give it 30 seconds grace period for normal cleanup
|
||||||
|
if (age > 30000) {
|
||||||
|
logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(age),
|
||||||
|
incomingDestroyed,
|
||||||
|
outgoingDestroyed,
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
this.cleanupConnection(record, 'half_zombie_cleanup');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for stuck connections: no data sent back to client
|
||||||
|
if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
|
||||||
|
const age = now - record.incomingStartTime;
|
||||||
|
// If connection is older than 60 seconds and no data sent back, likely stuck
|
||||||
|
if (age > 60000) {
|
||||||
|
logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(age),
|
||||||
|
bytesReceived: record.bytesReceived,
|
||||||
|
targetHost: record.targetHost,
|
||||||
|
targetPort: record.targetPort,
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Set termination reason and increment stats
|
||||||
|
if (record.incomingTerminationReason == null) {
|
||||||
|
record.incomingTerminationReason = 'stuck_no_response';
|
||||||
|
this.incrementTerminationStat('incoming', 'stuck_no_response');
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
this.cleanupConnection(record, 'stuck_no_response');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Process only connections that need checking
|
// Process only connections that need checking
|
||||||
for (const connectionId of connectionsToCheck) {
|
for (const connectionId of connectionsToCheck) {
|
||||||
const record = this.connectionRecords.get(connectionId);
|
const record = this.connectionRecords.get(connectionId);
|
||||||
|
@ -69,6 +69,7 @@ export interface ISmartProxyOptions {
|
|||||||
maxVersion?: string;
|
maxVersion?: string;
|
||||||
|
|
||||||
// Timeout settings
|
// Timeout settings
|
||||||
|
connectionTimeout?: number; // Timeout for establishing connection to backend (ms), default: 30000 (30s)
|
||||||
initialDataTimeout?: number; // Timeout for initial data/SNI (ms), default: 60000 (60s)
|
initialDataTimeout?: number; // Timeout for initial data/SNI (ms), default: 60000 (60s)
|
||||||
socketTimeout?: number; // Socket inactivity timeout (ms), default: 3600000 (1h)
|
socketTimeout?: number; // Socket inactivity timeout (ms), default: 3600000 (1h)
|
||||||
inactivityCheckInterval?: number; // How often to check for inactive connections (ms), default: 60000 (60s)
|
inactivityCheckInterval?: number; // How often to check for inactive connections (ms), default: 60000 (60s)
|
||||||
|
@ -1125,6 +1125,7 @@ export class RouteConnectionHandler {
|
|||||||
const targetSocket = createSocketWithErrorHandler({
|
const targetSocket = createSocketWithErrorHandler({
|
||||||
port: finalTargetPort,
|
port: finalTargetPort,
|
||||||
host: finalTargetHost,
|
host: finalTargetHost,
|
||||||
|
timeout: this.settings.connectionTimeout || 30000, // Connection timeout (default: 30s)
|
||||||
onError: (error) => {
|
onError: (error) => {
|
||||||
// Connection failed - clean up everything immediately
|
// Connection failed - clean up everything immediately
|
||||||
// Check if connection record is still valid (client might have disconnected)
|
// Check if connection record is still valid (client might have disconnected)
|
||||||
|
Reference in New Issue
Block a user