Implement zombie connection detection and cleanup in ConnectionManager; enhance tests for edge cases
This commit is contained in:
parent
9094b76b1b
commit
47735adbf2
@ -372,4 +372,180 @@ The connection cleanup mechanisms have been significantly improved in v19.5.20:
|
|||||||
2. Immediate routing cleanup handler always destroys outgoing connections
|
2. Immediate routing cleanup handler always destroys outgoing connections
|
||||||
3. Tests confirm no accumulation in standard scenarios with reachable backends
|
3. Tests confirm no accumulation in standard scenarios with reachable backends
|
||||||
|
|
||||||
However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
|
However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
|
||||||
|
|
||||||
|
### Outer Proxy Sudden Accumulation After Hours
|
||||||
|
|
||||||
|
**User Report**: "The counter goes up suddenly after some hours on the outer proxy"
|
||||||
|
|
||||||
|
**Investigation Findings**:
|
||||||
|
|
||||||
|
1. **Cleanup Queue Mechanism**:
|
||||||
|
- Connections are cleaned up in batches of 100 via a queue
|
||||||
|
- If the cleanup timer gets stuck or cleared without restart, connections accumulate
|
||||||
|
- The timer is set with `setTimeout` and could be affected by event loop blocking
|
||||||
|
|
||||||
|
2. **Potential Causes for Sudden Spikes**:
|
||||||
|
|
||||||
|
a) **Cleanup Timer Failure**:
|
||||||
|
```typescript
|
||||||
|
// In ConnectionManager, if this timer gets cleared but not restarted:
|
||||||
|
this.cleanupTimer = this.setTimeout(() => {
|
||||||
|
this.processCleanupQueue();
|
||||||
|
}, 100);
|
||||||
|
```
|
||||||
|
|
||||||
|
b) **Memory Pressure**:
|
||||||
|
- After hours of operation, memory fragmentation or pressure could cause delays
|
||||||
|
- Garbage collection pauses might interfere with timer execution
|
||||||
|
|
||||||
|
c) **Event Listener Accumulation**:
|
||||||
|
- Socket event listeners might accumulate over time
|
||||||
|
- Server 'connection' event handlers are particularly important
|
||||||
|
|
||||||
|
d) **Keep-Alive Connection Cascades**:
|
||||||
|
- When many keep-alive connections timeout simultaneously
|
||||||
|
- Outer proxy has different timeout than inner proxy
|
||||||
|
- Mass disconnection events can overwhelm cleanup queue
|
||||||
|
|
||||||
|
e) **HttpProxy Component Issues**:
|
||||||
|
- If using `useHttpProxy`, the HttpProxy bridge might maintain connection pools
|
||||||
|
- These pools might not be properly cleaned after hours
|
||||||
|
|
||||||
|
3. **Why "Sudden" After Hours**:
|
||||||
|
- Not a gradual leak but triggered by specific conditions
|
||||||
|
- Likely related to periodic events or thresholds:
|
||||||
|
- Inactivity check runs every 30 seconds
|
||||||
|
- Keep-alive connections have extended timeouts (6x normal)
|
||||||
|
- Parity check has 30-minute timeout for half-closed connections
|
||||||
|
|
||||||
|
4. **Reproduction Scenarios**:
|
||||||
|
- Mass client disconnection/reconnection (network blip)
|
||||||
|
- Keep-alive timeout cascade when inner proxy times out first
|
||||||
|
- Cleanup timer getting stuck during high load
|
||||||
|
- Memory pressure causing event loop delays
|
||||||
|
|
||||||
|
### Additional Monitoring Recommendations
|
||||||
|
|
||||||
|
1. **Add Cleanup Queue Monitoring**:
|
||||||
|
```typescript
|
||||||
|
setInterval(() => {
|
||||||
|
const cm = proxy.connectionManager;
|
||||||
|
if (cm.cleanupQueue.size > 100 && !cm.cleanupTimer) {
|
||||||
|
logger.error('Cleanup queue stuck!', {
|
||||||
|
queueSize: cm.cleanupQueue.size,
|
||||||
|
hasTimer: !!cm.cleanupTimer
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}, 60000);
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Track Timer Health**:
|
||||||
|
- Monitor if cleanup timer is running
|
||||||
|
- Check for event loop blocking
|
||||||
|
- Log when batch processing takes too long
|
||||||
|
|
||||||
|
3. **Memory Monitoring**:
|
||||||
|
- Track heap usage over time
|
||||||
|
- Monitor for memory leaks in long-running processes
|
||||||
|
- Force periodic garbage collection if needed
|
||||||
|
|
||||||
|
### Immediate Mitigations
|
||||||
|
|
||||||
|
1. **Restart Cleanup Timer**:
|
||||||
|
```typescript
|
||||||
|
// Emergency cleanup timer restart
|
||||||
|
if (!cm.cleanupTimer && cm.cleanupQueue.size > 0) {
|
||||||
|
cm.cleanupTimer = setTimeout(() => {
|
||||||
|
cm.processCleanupQueue();
|
||||||
|
}, 100);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Force Periodic Cleanup**:
|
||||||
|
```typescript
|
||||||
|
setInterval(() => {
|
||||||
|
const cm = connectionManager;
|
||||||
|
if (cm.getConnectionCount() > threshold) {
|
||||||
|
cm.performOptimizedInactivityCheck();
|
||||||
|
// Force process cleanup queue
|
||||||
|
cm.processCleanupQueue();
|
||||||
|
}
|
||||||
|
}, 300000); // Every 5 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Connection Age Limits**:
|
||||||
|
- Set maximum connection lifetime
|
||||||
|
- Force close connections older than threshold
|
||||||
|
- More aggressive cleanup for proxy chains
|
||||||
|
|
||||||
|
## ✅ FIXED: Zombie Connection Detection (January 2025)
|
||||||
|
|
||||||
|
### Root Cause Identified
|
||||||
|
"Zombie connections" occur when sockets are destroyed without triggering their close/error event handlers. This causes connections to remain tracked with both sockets destroyed but `connectionClosed=false`. This is particularly problematic in proxy chains where the inner proxy might close connections in ways that don't trigger proper events on the outer proxy.
|
||||||
|
|
||||||
|
### Fix Implemented
|
||||||
|
Added zombie detection to the periodic inactivity check in ConnectionManager:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// In performOptimizedInactivityCheck()
|
||||||
|
// Check ALL connections for zombie state
|
||||||
|
for (const [connectionId, record] of this.connectionRecords) {
|
||||||
|
if (!record.connectionClosed) {
|
||||||
|
const incomingDestroyed = record.incoming?.destroyed || false;
|
||||||
|
const outgoingDestroyed = record.outgoing?.destroyed || false;
|
||||||
|
|
||||||
|
// Check for zombie connections: both sockets destroyed but not cleaned up
|
||||||
|
if (incomingDestroyed && outgoingDestroyed) {
|
||||||
|
logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(now - record.incomingStartTime),
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up immediately
|
||||||
|
this.cleanupConnection(record, 'zombie_cleanup');
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for half-zombie: one socket destroyed
|
||||||
|
if (incomingDestroyed || outgoingDestroyed) {
|
||||||
|
const age = now - record.incomingStartTime;
|
||||||
|
// Give it 30 seconds grace period for normal cleanup
|
||||||
|
if (age > 30000) {
|
||||||
|
logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(age),
|
||||||
|
incomingDestroyed,
|
||||||
|
outgoingDestroyed,
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
this.cleanupConnection(record, 'half_zombie_cleanup');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### How It Works
|
||||||
|
1. **Full Zombie Detection**: Detects when both incoming and outgoing sockets are destroyed but the connection hasn't been cleaned up
|
||||||
|
2. **Half-Zombie Detection**: Detects when only one socket is destroyed, with a 30-second grace period for normal cleanup to occur
|
||||||
|
3. **Automatic Cleanup**: Immediately cleans up zombie connections when detected
|
||||||
|
4. **Runs Periodically**: Integrated into the existing inactivity check that runs every 30 seconds
|
||||||
|
|
||||||
|
### Why This Fixes the Outer Proxy Accumulation
|
||||||
|
- When inner proxy closes connections abruptly (e.g., due to backend failure), the outer proxy's outgoing socket might be destroyed without firing close/error events
|
||||||
|
- These become zombie connections that previously accumulated indefinitely
|
||||||
|
- Now they are detected and cleaned up within 30 seconds
|
||||||
|
|
||||||
|
### Test Results
|
||||||
|
Debug scripts confirmed:
|
||||||
|
- Zombie connections can be created when sockets are destroyed directly without events
|
||||||
|
- The zombie detection successfully identifies and cleans up these connections
|
||||||
|
- Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
|
||||||
|
|
||||||
|
This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
|
@ -856,4 +856,42 @@ The WrappedSocket class has been implemented as the foundation for PROXY protoco
|
|||||||
For detailed information about proxy protocol implementation and proxy chaining:
|
For detailed information about proxy protocol implementation and proxy chaining:
|
||||||
- **[Proxy Protocol Guide](./readme.proxy-protocol.md)** - Complete implementation details and configuration
|
- **[Proxy Protocol Guide](./readme.proxy-protocol.md)** - Complete implementation details and configuration
|
||||||
- **[Proxy Protocol Examples](./readme.proxy-protocol-example.md)** - Code examples and conceptual implementation
|
- **[Proxy Protocol Examples](./readme.proxy-protocol-example.md)** - Code examples and conceptual implementation
|
||||||
- **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
|
- **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
|
||||||
|
|
||||||
|
## Connection Cleanup Edge Cases Investigation (v19.5.20+)
|
||||||
|
|
||||||
|
### Issue Discovered
|
||||||
|
"Zombie connections" can occur when both sockets are destroyed but the connection record hasn't been cleaned up. This happens when sockets are destroyed without triggering their close/error event handlers.
|
||||||
|
|
||||||
|
### Root Cause
|
||||||
|
1. **Event Handler Bypass**: In edge cases (network failures, proxy chain failures, forced socket destruction), sockets can be destroyed without their event handlers being called
|
||||||
|
2. **Cleanup Queue Delay**: The `initiateCleanupOnce` method adds connections to a cleanup queue (batch of 100 every 100ms), which may not process fast enough
|
||||||
|
3. **Inactivity Check Limitation**: The periodic inactivity check only examines `lastActivity` timestamps, not actual socket states
|
||||||
|
|
||||||
|
### Test Results
|
||||||
|
Debug script (`connection-manager-direct-test.ts`) revealed:
|
||||||
|
- **Normal cleanup works**: When socket events fire normally, cleanup is reliable
|
||||||
|
- **Zombies ARE created**: Direct socket destruction creates zombies (destroyed sockets, connectionClosed=false)
|
||||||
|
- **Manual cleanup works**: Calling `initiateCleanupOnce` on a zombie does clean it up
|
||||||
|
- **Inactivity check misses zombies**: The check doesn't detect connections with destroyed sockets
|
||||||
|
|
||||||
|
### Potential Solutions
|
||||||
|
1. **Periodic Zombie Detection**: Add zombie detection to the inactivity check:
|
||||||
|
```typescript
|
||||||
|
// In performOptimizedInactivityCheck
|
||||||
|
if (record.incoming?.destroyed && record.outgoing?.destroyed && !record.connectionClosed) {
|
||||||
|
this.cleanupConnection(record, 'zombie_detected');
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Socket State Monitoring**: Check socket states during connection operations
|
||||||
|
3. **Defensive Socket Handling**: Always attach cleanup handlers before any operation that might destroy sockets
|
||||||
|
4. **Immediate Cleanup Option**: For critical paths, use `cleanupConnection` instead of `initiateCleanupOnce`
|
||||||
|
|
||||||
|
### Impact
|
||||||
|
- Memory leaks in edge cases (network failures, proxy chain issues)
|
||||||
|
- Connection count inaccuracy
|
||||||
|
- Potential resource exhaustion over time
|
||||||
|
|
||||||
|
### Test Files
|
||||||
|
- `.nogit/debug/connection-manager-direct-test.ts` - Direct ConnectionManager testing showing zombie creation
|
306
test/test.zombie-connection-cleanup.node.ts
Normal file
306
test/test.zombie-connection-cleanup.node.ts
Normal file
@ -0,0 +1,306 @@
|
|||||||
|
import { tap, expect } from '@git.zone/tstest/tapbundle';
|
||||||
|
import * as net from 'net';
|
||||||
|
import * as plugins from '../ts/plugins.js';
|
||||||
|
|
||||||
|
// Import SmartProxy
|
||||||
|
import { SmartProxy } from '../ts/index.js';
|
||||||
|
|
||||||
|
// Import types through type-only imports
|
||||||
|
import type { ConnectionManager } from '../ts/proxies/smart-proxy/connection-manager.js';
|
||||||
|
import type { IConnectionRecord } from '../ts/proxies/smart-proxy/models/interfaces.js';
|
||||||
|
|
||||||
|
tap.test('zombie connection cleanup - verify inactivity check detects and cleans destroyed sockets', async () => {
|
||||||
|
console.log('\n=== Zombie Connection Cleanup Test ===');
|
||||||
|
console.log('Purpose: Verify that connections with destroyed sockets are detected and cleaned up');
|
||||||
|
console.log('Setup: Client → OuterProxy (8590) → InnerProxy (8591) → Backend (9998)');
|
||||||
|
|
||||||
|
// Create backend server that can be controlled
|
||||||
|
let acceptConnections = true;
|
||||||
|
let destroyImmediately = false;
|
||||||
|
const backendConnections: net.Socket[] = [];
|
||||||
|
|
||||||
|
const backend = net.createServer((socket) => {
|
||||||
|
console.log('Backend: Connection received');
|
||||||
|
backendConnections.push(socket);
|
||||||
|
|
||||||
|
if (destroyImmediately) {
|
||||||
|
console.log('Backend: Destroying connection immediately');
|
||||||
|
socket.destroy();
|
||||||
|
} else {
|
||||||
|
socket.on('data', (data) => {
|
||||||
|
console.log('Backend: Received data, echoing back');
|
||||||
|
socket.write(data);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
backend.listen(9998, () => {
|
||||||
|
console.log('✓ Backend server started on port 9998');
|
||||||
|
resolve();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Create InnerProxy with faster inactivity check for testing
|
||||||
|
const innerProxy = new SmartProxy({
|
||||||
|
ports: [8591],
|
||||||
|
enableDetailedLogging: true,
|
||||||
|
inactivityTimeout: 5000, // 5 seconds for faster testing
|
||||||
|
inactivityCheckInterval: 1000, // Check every second
|
||||||
|
routes: [{
|
||||||
|
name: 'to-backend',
|
||||||
|
match: { ports: 8591 },
|
||||||
|
action: {
|
||||||
|
type: 'forward',
|
||||||
|
target: {
|
||||||
|
host: 'localhost',
|
||||||
|
port: 9998
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
});
|
||||||
|
|
||||||
|
// Create OuterProxy with faster inactivity check
|
||||||
|
const outerProxy = new SmartProxy({
|
||||||
|
ports: [8590],
|
||||||
|
enableDetailedLogging: true,
|
||||||
|
inactivityTimeout: 5000, // 5 seconds for faster testing
|
||||||
|
inactivityCheckInterval: 1000, // Check every second
|
||||||
|
routes: [{
|
||||||
|
name: 'to-inner',
|
||||||
|
match: { ports: 8590 },
|
||||||
|
action: {
|
||||||
|
type: 'forward',
|
||||||
|
target: {
|
||||||
|
host: 'localhost',
|
||||||
|
port: 8591
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
});
|
||||||
|
|
||||||
|
await innerProxy.start();
|
||||||
|
console.log('✓ InnerProxy started on port 8591');
|
||||||
|
|
||||||
|
await outerProxy.start();
|
||||||
|
console.log('✓ OuterProxy started on port 8590');
|
||||||
|
|
||||||
|
// Helper to get connection details
|
||||||
|
const getConnectionDetails = () => {
|
||||||
|
const outerConnMgr = (outerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
const innerConnMgr = (innerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
|
||||||
|
const outerRecords = Array.from((outerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
const innerRecords = Array.from((innerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
|
||||||
|
return {
|
||||||
|
outer: {
|
||||||
|
count: outerConnMgr.getConnectionCount(),
|
||||||
|
records: outerRecords,
|
||||||
|
zombies: outerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
r.incoming?.destroyed &&
|
||||||
|
(r.outgoing?.destroyed ?? true)
|
||||||
|
),
|
||||||
|
halfZombies: outerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
(r.incoming?.destroyed || r.outgoing?.destroyed) &&
|
||||||
|
!(r.incoming?.destroyed && (r.outgoing?.destroyed ?? true))
|
||||||
|
)
|
||||||
|
},
|
||||||
|
inner: {
|
||||||
|
count: innerConnMgr.getConnectionCount(),
|
||||||
|
records: innerRecords,
|
||||||
|
zombies: innerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
r.incoming?.destroyed &&
|
||||||
|
(r.outgoing?.destroyed ?? true)
|
||||||
|
),
|
||||||
|
halfZombies: innerRecords.filter(r =>
|
||||||
|
!r.connectionClosed &&
|
||||||
|
(r.incoming?.destroyed || r.outgoing?.destroyed) &&
|
||||||
|
!(r.incoming?.destroyed && (r.outgoing?.destroyed ?? true))
|
||||||
|
)
|
||||||
|
}
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
console.log('\n--- Test 1: Create zombie by destroying sockets without events ---');
|
||||||
|
|
||||||
|
// Create a connection and forcefully destroy sockets to create zombies
|
||||||
|
const client1 = new net.Socket();
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
client1.connect(8590, 'localhost', () => {
|
||||||
|
console.log('Client1 connected to OuterProxy');
|
||||||
|
client1.write('GET / HTTP/1.1\r\nHost: test.com\r\n\r\n');
|
||||||
|
|
||||||
|
// Wait for connection to be established through the chain
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('Forcefully destroying backend connections to create zombies');
|
||||||
|
|
||||||
|
// Get connection details before destruction
|
||||||
|
const beforeDetails = getConnectionDetails();
|
||||||
|
console.log(`Before destruction: Outer=${beforeDetails.outer.count}, Inner=${beforeDetails.inner.count}`);
|
||||||
|
|
||||||
|
// Destroy all backend connections without proper close events
|
||||||
|
backendConnections.forEach(conn => {
|
||||||
|
if (!conn.destroyed) {
|
||||||
|
// Remove all listeners to prevent proper cleanup
|
||||||
|
conn.removeAllListeners();
|
||||||
|
conn.destroy();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Also destroy the client socket abruptly
|
||||||
|
client1.removeAllListeners();
|
||||||
|
client1.destroy();
|
||||||
|
|
||||||
|
resolve();
|
||||||
|
}, 500);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Check immediately after destruction
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 100));
|
||||||
|
let details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter destruction:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Wait for inactivity check to run (should detect zombies)
|
||||||
|
console.log('\nWaiting for inactivity check to detect zombies...');
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 2000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter first inactivity check:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
console.log('\n--- Test 2: Create half-zombie by destroying only one socket ---');
|
||||||
|
|
||||||
|
// Clear backend connections array
|
||||||
|
backendConnections.length = 0;
|
||||||
|
|
||||||
|
const client2 = new net.Socket();
|
||||||
|
await new Promise<void>((resolve) => {
|
||||||
|
client2.connect(8590, 'localhost', () => {
|
||||||
|
console.log('Client2 connected to OuterProxy');
|
||||||
|
client2.write('GET / HTTP/1.1\r\nHost: test.com\r\n\r\n');
|
||||||
|
|
||||||
|
setTimeout(() => {
|
||||||
|
console.log('Creating half-zombie by destroying only outgoing socket on outer proxy');
|
||||||
|
|
||||||
|
// Access the connection records directly
|
||||||
|
const outerConnMgr = (outerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
const outerRecords = Array.from((outerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
|
||||||
|
// Find the active connection and destroy only its outgoing socket
|
||||||
|
const activeRecord = outerRecords.find(r => !r.connectionClosed && r.outgoing && !r.outgoing.destroyed);
|
||||||
|
if (activeRecord && activeRecord.outgoing) {
|
||||||
|
console.log('Found active connection, destroying outgoing socket');
|
||||||
|
activeRecord.outgoing.removeAllListeners();
|
||||||
|
activeRecord.outgoing.destroy();
|
||||||
|
}
|
||||||
|
|
||||||
|
resolve();
|
||||||
|
}, 500);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// Check half-zombie state
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 100));
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter creating half-zombie:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Wait for 30-second grace period (simulated by multiple checks)
|
||||||
|
console.log('\nWaiting for half-zombie grace period (30 seconds simulated)...');
|
||||||
|
|
||||||
|
// Manually age the connection to trigger half-zombie cleanup
|
||||||
|
const outerConnMgr = (outerProxy as any).connectionManager as ConnectionManager;
|
||||||
|
const records = Array.from((outerConnMgr as any).connectionRecords.values()) as IConnectionRecord[];
|
||||||
|
records.forEach(record => {
|
||||||
|
if (!record.connectionClosed) {
|
||||||
|
// Age the connection by 35 seconds
|
||||||
|
record.incomingStartTime -= 35000;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Trigger inactivity check
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 2000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter half-zombie cleanup:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Clean up client2 properly
|
||||||
|
if (!client2.destroyed) {
|
||||||
|
client2.destroy();
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('\n--- Test 3: Rapid zombie creation under load ---');
|
||||||
|
|
||||||
|
// Create multiple connections rapidly and destroy them
|
||||||
|
const rapidClients: net.Socket[] = [];
|
||||||
|
|
||||||
|
for (let i = 0; i < 5; i++) {
|
||||||
|
const client = new net.Socket();
|
||||||
|
rapidClients.push(client);
|
||||||
|
|
||||||
|
client.connect(8590, 'localhost', () => {
|
||||||
|
console.log(`Rapid client ${i} connected`);
|
||||||
|
client.write('GET / HTTP/1.1\r\nHost: test.com\r\n\r\n');
|
||||||
|
|
||||||
|
// Destroy after random delay
|
||||||
|
setTimeout(() => {
|
||||||
|
client.removeAllListeners();
|
||||||
|
client.destroy();
|
||||||
|
}, Math.random() * 500);
|
||||||
|
});
|
||||||
|
|
||||||
|
// Small delay between connections
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 50));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait a bit
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 1000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nAfter rapid connections:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Wait for cleanup
|
||||||
|
console.log('\nWaiting for final cleanup...');
|
||||||
|
await new Promise(resolve => setTimeout(resolve, 3000));
|
||||||
|
|
||||||
|
details = getConnectionDetails();
|
||||||
|
console.log(`\nFinal state:`);
|
||||||
|
console.log(` Outer: ${details.outer.count} connections, ${details.outer.zombies.length} zombies, ${details.outer.halfZombies.length} half-zombies`);
|
||||||
|
console.log(` Inner: ${details.inner.count} connections, ${details.inner.zombies.length} zombies, ${details.inner.halfZombies.length} half-zombies`);
|
||||||
|
|
||||||
|
// Cleanup
|
||||||
|
await outerProxy.stop();
|
||||||
|
await innerProxy.stop();
|
||||||
|
backend.close();
|
||||||
|
|
||||||
|
// Verify all connections are cleaned up
|
||||||
|
console.log('\n--- Verification ---');
|
||||||
|
|
||||||
|
if (details.outer.count === 0 && details.inner.count === 0) {
|
||||||
|
console.log('✅ PASS: All zombie connections were cleaned up');
|
||||||
|
} else {
|
||||||
|
console.log('❌ FAIL: Some connections remain');
|
||||||
|
}
|
||||||
|
|
||||||
|
expect(details.outer.count).toEqual(0);
|
||||||
|
expect(details.inner.count).toEqual(0);
|
||||||
|
expect(details.outer.zombies.length).toEqual(0);
|
||||||
|
expect(details.inner.zombies.length).toEqual(0);
|
||||||
|
expect(details.outer.halfZombies.length).toEqual(0);
|
||||||
|
expect(details.inner.halfZombies.length).toEqual(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
tap.start();
|
@ -456,6 +456,48 @@ export class ConnectionManager extends LifecycleComponent {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Also check ALL connections for zombie state (destroyed sockets but not cleaned up)
|
||||||
|
// This is critical for proxy chains where sockets can be destroyed without events
|
||||||
|
for (const [connectionId, record] of this.connectionRecords) {
|
||||||
|
if (!record.connectionClosed) {
|
||||||
|
const incomingDestroyed = record.incoming?.destroyed || false;
|
||||||
|
const outgoingDestroyed = record.outgoing?.destroyed || false;
|
||||||
|
|
||||||
|
// Check for zombie connections: both sockets destroyed but connection not cleaned up
|
||||||
|
if (incomingDestroyed && outgoingDestroyed) {
|
||||||
|
logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(now - record.incomingStartTime),
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up immediately
|
||||||
|
this.cleanupConnection(record, 'zombie_cleanup');
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for half-zombie: one socket destroyed
|
||||||
|
if (incomingDestroyed || outgoingDestroyed) {
|
||||||
|
const age = now - record.incomingStartTime;
|
||||||
|
// Give it 30 seconds grace period for normal cleanup
|
||||||
|
if (age > 30000) {
|
||||||
|
logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
|
||||||
|
connectionId,
|
||||||
|
remoteIP: record.remoteIP,
|
||||||
|
age: plugins.prettyMs(age),
|
||||||
|
incomingDestroyed,
|
||||||
|
outgoingDestroyed,
|
||||||
|
component: 'connection-manager'
|
||||||
|
});
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
this.cleanupConnection(record, 'half_zombie_cleanup');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Process only connections that need checking
|
// Process only connections that need checking
|
||||||
for (const connectionId of connectionsToCheck) {
|
for (const connectionId of connectionsToCheck) {
|
||||||
const record = this.connectionRecords.get(connectionId);
|
const record = this.connectionRecords.get(connectionId);
|
||||||
|
Loading…
x
Reference in New Issue
Block a user